[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Robot Discovery
This is not thought through but bear with me. It's a response to one of
the ideas floating round here about a consistent way to discover if a
site publishes RSS. As we were talking about this it occurred to me that
this problem is not limited to RSS. A site might well have many XML
based files available. It might also publish many XML based web
services. At the moment, the emphasis is all on aggregators and indexers
trying to locate these and the builders to promote them. Perhaps a
standard way for builders to publish their existence would turn this on
it's head.
Imagine a discovery.xml somewhat similar to a robots.txt. This would be
a single file in the root of the website that listed all the xml
available at that site. Each entry would consist of a single parameter,
being the URL of the xml service, or the URL of a deeper list. A spider
reading this would then have to look at each one to determine it's type
and perhaps use that to go off and look further. So we might have:-
discovery.xml => mainnews.rss
=> subscriptions.opml
=> sitemeta.dc
=> feedlist.ocs => subcategory.rss
=> servicelist.wsdl => getstockquote
Now like all standards, for this to work it would need very widespread
implementation. I suspect there are plenty (ie >1) of potential formats
already available. I can also see problems where the individual entries
are not single files, but cgi with multiple parameters.
Is this something new? Or am I just re-hashing work that's already under
way?
--
Julian Bond email: julian_bond@voidstar.com
CV/Resume: http://www.voidstar.com/cv/
WebLog: http://www.voidstar.com/
HomeURL: http://www.shockwav.demon.co.uk/
M: +44 (0)77 5907 2173 T: +44 (0)192 0412 433
ICQ:33679568 tag:So many words, so little time