[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] site-wide metadata discovery



Chad,

I think you're onto a good idea here.

Perhaps we can all wander forth and find the reasons, pro and con, for using
extensions to the robots.txt file.  Let's see if we can dig up circumstances or
use cases that demonstrate why or why not doing this is workable.  I'd hate to
see us latch onto the idea only to have the experts trash it and open up a whole
permathread of arguments on it.

What I will ask is that we find a way to absolutely require some sort of type
definition.  I'd very much like to see this as a way to not only point to the
data but to offer what format the data is using.  For example:

Site-Index:    http://somesite.example.com/feedindex.xml
http://purl.org/ocs/directory/0.5/

Here we're saying it's a site-index (or whatever syntax we eventual agree upon),
that it can be found at that feedindex.xml location and that it's using the OCS
version 0.5 format.  I'm completely open to any number of different formats
being allowable in this context.  I'm also suggesting that a default of no type
would not favor any one particular format as the default.  Let's not even go
there.

At the same time I'll also support using <head> section <link> tags for the
sites that want to do this as an aid to existing HTML handling tools (like
browsers).

This will help separate the issues of discovery and content.  Each has it's own
set of issues and they deserve individual analysis.

-Bill Kearney
Syndic8.com

----- Original Message -----
From: "Chad Everett" <yahoogroups@jayseae.cxliv.org>
To: <syndication@yahoogroups.com>
Sent: Wednesday, October 15, 2003 6:14 PM
Subject: RE: [syndication] site-wide metadata discovery


> Only thing more recent I found is this Internet Draft, dated November of
> 1996 (expired June 4, 1997).
>
> http://www.robotstxt.org/wc/norobots-rfc.html
>
> It has some more words about extending the specification, specifically:
>
>    Lines with Fields not explicitly specified by this specification
>    may occur in the /robots.txt, allowing for future extension of the
>    format. Consult the BNF for restrictions on the syntax of such
>    extensions. Note specifically that for backwards compatibility
>    with robots implementing earlier versions of this specification,
>    breaking of lines is not allowed.
>
> I think this would fall under:
>
>     extension    = token : *space value [comment] CRLF
>
> Which (I think) means that:
>
> Site-Index:
>
> Would not be valid, but:
>
> Site-Index: default
> Site-Index: folder
> Site-Index: another
> Site-Index: include folder/robots.txt
>
> Would be okay.
>
> Of course, since it's expired, I don't know that it's useful.  But thought
> I'd add it to the discussion anyway.
>
>
> -----Original Message-----
> From: Chad Everett [mailto:yahoogroups@jayseae.cxliv.org]
> Sent: Wednesday, October 15, 2003 5:53 PM
> To: 'syndication@yahoogroups.com'
> Subject: RE: [syndication] site-wide metadata discovery
>
>
> Hi Dave -
>
> Thanks for the vote of confidence.  :)
>
> Here's where I found that text:
>
> http://www.robotstxt.org/wc/norobots.html
>
> Under "The Format", at the end of the sentence just prior to "User-agent".
>
> It actually says "Unrecognised headers are ignored".  I took some small,
> hopefully irrelevant, liberties with my interpretation.  :)
>
> Chad.
>
> -----Original Message-----
> From: Dave Winer [mailto:dave@userland.com]
> Sent: Wednesday, October 15, 2003 5:44 PM
> To: syndication@yahoogroups.com
> Subject: Re: [syndication] site-wide metadata discovery
>
>
> This is a very interesting idea.
>
> I just went for a quick look and couldn't find where it says "unrecognized
> headers should be ignored."
>
> And then we'd have to know if it works in practice too, just because a spec
> says robots should work some way doesn't mean they do.
>
> But kudos for coming up with a new angle. Worth exploring, imho.
>
> Dave
>
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
>