[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] site-wide metadata discovery



This is a very interesting idea.

I just went for a quick look and couldn't find where it says "unrecognized
headers should be ignored."

And then we'd have to know if it works in practice too, just because a spec
says robots should work some way doesn't mean they do.

But kudos for coming up with a new angle. Worth exploring, imho.

Dave


----- Original Message -----
From: "Chad Everett" <yahoogroups@jayseae.cxliv.org>
To: <syndication@yahoogroups.com>
Sent: Wednesday, October 15, 2003 5:13 PM
Subject: [syndication] site-wide metadata discovery


> Some nice comments on discoverability in general being broken, but that
> doesn't really help solve the problem.  Here's an off-the-wall idea.  What
> about adding functionality to a file that's already present, namely the
> robots.txt file?  As it's tolerated already in many cases, let's make it
> useful.
>
> Rather than user-agent/disallow recordsets, it could use something like:
>
> Site-Index:
> Public-Feeds: myPublicFeeds.opml
>
> According to the standard, unrecognized headers should be ignored, so this
> shouldn't affect any "normal" robot/spider/crawler.  But when an app came
> along that did recognize this recordset, it could get the data it needs.
No
> new file name clutter, no link clutter.  You could still use those if you
> want, of course.  :)
>
> Of course, this doesn't help much if you're talking about folder-level
data,
> since robots.txt exists only at the root of the domain.  But at the very
> least, the root could be read to determine the file name, and look for
that
> file name in the current folder.
>
> For instance, if browsing example.com/folder, your browsing application of
> choice reads example.com/robots.txt and finds that the public feeds are
> stored in myPublicFeeds.opml, so it looks in
> example.com/folder/myPublicFeeds.opml for the data.  If you want to get
data
> below or above the current location, apply the same logic - traverse the
> folder structure and get the named file.
>
> This might be preferred in some cases, where file names should be
> standardized across the domain.  In other situations, perhaps an
alternative
> to allow for differences in folders:
>
> Site-Index: folder
> Public-Feeds: myOtherFeeds.opml
>
> Site-Index: another
> Public-Feeds: evenMoreFeeds.opml
>
> Or even add include functionality to the file:
>
> Site-Index: include folder/robots.txt
>
> And let each subdivision of the domain create their own file, which gets
> read into the "master" as the data is parsed.  Naturally this could create
a
> whole bunch of crawling to get all the data, so this last idea might not
be
> the best - but it could be there for those who want the functionality at
the
> cost of the bandwidth/resources required.  What's more, the include allows
> for different file names in different folders.  Only the top-level
> robots.txt is "standardized", and that file is already there in most
cases.
>
> If these are all really bad ideas, I blame Mark's medicine.  :)
>
>
> ---
> Chad Everett
> yahoogroups@jayseae.cxliv.org
>
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>