[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Standard RSS location?



--- In syndication@egroups.com, "james@x" <james@x> wrote:
> Kevin
> 
> > - RSSRegistry (moreover,xmltree,etc) just registers ocs files.
> I agree that xmlTree should contain OCS files, and allow RSS data
consumers to grab the
> metadata of the RSS channels themselves if they wish.  I will be
confirming and adding
> details of the OCS files that I know about to xmlTree this week. 
But so far, the only
> publishers who bother with the additional step of OCS are those who
produce more than 10
> channels (only a handful). For publishers who produce only one or
two channels, writing
> and updating an OCS file represents a significant increase in work. 
Having said that, the
> opportunity to allow more distributed discovery is exciting.
> 
> Ironically, it seems like this role for OCS would resemble the
sitemap role originally
> planned for RSS.  Meanwhile, we are discussing how to move OCS
along.  My particular
> interest is in adding category information to OCS to allow OCS
consumers to browse OCS
> instead of having to work with a large monolithic document
containing details of thousands
> of RSS files.

While OCS superficially seems like a good idea, it has its
limitations.

I recently added RSS to my site http://www.growinglifestyle.com/ (Home
and Garden - the links are at the bottom of almost every page except
the home page)

More specifically I added RSS to almost every where on my site. 
Several hundred topics X several hundred sources X arbitrary searches
= billions of RSS feeds

Why did I do this?

- Because I could (it only took an hour or two - RSS is very simple)

- Because it made sense to me.  Why should everybody get the same RSS
feed, when I can personalise it to their particular interests.  Some
sites target very narrow niches, and others broad areas.  A single RSS
feed can't satisfy them all.

- Because it allows all my content to be syndicated, not just the
front page/headlines

- Because my typical update event generates a thousand or more new
links.  A single RSS file can only show the 15 "most recent" links.

This sounds like just the thing for OCS.

But wait!  I'm getting several hundred hits per day from single RSS
feeds (not counting on-demand end-user RSS grabs - just automated
regular grabs).  On its own, that's fine, but if I register my billion
RSS feeds, my server will be toast.

If I register just my top 500 RSS feeds (say 1 per source, 1 per
subject area), my server may still have to field several hundred
thousand RSS requests per day.  There are only 86,000 CPU seconds in
my server's day, and with a 400MB full-text database, my server may
not be too happy.

Adding to the problem, it seems that very few RSS aggregators wait
more than a couple of seconds between requests.  On the hour, every
hour, they come and bang away at my server.  My content may only be
updated weekly, but hourly they come in one big group.

And what about the structuring / hierarchy of RSS feeds.  OCS does not
handle this at the moment.

Now I'm not complaining.  It's just that the reason I am not
publishing a list of all my RSS feeds (and I think I publish more of
them than any other site on the net - not that that means very much -
like I said it only took me an hour or two) is that I cannot currently
trust the bulk RSS aggregators to behave in a sensible manner with
such a list.

Most RSS aggregators are built with the assumption that a site
publishes only 1 or at most a few RSS feeds.  So a single RSS
aggregator will make only a very small impact on any given site.  Why
bother implementing code to automatically infer optimal update
intervals, when the impact is so small?

It doesn't help matters that the RSS standard method for describing
update intervals is so strangely constructed and obtusely worded that
I've yet to see an RSS reader that knows how to obey it.

So I think the bottle neck is not in the discovery mechanisms for RSS
files, but in the RSS standard and the RSS aggregators.

Specifically:

 - Both the standard and the aggregators need to acknowledge the
possibility that RSS feeds may be infinite in number, rather than a
small positive integer.  Systems that work with a few thousand RSS
feeds will not be able to scale gracefully to millions of feeds.

 - Both the standard and the aggregators need to acknowledge that
simply reading all RSS files on the hour, every hour does not scale
very well (for either the source, the aggregator, or the "destination
portal").

For me, what is holding me back from further exploiting RSS/OCS is not
the difficulty of the standards (they are both very easy to generate),
nor the difficulty of resource discovery (both userland and xmltree
are quite workable), but the lack of controls/limits inherent in the
RSS standard and the RSS readers.

happy syndicating,

Steve