[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [syndication] Master RSS List, Merging, and Updating...



If you are trying to get rid of duplicates, perhaps you can
make use of the Headline Viewer aliases file:

	http://www.vertexdev.com/chv_aliases.xml

This file is generated manually at great personal cost. It
lists all of the URLs that the same basic content seems to show
up under. I have not updated it for a while, but it only grows.

This is not an abstract problem. Look at this entry:

  <alias>
    <url>http://alchemy.openjava.org/alchemyRSS91.XML.xml</url> 
    <url>http://alchemy.openjava.org/alchemyrss-e.xml</url> 
    <url>http://alchemy.openjava.org/alchemyrss.xml</url> 
    <url>http://alchemy.openjava.org/alchemyrss91.xml</url> 
    <url>http://internetalchemy.org/rss.php3</url> 
    <url>http://theweb.startshere.net/channels/2/RSS91.XML</url> 
    <url>http://theweb.startshere.net/channels/264/RSS91.XML</url> 
    <url>http://www.fdc.co.uk/alchemy/alchemyrss.xml</url> 
  </alias>

That's 8 different names for the same content. Some of these
may be obsolete but they might still be one someone's list;
its easier for me to keep them than to check.

Assuming that you are going to publish your list(s) as XML,
it would be great if you could assign each channel a unique
ID. The ID would survive changes to the site's URL and/or title.
I have found that the lack of a unique key to the site leads
to problems and unneeded duplicates.

Jeff;

-----Original Message-----
From: Morbus Iff [mailto:morbus@disobey.com]
Sent: Sunday, February 25, 2001 6:21 PM
To: syndication@yahoogroups.com; rss-dev@yahoogroups.com
Subject: [syndication] Master RSS List, Merging, and Updating...


I'd like to get a communal opinion from the list members.

For my own little RSS project (to be released soon), I'm compiling a large
list of RSS channels. I'm pulling from XMLTree, GrokSoup, Weblogs,
Userland, Moreover, and a bunch of lesser places.

For each of the major lists, I'm compiling a subset of three lists:

  - "still there" channels.
  - channels updated in the last month.
  - "not there" (ie. 404) channels.

Finally, from those three subset files, I'm compiling three large master
lists, devoid of duplicates (or, at least as much as possible) of "still
there" channels, channels updated in the last month, and channels that were
once there, but are now MIA.

My sticky question is: once this supreme list is created, how often (or if
it all) should I recreate the list from scratch from all the other
providers (and merge the unique added entries in the supreme). Should I? Or
should I just keep adding to the supreme list from other sites,
submissions, and so forth?

Your comments are appreciated.

-- 
Morbus Iff
    _____
   |  (@ \
 __| <\/> |____.         Here we have Head-Wound Morbus who discovers
|  |------|    |          that flesh objects in motion tend to stay
|_ |______|_ __|          in motion until they hit metal and bleed.
  (_)      (_)

-05--- <\/> ---- <http://www.disobey.com/> --- Bad Ascii, Short Notice ----


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/