[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [RSS-DEV] RE: [syndication] Master RSS List, Merging, and Updating...



Quick notes before I head off to LAX and SJC (you know you travel too
much when you think in terms of airport codes instead of actual cities):

>  - besides numbering, the unique id has no purpose.

Actually, the unique id is the only piece of info about the site that
never changes. From the user's point of view, the fact that a site
gets a new URL, or is syndicated in a new format is irrelevant. By
indexing off of the unique id I can give my users a long-term stable
view of the sites that they read.

> - if we can assure getting rid of duplicates, and 404's,
>   then wouldn't the xmlUrl be the unique id, ie, if a
>   channel changes a url, then the old xmlUrl would 404
>   and be removed...

No, because there is nothing to relate one to the other.

> You mention that the ID would survive changes -
> that sounds like more work to me, actually:

Indeed, it is a lot of work. Headline Viewer (when built in a special
internal "for me only" mode) has a command which does an (N**2)/2 
comparison of titles to find some potential matches which I then sift
through to add to the aliases list. I first load in all of the external
lists from UserLand, XMLTree, and GrokSoup.

> - We go with a unique id system. A channel dies. Instead
>   of removing the channel, it has to be put in a pending
>   area so we don't lose the unique id number. The
>   author contacts us with the new xmlUrl*. We now
>   have to hunt down the unique id, and change the xmlUrl.
>   To make sure the rest of the junk is up to date, a script
>   downloads xmlUrl and reassign title/description, etc.

I do not think we can count on the authors to (with all due respect)
know anything. It should be up to us to keep the list clean. 

Gotta go!

Jeff;

-----Original Message-----
From: Morbus Iff [mailto:morbus@disobey.com]
Sent: Monday, February 26, 2001 9:03 AM
To: rss-dev@yahoogroups.com; syndication@yahoogroups.com
Subject: [RSS-DEV] RE: [syndication] Master RSS List, Merging, and
Updating...


>	http://www.vertexdev.com/chv_aliases.xml

Yes, I had noticed this after browsing through your documentation for the
program and immediatley bookmarked it as "hella useful"...

>Assuming that you are going to publish your list(s) as XML,
>it would be great if you could assign each channel a unique
>ID. The ID would survive changes to the site's URL and/or title.
>I have found that the lack of a unique key to the site leads
>to problems and unneeded duplicates.

I was indeed, and had worried about the unique id issue. I'd like to get
your thoughts on it. My thoughts are thusly:

 - besides numbering, the unique id has no purpose.
 - if we can assure getting rid of duplicates, and 404's,
   then wouldn't the xmlUrl be the unique id, ie, if a
   channel changes a url, then the old xmlUrl would 404
   and be removed...

You mention that the ID would survive changes -
that sounds like more work to me, actually:

 - We go with a unique id system. A channel dies. Instead
   of removing the channel, it has to be put in a pending
   area so we don't lose the unique id number. The
   author contacts us with the new xmlUrl*. We now
   have to hunt down the unique id, and change the xmlUrl.
   To make sure the rest of the junk is up to date, a script
   downloads xmlUrl and reassign title/description, etc.

 - We go with the unique xmlUrl system. A url 404's, so
   it's automatically removed from the list. An author
   submits a url, a script downloads the url and assigns
   a new entry.

(*: I'm ignoring the extra time needed to make a web based frontend of RSS
modification. Should there be one?) Either way would still force us to
check the whole list for duplicate entries because overzealous authors may
pad the results (or else ignore the "we're busy. it'll take three days to
be added."). For simplicity sake, I'm also leaving out any mention of "if
channel has 404'd three times in a row, then remove".

Thoughts? (Sorry - this was a rushed email. I'm heading off to work).

-- 
Morbus Iff

 Disobey has been mentioned in The Netly News, Internet World, ABC News,
   Bruce Sterling's Dead Media Notes and many more. Microsoft and 3Com
  ripped us off also... that's GOTTA mean we're important. And hell, we
  got a rise out of Playboy! With sections that have nothing to do with
    the others, you'll like at least one thing. No, really. Go there.

-07--- <\/> ---- <http://www.disobey.com/> ------- Bad Ego, Any Notice ----


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/