[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] Persistence of items
Julian, the problem you outline might be the beginning of a spec for an open
source project. To write a shareable algorithm that computes the "distance"
between two strings of characters. There's no hope of getting all the
different content management systems to put a unique ID on every item. But a
string that differs by only one character from another could be assumed to
be the same "item". I think there's lots of prior art on this, but we've
never had the time to write the code at UserLand, but it might make sense
for a few developers to work together to create something we all can share.
Dave
----- Original Message -----
From: "Julian Bond" <julian@netmarketseurope.com>
To: <syndication@yahoogroups.com>
Sent: Sunday, March 18, 2001 9:24 AM
Subject: [syndication] Persistence of items
> Let's say you take an RSS feed from one source. You store the results in
> a dbms. You collect the feed regularly. The source updates the feed by
> adding items at the top and knocking them off the bottom so there's
> always 15. Is there any easy way of identifying a particular item
> between updates? I want the item to be persistent in my dbms so I can do
> other things with it.
>
> Is the only way to look for an exact match on the whole item, so the key
> would be Title+link+description for my channel_id?
>
> In reality, how often does the link change for what is in fact the same
> item, in which case I could just key on the link? But then RSS 0.92 made
> link optional and typically Manila and RU RSS feeds have no item.link,
> damn!
>
> A related problem is that I receive the same RSS item from several
> feeds. That is, the link is the same, the title is usually the same and
> the description is similar, if not the same. Is it reasonable in the
> general case to say that if the item.link is the same, the item is
> referring to the same source story?
>
> --
> Julian Bond eMail: julian@netmarketseurope.com
> HomeURL: http://www.shockwav.demon.co.uk/
> WorkURL: http://www.netmarketseurope.com/
> WebLog: http://roguemoon.manilasites.com/
> M: +44 (0)77 5907 2173 T: +44 (0)20 7420 4363
> ICQ:33679668 tag:So many words, so little time
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>