[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] Re: State of unique IDs on newsfeed items?
In article <9p3tj1+qno9@eGroups.com>, Bill Kearney
<wkearney99@hotmail.com> writes
>I'm asking as to whether the various tools out there used to create
>and deliver newsfeeds are supporting unique IDs for their items. As
>links back to both the HTML presentation and to an XML structure.
IMHO, the vast majority of RSS writers have an internal unique
representation for the item. This may or may not be reflected in the URL
to the html representation of the item, and may or may not end up in the
<link>.
>How many of the clients out there that aren't using RDF would break
>if other elements started appearing in the XML?
My guess is very few. The multiplicity of RSS standards mean that
Readers have to be resilient to new elements and are typically just
looking for <title>, <link>, <description> in an <item> element. Any
other new elements in <item> are likely to be just ignored.
Why would we want a unique id on <item>?
Well all aggregators that store <item> locally from multiple feeds need
to try and ensure they don't store the same <item> from the same
<channel> twice. They all attempt this by looking for uniqueness of
Title, Link and if necessary Description. It's not very effective at the
moment on many feeds because some of those sub-elements are missing, or
Link is used differently, or a tiny and inconsequential edit (possibly
of another Item) at the Writer end makes the Item look different.
So it looks like it would be useful to have a unique ID on item as a
hint to Readers. But how should the situation be handled where an Item
is modified? Should this generate a new Unique ID or re-use the old one?
If it re-uses the old one, then in practice, the Reader may miss the
update as it is old in terms of the stream of items.
How would we structure an ID?
Well NNTP and SMTP solved this long ago. Message-ID is a required header
and consists of <Locally_Unique_String>@<Domain_Name>. This is
guaranteed globally unique as long as the local systems are capable of
generating Unique strings. The Get_Unique_Filename call on virtually all
operating systems makes this easy. Writers could just start adding
<id>Locally_Unique_String@Domain_Name</id> as a sub element of <item>
and nothing much would break.
Do we even need this?
Well the vast majority of feeds have a globally unique value in <link>
which is the URL of the html representation of the <item> (not some link
in the item). Since all aggregators (?) already use this (if it's
available) to check for uniqueness, nothing in the spec needs to change.
So it looks like the only problem is the feeds that do something else
with <link>, like leave it out altogether.
So it seems to me that a new element is *not * needed. We just need to
persuade the existing feeds to use <link> and make sure it is globally
unique. Since only a small proportion of them fail to do this already,
this ought to be an easier task than persuading the larger proportion to
add a new element.
--
Julian Bond email: julian_bond@voidstar.com
CV/Resume: http://www.voidstar.com/cv/
WebLog: http://www.voidstar.com/
HomeURL: http://www.shockwav.demon.co.uk/
M: +44 (0)77 5907 2173 T: +44 (0)192 0412 433
ICQ:33679568 tag:So many words, so little time