[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [syndication] Question: discovering sources

To: <syndication@egroups.com>
Subject: RE: [syndication] Question: discovering sources
From: "Jeff Barr" <jeff@vertexdev.com>
Date: Thu, 12 Oct 2000 14:08:11 -0700
Importance: Normal
In-reply-to: <NDBBKBHFKDFKAOFDBHAHEEIACKAA.david@moreover.com>

This is harder than it looks and it is a great idea.

Headline Viewer has an internal site list and it also loads
the lists from Userland, XMLTree, and Grok Soup. The
unscrubbed union of these lists is not a pretty sight.
There is a lot of effective duplication -- sites
register themselves under slightly different URLs, or
they register different formats, or they change domains.
There are also issues with intermediate aggregation
(some of these have gone away now that StartsHere.net
is mostly history).

To deal with this, I have built an alias file which
attempts to organize all of the various URLs for a site
into an "alias set". The file is available at

	http://www.vertexdev.com/chv_aliases.xml

Here is an example alias entry:

  <alias>
  <url>http://chaosn.com/xml/chaos.rdf</url>
  <url>http://chaosn.com/xml/chaos.xml</url>
  <url>http://chaosnetwork.com/xml/chaos.xml</url>
  <url>http://chaosnetwork.com/xml/chaos.rdf</url>
  <url>http://chaosn.weblogger.com/xml/scriptingnews2.xml</url>
  <url>http://theweb.startshere.net/channels/229/RSS91.XML</url>
  <url>http://theweb.startshere.net/channels/265/RSS91.XML</url>
  </alias>

The file currently has 2122 lines of text and it contains
814 entries. Building it and keeping it current is a real
pain -- definitely the least exciting part of my release
process. I have a debug Headline Viewer which does some
semi-automatic detection of potential duplicates.

The file is definitely available for public use, and it
would be great if other users could contribute to it in
some way.

I am happy to help push this forward in any way possible.

Jeff;


Jeff Barr - Vertex Development
Office: 425-868-4919  *** New Number ***
Home:   425-836-5624
mailto:jeff@vertexdev.com
http://www.vertexdev.com/~jeff
http://jeffbarr.editthispage.com/
4610 191st Place NE. Redmond, WA


-----Original Message-----
From: David Galbraith [mailto:david@moreover.com]
Sent: Thursday, October 12, 2000 12:46 PM
To: syndication@egroups.com
Subject: RE: [syndication] Question: discovering sources


I guess a 'meta' OCS format that aggregated OCS category listing from a
number of sites and removed duplicates would be the way to go.
e.g. http://w.moreover.com/categories/ocs/ocsdirectory.rdf plus another from
Userland, Meerkat, Netscape etc.
Dave

. . . . . . . . . . . . . . . . .
David Galbraith - Chief Architect, founder
Moreover.com - the webfeed company
david@moreover.com
415-577-8828 (US)
0777-565-8880 (UK)
favorite webfeed:
http://www.moreover.com/xml

References:
- RE: [syndication] Question: discovering sources
  - From: "David Galbraith" <david@moreover.com>

Prev by Date: RE: [syndication] Question: discovering sources
Next by Date: Re: [syndication] Question: discovering sources
Previous by thread: RE: [syndication] Question: discovering sources
Next by thread: Re: Question: discovering sources
Index(es):
- Date
- Thread