[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [syndication] Re: New poll for syndication



Yes!

Let's focus on the use of syndicated content. 

We can do so on this list or on another one. 
Let's talk about what we are doing now and what 
we can do in the future. Perhaps this will bridge 
those of us doing cool stuff now and those us 
planning to do something cool in the future, and 
let us find some common ground driven by real 
needs of real applications.

Let's find some issues and solve them in a clean
fashion. 

I'm wondering if one of the issues here is that
some of the "namespace folks" have some kind of
future vision that the rest of us have yet to 
get. Not because they have hoarded it up and kept
it a secret, but perhaps because they've looked
at the current level of "cool" and seen where it
can go next -- if only RSS has "amazing new 
feature X". I'm not accusing anyone of anything
here. 

So let me talk about what I am doing, right here,
right now. Headline Viewer inhales content from
over 2600 sites. We ship it with 536 sites builtin,
and the user can choose to load in Userland's
service list (1687 more providers), the xmlTree
list (366 more when loaded after Userland's), and
30-50 more providers from GrokSoup.

On any given day, there is always something broken.
Servers are sometimes down or busy. By far the
more difficult problem to deal with is the fact
that much of the XML is in bad shape. Here are
some real-world examples:

* I've had to drop at least one site (www.allusb.com) 
  because the XML parser could not digest the 
  non-standard encoding attribute in their XML.
  Repeated messages to the site over the course
  of several months failed to elicit a response.

* Even the very simple character encoding rules of
  XML cause problems. You'd all be amazed at how
  many sites suddenly become dysfunctional when
  "AT&T" is prominent in the news. This is compounded
  by the fact that the news scrolls by quickly 
  enough that troublesome headlines are often gone
  between the time that I get a report and I can
  investigate.

* Many sites can't manage to include the simple
  88x31 "button" <imageurl> without fouling it up.
  The Dire Straits Lyrics archive includes an
  320x442 picture of Mark Knopfler in there. Its a
  nice picture, but it definitely does not follow
  the rules. Lots of sites have no image, so we
  spend time digging them up.

* There is not much of a consensus on the use of
  <title> vs. <description>. Headline Viewer
  uses description if present, and then defaults
  to title. But a fair number of providers put
  weird meta-info in the description, so I store
  and respect a "use title" flag for each built-in
  news provider.

* Some sites accidentally spew debugging info into
  their XML. Don't laugh, its happened more than
  once.

Now how could things be improved? I've got to get
to sleep, but let me rattle off a few ideas:

1. Categorization. This is a rat's nest. Some 
   sites want thousands of categories. I want
   10-20. I want them to reflect the kinds of
   things that users want (Business, Technical,
   Sports, etc.) 

2. More widespread use of service lists like
   that found on Userland. eGroups can now
   generate RSS. So we feature (in a release
   that will go out the door in a day or two)
   drag-drop eGroup integration. Subscribe
   to an eGroup, drop any URL that mentions the
   group name on Headline Viewer, and wham, you
   can read the list headlines. Way cool. But
   it would be cooler to get a list of mailing
   lists from eGroups, let the user subscribe
   to them, etc.

3. More content. We've but scratched the surface
   here. I want content in all sorts of languages
   for all sorts of topics. I want the NY Times
   bestsellers as an RSS file, and I want 
   eBay categories in RSS. I want press releases,
   I want regional info for places I've never
   heard of.

4. More metainfo so that I can more easily track
   down those who generate bad RSS (and help 
   them). I think that this has been proposed. 
   Excuse me if its there already; its late and I
   am tired.

5. More awareness of the whole syndication concept.
   Imagine how much great content we would have
   if we could position syndication as a form of
   site advertising? Give out your headlines for
   free, get visitors to read the articles. Not a
   bad deal at all.

6. Unique site IDs. This is a messy problem. What
   I have found is that the same content has been 
   registered for syndication under multiple URLs.
   Sometimes this reflects evolution, perhaps from
   a sub-domain on a free site to a true top-level
   domain. Other times the site can generate content
   in several forms that I have to consider equivalent
   for my purposes. Moreover can emit RSS or their
   own <moreover> format. I use the <moreover> form,
   but the serviceList at Userland includes the
   RSS form. 

   To detect and eliminate the duplication
   that this causes, I've built and maintain an 
   alias list (http://www.vertexdev.com/chv_aliases.xml).
   Take a look at this to understand the problem.
   The first entry contains 7 names for the same
   content. I build this list semi-automatically but
   I have to do sufficient manual checking that I 
   cannot see this scaling to accomodate say 50K
   providers. Headline Viewer loads this list and
   uses it to avoid duplicating built-in providers
   with those loaded from the service lists.

Gack, I've written a lot. I hope this is some good 
food for thought. I'm really looking forward to 
some productive discussion and forward motion.

Jeff;

Jeff Barr - Home: 425-836-5624 Office: 425-936-3098
mailto:jeff@vertexdev.com
http://www.vertexdev.com/~jeff
http://jeffbarr.editthispage.com/
4610 191st Place NE. Redmond, WA


-----Original Message-----
From: Dave Winer [mailto:dave@userland.com]
Sent: Wednesday, September 13, 2000 9:50 PM
To: syndication@egroups.com
Cc: Lynn Siprelle
Subject: Re: [syndication] Re: New poll for syndication 


Maybe some people don't want to contribute to standards, maybe they just
want to get more hits for their sites. I think we should start a new mail
list with a charter that puts the focus on use of this stuff, not on a
standards process. Just speaking for myself, this is exactly what I want to
get away from, I've had enough standing at the fork, Aaron, you got
everything you wanted, go create something new with the RSS 1.0 people.
Thanks. Dave