[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

blogs and syndication



I'm hitting a problem that I feel the need to share. It's all related to
blogs and how they can or should be syndicated. And then how this
relates to RSS. 

News sites typically contain articles produced reasonably often. Each
article usually breaks down into 4 main elements. 
- A Title or headline that is about 40-50 characters that are designed
to attract attention.
- An Abstract paragraph of 2-300 characters that explain what the
article is about.
- A Body that contains the main text and graphics of the article
- A Link that is a fairly permanent URL for the article.

This is a pretty good match to an RSS item with Title=<title>,
Abstract=<description>, Link=<link>. There's quite a lot of consensus
out there in examples of RSS and this is the way most news sites produce
their files. It's common for the Abstract to contain one or more <a href
links and it's useful therefore for the <description> to contain those
same links. 

This is also an easy layout for an RSS reader or aggregator. Lists of
items can be displayed as a box of titles where each title is a link to
the page with the full text. Description can be left out for a condensed
box, or left in for a larger and more complete display. Taking the
slashdot.org home page as an example, The full display is like the
centre main section, the condensed display is like the RHS "Older Stuff"
list. 

But now what about blogs. There are numerous examples of what are
actually news sites that use blog technology as a quick and easy
publishing route. Of course, blogs get used for a lot of other things
besides. But a typical blog item only contains the abstract/body.
There's no title and frequently no permalink to the item or even a page
that the item will always be on. Blog systems also typically have a html
templating system so even though the html is generally consistent for
each item, there is no consistency between sites. Some blogs don't even
have the concept of an item, the closest being a whole day.

Then with the two main sources of blogs, Radio/Manila and Blogger, we
get another set of issues. Radio/Manila has got 3 syndication formats in
use with Scripting news, RSS 0.91 (with <desc> html stripped) and 0.92
(with no <title> or <link>) while Blogger has none. Various people
including myself have tried to hack an RSS feed on top of Blogger but
it's awkward and a kludge.  

Now if I've built an aggregator (which I have) that takes some news RSS
and some blog output and tries to display it in either a full or
condensed display I've got a series of problems. 
- Blogs that treat a whole page as an item make it impossible to check
for dupes so you see a new copy for every edit.
- The lack of title means you have to synthesize one.
- The lack of agreement about how to use <link> or the complete lack of
it, mean It's hard to get consistency about what happens when the user
clicks on a link.
- Getting html stripped out of some of the <description>s means you lose
the links that the blogmeister patiently added.
- Bad html code or bad underlying code (like the infamous
"title="Permanent link to ") have a habit of screwing up the display.   
- And then, many blogmeisters have little or no understanding of what is
producing their RSS and not much more about what is good or bad html. So
suggesting to them that there's something wrong is met with a blank
stare.

At which point this is turning into a rant. I can always just give up on
reading the output of sites that don't produce manageable RSS, but this
seems an enormous shame when their output is actually worth reading. I'm
not sure what to do about this beyond airing my frustration here and
hassling the individuals involved. I think the technical and social
problems are solvable but it's going to take a little commitment both
from them and from the rest of us. It feels to me that the Syndic8
project will have to devote as much time to trying to fix the RSS we
have and get blog sites to produce something useful, as to evangelizing
RSS to people who've never heard of it.
 
So here's some real world examples of what I'm talking about. Any ideas
on how to deal with this will be gratefully received!

http://www.evhead.com - Evan Williams site from blogger.com - No RSS and
a fairly complicated template with permalinks. I had to write a custom
parser with code to come up with http://www.newsisfree.com/sources/info/
2373/ it's not perfect but it almost works.

http://www.boingboing.net - Blogger powered - They've implemented the
<span class= kludge, but their item template is complex and the
heuristics don't work well. I ended up using another custom parser to
get 
http://www.newsisfree.com/sources/info/2376/ Like Evhead, it's not
perfect.

http://doc.weblogs.com - Horribly broken RSS at 
http://doc.weblogs.com/xml/rss.xml with an unusable title and the whole
day packed into one item.

http://blackholebrain.editthispage.com/xml/rss.xml - one item per day
with all the links stripped out. And because they edit frequently, my
view of it contains most of the intermediate copies. eg
http://www.voidstar.com/module.php?mod=import&op=feed&id=32

http://glennf.com/blog - Scripting News format only from the blog. But
with "RSS feed" next to the XML glyph. And http://glennf.com/ uses the
<span class= kludge *and* has a link to Radio. I'd suggest that Glenn
should have talked to the Cluetrain people when he met them, but that
would be unkind.

And finally a moderate success story.
http://blog.org/ - Blogger powered - David installed the <span class=
kludge in 5 minutes and the output came out not great, but serviceable.
http://www.newsisfree.com/sources/info/2271/ 

Oh, linkrot! 
 
-- 
Julian Bond    email: julian_bond@voidstar.com
CV/Resume:         http://www.voidstar.com/cv/
WebLog:               http://www.voidstar.com/
HomeURL:      http://www.shockwav.demon.co.uk/ 
M: +44 (0)77 5907 2173  T: +44 (0)192 0412 433
ICQ:33679568 tag:So many words, so little time