[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Rogue "&"



The Syndic8.com project is going at an amazing rate. There's now 1200
quality feeds with another 1000 awaiting approval. Cleaning the data is
slow and laborious work and could use a few more people ;-(

But it's become apparent that there are a large number of feeds with two
basic errors.
1. Unescaped &'s all over the place. In titles, links, descriptions,
site names and so on.
2. Unknown entities (but valid in html) in feeds with no DTD.

This is so common, I'm not sure what we do about it. It's only actually
a problem for readers that validate XML, and most of them are tolerant.
But it is invalid XML.

-- 
Julian Bond    email: julian_bond@voidstar.com
CV/Resume:         http://www.voidstar.com/cv/
WebLog:               http://www.voidstar.com/
HomeURL:      http://www.shockwav.demon.co.uk/ 
M: +44 (0)77 5907 2173  T: +44 (0)192 0412 433
ICQ:33679568 tag:So many words, so little time