[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Proper use of DOCTYPE?



Hi,

On Friday 22 March 2002 20:35, you wrote:
> Changing their header to read:
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd";>
> This assumes, of course, that they're going to be using that particular
> form of encoding.  The other choices for korean, russian, japanese, etc. 
> Or leaving it out entirely to indicate use of UTF-8 encoding.  And to avoid
> using entities at ALL by using the UTF-8 numeric encodings.
Actually, when they reference and use the entities, it doesn't matter which 
encoding they use, as there are no characters to encode (only non-ASCII is 
encoded). The entity definitions in the DTD just reference the numeric 
entities, which are valid in all encodings.

Also, they don't even need the entities, as UTF-8 always can be used, and 
most western characters exist in iso-8859-1, and can thus be used directly 
with that encoding, which must then of course be specified.
There's a whole lot of combinations that will work...

> As you point out, they could put their own entity definitions in the
> header.  I doubt many would really want to be doing this.
It depends on the RSS format in question, and I believe it's something like 
the following:

RSS 0.9 is really ugly, as it states [1] that the character set must be 
iso-8859-1, utf-8 is not allowed, but the declaration omits an encoding 
specification, and in XML terms this is then defaulted to utf-8!
Also, more importantly in this regard, it states that decimal and HTML 
entities are allowed, but entities must be declared before use, and there's 
no reference to a DTD or other entity declaration. In my opinion it's 
impossible to create a well-formed international RSS 0.9 feed without 
violating something.

RSS 0.91 states [2] that there must be a reference to the DTD (this is 
missing from this feed, but is somehow understandable, since the DTD was 
removed by Netscape at one point, causing a lot of grief). The DTD contains 
the mostcommon entity declarations, i.e. &eacute;, so if the DTD reference is 
present, use of the entities defined there is legal.

RSS 0.92 [3] doesn't say anything on the subject, except that it's 
'upward-compatible' with 0.91.

RSS 1.0 [4] specifically states that:
"Since RSS 1.0 does not require a DTD, be sure to include inline declarations 
of entities used aside from the aforementioned five."
(The five mentioned entities are the always-valid &lt;, &gt;, &amp;, &apos; 
and &quot;.) It also includes a widely used example.


Morten Frederiksen

[1] http://www.purplepages.ie/RSS/netscape/rss0.90.html
[2] http://my.netscape.com/publish/formats/rss-spec-0.91.html
[3] http://backend.userland.com/rss092
[4] http://purl.org/rss/