[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proper use of DOCTYPE?



> The distinctions between characters, how they're numbered, how
> they're encoded, and how they're represented textually are,
> admittedly, tricky to understand initially, but they're absolutely 
> essential to understand when you're writing international documents.

These were interesting reads:

http://www.macchiato.com/unicode/charts.html
http://www.macchiato.com/unicode/convert.html
http://www-106.ibm.com/developerworks/library/utfencodingforms/
http://www.w3.org/2001/06/utf-8-test/postscript-utf-8.html
http://www.hclrss.demon.co.uk/demos/ent4_frame.html

So I'm not completely clear on this, when using simple RSS, not with 
RDF, what's really "allowed" as character encoding in the elements?  
While there's certainly debate about using HTML inside the 
<description> element, let's assume for a minute we're going to allow 
HTML in there.  How should it be encoded?  

I do realize this presents a problem in that you'd have no real way 
to indicate what charset a given element is supposed to be using, if 
it required something different from that of the whole document.  
This would really be a problem for international feeds.  So let's 
just look at it from western languages (for now).  

Is it going to be correct to say that in order to use HTML entity 
references you'd have to "say so" in the DOCTYPE or DTD?  And that 
unless you used this you'd have to depend on the XML declaration's 
encoding attribute?  Which, unless specified, would be UTF-8?

Is this technically correct?  I'm not asking if that's what the 
observed behavior in clients uses.  I'm sure the clients are making 
some pretty liberal assumptions.  

What's the "right" way to do this?

-Bill Kearney