[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] XML Character encoding (again)
> Umm. Err.
>
> I'm getting seriously pissed off with this. The site where this is
> happening typically contains UK English text so we're talking about a
> limited number of awkward characters. £, €, and smart quotes and
> that's about it.
Fine, then use ISO-8859-1. Just remember that only ISO-8859-1 characters can be
used with that encoding. Which encoding you use is less important that using it
/consistently/.
> I'm really tempted to just say "tough". If you don't like the character
> put in a "?". If you're parser barfs on my feed, well don't read it.
> Programming hours are too short to start figuring out client browser
> capability, UTF-8 conversion from arbitrary encodings and so on.
This is an XML thing. That it's hard is a result of it being /possible/. Think
this is bad? Imagine it before.
> The point here is that it's RSS containing plain text read by human
> beenz. I'm not trying to get 100% perfect transfer of data, I'm trying
> to facilitate human communication.
Certainly a laudable goal.
> Getting back to trying to solve this.
>
> I'm genuinely puzzled that a CDATA block isn't enough to protect the
> text byte stream from aggressive parsers.
"aggressive parsers" wooo, there's an loaded statement. CDATA doesn't change
the fact that all data within the document MUST use the same encoding. A
document in UTF8 requires ALL characters be legal for UTF8. Regardless of
whether they're wrapped in a CDATA. Likewise, if you want to use any other
encoding you have to make sure your XML data is ALL consistently using that
encoding scheme.
> And I wonder if I'm confusing everyone by suggesting UTF-8. Perhaps if I
> used another encoding, the feed would be more likely to survive given
> that the vast majority of users are generating this text with Wintel PCs
> running IE.
Heh, it could be worse, you could be getting windows-1252 encoding. Or one of
the *hundreds* of other possible encodings.
FYI: http://www.w3schools.com/xml/xml_encoding.asp
Weeee, more encoding foolishness. Web browsers, once again, prove not to be
your friend. The steps various browsers take into 'helping you' display data
often mislead you into thinking the data is set up right. Those aggressive
parsers? Heed their warnings for they're correct.
-Bill Kearney