[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Re: syndication and i18n




> > - How does one deal with creating an HTML page from XML feeds which
> >  have potentially radically different charsets (i.e., ASCII and
> >  double-byte chinese on the same page)?
>

> Pick a superset of all the encodings (UCS-2?).

UTF-8 will handle everything, mixed languages and such.  That's one of the reasons all XML parsers have to handle UTF8 properly.

Since HTML doesn't have the XML's <?xml?> declaration, I think you probably have to say it's UTF8 in the headers.  (is that right?)

My take: use a decent XML parser and you'll have all the parse-side encoding issues completely handled for you, and your Python code will just see Unicode.  It might mean you end up with a stricter aggregator than some (eg. you won't be able to accept <item>stuff<img src="" because it's badly formed), but IMHO that's not a bad thing.


-Hugh

hpyle@agora.co.uk       | +44 (0)20 8783 3592
http://www.agora.co.uk/ | http://groovelog.agora.co.uk/  | http://rendezvoo.net/