[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Quickie: HTML in RSS?



Morbus Iff wrote:
> Should HTML in RSS *always* be encoded to its entity? (<, etc.)?

No, not always.

I looked up HTML support in the different specs [1-3], since I was
curious about this FAQ. It's, of course, different for the different
RSS versions:

  RSS 0.9 :  no
  RSS 0.91:  no (by spec)
  RSS 0.92:  yes, entity-escaped
  RSS 1.0 :  no; maybe, with content module

All the "no"s are assumed: none of those specs mention HTML. Since
0.92 claims entity-escaped HTML as a new feature, 0.9 and 0.91 must
allow no HTML (however, 0.92 claims to be a description of
then-current use of 0.91, so there are/were feeds with entity-escaped
HTML claiming to be 0.91). 1.0 is presumably derived from 0.9 enough
to allow no HTML, and its examples contain no HTML (though in the
version I looked at, the examples had some unescaped 'es).

If an RSS 1.0 document makes use of the content module [4], it will
have a <content:format/> that may specify XHTML, and may have a
<content:encoding/>. If the format is XHTML and the encoding is not
given, character encoding (like 0.92) is assumed. The other encoding
option the spec names by name is well-formed XML, which is the only
case in all of the RSS specs in which there's HTML that isn't encoded
in character data.

So for 0.9, vanilla 1.0, and 0.91, all character data is for display
to the user. In 0.92, the character data is for interpretation by an
HTML-aware user-agent. Some 0.92 files may claim to be 0.91. In 1.0
with the content module, <content:format/> and <content:encoding/>
tell what to do.

Anyone who knows better (such as anyone involved in RSS 1.0
development, on RSS 1.0) should feel free to correct me. Anyone
compiling a FAQ should feel free to swipe from this post.

[1] http://backend.userland.com/rss091
[2] http://backend.userland.com/rss092
[3] http://purl.org/rss/1.0/spec
[4] http://purl.org/rss/1.0/modules/content/


Mark Paschal
markpasc@mindspring.com