[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Bad entities



On Sat, 6 Oct 2001, Julian Bond wrote:

> In the last few days, I've seen a few rss feeds that have malformed XML
> due to non XML entities in encoded html in the <description> element.
> These are almost always perfectly valid html entities.
> 
> Looking at http://www.webreference.com/xml/reference/xhtml-lat1.txt it
> appears that there's a way of adding an entry in the xml file that more
> or less says "Support xhtml entity definitions". But I don't quite
> understand the syntax. Can someone throw some light on this and suggest
> exactly what RSS writers should add to their files? 

There are two options:

1) include the needed single-character entity definitions from the
xhtml-lat1.ent file _directly_ inside the DTD at the start of an XML (RSS)
messge, as in:

<!DOCTYPE rss ......  [
 
<!ENTITY nbsp   "&#160;"> <!-- no-break space = non-breaking space,
                                  U+00A0 ISOnum -->
<!ENTITY iexcl  "&#161;"> <!-- inverted exclamation mark, U+00A1 ISOnum
-->
<!ENTITY cent   "&#162;"> <!-- cent sign, U+00A2 ISOnum -->
<!ENTITY pound  "&#163;"> <!-- pound sign, U+00A3 ISOnum -->

]>

OF course, then it's up to the author to do this, which is what's not
happening.

2) include an external entity declaration in the DTD (one that references
the complete xhtml-lat1.ent resource) and then include that entire entity
into the DTD, as in:

<!DOCTYPE rss .... [

<!ENTITY % HTMLlat1 PUBLIC
       "-//W3C//ENTITIES Latin 1 for XHTML//EN"
       "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent";>
    %HTMLlat1;

]>

The ENTITY declaration maps the name HTMLLat1 onto the URL-referenced
resource, and the reference

%HTMLlat1;

asks the XML parser to retrieve the entity and insert it at that point.

These will only work if the XML (RSS) parser actually processed DTD
content and can also access the resource.

Hope this is the answer you were looking for --

Ian