[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [syndication] XML validation with XSD



Thanks Ian - I thought perhaps I'd missed something (I'll never forget the
trouble I had trying to find something in the XML spec that stated whether
or not it was case-sensitive...)

In these bandwidth-conscious days I'll do my bit by responding to Bill's
comments here too.
I agree in principle that brute forcing your way around bad XML is generally
a bad idea. Far better to flag & report the problem, and help the
content-generator fix the problem. There are times however when this isn't
an option - for example, if you are building a reader then you can't always
expect the end user to provide feedback to the generator of the dodgy feed.
A possibility here is to use something like the W3C's HTML Tidy to clean up
the markup before the rest of your app gets to see it. I'm trying this now
myself, and the actual code needed for the filtering is pretty small. I'm
not yet certain about the time overhead needed for processing, but I suspect
that won't be significant compared with the time taken to get the data.
Unfortunately this doesn't address the validation at a schema level, so
again isn't much help to Kanimozhi :-(

Cheers,
Danny.




-----------
Danny Ayers

Semantic Web Log :
http://www.citnames.com/blog

"The lyf so short, the craft so long to lerne." - Chaucer



>-----Original Message-----
>From: Ian Graham [mailto:ian.graham@utoronto.ca]
>Sent: 01 December 2002 22:16
>To: syndication@yahoogroups.com
>Cc: Danny Ayers
>Subject: RE: [syndication] XML validation with XSD
>
>
>
>You're correct -- I mis-stated what's in the specification, layering on
>what seems to be common practice (most parser's I've seen simply bail on
>the first problem), rather than set-in-stone requirements.
>
>There in fact is nothing to stop the parser from continuing, even in the
>face of so-called 'fatal' errors --
>
>XML 1.0 defines a 'fatal error' (http://www.w3.org/TR/REC-xml#dt-fatal) to
>be:
>
>   An error which a conforming XML processor must detect and
>   report to the application. After encountering a fatal error, the
>   processor may continue processing the data to search for further errors
>   and may report such errors to the application. In order to support
>   correction of errors, the processor may make unprocessed data from
>   the document (with intermingled character data and markup) available to
>   the application. Once a fatal error is detected, however, the processor
>   must not continue normal processing (i.e., it must not continue to pass
>   character data and information about the document's logical structure
>   to the application in the normal way).
>
>A 'regular' error is then defined as:
>
>   A violation of the rules of this specification; results are
>   undefined. Conforming software may detect and report an error and may
>   recover from it.
>
>So there is absolutely no 'requirement' to stop -- just that you have to
>appropriately indicate what happened if the error is 'fatal',
>
>The specification then is careful to note when errors are 'fatal'. There
>are actually very classes of 'fatal' errors -- basically (as you note)
>violations of well-formedness constraints, plus some additional
>requirements on the structure of entities (e.g., it's fatal if the
>processor can't handle the character encoding of an entity; if there is a
>general entity reference in an entity that points to an unparsed entity;
>reference to an external entity inside an attribute value, and a few
>others ..
>
>The  defintion of 'validity constraint' in teh XML specification
>(Section 1.2) states:
>
>  Violations of validity constraints are errors; they must, at user
>  option, be reported by validating XML processors
>
>And then later, at http://www.w3.org/TR/REC-xml#dt-validating
>
>   Definition: Validating processors must, at user option,
>   report violations of the constraints expressed by the declarations in
>   the DTD, and failures to fulfill the validity constraints given in this
>   specification.] To accomplish this, validating XML processors must read
>   and process the entire DTD and all external parsed entities referenced
>   in the document.
>
>So these are errors, but not fatal ones, and the processor can happily
>continue on.
>
>This doesn't help Kanimozhi, unfortunately, but at least the truth is now
>out there ;-)
>
>Ian
>
>On Sun, 1 Dec 2002, Danny Ayers wrote:
>
>>
>> >Parsers are required (by specification) to stop when they encounter
>> >an error, which is why you're seeing this behavior. Indeed, a parser
>> >that does not stop at the first error would be in violation of the XML
>> >specifcation.
>>
>> I can't find anything that says this in the spec [1] - do you have a ref?
>> This would certainly be expected behaviour if the XML wasn't
>well-formed (no
>> way of telling what was meant to come next). But Kanimozhi's data is well
>> formed, and I think it probably should be possible to report more than a
>> terminatin validation error. I haven't personally done much with XSD
>> validation, but I'm guessing that even if existing tools don't
>support full
>> error reporting then there are at least two (hacky) possibilities - dive
>> into the parser's source and get rid of all the "System.exit()"s or
>> whatever; or write a series of schema, separating out the
>constraints - run
>> them in different orders (sounds a bit unlikely, but I think it
>might just
>> work).
>>
>> Cheers,
>> Danny.
>>
>> [1] http://www.w3.org/TR/REC-xml#proc-types
>>
>> >On Thu, 14 Nov 2002, skanimozhi_sel wrote:
>> >
>> >> hi,
>> >>     we at present doing an application which requires the validation
>> >> of well formed XML file across an XSD.But the problem we faced is,all
>> >> the parsers stopped parsing when it encounterd the first error in the
>> >> XML file .Our requirement is we want the xml file to be parsed fully
>> >> and get all he errors in that file.Can anybody help me
>> >>
>> >> Thanks
>> >> Kanimozhi
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Your use of Yahoo! Groups is subject to
>> >http://docs.yahoo.com/info/terms/
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>> >Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/
> >
> >
>
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
>





Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/