Thanks! I can use a DOM implementation just as easily, since I'd do XPath on it, but a SAX driver could be used in other situations more efficiently and with less memory usage.
What's the license? (I suppose I'll just ask on the SourceForge lists...)
mike
-----Original Message-----
From: Leigh Dodds [mailto:ldodds@ingenta.com]
Sent: Wednesday, November 07, 2001 2:31 AM
To: syndication@yahoogroups.com
Subject: RE: [syndication] ANN: xpath2rss 0.5
The Java Port of HTML Tidy will generate a DOM tree from
a HTML page. You could either manipulate this DOM directly,
or walk over it to generate SAX events manually. The latter
is very simple to do.
http://lempinen.net/sami/jtidy/
I suspect your main requirement is to get a view of
the HTML as a well-formed XML document, which Tidy will
do very well.
Cheers,
L.
-----Original Message-----
From: Mike Dierken [mailto:mike@datachannel.com]
Sent: 06 November 2001 21:50
To: 'syndication@yahoogroups.com'
Subject: RE: [syndication] ANN: xpath2rss 0.5
Do you know of a Java HTML->SAX event generator?
I'd like to do the same sort of 'screen scraping' via a normalized XML document created from HTML.
> -----Original Message-----
> From: Mark Nottingham [mailto:mnot@mnot.net]
> Sent: Thursday, November 01, 2001 10:49 PM
> To: syndication@yahoogroups.com
> Subject: [syndication] ANN: xpath2rss 0.5
>
>
>
> It's back, it's marginally better.
>
> http://www.mnot.net/xpath2rss/
>
>
> --
> Mark Nottingham
> http://www.mnot.net/
>
>
Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.