[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Re: PHP tool for parsing RSS1.0?



> Not at all.  But its cool anyways.  What are you using the metaphone,
> and soundex fields for?

    Comparisions for duplicate data, etc. The Snewp will soon use them for
some new features too.

> This doesn't work for me because:
>
> * its seems to be discarding any fields associated with an item
> besides title, description, and link

    The current version only looks for title, description, link, and
category - the most common tags. Adding new elements is easy - can add 25
more in a matter of seconds. I do plan to add the extra elements, but this
parser had specific intentions, so I didn't bother yet.

> * its a remote service, where I want something local (perhaps is the
> source as available?)

    It is not a remote service. It is simply the only place I have that
particular code available to look at (output-wise anyway). I plan on
releasing a version of the code when it is more filled out - and as you
noted, it is missing some possible item elements, etc.

> * looks like its more focused on the .9x RSS formats.

    Not sure what you mean by that. <pick-a-fight>In a parsing sense, there
is no difference except how much BS the code has to deal with (the BS gets
thicker as the versions progress).</pick-a-fight>

    I use this parser for over 8500 different RSS (0.9x and 1.0) and true
RDF documents - several times a day. I have normalized the output for it's
intended use, but it doesn't change anything but identifiers. For example,
DocType for RSS 0.9x is the URL from the <!DOCTYPE .. > tag, if it exists,
but DocType for RSS 1.0 will return the URL of the primary Schema used.
Sometimes RSS 1.0 and RDF elements will be missed because the developer (or
author) of a feed isn't using standard namespace identifiers, but as I find
them, I can normalize them so they are parsed properly.

    If you want to peek at the source - feel free -
http://syndicatethe.net/dev/reader/stn-reader.txt

    This version excludes a rather complex schema recognition function that
is still having some issues. The function parses the schema includes so the
parser can recognize namespace specific elements on the fly -- like I said,
a bit buggy right now.

James