[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Scraper code?



On Fri, 11 Oct 2002, Mark Nottingham wrote:

> > The client can only react to what it's being told be the server.  If the
> > filename extension isn't consistent (and many are NOT) then how is your
> client
> > going to 'detect' what to do with the URL?  It'd have to use content
> type info
> > or even content negotiation.   This DOES work but as I've posited, many
> servers
> > are NOT currently doing this correctly AND won't be able to do so.  Not
> from a
> > technical standpoint as much as from a procedure/policy issue.  Folks
> are often
> > faced with serving up RSS from servers that won't let them change the
> content
> > type.
>
> Manipulating metadata in Web servers is a big problem, yes. However, most
> *do* allow setting content-types and other headers if there's even a
> minimal amount of administrative access (cacheability headers are also an
> area that suffer because of this). Worst case scenario, a CGI or other
> server-side script can be used to set the Content-Type...
>
> Another approach is to use the 'type' attribute on links to determine the
> type of the target. I realize that it was intended as a hint, and
> therefore the actual Content-Type should take precedence, but IMHO it's a
> useful thing precisely because of the usability/policy problems with HTTP
> headers.
>
>
> > Yes, content type would help a lot and everything out there should be
> making
> > steps to be sure their RSS output is being sent as
> 'application/rss+xml'.  Now
> > bear in mind that the RDF WG would prefer using 'application/rdf+xml'.
> That's
> > fine but for documents clearly intended to be RSS I'd suggest using the
> former.
> > Small steps here folks.
>
> Agreed. IMHO 'application/rdf+xml' is no better than 'application/xml';
> you'll still need some mechanism to look at the namespaces in the doc to
> figure out where you should dispatch it to (in this case, to an
> aggregator) and having *two* dispatch mechanisms is just silly.

I think there's a difference.

RDF-based mixed namespace XML documents often have a much finer-grained
mixing of their namespaces. Non-RDF XML docs often use a single namespace,
or have large islands of homogenous XML (eg. XHTML with blobs of MathML or
SVG inside).  In an RDF context, you're less likely to dispatch based on
namespace recognition alone, since there'll often be several being used
together (eg. DC and RSS). So for me at least, application/rdf+xml is
handy... since it picks out a whole class of XML docs that I'll be
dispatching to the same tools/services.

Dan


-- 
mailto:danbri@w3.org
http://www.w3.org/People/DanBri/