[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Scraper code?



> The client can only react to what it's being told be the server.  If the
> filename extension isn't consistent (and many are NOT) then how is your
client
> going to 'detect' what to do with the URL?  It'd have to use content
type info
> or even content negotiation.   This DOES work but as I've posited, many
servers
> are NOT currently doing this correctly AND won't be able to do so.  Not
from a
> technical standpoint as much as from a procedure/policy issue.  Folks
are often
> faced with serving up RSS from servers that won't let them change the
content
> type.

Manipulating metadata in Web servers is a big problem, yes. However, most
*do* allow setting content-types and other headers if there's even a
minimal amount of administrative access (cacheability headers are also an
area that suffer because of this). Worst case scenario, a CGI or other
server-side script can be used to set the Content-Type...

Another approach is to use the 'type' attribute on links to determine the
type of the target. I realize that it was intended as a hint, and
therefore the actual Content-Type should take precedence, but IMHO it's a
useful thing precisely because of the usability/policy problems with HTTP
headers.


> Yes, content type would help a lot and everything out there should be
making
> steps to be sure their RSS output is being sent as
'application/rss+xml'.  Now
> bear in mind that the RDF WG would prefer using 'application/rdf+xml'.
That's
> fine but for documents clearly intended to be RSS I'd suggest using the
former.
> Small steps here folks.

Agreed. IMHO 'application/rdf+xml' is no better than 'application/xml';
you'll still need some mechanism to look at the namespaces in the doc to
figure out where you should dispatch it to (in this case, to an
aggregator) and having *two* dispatch mechanisms is just silly.