[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] Scraper code?
> Manipulating metadata in Web servers is a big problem, yes. However, most
> *do* allow setting content-types and other headers if there's even a
> minimal amount of administrative access (cacheability headers are also an
> area that suffer because of this). Worst case scenario, a CGI or other
> server-side script can be used to set the Content-Type...
I strongly disagree. Most hosted situations do not, by default, allow for
.htaccess manipulations. They indeed SHOULD allow for this but out of the box
most do not. So if we're going to help the thousands of users out there we need
to give consideration to the reality they're working within.
> Another approach is to use the 'type' attribute on links to determine the
> type of the target. I realize that it was intended as a hint, and
> therefore the actual Content-Type should take precedence, but IMHO it's a
> useful thing precisely because of the usability/policy problems with HTTP
> headers.
Uh oh, headed down that slippery slope to hlink... Heh, you're right of course
but expecting people to alter their HTML to accomodate a type attribute on <a>
tags is even more remote than content header configuration.
> Agreed. IMHO 'application/rdf+xml' is no better than 'application/xml';
> you'll still need some mechanism to look at the namespaces in the doc to
> figure out where you should dispatch it to (in this case, to an
> aggregator) and having *two* dispatch mechanisms is just silly.
Well, it'd certainly be great to have some sort of local 'dispatcher' handling
all incoming RDF but that's not terribly likely in the near future. So using
application/rss+xml does at least avoid stepping on the toes of anything that
might ever emerge for handling RDF in a dispatched manner. As many folks, I'm
not holding my breath there...
It'd be a lot less silly to have one that worked specifically for RSS. Then we
could concentrate on getting people to serve up their RSS with that specific
content type. Given that RSS is probably the single most widely distributed
form of XML and RDF it's probably best to make use of it's own content type.
A generic dispatcher would be interesting. The complexity required of if,
however, would be quite daunting.
-Bill Kearney