[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Scraper code?

To: <syndication@yahoogroups.com>
Subject: Re: [syndication] Scraper code?
From: "Mark Nottingham" <mnot@mnot.net>
Date: Fri, 11 Oct 2002 07:45:31 -0700
References: <5.2.0.4.0.20021009141808.024499e0@red.totalnetnh.net> <009701c26fc2$8ce87ba0$a502200a@mnotlaptop> <OE65BzULVoykVH91Yv80000e5b2@hotmail.com> <015701c2706f$68d33c70$560ba8c0@mnotlaptop> <3DA5A362.7090805@robustai.net> <023c01c2708a$1af38580$a502200a@mnotlaptop> <3DA5D14B.6060101@robustai.net> <050001c270a7$18e81ef0$a502200a@mnotlaptop> <3DA5FB9A.3010800@robustai.net> <055701c270ab$db723a10$a502200a@mnotlaptop> <OE53XISA3wDHDlYfjL50000f5d0@hotmail.com> <002401c270d8$536092c0$f4457743@mnotlaptop> <OE33QdwhjSJY3XNZyp90000fb37@hotmail.com>

> The client can only react to what it's being told be the server.  If the
> filename extension isn't consistent (and many are NOT) then how is your
client
> going to 'detect' what to do with the URL?  It'd have to use content
type info
> or even content negotiation.   This DOES work but as I've posited, many
servers
> are NOT currently doing this correctly AND won't be able to do so.  Not
from a
> technical standpoint as much as from a procedure/policy issue.  Folks
are often
> faced with serving up RSS from servers that won't let them change the
content
> type.

Manipulating metadata in Web servers is a big problem, yes. However, most
*do* allow setting content-types and other headers if there's even a
minimal amount of administrative access (cacheability headers are also an
area that suffer because of this). Worst case scenario, a CGI or other
server-side script can be used to set the Content-Type...

Another approach is to use the 'type' attribute on links to determine the
type of the target. I realize that it was intended as a hint, and
therefore the actual Content-Type should take precedence, but IMHO it's a
useful thing precisely because of the usability/policy problems with HTTP
headers.


> Yes, content type would help a lot and everything out there should be
making
> steps to be sure their RSS output is being sent as
'application/rss+xml'.  Now
> bear in mind that the RDF WG would prefer using 'application/rdf+xml'.
That's
> fine but for documents clearly intended to be RSS I'd suggest using the
former.
> Small steps here folks.

Agreed. IMHO 'application/rdf+xml' is no better than 'application/xml';
you'll still need some mechanism to look at the namespaces in the doc to
figure out where you should dispatch it to (in this case, to an
aggregator) and having *two* dispatch mechanisms is just silly.

Follow-Ups:
- Re: [syndication] Scraper code?
  - From: Dan Brickley <daniel.brickley@bristol.ac.uk>
- Re: [syndication] Scraper code?
  - From: "Bill Kearney" <wkearney99@hotmail.com>

References:
- Re: [syndication] Scraper code?
  - From: Morbus Iff <morbus@disobey.com>
- Re: [syndication] Scraper code?
  - From: "Mark Nottingham" <mnot@mnot.net>
- Re: [syndication] Scraper code?
  - From: "Bill Kearney" <wkearney99@hotmail.com>
- Re: [syndication] Scraper code?
  - From: "Mark Nottingham" <mnot@mnot.net>
- Re: [syndication] Scraper code?
  - From: Seth Russell <seth@robustai.net>
- Re: [syndication] Scraper code?
  - From: "Mark Nottingham" <mnot@mnot.net>
- Re: [syndication] Scraper code?
  - From: Seth Russell <seth@robustai.net>
- Re: [syndication] Scraper code?
  - From: "Mark Nottingham" <mnot@mnot.net>
- Re: [syndication] Scraper code?
  - From: Seth Russell <seth@robustai.net>
- Re: [syndication] Scraper code?
  - From: "Mark Nottingham" <mnot@mnot.net>
- Re: [syndication] Scraper code?
  - From: "Bill Kearney" <wkearney99@hotmail.com>
- Re: [syndication] Scraper code?
  - From: "Mark Nottingham" <mnot@mnot.net>
- Re: [syndication] Scraper code?
  - From: "Bill Kearney" <wkearney99@hotmail.com>

Prev by Date: Re: [syndication] Scraper code?
Next by Date: Re: [syndication] Scraper code?
Previous by thread: Re: [syndication] Scraper code?
Next by thread: Re: [syndication] Scraper code?
Index(es):
- Date
- Thread