[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Re: robots.txt and rss



On Fri, Nov 08, 2002 at 11:58:52AM -0500, Dan Brickley wrote:
> You might want to restrict certain (disfunctional, annoying etc)
> user-agents, eg. if they poll impolitely often.

my own experience with such hostile user-agents is that the odds of them
honoring robots.txt is roughly zero. they usually end up being custom
perl or php scripts run from a cronjob or during a page load. the few
times an aggregation tool has been released with an overzealous default
poll frequency, the tool was quickly corrected.

sometimes the authors notice if you start handing them 403 responses,
sometimes not. from yesterday, these hosts got 403 responses from my
feeds:

  48 cassium.procopia.com
  47 csociety.ecn.purdue.edu
  72 usersweb1.go-concepts.com
  20 wwwcache2-ext.lancs.ac.uk
  40 wwwcache3-ext.lancs.ac.uk

the csociety.ecn.purdue.edu site has been getting them for over a year.
(and attempts to contact the author of the software doing so have been
fruitless, although the frequency appears to have gone down since last
time i looked.)

as i see it, having robots.txt handling in popular aggregation tools
would just multiply the number of requests to my site with zero
likelihood of benefit.

jim