[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Can anyone guess the aggregator?



> Once an hour from
> the same IP? That sounds like it might be a scraper, looking a little too
> hard to see if you've suddenly added an RSS feed that it can switch to.

A pretty dumb scraper.  There's little sense in harassing a site every hour on
the hour to see if it's suddenly started providing one of those magic URLs.

Does a traceroute on the IP address of the offending program turn up anything
intelligent?  We've 'outted' lame IP addresses before (did perdue ever stop
running that thing?) so it's not like you'd be making a horrible faux pas.  Or
perhaps sharing it privately with other site operators to see if they turn up
any matches.  I could run it against the syndic8 logs...

Here's a thought, setup a dummy page at one of those URLs.  Use some php to pull
out the user agent info (if your webserver doesn't already) as well as the rest
of what data it can collect.  Then consider returning them a legitimately formed
RSS file that says something to the effect of "HEY!  YOU! STOP DOING THIS!"

-Bill Kearney