[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Policies and so on

To: syndication@yahoogroups.com
Subject: Re: Policies and so on
From: Pino Calzo <pino@calzo.com>
Date: Sat, 17 Jan 2004 17:16:20 +0100
In-reply-to: <1074342355.332.69415.m12@yahoogroups.com>
References: <1074342355.332.69415.m12@yahoogroups.com>
Reply-to: Pino Calzo <pino@calzo.com>

Hm - interesting subject, although really not very new ;)

> I assume they generate them from the html pages (which is
> probably a pain, I am grateful and impressed)!
thanks - although we've got a very automated system, it takes quite
some manual work to keep our thousands of sources up-to-date...We also
had to outsource parts of that work to paid "employees". Just if you
wonder what that premium, advertising and paypal stuff (aka: money) on
our site is about ;) We also have some sites not able to produce their
own RSS, but instead pointing to us. It's our policy not to scrape a
site, if there's an XML export (for obvious reasons..)

> And they would probably provide such files if they knew about RSS...
some actually really do. Especially since mid of last year we noticed
a strong increase in emails from newspapers/content providers stating
something like: "we noticed that you scrape our site - we are proud to
provide now our own RSS feeds - please reflect that".
Something we love to do as every RSS source reduces our admin overhead..

To the legal side: There are things like the Digital Copyright
Millenium Act and the European Union has something similar. Based on
these it's current court ruling (yes - there have been court trials
about this - not with us, but others in the field) that scraping is
considered "fair use" - as long as you clearly say where you've got
the headlines from and don't take more than the headline and 1-2
description sentences.
Our servers reside in Switzerland and the situation here is a bit
grey-shaded as there are no respective laws, but the rulings here tend
to reflect EU rulings.
I remember some huge discussions about this 2-3 years ago ;) It all
got very relaxed now. I personally think, this has a lot to do with
the evangelization efforts, the court rulings and that the big publishing
houses understood that RSS & scraping is NOT "evil", but to their
benefit ;) Also any restrictive ruling would put heavy legal pressure on
the search engine industry - something nobody wants

Scraping:
There are quite some "unofficial" rules we consider when scraping, but
the same applies to search engines. Google doesn't index every site,
neither.

I don't know if we're "more" responsible than others - it's just in
our very own interest to be responsible in scraping, to be reactive to
removal requests (happened - and there exists no public "black list")
and to be responsible in the way stuff gets exported on the other
side. Our spiders are recognizable & blockable, we respect robots.txt
etc.

I suppose this all helps us now, as we get many add/modify requests
directly from the publishers.

Ok - that has to be sufficient for now.
The subject has really been discussed here several times -
there's an archive otherwise for details ;)

Pino

Follow-Ups:
- Re: Policies and so on
  - From: "Jerome Chevillat" <jchevillat@yahoo.fr>

Prev by Date: Re: [syndication] Policies and so on
Next by Date: BBC News RSS
Previous by thread: Re: [syndication] [OCS] Perl Parsing Request
Next by thread: Re: Policies and so on
Index(es):
- Date
- Thread