[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] Re: News Aggregator Software
> All true, except that I took his situation to be that he already had
> a feed but wanted the *articles* in plain-text. Surely nobody is
> going to provide the entire body of the article in the feed.
Some sites do just that. They have a headlines and a content feed. It's
entirely up to the site as to why they're providing content. Some folks want to
drive traffic to a web site. They often have very good reasons for doing this.
Keeping free content available often depends on web page views for statistics
and revenues. Thus having an RSS feed of headlines only is a way to preserve
their /required/ methodology. I'm all for this. It makes them have to put
something compelling in their RSS feed to ever get me to come visit their web
site. To pump out the full content into a feed would not be in their interest.
And to have someone scrape this specifically for that purpose is ASKING for
trouble. While it may be freely accessible via a browser, someone had to pay to
get it there. Cut off that revenue stream and there won't be content to scrape,
let alone headlines. There's a fine line of diplomacy here. One side wants to
take the "I'll do damn well whatever the hell I want with your content and you
can't stop me". The other side wants to enslave you with endless series of
pop-up ads and other dreck. I'm hoping to encourage people to find the middle
ground.
A point to consider, I pull feeds on my laptop through a caching proxy (squid) I
keep running on it. I have then altered my aggregator to push URLs to the proxy
such that it then pulls and caches the web pages. That way I can surf to most
of the pages and see the content. Alternatively on Windows you can simply push
the URLs to the syncronization manager and let it do the dirty work. It's
worked pretty well in most cases. I have hacked in a bit of "always download X
levels deep" from many sites I know I'm going to need.
> Maybe Josh can clarify...?
Indeed a URL of the web page in question would be most helpful.
-Bill Kearney