[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to scrape?



Alis Marsden <alis@purplepages.ie> wrote:

> I'm guessing I'd do it by spidering the pages somehow but there really
> doesn't seem to be much information about how it could be done on the web.

The way I do it is that I load the text of the page in to memory, than use
regular expressions to extract the proper information. Then I just spit it
out in RSS.

If you'd like some Tcl code to do this, I can send you some.

-- 
[ Aaron Swartz | me@aaronsw.com | http://www.aaronsw.com ]