[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Translate non-structured documents into Xml RSS format



Hi all,

Though I'm preaty new on this list, I don't think this issue has 
already been discussed... 

I would like to know if anybody has already worked on a bot that 
could grab unstructured documents and translate them into RSS format.

This should work as following :
Someone should fill a quew with several URL. The bot will then browse 
those web pages like a spider, recognise documents, grab them and 
save them into a database using the diferent fields required for RSS 
format. Then you'll only have to export your datas in a RSS format.

Each time you'll have to grab documents on a specific URL, your bot 
should be learned to recognise document structures knowing the 
website graphical chart.

I think Ondisplay have such a bot... has someone arleady read about 
such a stuff ?

Thanks a lot 
Ben.