[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Guaranteed unique?
We are working on a syndicated news site (not the one I posted about
before), and we have a problem with guaranteeing the uniqueness of the
information we pull.
It goes like this:
We pull an XML from a place where it's not dynamic, but is put
statically on a server and is updated once every two hours.
A cron job on our site pulls it every one hour (it also pulls from
another 30 sources and we can't change the periods).
The database is too heavy on our site and we don't want to make a
query to find out if we already have a tex somewhere with the same
guid, so we don't want to use guid.
Now, if we say "take only news from the last hour" - we will through
away half of their items.
If we say "take only the news from the last two hours" - then if this
script runs twice as often compared to theirs, we will get every item
twice...
Any ideas what we can do?