[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] RSS aggregators for non-technical folk
> How long do you keep it for and how big is your data-store now?
The size of the store is over 2gb. I've not bothered to break it down as to how
the data is being consumed. It's covering a range of about 2 years of RSS and 6
years of mail.
I can tell you that when using Radio it was *hideously* bad at handling anything
more than a few months worth of data.
> I keep questioning it, but I'm still convinced that an RSS aggregator
> really needs a database.
I disagree. The trouble is the old George Carlin joke, "you can't have
everything, where would you put it?" What I am finding is that by having a
database of the items, even the ones I don't bother 'reposting or bookmarking' I
find I can build inferences. Were they not in a database it would be nearly
impossible to query this way.
> Even though I use MySQL, it doesn't have to be
> relational, that's just the way I think.
I'm not taking the relational approach. I'm brute forcing queries.
> This tends to work against
> client side aggregators a bit as requiring a real database to run does
> restrict things a bit. But for server side it's trivial as any apache
> system is very likely to already have mysql along with php, perl and
> python. And any MS ASP system is likely to have access to a SQL Server.
The RDF people are on to something when they talk about using triples to store
this stuff. It does require more queries to "get" to the stuff. But given the
obscene amounts of CPU available these days this doesn't seem like it's a
problem. The trick is in building use patterns that help the system build the
indices it needs to give you the instantaneous results. Even now I'm only
halfway to understanding my own usage patterns well enough to build the needed
queries.
What does help, and this is where the ODB in Radio is /almost/ right, is being
able to store structured blobs of the data. When using the ill-fated Newton we
had a construct known as a frame. This is similar to what Radio calls a table.
The Newton's ability to introduce inheritance was /far/ superior, however. But
much like the Newton, Radio's handling of very large tables introduces
absolutely unacceptable performance problems. The object store in Exchange does
let me do /some/ of this but it's still not quite right.
Bolt up RDF triples to something that presents data in constructs like cursors
and frames and you've REALLY got something cooking.
-Bill Kearney