[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Automatically Transforming Blog or HTML Content into XML



> > i'd argue that a link-crawler is totally the wrong approach to getting
> > accurate stats on a mega-site like blogger - it works much better to
> > get a manual dump of the blogs hosted there that are updated, as
> > weblo.gs does....
>
> I'd argue a combination of the two is an even better idea.  It's one
> thing to see that a site exists.  It's trivial to create a site and
> particpate in a listing site like blo.gs or syndic8.  Where it's not
> trivial is to have that same site appear 'associated' with other sites
> already in the index.  The combination of link crawling with site
> cross-linking is probably the best bet.

absolutely true.  :)  our paper is very clear about the methodology we're
using - the statistical numbers must be interpreted in line with the
methods being used.

i'd hazard a guess that the # of blogger blogs is substantially more than
57% of what's out there, but maybe not the 80% we're getting as a result.
again, it depends entirely on what you call a blog, and that's a shifty
question in itself.  the reified answers that most people are willing to
provide will not suffice for research purposes.  :)


> As we build out larger lists of feeds we stand the chance of building
> out larger lists that can be cross-referenced.  Likewise, as more
> content comes online the users will demand more effective ways to refine
> the list of what's presented to them.  But at this stage of the game
> there's frankly TOO LITTLE content online to start thinking about using
> exclusionary filters.

i'm very interested in ways that this piece of the game might develop.

it is pretty interesting that you say there's "too little content";
what're your criteria for "enough content" to start filtering in earnest?

elijah