[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Automatically Transforming Blog or HTML Content into XML

To: syndication@yahoogroups.com
Subject: Re: [syndication] Automatically Transforming Blog or HTML Content into XML
From: elijah wright <elw@stderr.org>
Date: Fri, 13 Jun 2003 11:38:00 -0500 (CDT)
In-reply-to: <01c601c331c6$a6d7c4f0$2000a8c0@wkearney.com>
References: <bcagbh+cfoj@eGroups.com> <5.2.0.9.2.20030612213521.00b84238@mail.comcast.net> <vgVEI8A+yc6+EAn0@jblaptop.voidstar.com> <Pine.LNX.4.56.0306130943090.6274@illuminati.stderr.org> <n2m-g.Xns9399AC63ED719huftis@ID-99504.news.dfncis.de> <Pine.LNX.4.56.0306131004141.6274@illuminati.stderr.org> <01c601c331c6$a6d7c4f0$2000a8c0@wkearney.com>

> > i'd argue that a link-crawler is totally the wrong approach to getting
> > accurate stats on a mega-site like blogger - it works much better to
> > get a manual dump of the blogs hosted there that are updated, as
> > weblo.gs does....
>
> I'd argue a combination of the two is an even better idea.  It's one
> thing to see that a site exists.  It's trivial to create a site and
> particpate in a listing site like blo.gs or syndic8.  Where it's not
> trivial is to have that same site appear 'associated' with other sites
> already in the index.  The combination of link crawling with site
> cross-linking is probably the best bet.

absolutely true.  :)  our paper is very clear about the methodology we're
using - the statistical numbers must be interpreted in line with the
methods being used.

i'd hazard a guess that the # of blogger blogs is substantially more than
57% of what's out there, but maybe not the 80% we're getting as a result.
again, it depends entirely on what you call a blog, and that's a shifty
question in itself.  the reified answers that most people are willing to
provide will not suffice for research purposes.  :)


> As we build out larger lists of feeds we stand the chance of building
> out larger lists that can be cross-referenced.  Likewise, as more
> content comes online the users will demand more effective ways to refine
> the list of what's presented to them.  But at this stage of the game
> there's frankly TOO LITTLE content online to start thinking about using
> exclusionary filters.

i'm very interested in ways that this piece of the game might develop.

it is pretty interesting that you say there's "too little content";
what're your criteria for "enough content" to start filtering in earnest?

elijah

Follow-Ups:
- Re: [syndication] Automatically Transforming Blog or HTML Content into XML
  - From: "Bill Kearney" <ml_yahoo@ideaspace.net>

References:
- Automatically Transforming Blog or HTML Content into XML
  - From: "Loren Baker" <lorenbake@yahoo.com>
- Re: [syndication] Automatically Transforming Blog or HTML Content into XML
  - From: Mike James <mpjames@comcast.net>
- Re: [syndication] Automatically Transforming Blog or HTML Content into XML
  - From: Julian Bond <julian_bond@voidstar.com>
- Re: [syndication] Automatically Transforming Blog or HTML Content into XML
  - From: elijah wright <elw@stderr.org>
- Re: [syndication] Automatically Transforming Blog or HTML Content into XML
  - From: Karl Ove Hufthammer <karl@huftis.org>
- Re: [syndication] Automatically Transforming Blog or HTML Content into XML
  - From: elijah wright <elw@stderr.org>
- Re: [syndication] Automatically Transforming Blog or HTML Content into XML
  - From: "Bill Kearney" <ml_yahoo@ideaspace.net>

Prev by Date: Re: [syndication] Automatically Transforming Blog or HTML Content into XML
Next by Date: Re: [syndication] Automatically Transforming Blog or HTML Content into XML
Previous by thread: Re: [syndication] Automatically Transforming Blog or HTML Content into XML
Next by thread: Re: [syndication] Automatically Transforming Blog or HTML Content into XML
Index(es):
- Date
- Thread