[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
One publishing problem keyword filtering might address
Re: yesterday's discussion about keywords, here's the actual
publishing problem that prompted my message to Carl about the
keywords issue and his subsequent post to the list...
Here's my particular scenario: Let's say I want to filter 100 feeds
using keywords. In my case, "keywords" isn't a meta thing -- I
actually mean "search terms." The 100 feeds, in my scenario, are
mostly online versions of print newspapers around the country. (We
can ignore for now whether 100 such outlets actually provide RSS
feeds). In my case, I'm interested in environmental news and at any
given time I have enough of a sense of what MIGHT be written about to
set useful search terms like "superfund," "new source review," "clean
water act," "mountaintop mining," "mountaintop removal
mining," "asthma AND pollution," "acid rain," etc. Many such terms
would be required, but there's no reason why I couldn't cast a wide
enough net with, say, 150 search phrases. Of course, my net will
still miss some articles, and my net will occasionally grab stuff
that isn't relevant, but it's still a very useful net.
Ultimately what I want is 15 or so headlines on my web page that
refresh every 30-60 minutes based on automated searches of 100 (or
maybe 50, or 25) news sources. Basically, it's what you get from
Moreover.com or Yellowbrix, but I don't want to pay $6000/year for a
custom feed for my nonprofit.
I could also do this with Nexis -- almost. I could have Nexus email
me search results, but then I'd have to go and get URLs for those
articles every hour and update my site accordingly. An impossible
thing to do manually unless you do absolutely nothing else.
I'm not sure, from an electronic standpoint, what actually would be
happening to "filter" RSS feeds with my search terms. Would a spider
be visiting the sites supplying the feeds (which might be bad
etiquite -- that's a lot of spidering) or would those feeds be
sending stuff to something on my end, which could then be filtered
according to my search terms, thus avoiding any ethical dilemnas
about bandwidth? As a non-programmer, I don't know what actually
would be happening.
Another way to pose my question: with so many people developing
software that almost does what Moreover and YellowBrix do for
$6000/year, why hasn't anyone gone the extra yard to actually provide
something that can be downloaded for free or for $100 or so, that
does the same thing? I'm not the person to do it -- I'm too busy
being a website editor. And as a non-programmer, I don't know how
difficult it is to go that extra yard.
Such a program would not solve every problem faced by every person
who has every asked about filtering RSS feeds using keywords -- but I
think automated searches of the type I'm describing can work,
provided you're familiar enough with the behavior of the publications
you're filtering and you're smart about use of search terms.
P.S. Since I wrote this, Carl tells me Nexis might have the
functionality I seek, maybe, but I'd have to make like a programmer
and find out how xml works... sheesh...
Ryan Walker
Website Editor
Environmental Media Services
www.ems.org
ryan@ems.org