[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] RSS vs. HTML Bandwidth and "Scalability"...



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Morbus Iff <morbus@disobey.com> writes:

<snip>
> He makes a bit of an interesting point when you think. Beside search engine 
> spiders and proxies, I can think of no magical programs that hit a website 
> time and time again to get updates, when there aren't updates to be had.

... I think a lot of XML content producers *are* the problem and are *not* the
victims.

The problem is that you can Last-Modified header with HTTP in order to keep from
downloading something which hasn't changed.  The problem is that most people
return their content from a database so their CGI framework (JSP, Servlet, PHP,
etc) always returns a Last-Modified of 'NOW'..

Granted it is tough to tell everyone to rewrite their content engines but I
don't think they should complain soo loudly.  ;)

> On the other hand, most "constant on" RSS aggregators hit websites every hour
> to get the latest updates. However, I'm not really sure how Meerkat or
> NewsIsFree.com handles it (or for that matter, xmltree.com).
> 
> So, what's the solution to this pointless waste of possible bandwidth?

P2P stle node-by-node caching or using something like swarmcast...
<snip>


>   b) Check the HTTP headers from the server. This would
>      only work if the content wasn't dynamic, which is
>      rare nowadays. For a while now, I've been thinking
>      of checking content-length's / filesizes and
>      comparing for newness.

The only problem is that for some dynamic content engine... the CPU time for
generating your content length is about the same for generating the content.  So
you trade wasting their bandwidth for wasting their CPU time :)

>   c) Implement server control - block repetitive ip's
>      on a cron'd schedule and allow them back in when
>      the going gets happy. This shifts the "blame"
>      onto the server people though, and we really shouldn't
>      be making RSS maintenance any harder than it is.

Ouch... don't do that...

> What are your thoughts? Any additions to the above?

node-by-node caching is my suggestion :)

Kevin

- -- 
Kevin A. Burton ( burton@apache.org, burton@openprivacy.org, burtonator@acm.org )
        Cell: 408-910-6145 URL: http://relativity.yi.org ICQ: 73488596 

Boycott Amazon.com http://www.gnu.org/philosophy/amazon.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: Get my public key at: http://relativity.yi.org/pgpkey.txt

iD8DBQE7afd0AwM6xb2dfE0RAmukAKCKxEmVjg+Q+ZbaOzQAxYwIhOXZWwCeJ4nF
mbP9hAJ0rq5g9IqfVPPUKOo=
=PeLV
-----END PGP SIGNATURE-----