Mark Nottingham

Web Feeds in 2026: A Survey

Sunday, 10 May 2026

Web Feeds

A couple of weeks ago, I made a straw-man proposal for a new Web feed autodiscovery mechanism. I got some encouragement and some pushback (usual for this sort of thing). One of the issues raised was regarding internationalisation – a few people said that I should support multiple languages in the format, rather than relying on HTTP content negotiation.

I felt my approach was adequate, given the specifics of the use case. However, it bugged me: I didn’t have data to back me up – AFAICT there’s no significant information about how feeds are used on the Web today.

I realised that this didn’t have to be. Two things helped: a friend at Common Crawl, who assured me that they do indeed crawl feeds, and AI. I don’t have nearly enough time to learn the ins and outs of CC dumps, map/reduce, and the assorted data science bits, but I can babysit a couple of agents1 through it while I do other things. So I did.

You can peruse the full (inaugural?) survey report. Below are the major takeaways from my perspective.

Web Feeds are Still a Thing…

In the top 500,000 web sites (per Tranco) seen by Common Crawl, the run analyzed 196,598 registrable sites and found 303,790 parseable feeds. 35.9% of sites exposed feed autodiscovery, and 19.7% of analyzed HTML responses had feed links.

That’s huge – more than a third of sites offering some sort of feed is a big statement about the nature of the Open Web.

… But a LOT of Them are Abandoned.

High-quality feeds are a minority. Using a quality metric that considers feed recency, content, and metadata, 57,995 feeds scored above 0.5 – only 19.1% of parsed feeds. Only 100,643 feeds (33.1% of parsed feeds) had any freshness signal within a 365-day cutoff, and only 67,997 were both fresh and had entries.

So there are a lot of abandoned feeds on the Web.

I suspect a major contributor is the tendency of content management systems to automatically expose feeds, even when publisher customisation means those feeds no longer reflect the useful parts of the site. For example, only 19.3% of feeds that we could fingerprint as being created by WordPress exceeded our quality bar (a measure of recency, content, and metadata). Drupal was a bit higher but still very low at 24.9%, and for Blogger, it was an abysmal 3.5%.

In other words, when you go to a site hosted by a popular CMS, chances are uncomfortably high that the feed it creates will be stale, empty, or otherwise not very useful. The takeaway here is simple but urgent: CMS software should not silently create and advertise feeds that publishers never see or maintain; feeds should be visible, testable, and consciously enabled.. Fixing that in the next release of a couple of platforms could dramatically increase the quality of feeds on the Web in a short time.

Autodiscovery is Not a Quality Signal.

17.5% of feeds had HTML feed autodiscovery pointing at them, but those links don’t necessarily lead to higher-quality feeds. Although there was a slight bump in measured quality for autodiscovered feeds, mean measured quality was still low – 0.251 vs. 0.179 for feeds without.

This compromises autodiscovery as a user-facing affordance; if people experience lots of stale or zero-entry feeds when they use autodiscovery information (and that is my personal experience!), they won’t rely upon it.

That failure is why I made a proposal for a new feed autodiscovery mechanism, with prototype implementation as an extension for major browsers. The reasoning is straightforward: if autodiscovery is more deliberate and in a central place on the site, it has a better chance of leading to working, useful feeds.

An aside: autodiscovery overwhelmingly means rel=alternate. Feed autodiscovery via rel=alternate appeared on 81.94M pages; rel=feed was tiny by comparison at 12,793 pages. WHATWG should deprecate the feed link relation; the cowpath is well and truly paved.

Most Feeds Parse.

The run checked 311,382 feed URLs and parsed 303,790 RSS/Atom feeds, a 97.6% parse success rate. So broken XML exists, but outright parse failure is not the main quality problem.

This was a major concern in the early RSS and Atom days; XML was still new, it was complex,2 and implementations – both of XML parsers and feed software – weren’t quite baked. From what we see here, this doesn’t seem to be a concern any more.

The biggest problem – over 52% of the errors encountered – was ‘XML declaration allowed only at the start of the document’. Next at about 13% was ‘EntityRef: expecting ‘;’’ and then ‘CData section not finished’ at about 8%.

So, yes, sites should still use feed validators. As an ecosystem, however, we shouldn’t worry too much about this aspect of feed quality.

RSS and Atom Co-Exist.

Of 303,790 parsed feeds, about 200k were RSS-family and 104k were Atom. RSS 2.0 alone was 181,975 feeds. There are minimal quality differences between them. So the feed wars rage on more than twenty years later – or, alternatively, no one really cares.

The takeaway for sites is simple: Don’t over-fixate on choosing a format. Choose one – either will do – and don’t double up the feeds. Consuming software will support both.

One thing I specifically tried to check for was pages that advertised both Atom and RSS versions of the same feed. From what we can see here, that isn’t happening much; only seven sampled pages appeared to be doing this.

Feeds are Monolingual.

Finally, to the question that sparked this for me – 19.2% of feeds have HTTP Content-Language, while almost exactly 50% have feed-level language information (e.g., dc:language, xml:lang, or a RSS language tag). However, only 1.2% (3,571 feeds) have entry-level language information and only 2,527 feeds — 0.8% of parsed feeds — showed multiple entry languages.

So, based upon the current Web, mixing languages in a feed isn’t a widespread practice.

A Few Words About Methodology

Like all statistics, these should be taken with a grain (or three) of salt. The code is AI-written (although guided by me, therefore I get to take the blame). I have not reviewed every line of code; this is a side project. Common Crawl doesn’t see the whole Web (although the open Web nature of feeds means it’s a good match!). I filtered to the Tranco top 500,000 sites to reduce the influence of very low-quality domains. The run completed 44,281 WARCs and skipped/failed 71, so the missing-WARC rate was about 0.16%. I’m sure there are many more caveats, but you get the idea. Take a look at the code and file an issue if you see something.

  1. First Gemini, then Claude, then Codex. In a decidedly increasing order of effectiveness. 

  2. Still is.