Why Do Web Server APIs Suck So Much?

Monday, 8 December 2003

HTTP provides considerable benefits to Web applications that take advantage of it; everything from scalability (through caching), client-integrated authentication, automated redirection, multiple format support and lots more.

I’ve been drafting some entries about how cool all of these things are; I’m going to try to get a few up in the coming weeks. As I’ve been writing, however, I’ve noticed a common thread — just about every one of them is difficult to realise using existing server-side technology.

Web Metadata

For example, one of the biggest problems we found in the caching world was the inability of content authors to effectively set appropriate caching metadata for their resources. Most servers provide some mechanism, of course, but none of them are always available, usable to the average person and standardised beyond a single product.

Folks writing RSS aggregators, to give another example, can’t rely on the media type being set for RSS files, because the process for associating a new media type with content is so Byzantine on most servers (if the person who needs to do it has access at all). As a result, they can’t rely on content negotiation working, and can’t fully leverage the Web infrastructure.

The list goes on. Redirection is really simple, but I’d wager that 90% of intentional redirection on the Web happens through META refresh HTML elements, not HTTP redirection. The only clients that understand and follow them, then, are Web browsers, not the bulk of automated agents (where automated redirection does the most good).

All of this adds up to people not being able to count on the availability of mechanisms to set Web metadata, and therefore a failure to use what the Web provides. Take a look at Web applications like Wikis, Blog engines and commercial packages that you deploy on a Web server (I don’t want to pick on anyone particular here, because everybody’s in the same boat, and it’s not their fault).

URIs

The problem isn’t limited to setting metadata, either. URIs are the lynchpin of the Web; to get the full value of the Web infrastructure, you need to be able to identify every interesting part of your Web application with a URI. Unfortunately, common Web APIs don’t encourage this, or even actively discourage it.

For example, one of the most prevalent server-side APIs for HTTP (and therefore REST, for most people) — the Java Servlet API — does things backwards. It dispatches requests first to the HTTP method, and then has the application handle the URI. For example, a Python BaseHTTPServer handler (which has roughly the same API) for an imaginary address book might look like this:

class ResourceHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def init(cls):
        ...
    def do_GET(self):
        if self.path == '/':
            # return the home page
        if self.path == '/add':
            # return a form to add an entry
        elif self.path == '/search':
            # return a search form
        else:
            # return an entry's page
    def do_POST(self):
         ....

This stuffs a number of URIs (and therefore resources) into a single container, making it difficult to model an application as the transfer of state. Now imagine doing it the other way around; dispatching based upon URI to a different object, and then to a method based upon the HTTP method;

class FrontPage(Resource):
    def GET(self, request):
        ...
class AddForm(Resource):
    def GET(self, request):
        ...
    def POST(self, request):
        ...
class SearchForm(Resource):
    def GET(self, request):
        ...
class EntryPage(Resource):
    def GET(self, request):
        ...
    def DELETE(self, request):
        ...
    def PUT(self, request):
        ...

Isn’t that a much more natural way of writing a Web application, leveraging both the Web infrastructure and the good practices surrounding object-oriented programming? Even better, it might just steer people from creating Web sites where everything interesting is stuffed behind a single URI with a bunch of query parameters and a POST.

What Next

I do have ideas about how to fix this. I’ve started talking about some of it (e.g., Tarawa, which takes the approach to URIs outlined above), and will go into the rest over time, but more importantly I want to highlight the issues.

Mark Nottingham

other HTTP APIs posts