Tuesday, 25 October 2011
Web API Versioning Smackdown
A lot of bits have been used over on the OpenStack list recently about versioning the HTTP APIs they provide.
This over-long and rambling post summarises my current thoughts on the topic, both as background for that discussion, as well as for review in the wider community.
The Warm-up: Software vs. Web Versioning
Developers are used to software versioning; e.g., for every release, you bump an identifier. There are usually major versions, minor versions, and sometimes things like package identifiers.
This fine level of granularity is useful to both developers and users; each of these things has precise semantics that helps in figuring out compatibility and debugging.
For example, on my Fedora box, I can do:
cloud:~> yum -q list installed httpd
httpd.x86_64 2.2.17-1.fc14 @updates
… and I’ll know that Apache httpd version 2.2.17 is installed, and it’s the first package of that version for Fedora 14.
This lets me know that any modules I want to use with the server will need to work with Apache 2.2; and, that if there are security bugs found in httpd 2.2.15, I’m safe. Furthermore, when I install software that depends upon Apache, it can specify a specific version — and even packaging — to require, so that if it wants to avoid specific bugs, or require specific features, it can.
These are good and useful things to use software versioning for; it’s evolved into best practice that’s pretty well-understood. See, for example, Fedora’s package versioning guidelines.
However, they don’t directly apply to versioning on the Web. While there are
similar use cases — e.g., maintaining compatibility, enabling debugging,
dependency control — the mechanisms are completely different.
For example, if you throw such a version identifier into your URI, like this:
then every time you make a minor change to your software, you’ll be minting
an entire new set of resources on the Web;
Moreover, you’ll need to still support the old ones for old clients, so you’ll have a massive footprint of URIs to support. Now consider what this does to
caches in the middle; they have to maintain duplicates of the same thing — because it’s unlikely that foo has changed, but it can’t be sure — and your cache hit rate goes down.
Likewise, anybody holding onto a link from the previous version of the API has to decide what to do with it going forward; while they can guess that there’ll be compatibility between the two versions, they can’t really be sure, and they’ll still need to be rewriting a bunch of APIs.
In other words, just sticking software versions into Web URL removes a lot of the value we get from using HTTP, and if you do this, you might as well be using a ‘dumb’ RPC protocol.
So what does work, on the Web?
The answer is that there is no one answer; there are lots of different mechanisms in HTTP to meet the goals that people have for versioning.
However, there is an underlying principle to almost any kind of of versioning on the Web; not breaking existing clients.
The reasoning is simple; once you publish a Web API, people are going to start writing software that relies upon it, and every time you introduce a change, you introduce the potential to break them. That means that changes have to happen in predictable and well-understood ways.
For example, if you start using the Foo HTTP header, you can’t change its semantics or syntax afterwards. Even fixing bugs in how it works can be tricky, because clients will start to work around the bugs, and when you change things, you break the workarounds.
In other words, good mechanisms are extensible, so that you can introduce change without wiping the slate clean, and it means that any change that doesn’t fit into an extension needs to use a new identifier, so it doesn’t confuse clients expecting the old behaviour.
So, if you want to change the semantics of that Foo header, you can either take advantage of extensibility (if it allows it; see the Cache-Control headers extensibility policy for a great example), or you have to introduce another header, e.g., Foo2.
This approach extends to lots of other things, whether they be media types, URI parameters, and potentially URIs themselves (see below).
Because of this, versioning is something that should not take place often, because every time you change a version identifier, you’re potentially orphaning clients who “speak” that language.
The fundamental principle is that you can’t break existing clients, because you don’t know what they implement, and you don’t control them. In doing so, you need to turn a backwards-incompatible change into a compatible one.
This implies that API versioning absolutely cannot be tied to software versioning in any way; doing so will needlessly limit (and often break) your clients, and generally upset people.
There’s an interesting effect to observe here, by the way; this approach to versioning is inherently non-linear. In other words, every time you mint a new identifier, you’re minting a fundamentally new thing, whether it be a HTTP header, a format identified by a media type, or a URI. you might as well use “foo” and “bar” as “v1” and “v2”. In some ways, that’s preferred, because people read so much into numbers (especially when there are decimal points involved).
The tricky part, as we’ll see in a bit, is what identifiers you nominate to pivot interoperability around.
An Aside: Debugging with Product Tokens
So, if you don’t put minor version information into URIs, media types and other identifiers, how do you debug when you have an implementation-specific problem? How do you track these minor changes?
HTTP’s answer to this is product tokens. The appear in things like the User-Agent, Server and Via headers, and allow software to identify itself, without surfacing minor versioning and packaging information into the protocols “core” identifiers (whether it’s a URI, a media type, a HTTP header, or whatever).
These sorts of versions are free — or even encouraged, delta the security considerations — to contain fine-grained identifiers for what version, package, etc. of software is running. It’s what they’re for.
The Main Event: Resource Versioning
All of that said, the question remains of how to manage change in your Web application’s interface. These changes can be divided into two rough categories; representation format changes and resource changes.
Representation format changes have been covered fairly well by others (e.g., Dave), and they’re both simple and maddeningly complex. In a nutshell, don’t make backwards-incompatible changes, and if you do, change the media type.
JSON makes this easier than XML, because it has both a simpler metamodel, as well as a default mustIgnore rule.
Resource changes are what I’m more interested in here. This is doing things like adding new methods, changing the URIs that clients use (including query parameters and their semantics), and so forth.
Again, many (if not most) changes to resources can be accommodated by turning them into backwards-compatible changes. For example, rather than bumping a version when you want to modify how a resource handles query parameters, you mint a new, sibling resource with a different name that takes the alternate query parameters.
However, there comes a time when you need to “wipe the slate clean.” Perhaps it’s because your API has become overburdened with such add-on resources, or you’ve got some new insights into your problem that benefit from a fresh sheet. Then, it’s time to introduce a new API version (which again, shouldn’t happen often). The question is, “how?”
In this Corner: URI Versioning
The most widely accepted way to do version resources of Web APIs currently is in the URI. A typical example might be:
Here, first path segment is a major version identifier, and when it changes, everything under it does as well. Therefore, the client needs to decide what version of the API it wants to interact with; there isn’t any correlation between URIs between v1 and v2, for example.
So, even if you have:
There isn’t necessarily any correlation between the two URIs. This is important, because it gives you that clean slate; if there were correlation between v1 and v2 URIs, you’d be tying your hands in terms of what you could do in v2 (and beyond).
You can see evidence of this in lots of popular Web APIs out there; e.g., Twitter and Yahoo.
However, it’s not necessary to have that version number in there. Consider Facebook; their so-called old REST API has been deprecated in favour of their new Graph API. Neither has “v1” or “v2” in them; rather, they just use the hostname to name space the different interfaces (“api.facebook.com” vs. “graph.facebook.com”). Old clients are still supported, and new clients can get new functionality; they just called their new version something less boring than “v2”.
Fundamentally, this is how the Web works, and there’s nothing wrong with this approach, whether you use “v1” and “v2” or “foo” and “bar” — although I think there’s less confusion inherent in the latter approach.
The Contender: HATEOS
However, there is one lingering concern that gets tied up into this; people assume — very reasonably — that when you document a set of URIs and ship them as a version of an interface, clients can count on those URIs being useful.
This violates a core REST principle called “Hypertext As The Engine of Application State”, or HATEOS for short.
RESTafarians have long searched for signs of HATEOS in Web APIs, and Roy has lamented its absence in the majority of them.
Tying your clients into a pre-set understanding of URIs tightly couples the client implementation to the server; in practice, this makes your interface fragile, because any change can inadvertently break things, and people tend to like to change URIs over time.
In a HATEOS approach to an API, you’d define everything in terms of media types (what formats your accept and produce) and link relations (how the resources producing those representations are related).
This means that your first interaction with an interface might look like this:
GET / HTTP/1.1
HTTP/1.1 200 OK
Please don’t read too much into this representation; it’s just a sketch. The important thing is that the client uses information from the server to dynamically generate URIs at runtime, rather than baking them into the implementations.
All of the semantics are baked into those link relations — they should probably be URIs if they’re not registered, by the way — and in the formats produced. URIs are effectively semantic-free.
This gives a LOT of flexibility in the implementation; the client can choose which resources to use based upon the link relations it understands, and changes are introduced by adding new link relations, rather than new URIs (although that’s likely to be a side effect). The URIs in use are completely under control of the server, and can be arranged at will.
In this manner, you don’t need a different URI for your interface, ever, because the entry point is effectively used for agent-driven content negotiation.
The downsides? This approach requires clients to make requests to discover URIs, and not to take shortcuts. It’s therefore chatty — a fairly damning condemnation.
However, notice the all-important Cache-Control header in that response; it may be chatty without caching, but if the client caches, it’s not that bad at all.
The main issues with going HATEOS for your API, then, are the requirements it places upon clients. If client-side HTTP tools were more widely capable, this wouldn’t be a big deal, but currently you can only assume a very low-level, bare HTTP API without caching, so it does place a lot of responsibility on your client developer’s shoulders — not a good thing, since there are usually many more of them than there are server-side.
So, there are arguments for and against HATEOS, and one could say the trade-offs are somewhat balanced; both are at least reasoned positions. However, there’s one more thing…
Extensibility and Versioning are the peanut butter and jelly of protocol engineering. Sure, my kids’ cohort in Australian primary schools are horrified by this combination, but stay with me.
OpenStack has an especially nasty extensibility problem; they allow vendors to add pretty much arbitrary things to the protocol, from new resources to new representations, as well as extensions inside their existing formats.
Allowing such freedom with “baked-in” URIs is hard. You have to carve out extension prefixes to avoid collisions, and then hope that that’s good enough. For example, what if an API uses URIs like this:
and HP wants to add a new subresource to the users collection? Does it become
? No, that’s bad, because then no userid can be “hp”, and special cases are evil, especially when they’re under the control of others.
You could do:
and special-case only one thing, “ext”, but that’s pretty nasty too, especially when you can still potentially add “hp” to any point in the URI tree.
Instead, if you take a HATEOS approach, you push extensibility into link relations, so that you have something like:
GET / HTTP/1.1
HTTP/1.1 200 OK
Now, the implementation has full control over the URIs used for extensions, and it’s responsible for avoiding collisions. All that HP (or anyone else wanting an extension) has to do is mint a new link relation type, and describe what it points to (using existing or new media types).
This isn’t the whole extensibility story, of course; format extensions are independent of URIs, for example. However, the freedom of extensibility that taking a HATEOS approach gives you is too good to pass up, in my estimation.
The key insight here, I think, is that URIs are used for so many things — persistent identifiers, cache keys, bases for relative resolution, bookmarks — that overloading them with versioning and extensibility information as well makes them worse for all of their various purposes. By pushing these concerns into link relations and media types using HATEOS, you end up with a flexible, future-proof system that can evolve in a controllable way, without giving up the benefits of using HTTP (never mind REST).