API Versioning with the Django Rest Framework

A quick Google search for “REST API versioning” turns up lots of discussion around how a version should be specified in the request. There is no shortage of passionate debate around RESTful principles and the pros and cons of whether embedding version numbers in urls, using custom request headers, or leveraging the existing accept header is the best way to go. Unfortunately, there isn’t any one correct answer that will satisfy everyone. Regardless of which approach you end up using, some camp will proclaim that you are doing it wrong. Respective of the approach that is used, once the version is parsed out of the request, there is another, larger, question around how you should manage your API code to support the different schema’s used across different versions. Surprisingly, there is little discussion out there around how to best accomplish this.
At Rescale, we have built our API with the Django Rest Framework. The upcoming 3.1 release will offer some API versioning support. The project lead has wisely decided to sidestep the versioning debate by providing a framework that allows API builders to select between different strategies for specifying the version in the request. However, there isn’t much official guidance around what the best practices are for dealing with this version in your code. Indeed, the documentation mostly just punts on the issue by stating “how you vary your behavior is up to you”.
One of Stripe’s engineers posted a nice high-level summary of how Stripe deals with backwards compatibility with their API. The author describes a transformation pipeline that requests and responses are passed through. One of the main appeals of this approach is that the core API logic is always dealing with the current version of the API and there are separate “compatibility” layers to deal with the different versions. Requests from the client pass through a “request compatibility” layer to transform them into the current schema before being handed off to the core API logic. Responses from the core API logic are passed through a “response compatibility” layer to downgrade the response into the schema format of the requested version.
In this post, I want to explore a potential approach for supporting this type of transformation pipeline with DRF. The natural place then to inject the transform logic in DRF is within the Serializer as it is involved in both validating the request data (Serializer.to_internal_value) as well as preparing response data to return from an APIView method (Serializer.to_representation).
The general idea is to create an ordered series of transformations that will be applied to request data to convert it into the current schema. Similarly, response data will be passed through the transformations in the reverse order to convert data in the current schema to the version requested by the client. This ends up looking very similar to the forwards and backwards methods on database migrations.
As a basic example of how a response transform would be used, the following is a simple serializer that returns back a mailing list name and list of subscriber emails:

class MailingListSerializer(serializers.ModelSerializer):
    subscribers = serializers.SerializerMethodField('_subscribers')
    def _subscribers(self, obj):
        return [s.email for s in obj.subscribers_set.all()]
    class Meta:
        model = models.MailingList
        fields = ('name', 'description', 'subscribers')

The payload from an endpoint that uses this serializer might return JSON formatted as:

{'name': 'Cat Facts',
 'description': 'Fun facts about cats',
 'subscribers': ['joe@email.com', 'jane@email.com']}

Some time later, we decide that this endpoint also needs to return the date that each subscriber signed up for the mailing list. This is going to be a breaking change for any client that is using the original version of the API as each element in the subscribers array is now going to be an object instead of a simple string:

{'name': 'Cat Facts',
 'description': 'Fun facts about cats',
 'subscribers': [
    {'email': 'joe@email.com', 'date_subscribed': '2015-01-15T00:01:34Z'},
    {'email': 'jane@email.com', 'date_subscribed': '2015-02-18T04:57:56Z'}

The serializer needs to updated to return data in this new format. To support backwards compatibility with the original version of the API, it will also need to be modified to derive from a VersioningMixin class and specify the location of the Transform classes (more on this in a bit):

class MailingListSerializer(VersioningMixin, serializers.ModelSerializer):
    transform_base = 'api.transforms.mailinglist.MailingListSerializerTransform'
    subscribers = serializers.SerializerMethodField('_subscribers')
    def _subscribers(self, obj):
        return [{'email': s.email, 'date_subscribed': s.date_subscribed}
                for s in obj.subscribers_set.all()]
    class Meta:
        model = models.MailingList
        fields = ('name', 'subscribers')

Whenever a new API version is introduced for this serializer, a new numbered Transform class needs to be added to the api.transforms.mailinglist module. Each Transform handles the the downgrade of version N to version N-1 by munging the response data dict:

class MailingListSerializerTransform0001(Transform):
    def update_response_data(self, data, request, source):
        # Transform a raw v2 input dict into a v1 dict
        data['subscribers'] = [s['email'] for s in data['subscribers']]

The Transform class is the analogue of a schema migration and contains methods to transform request and response data. Each Transform class name needs to have a numerical value as a suffix. The VersioningMixin class uses this to identify the order that Transforms should be applied to the request or response data.

class Transform(object):
    def update_request_data(self, data, request):
        """data is the native dict that is the output of parsing the request
        from the client.
    def update_response_data(self, data, request, source):
        """data is the native dict that will be passed to the renderer and
        returned back to the client.
        source is the original object that is being converted into a native

The VerisoningMixin class provides the Serializer.to_internal_value and Serializer.to_representation overrides that will look up the Transforms pointed to by the transform_base property on the serializer and apply them in order to convert requests into the current API version or downgrade responses from the current API version to the requested version. In the following code snippet, settings.API_VERSION refers to the latest, current API verison number and the request.version field is set to the requested API version from the client:

class VersioningMixin(object):
    def to_internal_value(self, data, *args, **kwargs):
        """Applies a pipeline of transformations against the specified request
        data until it is updated to the current API version.
        if data:
            request = self.context['request']
            for v in range(request.version, settings.API_VERSION):
                self._get_transform(v).update_request_data(data, request)
        return super(VersioningMixin, self).to_internal_value(data, *args, **kwargs)
    def to_representation(self, obj, *args, **kwargs):
        """Applies a pipeline of transformations against the specified response
        data until it is downgraded to the desired API version.
        data = super(VersioningMixin, self).to_representation(obj, *args, **kwargs)
        if obj:
            request = self.context['request']
            for v in range(settings.API_VERSION, request.version, -1):
                t = self._get_transform(v - 1)
                t.update_response_data(data, request, obj)
        return data
    def _get_transform(self, version):
        transform_dict = {v: c for v, c in self._transform_classes()}
        TransformKlass = transform_dict.get(version, Transform)
        return TransformKlass()
    def _transform_classes(self):
        module, base = self.transform_base.rsplit('.', 1)
        mod = import_module(module)
        for name, Klass in inspect.getmembers(mod):
            if name.startswith(base) and issubclass(Klass, Transform):
                m = re.search('\d+$', name)
                if m:
                    yield (int(m.group(0)), Klass)

The main benefit of this approach is the  APIView (the classes that will generally implement the core API logic and use Serializers for request/response processing) only need to worry about the latest schema version. In addition, writing a Transform requires knowledge of the only current and previous version of the API. When creating version 10 of a particular response, there is just a single Transform between v10 and v9 that needs to be created. A request asking for v7, will be first transformed from v10 to v9 by the new Transform. The existing v9 to v8 and v8 to v7 Transforms will handle the rest.
We certainly do not believe that this is a panacea for all backwards compatibility issues that will crop up. There are certainly some performance issues to consider with having to constantly run requests and responses through a series of potentially expensive transformations. Further, in the same way that it is sometimes impossible to create backwards migrations for database schema changes, there are certainly more complex breaking API changes that are not easily resolvable by this approach. However, for basic API changes, this seems like it could be a nice way to isolate concerns and avoid embedding conditional goop inside the core API logic for versioning purposes.

Similar Posts