Key-based cache expiration with Django

Last week, the team over at 37Signals wrote up an article on their newly implemented Key-based cache expiration system and it hit me: It's such a simple idea with obvious benefits, why hadn't I implemented a similar caching mechanism before? Being a Django user, the Rails code didn't make much sense to me but the concept certainly did - so here's my take on it with a quick Django example.

Background

I've just implemented this caching strategy for WhisperGifts for a re-launch that will go live in the next few weeks. We allow couples to publish an online gift list, then let people select items from that list. Pretty basic stuff, but rendering the gift list can require n+1 queries due to the way that my purchase data is kept. This hasn't been a big issue until now, when I've built new functionality and generally just extended things a bit.

The cache strategy is so simple it's taken longer to write up here than it did to alter my existing codebase.

My basic model is as follows:

Registry, the top-level "collection" of items for each wedding.
Item, of which there are many for each Registry
Buyer, of which there are 0-n for each Item

With that in mind, read on...

Model Setup

The first task to do is make sure your models have a last_modified timestamp. Mine already did, but just in case you need an example:

    class Registry(models.Model):
        name = models.CharField(max_length=100)
        last_modified = models.DateTimeField(null=True, editable=False)

    def save(self, *args, **kwargs):
        from datetime import datetime
        self.last_modified = datetime.now()
        super(Registry, self).save(*args, **kwargs)

Next, make sure that whenever you save an Item or Buyer the last_modified change cascades upwards to the Registry.

    class Item(models.Model):
        registry = models.ForeignKey(Registry)
        name = models.CharField(max_length=100)
        last_modified = models.DateTimeField(null=True, editable=False)

    def save(self, *args, **kwargs):
        from datetime import datetime
        self.last_modified = datetime.now()
        registry = self.registry
        registry.last_modified = datetime.now()
        registry.save()
        super(Item, self).save(*args, **kwargs)

Note: I'd highly recommend this gets taken care of in an abstract base class. The above is just a (bad) example.

Define your cache keys

At the top of models.py I defined a variable named DATA_SCHEMA_REVISION. For now I increment it whenever I make a change to my schema that should invalidate the entire cache - not that it's a regular occurrence.

On each model, now, define a cache_key property. For example, on my Item model:

    @property
    def cache_key(self):
        return 'myproject/%s/item-%s-%s' % (DATA_SCHEMA_REVISION, self.id, self.last_modified)

Again, as a good programmer you'll put this in your abstract model, won't you?

Configure Django caching.

This isn't hard. I use memcached. YMMV.

Update your templates.

My existing template was along these lines:

    <b>{{ registry.name }}</b>
    {% for item in registry.item_set.all %}
        {{ item.name }}
        {{ item.expensive_calculation_and_rendering }}
    {% endfor %}

The new, improved, fully cached version:

    {% load cache %}
    {% cache 9999999 registry registry.cache_key %}
    <b>{{ registry.name }}</b>
    {% for item in registry.item_set.all %}
        {% cache 9999999 item item.cache_key %}
        {{ item.name }}
        {{ item.expensive_calculation_and_rendering }}
        {% endcache %}
    {% endfor %}
    {% endcache %}

Notice that we're caching this for a very long time - because the cache key will change whenever our data is changed. If an Item changes, that item's cache key (and the parent Registry cache key) will be changed due to the new timestamp, and they'll be re-rendered. However when we re-render the outer Registry cache fragment, it's primarily made up of still-valid cache components for the other Items.

This is an incredibly simple way to keep your site nice and snappy. There's some fantastic discussion over on the original 37Signals blog post - I strongly recommend you read & understand that post as well as it's comments, then consider my examples above as a bare minimum implementation in Django.

How quick is it?

A brief update, 10 minutes after this post was written. I have done some very rudimentary tests and come up with the following numbers. Note that these are NOT representative, however the difference is very clear. Keep in mind this is running in a development environment with debugging enabled and various other slowdowns not present in production!

Cold cache 17.34 seconds
Warm cache 1.11 seconds

This is the time taken for Django to process the request form start to finish, from the initial HTTP request to returning a complete HTML page. There's plenty of other gains to make but as a starting point this is not too bad at all!