How Django's @page_cache works

rixx

2022-01-22

Hi, Future Me! It seems that you've forgotten once again how Django's @page_cache decorator function works. But this time, I'm here to tell you what you (or rather, I) worked out in the past. So here it is: @page_cache from the ground up.

Step 1: What is `page_cache`?

page_cache is Django's way of providing view-level caching. The docs are pretty good for the default use case: attach it to a view function, give it a cache duration in seconds, done. Except you want to do something slightly different, or else you wouldn't be here, right? Alright, let's go and see how this works under the hood.

If you look at the source, all the page_cache function does is to call the CacheMiddleware (wrapped in a helpful decorator function), so that's where we are headed.

Step 2: `decorator_from_middleware_with_args`

For our intents and purposes (thanks to said helpful decorator, the following will happen:

CacheMiddleware.process_request will be called, and if it returns anything at all, that value will be returned. End of story.
Otherwise, the view function will be called, and its returned response decides what happens next.
If the response has a render_response method (as, for example, all template responses do), CacheMiddleware.process_response will be called after the rendering step.
Otherwise, the result of CacheMiddleware.process_response will be returned immediately.

Step 3: `CacheMiddleware`

CacheMiddleware is really just two other classes in a trenchcoat, plus some safe initialisation, during which it stores our cache duration as page_timeout.

In process_request, a cache key is calculated and checked. If there is anything found, it will be returned immediately. Otherwise, _cache_update_cache is set to True and nothing is returned. Remember, that means that next, the view function is evaluated, and its result is then provided to process_response.

In process_response, this _cache_update_cache attribute is checked. If it's unset or False, the response is passed through (because it already came from the cache, there's nothing left to do). But if this attribute is set (and a bunch of other conditions apply, like a 200 status code and so on), the magic happens: The response in its final, rendered state is written to the Django cache for the duration of our page_timeout (auto-expiring, which makes this entire thing work). Helpfully, Django also sets the required headers, Expires and Cache-Control.

Step 4: ???

Now, all this is pretty standard for how a cache works, but it's a bit annoying to find all the moving parts when you need them, because you have to follow the redirection from decorator to decorator maker to middleware to middleware subclasses. I'm guessing this doesn't really prove a problem to anybody except for me and you, Future Me, but then, that's who this post is for.

Step 5: Profit

Just to show off why this is useful knowledge (and to show off in general, of course), here's what I built with this knowledge. All this is something I built for pretalx, for the heavily cached view that returns the schedule data on any page.

First off, I needed a decorator like page_cache, but with the opportunity to opt out of the caching – because all schedule pages and schedule versions should be cached for 60 seconds (to gracefully handle thousands of attendees pulling up the schedule at the same time) – except for the WIP schedule. The work in progress schedule is only visible to organisers, and naturally isn't meant to be cached. A fairly standard decorator took care of that part:

def conditional_cache_page(timeout, condition, *, cache=None, key_prefix=None):
    """This decorator is exactly like cache_page, but with the option to skip
    the caching entirely.

    The second argument is a callable, ``condition``. It's given the
    request and all further arguments, and if it evaluates to a true-ish
    value, the cache is used.
    """

    def decorator(func):
        def wrapper(request, *args, **kwargs):
            if condition(request, *args, **kwargs):
                return cache_page(timeout=timeout, cache=cache, key_prefix=key_prefix)(
                    func
                )(request, *args, **kwargs)
            return func(request, *args, **kwargs)

        return wrapper

    return decorator

As you can see, all this decorator does is to add a condition argument, to call it, and then to either return the view function's result immediately, or to call the standard cache function. Yes, the triple ()()() is mildly cursed, but other than that, this is looking alright.

Now, in the next step, I wanted to tackle one of the three big problems: Cache invalidation. Organisers can release new schedule versions, and in case a new version has been released, the cached version is outdated and should not be used. (This isn't extremely relevant to pretalx itself, which always retrieves data for a specific version, but external users call the API without any version and expect to see the latest data).

Ideally, you'd just invalidate the old cache parts when a new schedule version is released, but since the cache keys depend on the request, and I didn't want to muck around with internals or fake requests, this is the next best place to do it. The part that prompted the explanatory wall of text above was that I needed to understand that you can't just opt out of caching when a new version should be cached – you need to actively invalidate the old cache. (Yes, I felt somewhat stupid after figuring out the obvious. Hi, Future Me, you're in great company.)

This resulted in the following condition function (slightly shortened and without the doc comments):

def cache_version(request, event, version=None):
    if version:
        if version == "wip":
            return False
        return True

    cache = caches["default"]
    content_key = get_cache_key(request, "", "GET", cache=cache)
    version_key = f"{content_key}_known_version"
    current_version = request.event.current_schedule.version

    if current_version != cache.get(version_key):
        cache.set(version_key, current_version, 0)
        cache.delete(content_key)
    return True

If you're not Future Me and assumed that this would be in some way useful to you … I'm sorry? Thank you for reading? Not sure what the correct response is – but if it was useful to you, against all odds, let me know – you'll make my day.

Step 1: What is page_cache?

Step 2: decorator_from_middleware_with_args

Step 3: CacheMiddleware

Step 4: ???

Step 5: Profit

Step 1: What is `page_cache`?

Step 2: `decorator_from_middleware_with_args`

Step 3: `CacheMiddleware`