DjangoCon Europe 2019: Sketching out a Django redesign.

Writeup of the DjangoCon Europe 2019 talk »Sketching out a Django redesign.« by Tom Christie

Tom Christie: 🌟, maintainer and author of Django Rest Framework, among other things. Tom works on Open Source full-time, now, thanks to the DRF sponsorship programme, where companies give money for features/support.

Python is at a big crossroads. Python 3.5 introduced the async/await keyworkds. They allow us to use a new concurrency model which will allow us to do new things, e.g. high throughput services, non-blocking HTTP requests. This implies high-throughput, low-latency services, and allows us to handle real-time protocols, such as WebSockets, opening up new worlds for web applications especially.

Concurrency

Concurrency is about the number of tasks your server is able to handle simultaneously (in web speak: how many requests can we handle per second/at the same time). Increasing concurrency can be done: by multi-host setups (the sledgehammer method). On a single server, we can run a multi-process setup. One level below that, within a single process, we can run multi-thread setups – or we can go with async.

This is doable because programs spend a lot of time waiting for IO, be it disk or network. With multi-threading, the Operating System handles the switching between the waiting threads. Async is an alternative to multi-threading. It only runs in a single thread, but has multiple tasks within that thread, handing off congrol.

With thread-based concurrency, everything is managed by the OS, and as a programmer you don't get to see when you switch to the next thread, whereas in async, you choose your points of context switching, and make this switch more explicit. Thread-based concurrency has been around forever, and has low concurrency, while async is new, and has high concurrency. These two methods are largely incompatible. So this is a fork in the road.

Costs

  • Async (but threading, too) is more to thing about, so you might not need it.
  • Or you might not need high througput, so you could stick with existing threading code, instead of investing the cost and effort of rewriting functionality in async.
  • If a function makes use of I/O, it has to be async, and only async functions can call other async functions, so you'll have to rewrite a whole lot of things. Unless you care about these results, rewriting these things may not be worth it.

Benefits

  • Performance is higher, and sometimes it really, really does matter. We do not want performance issues to be a blocker for business looking to adopt Python.
  • Realtime for WebSockets and real time communications
  • Non-blocking HTTP requests
  • Lightweight parallelization that is easier to reason about
  • Explicit I/O

Performance can mean different things in different contexts. Being able to build very highly concurrent web services will work well on small systems. Or you don't have high throughput, but high spikes, which you'll be better able to deal with.

It will also address the "Python is slow" issue. The TechEmpower benchmarks are the least horrible benchmarks, for reference. They have many go results on the top, and then Python in a reasonable middle position.

WSGI and ASGI

WSGI is an inherently thread-concurrency interface, so it cannot do any async context switching inside. It only handles HTTP requests/responses and passes them to Python, so it cannot do WebSockets.

ASGI has an async interface, and includes more than just HTTP request/responses, and is very much more general purpose and highly adabtable. ASGI has a fairly clean interface, handing over a dictionary of a couple of information on the requests. Receiving and sending are two channels, and are addressed as async functions.

ASGI provides

  • High raw throughput
  • WebSockets and other real tie communications
  • Server Sent Events and long polling
  • HTTP/2 server push
  • Background tasks
  • Clock and timer tasks
  • A more adaptable interface, e.g. via startup/shutdown events, which allow you to modify the context your ASGI service runs in

The ASGI landscape features Daphne, Hypercorn, and Uvicorn as servers. There are ASGI frameworks, too: Starlette, Responder, FastAPI, Bocadillo, Quart, and kind of Django Channels. There is also lots of other stuff, like Gunicorn worker classes.

Framework development for ASGI

Starlette

Starlette is a web framework that is async/ASGI compatible, and is built on it all the way up to working with views and the request/response cycle. It gives you the opportunity to work directly with the ASGI interface if you want to. The test client is basically a requests client, but with an adapter class that plugs it into an ASGI framework instead of pushing out raw network requests. ASGI also features as the middleware interface: Instead of creating a request instance as early as possible, and passing it to a middleware stack, we pass along the ASGI request as far as we can. By using ASGI, middlewares are reusable across all ASGI frameworks, and is compatible with WebSocket requests just the same as HTTP requests. It also provides Mountable Apps to provide file interfaces. It also features per-component configuration, instead of framework configuration or application configuration. This leads to less entanglement, and better re-use, and helps you to understand components in isolation.

Naturally, as a developer you don't want to be working at this level. Starlette provides a request/response interface to you, if you want it. These can be used in regular methods or class-based views, just as you are used to.

This gives us a less complex stack that is very composable, with a single interface style.

Databases

How do we handle databases? Django ORM, and SQLAlchemy, they are standard thread-concurrency APIs, with a very separate database driver, that is also thread-synchronous. We don't have database interfaces for async, only single db drivers, at the moment. Tom released Databases recently, which can run raw SQL, or work with SQLAlchemy core (table definitions and query builder). It supports transactions and you can add on migrations, with a bit of work. It is only a low-level answer, but it's a start – it's not a fully-fledged ORM.

Tom also released orm, the start of a Django-like ORM, only asynchronous, with a highly Django-like API. It supports filter and select_related, and Foreign Key, but not Many2Many relationships, at the moment. Design differences include that there cannot be any lazy-loading of relationships, because we need to await IO explicitly. It only has sparsely populated model instances for the same reason. Paging through querysets also neets to be explicit, since this also results in separate I/O. This explicitness has advantages, though – it makes you very aware of DB connection usage, by necessity. Reasoning about these things at coding time can improve your performance and code quality.

Connections and transactions needs to be considered, too. Holding database transactions (or even connections) across the whole request/response cycle can be a very blunt hammer. Having a more granular approach would be great.

Other relevant places to think about include HTTP requests (check out requests-async), SMTP requests, caching, validation, password hashing – all of these are I/O bound, at least potentially.

Bringing it together

  • uvicorn is an ASGI web server
  • starlette is an ASGI web framework or toolkit
  • databases is an async database interface
  • orm is an async ORM
  • typesystem provides type validation
  • requests-async can handle any outgoing requests

This combination makes I/O (network or database access most of all) very visible. Please check out Tom's slides, they include about five slides of code showing how this ties in together, and it looks very good.

This is Django-ish, decoupled from overall stack complexity. It is scalable and performs very well in terms of throughput. It supports WebSockets, HTTP/2 push, background tasks, and many other things. It lets us do things Django currently cannot do. High availability API or proxy services. Real-time subscription endpoints. All of these things would be great to bring to Django. (And it provides a clean interface all the way up.)

Django …?

  1. Add ASGI into the stack progressively. Good groundwork has been laid, especially by Andrew Godwin in his async proposal.
  2. Consider using existing tools like db or orm – maybe not by direct inclusion, but at least as inspiration.
  3. Keep maturing the async ecosystem.

The Python community has a really important message here: Python is not beat out there in terms of productivity. And async makes Python competitive with node and go, and brings support for real-time protocols. If we can a reach a great level of functionality with this stuff, Python will be hitting the sweet spot in many ways.

Productivity + Performance = the sweet spot

Call for sponsors

In open source, cash is collaboration, made liquid. Support this effort. If your business can see the potential of this, support it financially.

We will also need other strategies of monetisation, as communities. We need to be succeeding in the product space, providing the best possible products, to have fast onramping for new developers. Improve and/or revolutionize the onboarding process. We could have a company that works on Django or similar projects full-time.

In the end it's not about the money – the money is a tool. Our goal has to be impact on society, and betterment of society. Remember the quote: "We may find ourselves climbing the ladder of success, only to find at the very top of the ladder that it has been leaning against the wrong wall".