DjangoCon Europe 2019: Maintaning a Django codebase after 10k commits

rixx

2019-04-12

Writeup of the DjangoCon Europe 2019 talk »Maintaning a Django codebase after 10k commits« by Joachim Jablon, Stéphane "Twidi" Angel

Joachim Jablon: A French developer, from Paris, living in Poitiers. Small time Django contributor, also maintains a few small Python/Django open source packages (Raincoat, Django-Readonly-Field, ...). Stéphane "Twidi" Angel: A web/backend developer. He tries to share its experience through open source, during meetups or in front of a good meal.

If you have a huge code based an you are pressured into adding features daily, there is no magic trick to save yourself. Let's assume a fictional project called MONOLI.TH, delivering monoliths to people's gardens. It has 30k commits, 13 open and 6k closed prs, 840 releases, tests run for an hour, over 100 dependencies, 100 models, more than 1000 fields, etc. It's huge.

This project involves a huge amount of related tasks, To add to that, you have team turnover, company pivoting, complexity growth, and abandoned third-party dependencies, making complexities tricky and growing everywhere. You can't control all of that, but you can reduce the effects.

Third-party code

If you need a feature, you can code it yourself, ask a contractor, or use open-source (which you'll still pay for, one way or another), so you'll end up with external code used by your project. Let's mostly consider third-party packages installed with pip for now.

We choose packages to reduce effort and time and reduce the cost of implementation, but they also can increase the effort and time and cost and risk associated with the project. Security updates are a problem, and you have no direct control here. You cannot influence project direction directly, so APIs might just break. And not everybody uses semver. The project can also lag behind in Django support, making you stay with old versions. Contributions might just be ignored, and you'll be stuck with no real way out. And you might not even need the full application – do you only need a small part?

Never forget the hidden costs of adding third-party dependencies.

Sometimes, adding a dependency is ok. Maybe you have ownership of it. Or you have a complex scoped problem. Or it's an official package, or at least a very well-known app.

Test the next versions of your dependencies. It's okay if they fail – but at least you will know. Then, participate by providing issues or even PRs – and remember to be nice about it!

Remember that you're free to start a public or private fork, license permitting. Or you could vendor the code, again, license permitting. Both of these will rid you of security updates, too, though. You can also end up implementing everything yourself, leading to a huge overhead.

Another way can be providing an abstraction layer, which can improve the update situation!

Isolate your third-party dependencies.

A note regarding django models: You need to have the entire ownership on your database. External apps providing migrations are not a good idea. External apps having abstract models to define fields are dangerous.

Don't let anybody define your modls.

Don't forget to contribute back to open source.

Business logic

Business logic is the part of your app that provides logic, and would still hold true if you were to swap out django for flask. It mostly includes rules, actions, and workflows.

In many modern Django applications, this is very broken. There's logic in templates. And forms. And serializers. And views. And models and managers. All of these parts have other purposes, though! Not all code needs to fit in Djangos base files (models/views/templates).

A good alternative are service layers: Hanna Kollo's Avoiding Monoliths, and Radoslav Georgiev's Django structure talk are great resources for this! It can go like this:

Have a services.py module and a selectors.py module. Every non-trivial operation should be a service/selector interaction: Services for creation/interaction, selectors for access, leading to proper separation.

Taking this idea further: Let's assume you implement your business logic completely separate from Django ORM code (optionally supporting any ORM). You'd need a pure Python class for model interfacing. Your business logic could then be completely free from Django, outside the Django app architecture.

Split your business logic, Django views, and the ORM.

But don't so many layers make maintenance harder, and start looking like business Java? Well, maybe Java's layering is over the top, and Django's layering is a bit lacking, in the average project.

Architecture

Architecture means putting things in the right boxes, naming your boxes, stopping from mixing them, and defining links between them instead. Let's talk about possible architectures

Functional core / Imperative shell

Concept by Gary Bernhardt, and is about separating the logic from the glue. The logic should be expressed with functional code without side effects (no external links). It's very easy to unit test, because you can only check output depending on input, allowing you to find all the corner cases.

The imperative shell has all the side effects but at little logic as possible. You can get through this with little tests, but you'll want them to be integration tests without mocks! HTTP, file reads, database connections happen here.

Hoist your I/O – you're not reducing complexity by burying I/O down the stack. Look at @brandon_rhode's talk at PyWarsaw(?)

Hexagonal Architecture

Concept by Alistair Cockburn – based on layers and communication/dependency rules. We're seeing the domain as the innermost part, the application around that, and the infrastructur on the outside. Objects on the inside should never know about or call objects on the outside. Any communication to wards the outside must be done via fixed interfaces.

There is a good talk b Brandon Rhodes, The Clean Architecture in Python, at PyOhio 2014.

Domain Driven Development

Concept by Eric Evans – very … much, at every level, some even about code. Software design should be driven by the actual knowledge/business logic, focused on roles/entities. You'll separate the domain into subdomains (Bounded Contexts), and you'll have to learn and use the same words as any user in any role (Ubiquitous Language)!

TALK TO PEOPLE ABOUT THEIR TASKS AND AREAS OF COMPETENCE. Learn their vocabulary and apply it. You're not "just" building a web app.

Be aware that you stand on the shoulders of giants (or taller folks).

Tests

Be reminded of the testing pyramid. At the bottom, there are the unit tests. You'll want to have many of them, so you'll want them to be fast (~10ms). Then there are integration tests, which test how different parts go together (longer running, not too many, ~/30 as many, ~1s). And then there are completely functional tests, but you'll have even less of them (/10 as many, ~30s).

What do you test in a Django view? Functional tests do page contetn and flow control. Integration tests test the view, persistence, and similar thing. Unit tests provide coverage for actions and data requisition.

Don't just write tests, write the right tests.

Code quality in tests involves ease of reading, one test for one thing, but not things like DRY – ease of reading beats having to hunt down what happens due to cleverness.

In your tests, aim for ease of reading and writing above cleverness.

Tests get complicated in the long run, such as mixins, client fixtures (only called in integration tests once, and functional tests repeatedly, but never in unit tests), and fixture files which grow outdated, soon.

Use pytest with pytest style function tests, with pytest fixtures, and not auto-using them. And use Factory Boy!

And go further: Have Behaviour Driven Development (inspired by Test Driven Development). Write up Feature Scenarios, and test them! You can use pytest-bdd, for example. This documents your specifications by design, and gives you functional tests from the very beginning.

Snapshot testing is useful, too: Test screenshot testing by comparing images. Compare SQL queries (django-perf-rec), and maybe test your HTML, or your API output.

Make your tests a prime part of your process.

Stay curious, try things, think before you code.

Don't overengineer it!