DjangoCon Europe 2017 - To Index or Not, That’s Not The Question
Speaker: Markus Holtermann (Django core developer with a particular focus on the migrations framework)
Background theory
Filtering/selecting for a specific value does a full table scan by default. This does not scale for large tables.
DB Indexes give access to a single or multiple column in your table, making guarantees on timing for specific lookups.
Indexes use B Trees. B trees are self-balancing, providing us with the timing guarantees.
B Trees have nodes combining several values and several + 1 pointers. Leaf values then hold a pointer to the table row it belongs to. The other pointers point to further nodes which hold values smaller, in-between, and larger than the values in the node, respectively. This provides an easy way to find a value, binary-search style.
An additional pointer points go to the next node in the same layer. This makes it easy to select ranges, since we need to find the initial value and can then just follow the same-layer pointers.
The values are balanced regarding the nodes, which means "evenly distributed within the layers and nodes".
Indexes in Django
db_index = True
index_together = (('foo', 'bar'), )
-
ForeignKey
/OneToOneField
primary_key = True
Index(fields, name)
New, fresh from GSoC 2016, arriving in 1.11!
You can define an indexes
list on a model's Meta
class. Every element is a field of type models.Index
.
models.Index
takes a fields
list of strings, and a name. But you can also use different index classes, eg special
classes provided by django.contrib.postgres
to index JSON fields.
Feature Ideas
- Functional Indexes
- Let
db_index
take other index types, instead of justTrue
/False
- Define default index types for fields (maybe useful for JSON fields?)
- Refactor
index_together
anddb_index
so that we do not have three ways to define indexes - Add
GiSTIndex
which allows you to search for locations near other locations
Feel free to work on this during sprints!