DjangoCon Europe 2019: Jupyter, Django and Altair - Quick and dirty business analytics

rixx

2019-04-11

Writeup of the DjangoCon Europe 2019 talk »Jupyter, Django and Altair - Quick and dirty business analytics« by Chris Adams

Chris Adams: Chris Adams is an environmentally focussed tech generalist, spending the last ten years working in tech startups, blue chip companies and government, as a user researcher, product manager, developer, sysadmin and UX-er. He runs Product Science, a small product development consultancy, and lives in Berlin, and has been using Django since 2008.

Notebooks

Python and, by extension, Django are very popular and stable. One of the big trends is data analysis, and one of the most popular ways to approach it are Jupyter notebooks.

Jupyter notebooks are brilliant for creating narratives. Narratives are collaborative and are for multiple people, they are shareable and publishable, especially since they are very reproducible. This makes them very popular in academic communities, naturally.

One popular user of Jupyter notebooks are the Economist – they have brilliant data visualisations, and in the background, there are Jupyter notebooks (which they share, too). Or O'Reilly who use it for educational content, e.g. with Peter Norvig.

Notebooks like that are great for Operations, too – it's great to show data and come to find and reason about big decisions. Netflix does this a lot, re-running notebooks for current analysis. They shared this as open source code, too!

JuPyTeR stands for Julia + Python + R. The Jupyter kernel executes Julia or Python or R code, and then saves results to the notebook file, which is then served by the notebook server to the browser.

Visualisation theory

Terminal and plain text output, but it's harder to understand and less informationally dense. Having the browser available, allows you to do plenty of helpful stuff, like showing actual table, or clickable/foldable dictionaries. You can expand geojson directly.

Visualisation is hard to get right and easy to get wrong. If you just want to read a book, read "Visualization Analysis and Design" by Tamara Munzner. Do eeet. And/or watch her talk at a d3 conference.

So data science has data types (think data structures), and attributes, which can be discrete or not. You can then represent data points as marks (any visual representtion) by modifying channels (position, color, size, shape, …).

Application

All of this theory is encoded in Vega-Lite, which you can then display via JSON data and JSON configuration. That's a javascript library, but we can use Python to run it, too – this is Altair, which uses vega-lite, to use vega, to use d3, which renders to canvas. The good part is that you can get help for every part of this stack.

In Django you can use Python to get the data from the ORM, pass it through Altair to the template, where Vega will render it.