EuroPython 2018: Let’s Build a Python Profiler in 25 LOC
Noam Elfanbaum is the author of pycubator.com and a data engineering lead at Bluevine.
Builtins
Python has two builtin profilers in stdlib: profile
and cProfile
(for speed).
python -m cProfile -s tottime file.py
Deterministic profiling
We can access the process call stack ourself on a trigger, and then print it out. We'll use ``sys.setprofile```, and register a profiler which receives a call_stack, an event, and further args. and then we run traceback.extact_stack(call_stack) to get to the data.
def profiler(call_stack, event, arg): line = traceback.extract_stac(call_stack)[0] print(line) sys.setprofiler(profiler)
Statistical profiling
Statistical prifliers sample the program on a given interval. One way to do so is using OS interrupts.
def print_handler(handler, call_stac): line = traceback.extract_stac(call_stack)[0] print(line) def start(handler, interval=1): signal.signal(signal.SIGPROF, handler) signal.setitemer(sgnal.ITIMER_PROF, interval, interval) atexit.register(malbda: signal.setitimer(0)) start(print_handler)
Selection
The statistical profilers run with low, controllable and predictable overhead. (Use in prod) The deterministic profiler introduces latency and stress, but is more complete. (Don't use in prod)
Building your own statistical flame graph profiler
Set an interval as before. After extracting the tack, join the stack lines, and count how often we arrived at this location.
import atexit import collections import signal import traceback stats = collections.defaultdict(int) def _sample(_, call_stack): stack = traceback.extract_stack(call_stack) formatted_stack = ';'.join(line.line for line in stack) stats[formatted_stack] += 1 def start(interval=0.005): signal.signal(signal.SIGPROF, _sample) signal.setitimer(signal.ITIMER_PROF, interval, interval) atexit.register(lambda: signal.setitimer( signal.ITIMER_PROF, 0)) def format_stats(): stats_out = [f'{trace} {count}' for trace, count in stats.items()] return '\n'.join(reversed(stats_out))
Run this and pipe the result into flamegraph | browser
. You'll now get an svg plotting where most time was spent.