First Sunday of Advent: Built-in Functions

rixx

2020-11-29

This post is part of my series on Traversing the Python Standard Library.

Built-in functions are included in the standard library and are always available with no need for imports. There are 69 (nice) of them. I've rearranged them from the Python documentation order (alphabetical) into groups – this was short enough to do a detailed breakdown, the following days will be more of a summary.

Highlights

These are the things I did not know or had forgotten:

open() takes an errors argument that you can set to surrogateescape, which allows you to process and even roundtrip encoding errors in files.
iter() can take two arguments and then calls the first one repeatedly until its result equals the second argument, good for chunked reading.
round(), when it could round either way, chooses the even number, so both round(1.5) and round(2.5) is 2.

Favourites

My favourites among the built-ins:

I love all() and any() and throw them at iterables all the time, particularly at generators.
The introduction of breakpoint was theoretically a godsend, but my fingers are so used to typing import pdb; pdb.set_trace() that I'm still breaking (ha.) them of the habit.
dir() returns a list of attributes of its argument (or all names in your scope if called without argument). Both of these modes are extremely useful when debugging. Uses __dir__ where possible, and otherwise __dict__. Absolutely no guarantees or event attempts at completeness, eg does not include metaclass attributes when called on a class.

Types and casting

Types and casting is pretty intuitive in Python, but it's easy to forget some of the built-in types (bytearray, for instance, when you don't typically work on bytes).

bool() tests the truth value of its argument. It's a subclass of int and cannot be subclassed further.
callable tells us if the argument has a __call__ method, ie is a class or a function or a lambda function (or probably something I'm forgetting)? Does not catch async functions.
bytearray() objects can only be created with this constructor, which returns a mutable array of bytes. It can be used with a string + encoding, or a buffer, or an iterable of numbers (all of these initialise the bytearray to the values given). Unintuitive: it can also be called with a number and will return a nulled bytearray of that length. bytes is the same thing, just immutable.
dict, list, set, frozenset, tuple: cast input, typically iterables
range([start, ]stop) and slice([start, ]stop) are powerful and slice in particular is underused.

Math and numbers

Doing math with Python is reasonably pleasant even without waiting for NumPy or SciPy to install. We'll come back to this in a couple of days when I stare at the statistics standard library module.

I always forget that abs exists (and uses __abs__()) and doesn't need to be imported from math.
divmod() is exactly that: a // b, a % b
Why use pow() when you can use **. For integer operands, you can also pass a third mod argument, which is more efficient than doing the technically equivalent a ** b % c
bin() converts numbers to binary number strings, hex() does the same for hexadecimals, oct for octals.
int() and float() create numbers (for objects via __int__() and __float__()). int can take a base argument.
complex() creates a complex number, either from a string or from two numbers (on objects, it uses __complex__(), then float()/__float__(), which in turn falls back on __index__().
hash() returns a number, optionally using __hash__()
max() and min() are pretty straightforward. You can supply a key= named argument just like for list sorting, and you can provide a default= named argument that will be returned on empty iterables. Both of these take either an iterable or just a lot of arguments to be compared.
sum() has an optional start named argument that is easy to forget.

Stringy things

Everything is a string if you squint hard enough.

str(str) is "<class 'str'>"
repr() calls __repr__(), and ascii() does the same, but escapes all non-ascii characters.
chr() and ord() to convert between numbers and characters
format(string, format_string) is the same as string.format(format_string), but of course now you can format all sorts of things instead.
input() prompts for STDIN. I mostly use the inquirer library when looking at input(), because parsing user input 🙄.
open() turns files into file objects. Use as context manager. The mode can be any of rwax (read, write, append, exclusive create) combined with any of bt (binary or text mode), and + (open for reading and writing), default is rt. The buffering argument can change how files are buffered, by default text files use line buffering and binary files try to use the underlying device block size or io.DEFAULT_BUFFER_SIZE. The encoding argument is a life saver and you'll know it when you need it. When it stops saving your life, use the error argument and set it to surrogateescape.
print() prints all its non-named arguments to its file= named argument, defaulting to sys.stdout. Set sep= and end= for fun.

List things

Lists and iterables are probably why I like Python most in everyday life, particularly thinking back to my clunky experiences with Java. The biggest hurdle here is the fact that JavaScript iterating works so similarly but never quite the same. So whenever I switch between the two, I get to take a tour of the respective language docs.

len() the most basic list thing. Will eat your generator alive.
enumerate() goes through an iterable and yields both the next element and its index (optionally starting the index count at the second argument). The index comes FIRST which I will never remember. (I've added it to my Anki deck, though).
filter(function, iterable) is the same as a for a in iterable if function(a). I will never remember that the filter function comes first.
map(function, iterable) gives you an iterator that applies the first argument to each of the second, and yields the result. Guess what I will never remember. (I never use map, because list comprehension is always there for me.)
zip(*iterables) steps through all iterables given to it at the same time. If they have different lengths, everything past the length of the shortest iterable is discarded.
reversed() is what you use when you don't want to be eDgY and cool and use [::-1]. Uses __reversed__() if present. Both reversed and sorted can take a key function.
sorted() returns a sorted version of the provided iterable, optionally using key=, optionally reverse=True. Use list.sort() for in-place sorting and sorted() on non-lists and to create new objects.
next() retrieves the next object from an iterator. I think all the places where I've needed next() were hacky or just plain bad life choices.
iter(), when given one argument, creates an iterator object from it. Boring, and not often useful. Much more interesting is the version with two arguments: The first one is a callable, and the second one is the stopping value ("sentinel"). Iterating through the result will call the first argument until it returns the second one, making for an easy way to build block-based iteration:

from functools import partial
with open('mydata.db', 'rb') as f:
    for block in iter(partial(f.read, 64), b''):
        process_block(block)

Debugging

To take the optimistic view: Dynamic programming makes for great debugging skills.

globals(), locals() and vars(obj) are nice for debugging, please please please do not use them beyond that.
help() is something I should use more, but since it's hard to predict which library provides good help strings, I usually don't bother – when I hit this level of confusion, I go read the source.
id() is great to see if you've got a shallow copy problem.
isinstance() and issubclass() do exactly what's on the tin. Remember that the class to be checked out can be an iterable for both of these. type(argument) is good for debugging, but try to use isinstance or issubclass in code.
type(name, bases, dict) is completely different from type(obj). It allows dynamic class creation, which you will know when you need it.

Magic

And all the other built-ins that didn't fit in the categories above, and that provide dark magic not to be provoked lightly.

super() is the height of magic, but points towards the extremely useful __mro__ attribute of classes or types. For more information and recipes, read Hettinger's evergreen super() considered super
@classmethod turns the decorated function into a class function. You typically use cls instead of self for the first argument.
@staticmethod transforms a method into one that does not receive the implicit first self or cls argument. As classmethods, these can be called on the class and the object both.
property(getter, setter, deleter) creates a property. Use as decorator on def x(), then use @x.setter and @x.deleter if you need special access handling.
delattr (basically del), getattr, hasattr and setattr make dynamic programming way too tempting and fun
compile() turns a string (or bytes) into an AST object, which you can then exec() or eval() (returns the result). Can change the __future__ elements included, if the code may contain top level async code, and optimization levels (none / remove asserts / remove asserts and docstrings). You never want this, and if you do, you probably want ast.parse(). All of these raise auditing events.
__import__ is invoked by import. You nearly always want to use importlib.import_module when eyeing __import__.