Strap in, this is a long one: As part of the Python Standard Library
traversal, after yesterday's short post about binary
data services, today we're going to look at Python data type modules.
There's a lot of them:
- Don't use
- I'll be happy to forget
calendarin a few days.
- All of
collectionsis good and underused.
- deques have a
maxlenattribute and will discard items past that length.
- deques have a
- bisect maintains sorted lists cheaply.
- Trigger callbacks on garbage collection with
- You can only weak ref
dictsubclasses, not the original types themselves.
types.coroutine()can turn a generator function into an awaitable coroutine.
- If an enum has two members with the same value, the second one is an alias for the first.
Aware and naive objects
datetime is a big reason that I have not fallen out of love with Python yet. Native support for naive and aware
immutable objects that block interactions between the two? Sign me up!
timedeltas are made out of love. You use them to add intervals to dates or times, but you can also use all the usual math operations on it and compare them to other timedeltas.
dates use the idealized Gregorian calendar: going infinitely forwards and backwards – remember this if you need dates in Russia in 1900, and/or watch my talk about calendars. Note that dates on their own do not have a timezone, and as such are scary and misleading.
Interfaces include the usual attributes (day, month, year, weekday, you name it). Next to that, there's
strftime(), of course.
datetimes combine, you guessed it, date and time. Use
now() (or better
utcnow()) for handy
constructors. There's also
utcfromtimestamp(), which you should be using instead of
fromtimestamp(), because using
naive datetimes is just asking for trouble.
combine() dates and times. The attribute
fold (0 or 1) tells you if it's
the first or second occurrence of a time when it occurs twice in a day (due to daylight savings and similar edge cases).
You can add timedeltas, or get them by way of subtraction. You can compare them, which is pretty useful. You can extract
time(), though you should really be using
To add a timezone to a naive datetime, use
replace(tzinfo=tz). To cast a datetime to a certain timezone, use
astimezone(tz). You can use
dst() to retrieve timezone related data.
time is like
datetime without the
tzinfo is an abstract class. Use instances of it, provided by other people who know what they are doing. Hopefully.
strftime and strptime
Hey, look, it's another string formatting language! There's a lot of placeholders.
zoneinfo uses system timezone information, if available, and otherwise
tzinfo from PyPI, if available. It implements
tzinfo abstract base class.
Have you ever said to yourself "I wish I could build a calendar grid like unix's
cal does in Python"? Well, aren't you
calendar.TextCalendar().prmonth(2020, 12) has you covered. There is also an
HTMLCalendar and localised
versions of each.
ChainMap allows you to group mappings together, and is more efficient than chaining
.update() calls. You can access
the initial list of mappings in the
maps attribute and update it with
Feed any iterable to a
Counter, or hand it keyword arguments. You can also
subtract() iterables, or
repeatedly. Access results via normal dictionary access, use
.elements() to get the raw elements fed into the counter,
most_common(n) for more evaluation.
Deques (pronounced "deck"?!, double ended queue) permit threadsafe and fast (
O(n)) appends and pops to both sides of a
list-like. You can pass the constructor a
maxlen attribute that will cause items to be discarded when the deque grows
past its limit.
It follows a list-like interface, though it adds the
popleft methods for easier access.
Additionally, it comes with a
rotate() method that rotates the deque contents n steps to the left.
defaultdict is supremely useful, and I'm not sure there is more to say. Pass it a factory function, and you never have
to see any
KeyError ever again. My favourites are
int (though you can use
Counter for most of the
namedtuples are slightly better than peppering everything with just unstructured tuples and dicts. Instantiate them
namedtuple('Point', ['x', 'y']), then use them as a real class:
Point(11, y=30). The field definition can also,
for some reason, be a single string with comma- or whitespace-separated values. Since they are full-featured classes,
you can also use them as base class for inheritance.
OrderedDict is important if you're unable to upgrade to modern Python. My sympathies.
… No, sorry. It still has things like
popitem() with a direction indicator, and better ordering
Use classes provided in
collections.abc to test if a collection type has certain attributes, such as
heapq implements prioritiy queues, aka binary trees. That means
heap is always the smallest item, which is also
pop(). Initialize one with an empty array or run
heapify() on an existing one.
heappop() interact with the heap. If you want to do both, run
merge() you can take several
already sorted lists into a heap, and you can retrieve the
nsmallest items. Insertion and removal, if
I understand it correctly, run in
Maintain a sorted list without expensive resorting – uses bisecting under the hood. You can tell this one's by and for
the theorists because everything is just called
bisect_right finds the correct insertion point, while
insort_right handle the insertion,
too. Check the docs to see how to use
bisect to find items in a list.
Arrays are like lists, but constrained to one type out of a fairly small type pool. The type is specified at creation
time. Create one with
array.array(), which provides list-style interfaces. Additional methods: check its byte size
byteswap() to swap the byte order on all items (if the items are 1, 2, 4, or 8 bytes long).
You can append additional items with methods like
fromstring, and export with
weakref is for people who miss playing with garbage collectors when they use Python: A weak reference lets you access
its referent, but does not keep it from being gc'd. Most types support weak references. Some built-in types, like
dict do not, but if you subclass them, the subclasses support weak referencing.
Usually you want to handle many weak references at a time. For this purpose,
finalize, you can register cleanup functions to be run when an object is
gc'd (but do not call the finalizer manually – it will run at most once!). Pass
atexit=False if you do not want the
callback to trigger when the whole program exits. You can remove them by running the returned finalizer's
Usually, those functions and classes will be sufficient, but you can also manually create weak references with
weakref.ref(object[, callback]). If you create several references to the same object, the most recent garbage
collection callback will be called first. You can count and collect all weak references to an object with
Dynamic Type Creation
A module for wizards who create their own types dynamically. Use
new_class given a name, base classes, a metaclass and
so on, and
prepare_class to just generate the metaclass and namespaces.
resolve_bases() to replace the
__mro_entries__ method with the unrolled/evaluated MRO.
Standard Interpreter Types
types includes some classes that are mostly useful for
isinstance checks, such as
CoroutineType and so on.
types.coroutine(), you can transform a generator function into a coroutine function.
copy provides exactly
copy.copy for shallow copies and
copy.deepcopy for deep copies.
your back, and typically resolves recursion gracefully. You can change copy behaviour on your own classes.
I disagree with the decision to put
reprlib into the data types section, when they would be better placed
in the string services section.
pprint serves to represent arbitrary data structures in a way that can be used as input to the Python
interpreter. Instantiate a
PrettyPrinter and call
PrettyPrinter.pprint. On the printer class, you can set an
sort_dicts=True, and most interestingly a
depth. I can recommend using a higher indent value than
pformat as a shortcut to instantiating a printer and receiving its result as a string, or
print to stdout or an arbitrary stream. Use
saferepr to disable all recursion.
reprlib provides an alternative
reprlib.repr() that limits the size of the result. You can customize things like the
maximum array entries to be printed and the maximum recursion level. You can also decorate your
@recursive_repr(fillvalue="...") to handle how nested objects are represented.
Enums map their attributes ("members", symbolic constants) to unique constant values. They can be compared and iterated
over. You can use the classes
IntEnum for numerical constants,
IntFlag for numerical constants that should
support bitwise combintion (and remain subclasses of
Flag for the same just without the
Flag enums that are not enum members have a boolean evaluation of
You can decorate an enum class with
enum.unique() to ensure that each value occurs only once (names are always unique).
enum.auto() can be called in the member definition to assign a value by function. By default you get integers starting
1, but you can override
_generate_next_value on your enum class for different behaviour.
type of the enum's members is the enum class itself, so you can use
isinstance() to check for proper values. You
can iterate over enums to get all their members, or access
__members__. Enum members have properties to access their
value. They are also hashable, so you can use them as dict keys and in sets.
Methods defined on the enum class are available on the enum members. You can subclass enums only when they don't define
any members. They can be pickled. If you use throwaway values (like
object) for your values, change
__repr__ to hide that value.
You create enums by subclassing
Enum, and you set members by defining class attributes.
You can also create them programmatically, like
Enum('Colour', 'RED BLUE GREEN'). Add a
module=__name__ if you want
to pickle them.
Enums are so weird. You can retrieve enum members by name with
MyEnum["NAME"], and you can retrieve
them by value with
MyEnum(VALUE). If a value occurs multiple times, you just get the first one, because all later ones
are actually just aliases to the first member with their value.
You should compare enum members by identity with
is not, though use of equality comparison is also supported.
IntEnum members can return
True when comparing to something other than an enum member.
Enums are usually not ordered – check the docs to see how to implement an
OrderedEnum base class.
Enums are extremely weird when you think in normal Python classes, which is mostly due to their custom metaclass. Members are somewhat like instances, except that they are singletons.
TopologicalSorter, a handy sorting and iterating class for hashable node elements. In
topological sort, elements are connected by directed edges. The linear sort order guarantees that of two connected
nodes, the originating node comes before the other in the sorted list. A complete topological ordering is only possible
in acyclic graphs.