December 2: Python Binary Data Services
As part of the Python Standard Library traversal,
today we're following up yesterday's text processing services with
the binary data services: struct
and codecs
.
Highlights
-
codecs.encode("foo", "rot13")
😍 - Now I know about yet another formatting language in Python. Thanks,
struct
.
struct
Convert between Python values and C structs (which get presented as bytes). You can pack
and unpack
values according
to format strings that you pass on. You can also use iter_unpack
and unpack_from
for efficient handling of larger
structs.
Format strings
By default, you get proper system-appropriate byte order and alignment, but you can change those with the first character of the format string. Other than that, you get a typical type-based placeholder language, aka yet-another formatting language yaay?
codecs
This module defines base classes for codecs: encoders and decoders. Most of the ones included with Python are for
text/byte conversion. codecs.encode()
and codecs.decode()
do what string.encode()
and bytes.decode()
do. With
codecs.lookup(name)
you can get a CodecInfo
object, which gives you direct access to a streamwriter, a streamreader,
encode and decode functionality and incremental encoding and decoding. If you have codecs of your own, you can
register()
them. codecs.open()
behaves like general open()
, but is restricted to binary modes. If you need to do
weird transcoding magic, use EncodedFile
. codecs
also defines BOM constants for when you have to meddle with
platform dependent data.
Codec Base Classes
Executive summary: You can implement your own Codec
subclasses, and it's neither impossible nor particularly painuful.
You have to implement stateless encoding and decoding as well as stream reading and writing . Additionally, you are
encouraged to support at least the two main kinds of error handling, strict
and ignore
, and optionally further error
modes. The module provides base classes for incremental encoding and decoding, and stream encoding and decoding.
Standard Encodings
Python comes with a bunch of standard encodings, not just the usual utf-8, latin-1 etc. If you ever need weird encodings, you'll be thankful for it. Some of these are specific to Python itself and have no application outside the language domain. Naq gurer vf nyfb EBG13 fhccbeg.
idna
encodings.idna
is there to transform non-ASCII characters in domain names into those xn--
strings, with ToASCII()
and ToUnicode()
. It also provides the nameprep()
function, which normalizes domains, mostly by lowercasing them.
If you're looking at this, please use the idna
package from PyPI instead, as encodings.idna
only supports an
outdated RFC.