December 11: Python Operating System Services

rixx

2020-12-11

After some restful days, today's Python Standard Library Traversal follows up yesterday's Cryptographic Service Modules with all the generic OS interaction modules. That is os (but not os.path, we did that a week ago), io, time, argparse, getopt, logging and its submodules, getpass, curses and submodules, platform, errno and ctypes. Strap in!

Highlights

On Windows, os.startfile() acts like double-clicking the file.
There are way too many ways to start new processes in Python.
"seconds since epoch" usually excludes leap seconds.
argparse is honestly not as bad as I remembered it. Stockholm syndrome?
logger.getChild() is like calling getLogger() on the full target name, very useful when you use stand-in values like __name__.
logger.handlers.TimedRotatingFileHandler and logger.handlers.RotatingFileHandler handle log file rotation, nice!
logger.handlers.HTTPHandler defaults to secure=False.
You can change the default %-style string formatting in logging formatters to str.format() or string.Template.
You can query the platform you are running on with the platform module.

os

The os module provides "a portable way of using system dependent functionality", just like half the other modules (scnr).

Process parameters

You can gather information on the current process and user with functions like ctermid (filename of the controlling terminal), environ (a mapping of all environment variables and their values), get_exec_path, getegid, geteuid (for effective IDs, the same exist without the e for the "real" IDs), getgrouplist, getlogin, getpgid, getpid, getppid (parent process ID), uname and umask. You can use putenv to modify environment variables (relevant for child processes), setegid/seteuid/setpgid etc to change process affiliation, and setpriority to change process scheduling priority.

File descriptors

You can interact with files using file descriptors: small integers referring to files opened by the current process: usually 0 is stdin, 1 is stdout, 2 is stderr and further files opened by the process are numbered incrementally. You can run fileno() on a file-like to retrieve the descriptor. os.close() and os.closerange() can then close files by descriptor. copy_file_range can copy a given range of bytes from the source descriptor to the target descriptor. You can create new special descriptors with openpty and pipe (or pipe2). You can manually read and write using read and write, or pread and pwrite while leaving the file offset unchanged. For most socket activity, consider using the socket module instead.

Files and directories

os defines a bunch of functions to interact with files and directories, in addition to os.path and pathlib. access tells you if you have access to a path. The ch family (chdir, chmod, chown, chroot) do exactly what you'd expect them to do (they also have l prefix versions to interact with links). There is a lot of overlap with pathlib, so I'm skipping over quite a bit here.

Many of these functions generally require a path (or a Path), but can also deal with file descriptors, if os.supports_fd. Often you can also pass dir_fd, that is, a file descriptor pointing to a directory to which the path is relative.

Process management

You can use the os module to create and manage processes: Use nice() to change process niceness, and use times() to retrieve process times (user/system/elapsed real time).

There are a ton of exec* and spawn* functions to start a new process, either replacing the current one or as new processes. fork() and the related register_at_fork() (to be executed on forked child processes). abort(), kill()/killpg (process group) serve to terminate processes. All of, I kid you not, wait, waitid, waitpid, wait3 and wait4 provide different ways to wait until a process terminates and retrieve its result.

There are also a bunch of functions that have better/more usable versions in other modules, like os.system() (runs a command in a subshell, use subprocess).

Scheduler

On some Unix platforms, you can interfere with how a process is allocated CPU time. If you want to do this in Python, you'll probably want to consult your OS man pages.

System information

os.name exists, but you should probably use platform instead. cpu_count() and getloadavg() do what you would expect them to do. os also exposes a bunch of informational attributes, like sep (path name separator), pathsep (separator for PATH) or linesep (what even is a newline).

Randomness

We not only have random and secrets, we also have ways of retrieving random data in os. Yay. In particular, os provides getrandom (relies on entropy from environmental noise, available on Linux) and urandom (suitable for cryptographic use, calls getrandom if available).

io

The io module deals with text I/O, binary I/O and raw I/O (which is a building block for the other two, you usually don't want that). The objects used for interaction are usually called streams or file-like. Streams can be read-only or write-only or read-write, and can permit random access or only sequential seeking. All streams support the file-like interface of methods like close(), flush(), read() and readline(), seek() if they are seekable(), and write() and writelines() if they are writable().

Text streams are created when open()-ing a file in a non-binary mode, but you can also create them with io.StringIO("my text") Binary streams are created when open()-ing a file in a binary mode, or with io.BytesIO.

time

time provides system clock related functions that map pretty straight onto their C equivalents. If you think about using time, please think at least twice about using datetime instead (which you should use for anything that handles real-life dates and times), and make sure you understand which of time is monotonic. The module also defines constants that give you access to different system clocks, like CLOCK_REALTIME and CLOCK_MONOTONIC. You can also use this module to change the system time and timezone.

A lot of these functions are system dependent: 11 are available only on Unix. I'll only mention those that don't have "USE DATETIME" printed all over them. With clock_gettime() you read the value of a given clock, and with clock_settime() you can, on Unix, change the realtime clock value. monotonic() (and monotonic_ns for nanosecond support) retrieves the value of a clock that is guaranteed to be monotonic, which means it is not affected by system time changes and its starting point is undefined. Arguably the most useful function of the bunch. perf_counter() uses the highest-resolution clock, also with an undefined start value, and is system wide (thus includes sleep times). process_time() tells you the sum of system and user time your script has used up (but also with an undefined start value), and sleep makes the time fly by. time and time_ns give you the epoch time as a float (but note that you can get that from datetime if you want an actual date, and you should use monotonic() if you use it for timings or counting).

argparse

With argparse (which supersedes the old optparse), you can define and parse CLI flags, including help texts, long flags, default values and automatic usage string generation. You do this by instantiating ArgumentParser objects, and calling add_argument(), then running parse_args() (with a list of strings, defaulting to argv). Instead of doing all this directly, you can also add subparsers, if your script provides wildly different subcommands. You can customise many-but-not-all things – change start and end (and middle, if you want) of the help output, and enable or disable users using unambiguous shortened versions of --long-flags. Parsers can also parse flags from files that are passed on the command line with a specified prefix.

All of this is much better than I remember, but I think I'll still default to using click in many cases.

Arguments

Arguments can specify a name or a set of flags, the number of arguments that should be consumed (+ just noms up all the arguments until encountering a new flag, ? takes one if possible or else nothing), a default value, a type (everything is a string until you tell argparse to convert it), valid argument choices, a help message, and the name of the variable in which to store the received value. You can put arguments into groups (nice for better help display) and into mutually exclusive groups when you need only one of them to be present.

Actions

You can also set an associated action to be called. The default is store, which just shoves the value into the result object, but you can also use store_true (good for flags where you don't care about any value), append when you're collecting a list of things, count for flags like -vvv. There are also special actions for help and version printing. You can also specify arbitrary actions, eg argparse.BooleanOptionalAction to add support for --no-foo flags that will set foo=False.

getopt

Use getopt if you need (or want?!) to parse CLI options the way C's getopt does. Everybody else probably wants to use argparse. The module defines both getopt and gnu_getopt because of course it does.

logging

The logging module is fiendish – there are multiple official tutorials, which are worth reading. The module comes with four major concepts: Loggers (the application interface), Handlers (which decide where to send the input), Filters which can contain more detailed rules for content to pass or to silence, and Formatters to change the final layout of the messages.

Loggers

Never instantiate loggers directly, always use getLogger (sic) for more threadsafe singleton goodness. Loggers work hierarchically, and a logger named a.b will have a parent called a. You can set the propagate attribute to configure if a logger should pass events to its parent. With setLevel, you set the threshold for this logger, and all messages of a lower level will be ignored. Loggers are often named __name__ to follow Python packaging hierarchy – you can then use getChild() to get new loggers lower in the hierarchy.

You use loggers by calling methods named after the log level in use: debug(), info(), warning(), error() and critical(). When you call these methods on the logging module, they get passed to the root logger. You pass a message, and optionally exception/traceback information, and optional extra={} additional data. You can also use the log() method and provide the log level there, or the exception() method from an exception handling context, which will add exception information to the logging message.

config

If you don't want to configure logging with the methods and functions mentioned above, you can also use the logging.config interface, and pass a dictionary or a file.

Handlers

You can manage handlers with the addHandler() and removeHandler() methods on a logger. They have a configured log level, same as loggers, and will ignore anything less severe than their level. Some handlers are included in the standard library in logging.handlers: StreamHandler for streams (most notably sys.stdout, sys.stderr and file-like objects), FileHandler for all files, NullHandler as no-op, WatchedFileHandler which detects changes in a file and closes and re-opens it to avoid overriding other changes, RotatingFileHandler and TimedRotatingFileHandler to start a new log file after a set time or byte length. There's also SocketHandler for TCP and DatagramHandler for UDP-based logging. SysLogHandler talks to the OS syslog, as does NTEventLogHandler. Use SMTPHandler to spam a poor email account somewhere, or HTTPHandler (with secure=True) to do the same to a poor API endpoint. With the QueueHandler class, you can send the messages to a queue.queue or a multiprocessing.queue if you want to handle logging in a separate thread, to avoid congestion due to slow logging methods (think SMTP).

Filters

Filters are useful when log levels alone are not enough to decide what to do with a log message, either in a handler or a logger. You handle filters with the addFilter() and removeFilter() methods on a logger. With logger.filter(), you can test if a message will be processed by the logger.

The base filter works based on logger hierarchy, and will only allow events in the given branch of the logging hierarchy, but implementing your own filters is not hard – you can even just implement a filter function without a wrapper class.

Formatters

Formatters convert LogRecord objects to strings. They use %-style string formatting, though you can change this behaviour by setting the style to { (for str.format()) or $ (for string.Template).

LogRecord

LogRecord objects contain all the logging information prior to formatting. It's generally what you'd expect: The message, the originating function/level/path, but also things that surprised me: A relativeCreated time in milliseconds, the process name, and the running thread and its name.

getpass

The getpass module defines exactly two functions: getuser() retrieves the login name of the current user, and is generally better than os.getlogin(). getpass() prompts the user for a password.

curses

With curses, can use, uhm, curses. Surprise! It allows advanced terminal handling, including drawing pretty UI elements. Use the tutorial linked in the docs, as the docs themselves are a bit unhelpful. You can use curses to retrieve data about the terminal (e.g. baudrate() and can_change_color()), and change its state (e.g. erasechar, set_vis). You can do wonderful things like beep and flash the screen. You can even listen for and react to mouse events, in a way. You can even enter raw mode, where normal handling of interrupts, flow control and line buffering is turned off.

You'll want to start out with initsrc and start_color, which need to get called first. On the window object that you get from there, you can call low-level methods like addch to paint a character at a given coordinate. You can manipulate colours, draw borders and boxes and lines, create new subwindows, and redraw your changes on refresh(). If you want to interact with keyboard input, use the defined key constants.

curses.panel

With curses.panel, you can create window objects that understand depth, so you can stack them on top of one another.

curses.textpad

With curses.textpad, you can provide a basic textbox class that allows text editing with emacs-like shortcuts.

platform

Query the platform you are running on with the platform module. There's a range of information functions: architecture() for the architecture bits (returns a string like "64bits" for some reason), machine() for what I would have called the architecture, eg "x86_64", node() for the network name, platform() for a magic string identifying your system, processor(), a whole bunch of python_{} functions, release(), the much more useful system() ("Linux", "Java", "Windows") and version (system version). You can also choose to run uname(). There are some OS specific checks, like libc_ver().

errno

The errno module defines standard system errors. They have integer values, and you can use os.strerror to translate them to hooman speech.

ctypes

ctypes is "a foreign function library", which sounds very ambassadorial. It provides C compatible data types and allows you to call functions in shared libraries, and wrapping those in Python. You can, for example, access ctypes.libc.printf (and generally call find_library to find a shared library). You can do all sorts of unholy cross-code accessing – I'm just listing the standard use cases here.

Types

The docs contain a handy table to help you map C types to ctypes. You create instantiate them by calling them with a compatible value, like c_wchar_p("Hello World"). They typically are mutable, and you can change their value by assigning to their value attribute. Note that when you assign a new value to the pointer types (string, char, void), you change the memory location they point to, not the contents.

If you need (for function calls, usually) mutable memory blocks, you'll want to use create_string_buffer() or create_unicode_buffer. You can create structs and unions with the Structure and Union base classes, where you specify the _fields_ as tuples of a name and a type. If you need an array, it is recommended that you just multiply a type with the length of the array. Use cast(obj, new_type) to cast an object to a different type.

Functions

You can set argtypes to a list of types on any function to protect from calling it with incompatible types (though I don't think you can handle overloading here?). Functions are assumed to return integers unless you specify a different restype.

Function calls

For function calls, you'll have to wrap all types except for ints, strings and bytes in their ctype equivalent. To pass your own types, they have to define an _as_parameter_ attribute or property. If you need to pass pointers to a function, you can wrap your value in a byref call.