December 11: Python Operating System Services
After some restful days, today's Python Standard Library Traversal follows up yesterday's
Cryptographic Service Modules with all the generic OS
interaction modules. That is os
(but not os.path
, we did that a week
ago), io
, time
, argparse
, getopt
, logging
and its
submodules, getpass
, curses
and submodules, platform
, errno
and ctypes
. Strap in!
Highlights
- On Windows,
os.startfile()
acts like double-clicking the file. - There are way too many ways to start new processes in Python.
- "seconds since epoch" usually excludes leap seconds.
-
argparse
is honestly not as bad as I remembered it. Stockholm syndrome? -
logger.getChild()
is like callinggetLogger()
on the full target name, very useful when you use stand-in values like__name__
. -
logger.handlers.TimedRotatingFileHandler
andlogger.handlers.RotatingFileHandler
handle log file rotation, nice! -
logger.handlers.HTTPHandler
defaults tosecure=False
. - You can change the default
%
-style string formatting in logging formatters tostr.format()
orstring.Template
. - You can query the platform you are running on with the
platform
module.
os
The os
module provides "a portable way of using system dependent functionality", just like half the other modules
(scnr).
Process parameters
You can gather information on the current process and user with functions like ctermid
(filename of the controlling
terminal), environ
(a mapping of all environment variables and their values), get_exec_path
, getegid
, geteuid
(for effective IDs, the same exist without the e
for the "real" IDs), getgrouplist
, getlogin
, getpgid
, getpid
,
getppid
(parent process ID), uname
and umask
. You can use putenv
to modify environment variables (relevant for
child processes), setegid
/seteuid
/setpgid
etc to change process affiliation, and setpriority
to change process
scheduling priority.
File descriptors
You can interact with files using file descriptors: small integers referring to files opened by the current process:
usually 0 is stdin, 1 is stdout, 2 is stderr and further files opened by the process are numbered incrementally. You can
run fileno()
on a file-like to retrieve the descriptor. os.close()
and os.closerange()
can then close files by
descriptor. copy_file_range
can copy a given range of bytes from the source descriptor to the target descriptor. You
can create new special descriptors with openpty
and pipe
(or pipe2
). You can manually read and write using
read
and write
, or pread
and pwrite
while leaving the file offset unchanged. For most socket activity, consider
using the socket
module instead.
Files and directories
os
defines a bunch of functions to interact with files and directories, in addition to os.path
and pathlib
.
access
tells you if you have access to a path. The ch family (chdir
, chmod
, chown
, chroot
) do exactly what
you'd expect them to do (they also have l
prefix versions to interact with links). There is a lot of overlap with
pathlib
, so I'm skipping over quite a bit here.
Many of these functions generally require a path (or a Path
), but can also deal with file descriptors, if
os.supports_fd
. Often you can also pass dir_fd
, that is, a file descriptor pointing to a directory to which the path
is relative.
Process management
You can use the os
module to create and manage processes: Use nice()
to change process niceness, and use times()
to retrieve process times (user/system/elapsed real time).
There are a ton of exec*
and spawn*
functions to start a new process, either replacing the current one or as new
processes. fork()
and the related register_at_fork()
(to be executed on forked child processes). abort()
,
kill()
/killpg
(process group) serve to terminate processes. All of, I kid you not, wait
, waitid
, waitpid
,
wait3
and wait4
provide different ways to wait until a process terminates and retrieve its result.
There are also a bunch of functions that have better/more usable versions in other modules, like os.system()
(runs a
command in a subshell, use subprocess
).
Scheduler
On some Unix platforms, you can interfere with how a process is allocated CPU time. If you want to do this in Python, you'll probably want to consult your OS man pages.
System information
os.name
exists, but you should probably use platform
instead. cpu_count()
and getloadavg()
do what you would
expect them to do. os
also exposes a bunch of informational attributes, like sep
(path name separator), pathsep
(separator for PATH
) or linesep
(what even is a newline).
Randomness
We not only have random
and secrets
, we also have ways of retrieving random data in os
. Yay. In particular, os
provides getrandom
(relies on entropy from environmental noise, available on Linux) and urandom
(suitable for
cryptographic use, calls getrandom
if available).
io
The io
module deals with text I/O, binary I/O and raw I/O (which is a building block for the other two, you usually
don't want that). The objects used for interaction are usually called streams or file-like. Streams can be read-only
or write-only or read-write, and can permit random access or only sequential seeking. All streams support the file-like
interface of methods like close()
, flush()
, read()
and readline()
, seek()
if they are seekable()
, and
write()
and writelines()
if they are writable()
.
Text streams are created when open()
-ing a file in a non-binary mode, but you can also create them with
io.StringIO("my text")
Binary streams are created when open()
-ing a file in a binary mode, or with io.BytesIO
.
time
time
provides system clock related functions that map pretty straight onto their C equivalents. If you think about
using time
, please think at least twice about using datetime
instead (which you should use for anything that handles
real-life dates and times), and make sure you understand which of time
is monotonic. The module also defines constants
that give you access to different system clocks, like CLOCK_REALTIME
and CLOCK_MONOTONIC
. You can also use this
module to change the system time and timezone.
A lot of these functions are system dependent: 11 are available only on Unix. I'll only mention those that don't have
"USE DATETIME" printed all over them. With clock_gettime()
you read the value of a given clock, and with
clock_settime()
you can, on Unix, change the realtime clock value. monotonic()
(and monotonic_ns
for nanosecond
support) retrieves the value of a clock that is guaranteed to be monotonic, which means it is not affected by system
time changes and its starting point is undefined. Arguably the most useful function of the bunch. perf_counter()
uses
the highest-resolution clock, also with an undefined start value, and is system wide (thus includes sleep times).
process_time()
tells you the sum of system and user time your script has used up (but also with an undefined start
value), and sleep
makes the time fly by. time
and time_ns
give you the epoch time as a float (but note that you
can get that from datetime
if you want an actual date, and you should use monotonic()
if you use it for timings or
counting).
argparse
With argparse
(which supersedes the old optparse
), you can define and parse CLI flags, including help texts, long
flags, default values and automatic usage string generation. You do this by instantiating ArgumentParser
objects, and
calling add_argument()
, then running parse_args()
(with a list of strings, defaulting to argv
). Instead of doing
all this directly, you can also add subparsers, if your script provides wildly different subcommands. You can customise
many-but-not-all things – change start and end (and middle, if you want) of the help output, and enable or disable users
using unambiguous shortened versions of --long-flags. Parsers can also parse flags from files that are passed on the
command line with a specified prefix.
All of this is much better than I remember, but I think I'll still default to using click
in many cases.
Arguments
Arguments can specify a name or a set of flags, the number of arguments that should be consumed (+
just noms up all
the arguments until encountering a new flag, ?
takes one if possible or else nothing), a default value, a type
(everything is a string until you tell argparse
to convert it), valid argument choices, a help message, and the name
of the variable in which to store the received value. You can put arguments into groups (nice for better help display)
and into mutually exclusive groups when you need only one of them to be present.
Actions
You can also set an associated action to be called. The default is store
, which just
shoves the value into the result object, but you can also use store_true
(good for flags where you don't care about
any value), append
when you're collecting a list of things, count
for flags like -vvv
. There are also special
actions for help and version printing. You can also specify arbitrary actions, eg argparse.BooleanOptionalAction
to
add support for --no-foo
flags that will set foo=False
.
getopt
Use getopt
if you need (or want?!) to parse CLI options the way C's getopt
does. Everybody else probably wants to
use argparse
. The module defines both getopt
and gnu_getopt
because of course it does.
logging
The logging
module is fiendish – there are multiple official tutorials, which are worth reading. The module comes with
four major concepts: Loggers (the application interface), Handlers (which decide where to send the input), Filters which
can contain more detailed rules for content to pass or to silence, and Formatters to change the final layout of the
messages.
Loggers
Never instantiate loggers directly, always use getLogger
(sic) for more threadsafe singleton goodness. Loggers work
hierarchically, and a logger named a.b
will have a parent called a
. You can set the propagate
attribute to
configure if a logger should pass events to its parent. With setLevel
, you set the threshold for this logger, and all
messages of a lower level will be ignored. Loggers are often named __name__
to follow Python packaging hierarchy – you
can then use getChild()
to get new loggers lower in the hierarchy.
You use loggers by calling methods named after the log level in use: debug()
, info()
, warning()
, error()
and
critical()
. When you call these methods on the logging
module, they get passed to the root logger. You pass a
message, and optionally exception/traceback information, and optional extra={}
additional data. You can also use the
log()
method and provide the log level there, or the exception()
method from an exception handling context, which
will add exception information to the logging message.
config
If you don't want to configure logging
with the methods and functions mentioned above, you can also use the
logging.config
interface, and pass a dictionary or a file.
Handlers
You can manage handlers with the addHandler()
and removeHandler()
methods on a logger. They have a configured log
level, same as loggers, and will ignore anything less severe than their level. Some handlers are included in the
standard library in logging.handlers
: StreamHandler
for streams (most notably sys.stdout
, sys.stderr
and
file-like objects), FileHandler
for all files, NullHandler
as no-op, WatchedFileHandler
which detects changes in a
file and closes and re-opens it to avoid overriding other changes, RotatingFileHandler
and TimedRotatingFileHandler
to start a new log file after a set time or byte length. There's also SocketHandler
for TCP and DatagramHandler
for
UDP-based logging. SysLogHandler
talks to the OS syslog, as does NTEventLogHandler
. Use SMTPHandler
to spam a poor
email account somewhere, or HTTPHandler
(with secure=True
) to do the same to a poor API endpoint. With the
QueueHandler
class, you can send the messages to a queue.queue
or a multiprocessing.queue
if you want to handle
logging in a separate thread, to avoid congestion due to slow logging methods (think SMTP).
Filters
Filters are useful when log levels alone are not enough to decide what to do with a log message, either in a handler or
a logger. You handle filters with the addFilter()
and removeFilter()
methods on a logger. With logger.filter()
,
you can test if a message will be processed by the logger.
The base filter works based on logger hierarchy, and will only allow events in the given branch of the logging hierarchy, but implementing your own filters is not hard – you can even just implement a filter function without a wrapper class.
Formatters
Formatters convert LogRecord
objects to strings. They use %
-style string formatting, though you can change this
behaviour by setting the style
to {
(for str.format()
) or $
(for string.Template
).
LogRecord
LogRecord
objects contain all the logging information prior to formatting. It's generally what you'd expect: The
message, the originating function/level/path, but also things that surprised me: A relativeCreated
time in
milliseconds, the process name, and the running thread and its name.
getpass
The getpass
module defines exactly two functions: getuser()
retrieves the login name of the current user, and is
generally better than os.getlogin()
. getpass()
prompts the user for a password.
curses
With curses
, can use, uhm, curses. Surprise! It allows advanced terminal handling, including drawing pretty UI
elements. Use the tutorial linked in the docs, as the docs themselves are a bit unhelpful. You can use curses
to
retrieve data about the terminal (e.g. baudrate()
and can_change_color()
), and change its state (e.g. erasechar
,
set_vis
). You can do wonderful things like beep
and flash
the screen. You can even listen for and react to mouse
events, in a way. You can even enter raw mode, where normal handling of interrupts, flow control and line buffering is
turned off.
You'll want to start out with initsrc
and start_color
, which need to get called first. On the window object that you
get from there, you can call low-level methods like addch
to paint a character at a given coordinate. You can
manipulate colours, draw borders and boxes and lines, create new subwindows, and redraw your changes on refresh()
. If
you want to interact with keyboard input, use the defined key constants.
curses.panel
With curses.panel
, you can create window objects that understand depth, so you can stack them on top of one another.
curses.textpad
With curses.textpad
, you can provide a basic textbox class that allows text editing with emacs-like shortcuts.
platform
Query the platform you are running on with the platform
module. There's a range of information functions:
architecture()
for the architecture bits (returns a string like "64bits"
for some reason), machine()
for what I
would have called the architecture, eg "x86_64"
, node()
for the network name, platform()
for a magic string
identifying your system, processor()
, a whole bunch of python_{}
functions, release()
, the much more useful
system()
("Linux", "Java", "Windows") and version
(system version). You can also choose to run uname()
. There are
some OS specific checks, like libc_ver()
.
errno
The errno
module defines standard system errors. They have integer values, and you can use os.strerror
to translate
them to hooman speech.
ctypes
ctypes
is "a foreign function library", which sounds very ambassadorial. It provides C compatible data types and
allows you to call functions in shared libraries, and wrapping those in Python. You can, for example, access
ctypes.libc.printf
(and generally call find_library
to find a shared library). You can do all sorts of unholy
cross-code accessing – I'm just listing the standard use cases here.
Types
The docs contain a handy table to help you map C types to ctypes
. You create instantiate them by calling them with a
compatible value, like c_wchar_p("Hello World")
. They typically are mutable, and you can change their value by
assigning to their value
attribute. Note that when you assign a new value to the pointer types (string, char, void),
you change the memory location they point to, not the contents.
If you need (for function calls, usually) mutable memory blocks, you'll want to use create_string_buffer()
or
create_unicode_buffer
. You can create structs and unions with the Structure
and Union
base classes, where you
specify the _fields_
as tuples of a name and a type. If you need an array, it is recommended that you just multiply a
type with the length of the array. Use cast(obj, new_type)
to cast an object to a different type.
Functions
You can set argtypes
to a list of types on any function to protect from calling it with incompatible types (though I
don't think you can handle overloading here?). Functions are assumed to return integers unless you specify a different
restype
.
Function calls
For function calls, you'll have to wrap all types except for ints, strings and bytes in their ctype equivalent. To pass
your own types, they have to define an _as_parameter_
attribute or property. If you need to pass pointers to a
function, you can wrap your value in a byref
call.