There's no place where "one obvious way to do things" fails as much as it does with file and OS interaction. As part of
the Python Standard Library traversal, after
yesterday's post about functional programming modules, today
we're going to look at the file and directory access modules. Yes, all of them:
- The 80% overlap between
os.pathis bordering on hilarious. One obvious way indeed.
fileinputis really weird.
statallows you to query and extract results of
shutil.diskusage()returns total, used and free bytes for a given directory.
shutil.which()looks up executables (platform independent
shutil.make_archivesupports zip, tar, gztar, bztar and xtar out of the box.
Nearly anything you want to do with files or paths, you want to use with
pathlib. Instantiate a
then enjoy the goodness. You can concatenate paths with
Path("a") / "b" / "c", which is really neat. For paths that
don't access the filesystem, you can use
PurePath instances. Paths are immutable, hashable, orderable.
Paths expose their parts via the attributes
parts (all of them),
parents (all parent
name (the complete file name),
suffix (file extension),
suffixes (extensions as a list)
stem (name without suffix).
as_posix() gets you the path with forward slashes,
as_uri() as a
is_absolute() is useful,
is_reserved is extremely useful on Windows and
False everywhere else.
expresses one path starting from the other.
with_name() returns a new Path with the same path, but the new filename.
with_suffix() does the same for the file suffix.
True if the path matches the given pattern. You can use
rglob() for an implicit
**/) to find files matching a pattern starting at the path. With
iterdir() you can iterate over all files in
a given directory.
home() return a new path in your current or home directory.
resolve() turns any path into an absolute
path with no symlinks and no relative elements.
exists() checks file or directory existence, and follows symlinks.
Path.stat() returns an
you can either query directly or with the
chmod() changes permissions and modes. You can also use
is_char_device() as shortcuts. With
samefile() you can compare files, just like with
~user strings in a path.
Create new directories with
mkdir() and new files with
touch(), and remove them with
unlink. Move files or
replace(). Remove empty directories with
rmdir(). Create symlinks with
link_to, remove them with
unlink. Open files with
for immediate read access (and
write_text() for writing).
os.path is in a weird spot: Some of its functions are being replaced by the much nicer
pathlib module, but nowhere
near all of them. For example,
basename() is available on
pathlib.Path objects, but
commonpath is not.
These functions, to my knowledge, are available by way of
isabs() is replaced by
is_absolute() (look at the readability!),
is_file (and same for links and
ismount does not have an equivalent in
splitext() is replaced by
Path.stem. File information functions like
getsize are replaced by
These functions (again, to my knowledge), are still exclusive to
commonpath tells you the longest common
path from a list of paths, and
commonprefix does the same, but from the beginning of a path.
variables of the form
normpath changes a path to lowercase on Windows and leaves it unchanged
relpath is kind of equivalent to
Path.relative_to, but the signature is different enough to be annyoing
samefile is available on
Path objects, but
sameopenfile (comparing descriptors) is not, and
In a strange set of capabilities,
fileinput allows you to loop over standard input or a list of files. If you call
fileinput.input() without any arguments, it uses
sys.argv[1:]. You can use
fileinput.lineno() and the likes to get information about the file you're currently reading.
os.stat() on something, you can use the
stat module to interpret the result. It contains just a ton of
S_ISDIR() that return a boolean when passed a stat mode. It can also be used to extract the file mode
(eg as a
-rwxrwxrwx string), and to extract other information from stat results, like
filecmp compares files and directories with varying speed and accuracy. For detailed content comparison, use
cmp() compares two files. If called with
shallow=True it only compares the
the same for all the files in the two given directories. These functions return a comparison object which you can use to
see matching and different elements in the comparison process.
tempfile provides platform-independent temporary files and directories. All of its interface classes can be used as
context managers, providing automatic cleanup once you exit the context. If you don't use them as context managers, they
will be deleted when you close them or when they get gc'd.
TemporaryFile, the base class, is not visible in directory listings under Unix, but that's not a platform independent
NamedTemporaryFile to get cross-platform consistent, visible files.
SpooledTemporaryFile keeps the data
spooled in memory until you call
rollover() (which cause the data to be written to disk).
TemporaryDirectory also removes all of its contents when it is deleted.
If you don't want the whole context manager thing, you can use
mkdtemp() to create temporary files and
directories where you have to handle removal yourself.
gettempdir() tells you where
tempfile will create its files
glob provides Unix-style wildcard expansion.
fnmatch.fnmatch() to return
a list of files matching your input string. If you set
recursive=True, you can also use
** directory wildcards. Use
iglob if the result can be huge – it returns an iterator instead of a list. Both these functions raise auditing
You can use
glob.excape() to deal with paths that contain the special characters
fnmatch provides Unix-style wildcard matching, with a
fnmatch and a
filter function for easy use, and a
function that translates the given pattern to a regular expression to be used with
In most cases, you're going to want to use
glob, since it treats
/ as special separator character. Or
pathlib.Path.glob, of course.
linecache is used to retrieve the file contents printed in tracebacks. You call it with
shutil provides high level access to files, particularly for copying files: But be warned that even
cannot transfer all metadata, and the metadata transfered is different depending on your OS. All of these methods
raise auditing events.
Directories and files
copyfileobj copies a file-like object to another file-like object, and
copyfile copies a file from one path-like to
copy does the same as
copyfile, but also accepts a directory as destination.
copymode copies permission
copystat additionally also copies access times and flags.
copy behaviour with
chown changes user and group ownership of a given path.
copytree copies an entire directory tree. You can ignore files, and you can use
ignore_patterns to create an ignore
function from a list of ignore patterns.
rmtree removes an entire directory tree.
move moves either files or
directory trees with a copying function of your choice.
diskusage returns a tuple of total, used and free bytes in a given directory.
which looks up executables, using the
make_archive creates an archive. As all archiving tools, it has an unholy amount of options, and the directory options
are particularly confusing. Highly recommend to play around with them a bit. It supports several formats: zip, tar,
gztar, bztar and xtar (depending on available modules – use
unpack_archive does the same in
To make sure
shutil isn't too intuitive to use, it also houses
get_terminal_size(), which returns a tuple of rows