tl.rename core functionality

The tl.rename.core module defines those parts of the package’s functionality that are not concerned with the user interface or specific file name transformation algorithms. It covers reading the new file names from a file or standard input, applying all known transformations to the original file names, and renaming the files accordingly.

Reading new file names

While the read_names_from_file function is counted among tl.rename’s core functionality, it is really just another file name transformation. It is passed a sequence of old file names and returns a sequence of new ones. If no processing options are given, the original names are returned:

>>> from tl.rename.core import read_names_from_file
>>> read_names_from_file(['foo', 'bar/baz'])
['foo', 'bar/baz']

The function accepts an optional names_file parameter which is assumed to be a file like object with each new name listed on a separate line:

>>> from StringIO import StringIO
>>> new_names = """\
... as/df
... fdsa
... """
>>> read_names_from_file(['foo', 'bar/baz'], names_file=StringIO(new_names))
['as/df', 'fdsa']

Whether the multi-line string read from the file-like object ends with a line break makes no difference:

>>> new_names = """\
... as/df
... fdsa"""
>>> read_names_from_file(['foo', 'bar/baz'], names_file=StringIO(new_names))
['as/df', 'fdsa']

Other whitespace including empty lines ends up in the file names, however:

>>> new_names = """\
...
...   as/df
... fdsa  """
>>> read_names_from_file(['foo', 'bar/baz'], names_file=StringIO(new_names))
['', '  as/df', 'fdsa  ']

Applying all registered transformations

How it works

The tl.rename.core module keeps a list of known file name transformations the first of which is read_names_from_file:

>>> from tl.rename.core import transformations
>>> transformations
[<function read_names_from_file at 0x...>, ...]

It also defines a function transform that applies the transformations to a list of file names, in order. This function works just like any of the individual transformations, taking a sequence of old names and any number of optional keyword arguments and returning a list of new names. Without any options, it also returns the original file names:

>>> from tl.rename.core import transform
>>> transform(['foo', 'bar/baz'])
['foo', 'bar/baz']

Options given to transform are passed to all transformations so each of them can pick whichever options are of interest to it. We stick with read_names_from_file in this example:

>>> new_names = """\
... as/df
... fdsa
... """
>>> transform(['foo', 'bar/baz'], names_file=StringIO(new_names))
['as/df', 'fdsa']

Note that it is possible and allowed to change the whole file path, not just the part after the last path separator.

If a slice option is passed to transform, any transformations apply only to the specified slice of each name [1]:

>>> transform(['01 - foo.txt', '02 - bar/baz.ogg'],
...           applied_slice=(5, -4), names_file=StringIO(new_names))
['01 - as/df.txt', '02 - fdsa.ogg']

What it guards against

In contrast to the individual transformations, transform makes sure that no ambiguities arise. To begin with, old file names passed to transform must be unique [2]:

>>> transform(['foo', 'foo'])
Traceback (most recent call last):
AssertionError: Original names are not unique.

An exception is also raised if any of the transformations introduces an ambiguity:

>>> new_names = """\
... as/df
... as/df
... """
>>> transform(['foo', 'bar/baz'], names_file=StringIO(new_names))
Traceback (most recent call last):
AssertionError:
Result of transformation <tl.rename.core.read_names_from_file> is not unique.

If using slices, ambiguities are considered with respect to whole file names, so the slices themselves may be ambiguous:

>>> transform(['foo', 'bar/baz'],
...           applied_slice=(1, None), names_file=StringIO(new_names))
['fas/df', 'bas/df']

Another mistake transform guards against is for a transformation to return a different number of file names than it was passed. This works both with and without using slices:

>>> new_names = """\
... as/df
... """
>>> transform(['foo', 'bar/baz'], names_file=StringIO(new_names))
Traceback (most recent call last):
AssertionError:
Transformation <tl.rename.core.read_names_from_file> changed number of names.
>>> transform(['foo', 'bar/baz'],
...           applied_slice=(1, 3), names_file=StringIO(new_names))
Traceback (most recent call last):
AssertionError:
Transformation <tl.rename.core.read_names_from_file> changed number of names.
>>> new_names = """\
... as/df
... fdsa
... asdf
... """
>>> transform(['foo', 'bar/baz'], names_file=StringIO(new_names))
Traceback (most recent call last):
AssertionError:
Transformation <tl.rename.core.read_names_from_file> changed number of names.
>>> transform(['foo', 'bar/baz'],
...           applied_slice=(1, 3), names_file=StringIO(new_names))
Traceback (most recent call last):
AssertionError:
Transformation <tl.rename.core.read_names_from_file> changed number of names.

Renaming files

Dry-run mode

The rename function finally applies the changes made by the transform run. It accepts as arguments the lists of old and new file paths, as well as an option to turn on dry-run mode. In dry-run mode, it just prints the changes that would be applied:

>>> from tl.rename.core import rename
>>> rename(['foo', 'bar/baz'], ['as/df', 'fdsa'], dry_run=True)
foo -> as/df
bar/baz -> fdsa

Notice how unchanged paths are discarded:

>>> rename(['foo', 'bar/baz'], ['foo', 'fdsa'], dry_run=True)
bar/baz -> fdsa

Basic usage

In order to demonstrate the rename function’s actions on the file system, we create and list sandboxes containing sample directories and files:

>>> from tl.testing.fs import new_sandbox, ls
>>> new_sandbox("""\
... d bar
... f bar/baz some content
... f foo other content
... """)
>>> ls()
d bar
f bar/baz some content
f foo other content
>>> rename(['foo', 'bar'], ['asdf', 'fdsa'])
>>> ls()
f asdf other content
d fdsa
f fdsa/baz some content

A file path may be almost any string including whitespace and non-printable characters [3]:

>>> rename(['asdf'], [' bar\tbaz\n\xff '])
>>> sorted(os.listdir('.'))
[' bar\tbaz\n\xff ', 'fdsa']
>>> rename([' bar\tbaz\n\xff '], ['asdf'])
>>> ls()
f asdf other content
d fdsa
f fdsa/baz some content

Trying to rename an item that does not exist by the time the transformations are finished and renaming is undertaken will result in an error:

>>> rename(['not-here'], ['whatever'])
Traceback (most recent call last):
OSError: [Errno 2] No such file or directory

The renaming of directories doesn’t care about trailing path separators:

>>> rename(['fdsa'], ['foobar/'])
>>> ls()
f asdf other content
d foobar
f foobar/baz some content
>>> rename(['foobar/'], ['fdsa'])
>>> ls()
f asdf other content
d fdsa
f fdsa/baz some content

Moving between directories

Files may be moved between directories by renaming:

>>> rename(['asdf'], ['fdsa/bar'])
>>> ls()
d fdsa
f fdsa/bar other content
f fdsa/baz some content

Renaming a directory with some content works as expected:

>>> rename(['fdsa'], ['foo'])
>>> ls()
d foo
f foo/bar other content
f foo/baz some content

Moving a file to a directory that does not yet exist will create directories as needed along the new path:

>>> rename(['foo/bar'], ['as/df/bar'])
>>> ls()
d as
d as/df
f as/df/bar other content
d foo
f foo/baz some content

On the other hand, moving the last file out of a directory results in that directory and any empty parents of it to be removed:

>>> rename(['as/df/bar'], ['bar'])
>>> ls()
f bar other content
d foo
f foo/baz some content

An existing empty directory can be moved and renamed without being deleted:

>>> new_sandbox("""\
... d foo
... d foo/bar
... """)
>>> rename(['foo/bar'], ['baz'])
>>> ls()
d baz

Renaming to existing paths

If a file is renamed to a path that is already used by a file, that other file is replaced. The same goes for two directories if the target is empty:

>>> new_sandbox("""\
... f foo first file
... f bar second file
... """)
>>> rename(['foo'], ['bar'])
>>> ls()
f bar first file
>>> new_sandbox("""\
... d foo
... f foo/baz
... d bar
... """)
>>> rename(['foo'], ['bar'])
>>> ls()
d bar
f bar/baz

If the target directory is not empty, renaming is not possible lest the directory’s content be lost:

>>> new_sandbox("""\
... d foo
... d bar
... f bar/baz
... """)
>>> rename(['foo'], ['bar'])
Traceback (most recent call last):
OSError: [Errno 39] Directory not empty

Renaming a file to an existing directory or a directory to an existing file does not work either:

>>> new_sandbox("""\
... f foo
... d bar
... """)
>>> rename(['foo'], ['bar'])
Traceback (most recent call last):
OSError: [Errno 21] Is a directory
>>> rename(['bar'], ['foo'])
Traceback (most recent call last):
OSError: [Errno 20] Not a directory

Renaming to paths renamed in turn

In contrast to the above, it is possible to rename an item to an existing one without the latter being removed if it is renamed by the same rename call:

>>> new_sandbox("""\
... f asdf first
... f bar second
... f baz third
... f foo fourth
... """)
>>> rename(['asdf', 'bar'], ['bar', 'baz'])
>>> ls()
f bar first
f baz second
f foo fourth

This also works in circles and between two items:

>>> rename(['bar', 'baz', 'foo'], ['foo', 'bar', 'baz'])
>>> ls()
f bar second
f baz fourth
f foo first
>>> rename(['bar', 'foo'], ['foo', 'bar'])
>>> ls()
f bar first
f baz fourth
f foo second

The combined runner

A simple run function ties all the things demonstrated above together. Its signature is basically that of transform, with the dry_run option passed to rename:

>>> from tl.rename.core import run
>>> new_sandbox("""\
... f bar BAR
... f baz BAZ
... f foo FOO
... """)
>>> new_names = """\
... foo
... asdf/bsdf
... """
>>> run(['bar', 'baz'], names_file=StringIO(new_names), dry_run=True)
bar -> foo
baz -> asdf/bsdf
>>> ls()
f bar BAR
f baz BAZ
f foo FOO
>>> run(['bar', 'baz'], names_file=StringIO(new_names))
>>> ls()
d asdf
f asdf/bsdf BAZ
f foo BAR

All old and new file names must be valid [3]. In particular, they cannot be empty strings so as to avoid ambiguities:

>>> run(['\x00'], names_file=StringIO('bar'))
Traceback (most recent call last):
AssertionError: Invalid old file name: '\x00'
>>> run(['foo'], names_file=StringIO('\x00'))
Traceback (most recent call last):
AssertionError: Invalid new file name: '\x00'

Footnotes

[1]

Slicing file names

The apply_slice function takes a sequence of file names and any keyword parameters and returns a triple of sequences which contain the left, middle and right portions of the names as determined by the applied_slice option.

If the option is not given, left and right portions are empty, and the middle portions are the whole names:

>>> from tl.rename.core import apply_slice
>>> apply_slice(['foo', 'bar-baz'])
(['', ''], ['foo', 'bar-baz'], ['', ''])

The same result is obtained if the start and stop index of the slice are both omitted. Slices are given as tuples of start and stop index:

>>> apply_slice(['foo', 'bar-baz'], applied_slice=(None, None))
(['', ''], ['foo', 'bar-baz'], ['', ''])

Other values produce the results expected from Python’s simple slices:

>>> apply_slice(['foo', 'bar-baz'], applied_slice=(None, 2))
(['', ''], ['fo', 'ba'], ['o', 'r-baz'])
>>> apply_slice(['foo', 'bar-baz'], applied_slice=(4, None))
(['foo', 'bar-'], ['', 'baz'], ['', ''])
>>> apply_slice(['foo', 'bar-baz'], applied_slice=(1, 5))
(['f', 'b'], ['oo', 'ar-b'], ['', 'az'])
>>> apply_slice(['foo', 'bar-baz'], applied_slice=(1, -1))
(['f', 'b'], ['o', 'ar-ba'], ['o', 'z'])
>>> apply_slice(['foo', 'bar-baz'], applied_slice=(-2, 100))
(['f', 'bar-b'], ['oo', 'az'], ['', ''])
[2]

Ensuring unique file names

The assert_unique function takes a sequence of file names and an error message and raises an AttributeError with that error message if and only if the file names are not unique:

>>> from tl.rename.core import assert_unique
>>> assert_unique(['foo', 'bar', 'baz'], 'not unique')
>>> assert_unique(['foo', 'foo', 'baz'], 'not unique')
Traceback (most recent call last):
AssertionError: not unique

File paths are normalised using os.path.abspath prior to comparison. Ambiguities are thus noticed even when mixing absolute and relative paths:

>>> import os
>>> assert_unique(['as/df', '%s/as/df' % os.getcwd()], 'not unique')
Traceback (most recent call last):
AssertionError: not unique

As another consequence, paths that differ only by a trailing path separator are considered equivalent:

>>> assert_unique(['as/df', 'as/df/'], 'not unique')
Traceback (most recent call last):
AssertionError: not unique
[3](1, 2)

Ensuring valid file names

A file path is considered invalid if it contains null bytes (in order to avoid a TypeError while renaming) or is an empty string (in order to avoid ambiguities).

The assert_valid function takes an iterable of file names and an error message and raises an AttributeError with that error message if it first encounters an invalid name. The error message is supposed to contain exactly one %r formatting operator to include the representation of the invalid name:

>>> from tl.rename.core import assert_valid
>>> assert_valid(['fdsa', 'foo\x00bar', ''], 'invalid: %r')
Traceback (most recent call last):
AssertionError: invalid: 'foo\x00bar'
>>> assert_valid(['fdsa/', '', 'foo\x00bar'], 'invalid: %r')
Traceback (most recent call last):
AssertionError: invalid: ''