Python API

This section includes information for using the pure Python API of bob.io.base.

Classes

bob.io.base.File Use this object to read and write data into files
bob.io.base.HDF5File Reads and writes data to HDF5 files.

Functions

bob.io.base.load((inputs) -> data) Loads the contents of a file, an iterable of files, or an iterable of bob.io.base.File‘s into a numpy.ndarray.
bob.io.base.merge((filenames) -> files) Converts an iterable of filenames into an iterable over read-only bob.io.base.File‘s.
bob.io.base.save(array, filename[, ...]) Saves the contents of an array-like object to file.
bob.io.base.append((array, filename) -> position) Appends the contents of an array-like object to file.
bob.io.base.peek((filename) -> dtype, shape, ...) Returns the type of array (frame or sample) saved in the given file.
bob.io.base.peek_all((filename) -> dtype, ...) Returns the type of array (for full readouts) saved in the given file.
bob.io.base.create_directories_safe(directory) Creates a directory if it does not exists, with concurrent access support.
bob.io.base.extensions(() -> extensions) Returns a dictionary containing all extensions and descriptions
bob.io.base.get_config() Returns a string containing the configuration information.

Test Utilities

These functions might be useful when you are writing your nose tests. Please note that this is not part of the default bob.io.base API, so in order to use it, you have to import bob.io.base.test_utils separately.

bob.io.base.test_utils.datafile((f, ...) Returns the test file on the “data” subdirectory of the current module.
bob.io.base.test_utils.temporary_filename(...) Generates a temporary filename to be used in tests, using the default temp directory (on Unix-like systems, usually /tmp).
bob.io.base.test_utils.extension_available(...) Decorator to check if a extension is available before enabling a test

Details

class bob.io.base.File[source]

Bases: File

Use this object to read and write data into files

Constructor Documentation:

File (filename, [mode], [pretend_extension])

Opens a file for reading or writing

Normally, we read the file matching the extension to one of the available codecs installed with the present release of Bob. If you set the pretend_extension parameter though, we will read the file as it had a given extension. The value should start with a '.'. For example '.hdf5', to make the file be treated like an HDF5 file.

Parameters:

filename : str

The file path to the file you want to open

mode : one of (‘r’, ‘w’, ‘a’)

[Default: 'r'] A single character indicating if you’d like to 'r'ead, 'w'rite or 'a'ppend into the file; if you choose 'w' and the file already exists, it will be truncated

pretend_extension : str

[optional] An extension to use; see bob.io.base.extensions() for a list of (currently) supported extensions

Class Members:

append(data) → position

Adds the contents of an object to the file

This method appends data to the file. If the file does not exist, creates a new file, else, makes sure that the inserted array respects the previously set file structure.

Parameters:

data : array_like

The array to be written into the file; it can be a numpy.ndarray, a bob.blitz.array or any other object which can be converted to either of them

Returns:

position : int

The current position of the newly written data
codec_name

str <– Name of the File class implementation

This variable is available for compatibility reasons with the previous versions of this library.

describe([all]) → dtype, shape, stride

Returns a description (dtype, shape, stride) of data at the file

Parameters:

all : bool

[Default: False] If set to True, returns the shape and strides for reading the whole file contents in one shot.

Returns:

dtype : numpy.dtype

The data type of the object

shape : tuple

The shape of the object
filename

str <– The path to the file being read/written

read([index]) → data

Reads a specific object in the file, or the whole file

This method reads data from the file. If you specified an index, it reads just the object indicated by the index, as you would do using the [] operator. If the index is not specified, reads the whole contents of the file into a numpy.ndarray.

Parameters:

index : int

[optional] The index to the object one wishes to retrieve from the file; negative indexing is supported; if not given, implies retrieval of the whole file contents.

Returns:

data : numpy.ndarray

The contents of the file, as array
write(data) → None

Writes the contents of an object to the file

This method writes data to the file. It acts like the given array is the only piece of data that will ever be written to such a file. No more data appending may happen after a call to this method.

Parameters:

data : array_like

The array to be written into the file; it can be a numpy.ndarray, a bob.blitz.array or any other object which can be converted to either of them
class bob.io.base.HDF5File[source]

Bases: HDF5File

Reads and writes data to HDF5 files.

HDF5 stands for Hierarchical Data Format version 5. It is a flexible, binary file format that allows one to store and read data efficiently into or from files. It is a cross-platform, cross-architecture format.

Objects of this class allows users to read and write data from and to files in HDF5 format. For an introduction to HDF5, visit the HDF5 Website.

Constructor Documentation:

  • HDF5File (filename, [mode])
  • HDF5File (hdf5)

Opens an HFF5 file for reading, writing or appending.

For the open mode, use 'r' for read-only 'a' for read/write/append, 'w' for read/write/truncate or 'x' for (read/write/exclusive). When another HDF5File object is given, a shallow copy is created, pointing to the same file.

Parameters:

filename : str

The file path to the file you want to open for reading or writing

mode : one of (‘r’, ‘w’, ‘a’, ‘x’)

[Default: 'r'] The opening mode

hdf5 : HDF5File

An HDF5 file to copy-construct

Class Members:

append(path, data[, compression]) → None

Appends a scalar or an array to a dataset

The object must be compatible with the typing information on the dataset, or an exception will be raised. You can also, optionally, set this to an iterable of scalars or arrays. This will cause this method to iterate over the elements and add each individually.

The compression parameter is effective when appending arrays. Set this to a number betwen 0 (default) and 9 (maximum) to compress the contents of this dataset. This setting is only effective if the dataset does not yet exist, otherwise, the previous setting is respected.

Parameters:

path : str

The path to the dataset to append data at; can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

data : numpy.ndarray or scalar

Object to append to the dataset

compression : int

A compression value between 0 and 9
cd(path) → None

Changes the current prefix path

When this object is created the prefix path is empty, which means all following paths to data objects should be given using the full path. If you set the path to a different value, it will be used as a prefix to any subsequent operation until you reset it. If path starts with '/', it is treated as an absolute path. If the value is relative, it is added to the current path; '..' and '.' are supported. If it is absolute, it causes the prefix to be reset.

..note:: All operations taking a relative path, following a cd(), will be considered relative to the value defined by the cwd property of this object.

Parameters:

path : str

The path to change directories to
close() → None

Closes this file

This function closes the HDF5File after flushing all its contents to disk. After the HDF5File is closed, any operation on it will result in an exception.

copy(hdf5) → None

Copies all accessible content to another HDF5 file

Unlinked contents of this file will not be copied. This can be used as a method to trim unwanted content in a file.

Parameters:

hdf5 : HDF5File

The HDF5 file (already opened for writing), to copy the contents to
create_group(path) → None

Creates a new path (group) inside the file

A relative path is taken w.r.t. to the current directory. If the directory already exists (check it with has_group()), an exception will be raised.

Parameters:

path : str

The path to create.
cwd

str <– The current working directory set on the file

del_attribute(name[, path]) → None

Removes a given attribute at the named resource

Parameters:

name : str

The name of the attribute to delete; if the attribute is not available, a RuntimeError is raised

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to delete an attribute from; if the path does not exist, a RuntimeError is raised
del_attributes([attributes][, path]) → None

Removes attributes in a given (existing) path

If the attributes are not given or set to None, then remove all attributes at the named resource.

Parameters:

attributes : [str] or None

[Default: None] An iterable containing the names of the attributes to be removed, or None

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to delete attributes from; if the path does not exist, a RuntimeError is raised
describe(key) → shape, size, expandable

Describes a dataset type/shape, if it exists inside a file

If a given key to an HDF5 dataset exists inside the file, returns a type description of objects recorded in such a dataset, otherwise, raises an exception. The returned value type is a tuple of tuples (HDF5Type, number-of-objects, expandable) describing the capabilities if the file is read using these formats.

Parameters:

key : str

The dataset path to describe

Returns:

shape : tuple

The shape of the returned array

expandable : bool

Defines if this object can be resized.
filename

str <– The name (and path) of the underlying file on hard disk

flush() → None

Flushes the content of the HDF5 file to disk

When the HDF5File is open for writing, this function synchronizes the contents on the disk with the one from the file. When the file is open for reading, nothing happens.

get(key) → data

Reads whole datasets from the file

This function reads full data sets from this file. The data type is dependent on the stored data, but is generally a numpy.ndarray.

Note

The functions read() and get() are synonyms.

Parameters:

key : str

The path to the dataset to read data from; can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

Returns:

data : numpy.ndarray or other

The data read from this file at the given key
get_attribute(name[, path]) → attribute

Retrieve a given attribute from the named resource

This method returns a single value corresponding to what is stored inside the attribute container for the given resource. If you would like to retrieve all attributes at once, use get_attributes() instead.

Parameters:

name : str

The name of the attribute to retrieve; if the attribute is not available, a RuntimeError is raised

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to get an attribute from; if the path does not exist, a RuntimeError is raised

Returns:

attribute : numpy.ndarray or scalar

The read attribute
get_attributes([path]) → attributes

Reads all attributes of the given path

Attributes are returned in a dictionary in which each key corresponds to the attribute name and each value corresponds to the value stored inside the HDF5 file. To retrieve only a specific attribute, use get_attribute().

Parameters:

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to get all attributes from; if the path does not exist, a RuntimeError is raised.

Returns:

attributes : {str:value}

The attributes organized in dictionary, where value might be a numpy.ndarray or a scalar
has_attribute(name[, path]) → existence

Checks existence of a given attribute at the named resource

Parameters:

name : str

The name of the attribute to check

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to delete attributes from; if the path does not exist, a RuntimeError is raised

Returns:

existence : bool

True, if the attribute name exists, otherwise False
has_dataset(key) → None

Checks if a dataset exists inside a file

Checks if a dataset exists inside a file, on the specified path. If the given path is relative, it is take w.r.t. to the current working directory.

Note

The functions has_dataset() and has_key() are synonyms.

Parameters:

key : str

The dataset path to check
has_group(path) → None

Checks if a path (group) exists inside a file

This method does not work for datasets, only for directories. If the given path is relative, it is take w.r.t. to the current working directory.

Parameters:

path : str

The path to check
has_key(key) → None

Checks if a dataset exists inside a file

Checks if a dataset exists inside a file, on the specified path. If the given path is relative, it is take w.r.t. to the current working directory.

Note

The functions has_dataset() and has_key() are synonyms.

Parameters:

key : str

The dataset path to check
keys([relative]) → paths

Lists datasets available inside this file

Returns all paths to datasets available inside this file, stored under the current working directory. If relative is set to True, the returned paths are relative to the current working directory, otherwise they are absolute.

Note

The functions keys() and paths() are synonyms.

Parameters:

relative : bool

[Default: False] If set to True, the returned paths are relative to the current working directory, otherwise they are absolute

Returns:

paths : [str]

A list of paths inside this file
lread(key[, pos]) → data

Reads some contents of the dataset

This method reads contents from a dataset, treating the N-dimensional dataset like a container for multiple objects with N-1 dimensions. It returns a single numpy.ndarray in case pos is set to a value >= 0, or a list of arrays otherwise.

Parameters:

key : str

The path to the dataset to read data from, can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

pos : int

If given and >= 0 returns the data object with the given index, otherwise returns a list by reading all objects in sequence

Returns:

data : numpy.ndarray or [numpy.ndarray]

The data read from this file
paths([relative]) → paths

Lists datasets available inside this file

Returns all paths to datasets available inside this file, stored under the current working directory. If relative is set to True, the returned paths are relative to the current working directory, otherwise they are absolute.

Note

The functions keys() and paths() are synonyms.

Parameters:

relative : bool

[Default: False] If set to True, the returned paths are relative to the current working directory, otherwise they are absolute

Returns:

paths : [str]

A list of paths inside this file
read(key) → data

Reads whole datasets from the file

This function reads full data sets from this file. The data type is dependent on the stored data, but is generally a numpy.ndarray.

Note

The functions read() and get() are synonyms.

Parameters:

key : str

The path to the dataset to read data from; can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

Returns:

data : numpy.ndarray or other

The data read from this file at the given key
rename(from, to) → None

Renames datasets in a file

Parameters:

from : str

The path to the data to be renamed

to : str

The new name of the dataset
replace(path, pos, data) → None

Modifies the value of a scalar/array in a dataset.

Parameters:

path : str

The path to the dataset to read data from; can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

pos : int

Position, within the dataset, of the object to be replaced; the object position on the dataset must exist, or an exception is raised

data : numpy.ndarray or scalar

Object to replace the value with; this value must be compatible with the typing information on the dataset, or an exception will be raised
set(path, data[, compression]) → None

Sets the scalar or array at position 0 to the given value

This method is equivalent to checking if the scalar or array at position 0 exists and then replacing it. If the path does not exist, we append the new scalar or array.

The data must be compatible with the typing information on the dataset, or an exception will be raised. You can also, optionally, set this to an iterable of scalars or arrays. This will cause this method to iterate over the elements and add each individually.

The compression parameter is effective when writing arrays. Set this to a number betwen 0 (default) and 9 (maximum) to compress the contents of this dataset. This setting is only effective if the dataset does not yet exist, otherwise, the previous setting is respected.

Note

The functions set() and write() are synonyms.

Parameters:

path : str

The path to the dataset to write data to; can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

data : numpy.ndarray or scalar

Object to write to the dataset

compression : int

A compression value between 0 and 9
set_attribute(name, value[, path]) → None

Sets a given attribute at the named resource

Only simple scalars (booleans, integers, floats and complex numbers) and arrays of those are supported at the time being. You can use numpy scalars to set values with arbitrary precision (e.g. numpy.uint8).

Warning

Attributes in HDF5 files are supposed to be small containers or simple scalars that provide extra information about the data stored on the main resource (dataset or group|directory). Attributes cannot be retrieved in chunks, contrary to data in datasets. Currently, no limitations for the size of values stored on attributes is imposed.

Parameters:

name : str

The name of the attribute to set

value : numpy.ndarray or scalar

A simple scalar to set for the given attribute on the named resources path

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to set an attribute at
set_attributes(attributes[, path]) → None

Sets several attribute at the named resource using a dictionary

Each value in the dictionary should be simple scalars (booleans, integers, floats and complex numbers) or arrays of those are supported at the time being. You can use numpy scalars to set values with arbitrary precision (e.g. numpy.uint8).

Warning

Attributes in HDF5 files are supposed to be small containers or simple scalars that provide extra information about the data stored on the main resource (dataset or group|directory). Attributes cannot be retrieved in chunks, contrary to data in datasets. Currently, no limitations for the size of values stored on attributes is imposed.

Parameters:

attributes : {str: value}

A python dictionary containing pairs of strings and values, which can be a py:class:numpy.ndarray or a scalar

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to set attributes at
sub_groups([relative][, recursive]) → groups

Lists groups (directories) in the current file

Parameters:

relative : bool

[Default: False] If set to True, the returned sub-groups are relative to the current working directory, otherwise they are absolute

recursive : bool

[Default: True] If set to False, the returned sub-groups are only the ones in the current directory, otherwise recurses down the directory structure

Returns:

groups : [str]

The list of directories (groups) inside this file

Unlinks datasets inside the file making them invisible

If a given path to an HDF5 dataset exists inside the file, unlinks it.Please note this will note remove the data from the file, just make it inaccessible. If you wish to cleanup, save the reacheable objects from this file to another HDF5File object using copy(), for example.

Parameters:

key : str

The dataset path to unlink
writable

bool <– Has this file been opened in writable mode?

write(path, data[, compression]) → None

Sets the scalar or array at position 0 to the given value

This method is equivalent to checking if the scalar or array at position 0 exists and then replacing it. If the path does not exist, we append the new scalar or array.

The data must be compatible with the typing information on the dataset, or an exception will be raised. You can also, optionally, set this to an iterable of scalars or arrays. This will cause this method to iterate over the elements and add each individually.

The compression parameter is effective when writing arrays. Set this to a number betwen 0 (default) and 9 (maximum) to compress the contents of this dataset. This setting is only effective if the dataset does not yet exist, otherwise, the previous setting is respected.

Note

The functions set() and write() are synonyms.

Parameters:

path : str

The path to the dataset to write data to; can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

data : numpy.ndarray or scalar

Object to write to the dataset

compression : int

A compression value between 0 and 9
bob.io.base.create_directories_safe(directory, dryrun=False)[source]

Creates a directory if it does not exists, with concurrent access support. This function will also create any parent directories that might be required. If the dryrun option is selected, it does not actually create the directory, but just writes the (Linux) command that would have been executed.

Parameters:

directory
: str
The directory that you want to create.
dryrun
: bool
Only print the command to console, but do not execute it.
bob.io.base.load(inputs) → data[source]

Loads the contents of a file, an iterable of files, or an iterable of bob.io.base.File‘s into a numpy.ndarray.

Parameters:

inputs : various types

This might represent several different entities:

  1. The name of a file (full path) from where to load the data. In this case, this assumes that the file contains an array and returns a loaded numpy ndarray.
  2. An iterable of filenames to be loaded in memory. In this case, this would assume that each file contains a single 1D sample or a set of 1D samples, load them in memory and concatenate them into a single and returned 2D numpy.ndarray.
  3. An iterable of File. In this case, this would assume that each File contains a single 1D sample or a set of 1D samples, load them in memory if required and concatenate them into a single and returned 2D numpy.ndarray.
  4. An iterable with mixed filenames and File. In this case, this would returned a 2D numpy.ndarray, as described by points 2 and 3 above.

Returns:

data
: numpy.ndarray
The data loaded from the given inputs.
bob.io.base.merge(filenames) → files[source]

Converts an iterable of filenames into an iterable over read-only bob.io.base.File‘s.

Parameters:

filenames : str or [str]

A list of file names. This might represent:

  1. A single filename. In this case, an iterable with a single File is returned.
  2. An iterable of filenames to be converted into an iterable of File‘s.

Returns:

files
: [File]
The list of files.
bob.io.base.save(array, filename, create_directories=False)[source]

Saves the contents of an array-like object to file.

Effectively, this is the same as creating a File object with the mode flag set to 'w' (write with truncation) and calling File.write() passing array as parameter.

Parameters:

array
: array_like
The array-like object to be saved on the file
filename
: str
The name of the file where you need the contents saved to
create_directories
: bool
Automatically generate the directories if required (defaults to False because of compatibility reasons; might change in future to default to True)
bob.io.base.write(array, filename, create_directories=False)

Saves the contents of an array-like object to file.

Effectively, this is the same as creating a File object with the mode flag set to 'w' (write with truncation) and calling File.write() passing array as parameter.

Parameters:

array
: array_like
The array-like object to be saved on the file
filename
: str
The name of the file where you need the contents saved to
create_directories
: bool
Automatically generate the directories if required (defaults to False because of compatibility reasons; might change in future to default to True)
bob.io.base.read(inputs)

load(inputs) -> data

Loads the contents of a file, an iterable of files, or an iterable of bob.io.base.File‘s into a numpy.ndarray.

Parameters:

inputs : various types

This might represent several different entities:

  1. The name of a file (full path) from where to load the data. In this case, this assumes that the file contains an array and returns a loaded numpy ndarray.
  2. An iterable of filenames to be loaded in memory. In this case, this would assume that each file contains a single 1D sample or a set of 1D samples, load them in memory and concatenate them into a single and returned 2D numpy.ndarray.
  3. An iterable of File. In this case, this would assume that each File contains a single 1D sample or a set of 1D samples, load them in memory if required and concatenate them into a single and returned 2D numpy.ndarray.
  4. An iterable with mixed filenames and File. In this case, this would returned a 2D numpy.ndarray, as described by points 2 and 3 above.

Returns:

data
: numpy.ndarray
The data loaded from the given inputs.
bob.io.base.append(array, filename) → position[source]

Appends the contents of an array-like object to file.

Effectively, this is the same as creating a File object with the mode flag set to 'a' (append) and calling File.append() passing array as parameter.

Parameters:

array
: array_like
The array-like object to be saved on the file
filename
: str
The name of the file where you need the contents saved to

Returns:

position
: int
See File.append()
bob.io.base.peek(filename) → dtype, shape, stride[source]

Returns the type of array (frame or sample) saved in the given file.

Effectively, this is the same as creating a File object with the mode flag set to r (read-only) and calling File.describe().

Parameters:

filename
: str
The name of the file to peek information from

Returns:

dtype, shape, stride : see File.describe()

bob.io.base.peek_all(filename) → dtype, shape, stride[source]

Returns the type of array (for full readouts) saved in the given file.

Effectively, this is the same as creating a File object with the mode flag set to 'r' (read-only) and returning File.describe with its parameter all set to True.

Parameters:

filename
: str
The name of the file to peek information from

Returns:

dtype, shape, stride : see File.describe()

bob.io.base.open

alias of File

bob.io.base.get_config()[source]

Returns a string containing the configuration information.

bob.io.base.get_include_directories() → includes[source]

Returns a list of include directories for dependent libraries, such as HDF5. This function is automatically used by bob.extension.get_bob_libraries() to retrieve the non-standard include directories that are required to use the C bindings of this library in dependent classes. You shouldn’t normally need to call this function by hand.

Returns:

includes
: [str]
The list of non-standard include directories required to use the C bindings of this class. For now, only the directory for the HDF5 headers are returned.
bob.io.base.get_macros() → macros[source]

Returns a list of preprocessor macros, such as (HAVE_HDF5, 1). This function is automatically used by bob.extension.get_bob_libraries() to retrieve the prerpocessor definitions that are required to use the C bindings of this library in dependent classes. You shouldn’t normally need to call this function by hand.

Returns:

macros
: [(str,str)]
The list of preprocessor macros required to use the C bindings of this class. For now, only ('HAVE_HDF5', '1') is returned, when applicable.
bob.io.base.File_C

alias of File

bob.io.base.HDF5File_C

alias of HDF5File

bob.io.base.extensions() → extensions

Returns a dictionary containing all extensions and descriptions currently stored on the global codec registry

The extensions are returned as a dictionary from the filename extension to a description of the data format.

Returns:

extensions : {str : str}

A dictionary of supported extensions

Re-usable decorators and utilities for bob test code

bob.io.base.test_utils.datafile(f[, module][, data]) → filename[source]

Returns the test file on the “data” subdirectory of the current module.

Parameters:

f
: str
This is the filename of the file you want to retrieve. Something like 'movie.avi'.
module: str
[optional] This is the python-style package name of the module you want to retrieve the data from. This should be something like bob.io.base, but you normally refer it using the __name__ property of the module you want to find the path relative to.
path: str
[Default: 'data'] The subdirectory where the datafile will be taken from inside the module. It can be set to None if it should be taken from the module path root (where the __init__.py file sits).

Returns:

filename
: str
The full path of the file
bob.io.base.test_utils.temporary_filename([prefix][, suffix]) → filename[source]

Generates a temporary filename to be used in tests, using the default temp directory (on Unix-like systems, usually /tmp). Please note that you are responsible for deleting the file after your test finished. A common way to assure the file to be deleted is:

import bob.io.base.test_utils
temp = bob.io.base.test_utils.temporary_filename()
try:
  # use the temp file
  ...
finally:
  if os.path.exist(temp): os.remove(temp)

Parameters:

prefix
: str
[Default: 'bobtest_'] The file name prefix to be added in front of the random file name
suffix
: str
[Default: '.hdf5'] The file name extension of the temporary file name

Returns:

filename
: str
The name of a temporary file that you can use in your test. Don’t forget to delete!
bob.io.base.test_utils.extension_available(extension)[source]

Decorator to check if a extension is available before enabling a test

This decorator is mainly used to decorate a test function, in order to skip tests when the extension is not available. The syntax is:

import bob.io.base.test_utils

@bob.io.base.test_utils.extension_available('.ext')
def my_test():
  ...