Python API¶

This section includes information for using the pure Python API of bob.learn.linear.

Summary¶

Classes¶

`bob.learn.linear.Machine`	A linear classifier, see C.
`bob.learn.linear.PCATrainer`	Sets a linear machine to perform the Principal Component Analysis (PCA; a.k.a.
`bob.learn.linear.FisherLDATrainer`	Trains a `bob.learn.linear.Machine` to perform Fisher’s Linear Discriminant Analysis (LDA).
`bob.learn.linear.WCCNTrainer`	Trains a linear machine to perform Within-Class Covariance
`bob.learn.linear.WhiteningTrainer`	Trains a linear `bob.learn.linear.Machine` to perform Cholesky whitening.
`bob.learn.linear.CGLogRegTrainer`	Trains a linear machine to perform Linear Logistic Regression
`bob.learn.linear.BICMachine`	This machine is designed to classify image difference vectors to be
`bob.learn.linear.BICTrainer`	A trainer for a `bob.learn.linear.BICMachine`
`bob.learn.linear.GFKMachine`([hdf5])	Geodesic flow Kernel (GFK) Machine.
`bob.learn.linear.GFKTrainer`([...])	Trains the Geodesic Flow Kernel (GFK) that models the domain shift from a certain source linear subspace $P_S$ to a certain target linear subspaces $P_T$ .

Functions¶

`bob.learn.linear.get_config`()	Returns a string containing the configuration information.
`bob.learn.linear.bic_intra_extra_pairs`(...)	Computes intra-class and extra-class pairs from given training data.
`bob.learn.linear.bic_intra_extra_pairs_between_factors`(...)	Computes intra-class and extra-class pairs from given training data, where only pairs between the first and second factors are considered.

Reference¶

bob.learn.linear.get_config()[source]¶: Returns a string containing the configuration information.

class bob.learn.linear.BICMachine¶

Bases: object

This machine is designed to classify image difference vectors to be either intrapersonal or extrapersonal

There are two possible implementations of the BIC:

‘The Bayesian Intrapersonal/Extrapersonal Classifier’ from Teixeira [Teixeira2003]. A full projection of the data is performed. No prior for the classes has to be selected.
‘Face Detection and Recognition using Maximum Likelihood Classifiers on Gabor Graphs’ from Guenther and Wuertz [Guenther2009].Only mean and variance of the difference vectors are calculated. There is no subspace truncation and no priors.

What kind of machine is used is dependent on the way, this class is trained via the bob.learn.linear.BICTrainer.

[Teixeira2003]

(1, 2, 3) Marcio Luis Teixeira. The Bayesian intrapersonal/extrapersonal classifier, Colorado State University, 2003.

[Guenther2009]

(1, 2) Manuel Guenther and Rolf P. Wuertz. Face detection and recognition using maximum likelihood classifiers on Gabor graphs, International Journal of Pattern Recognition and Artificial Intelligence, 23(3):433-461, 2009.

Constructor Documentation:

bob.learn.linear.BICMachine ([use_DFFS])

bob.learn.linear.BICMachine (bic)

bob.learn.linear.BICMachine (hdf5)

Creates a BIC Machine

Parameters:

use_DFFS : bool

[default: False] Use the Distance From Feature Space measure as described in [Teixeira2003]

bic : bob.learn.linear.BICMachine

Another machine to copy

hdf5 : bob.io.base.HDF5File

An HDF5 file open for reading

Class Members:

forward(input) → score¶

Computes the BIC or IEC score for the given input vector, which results of a comparison vector of two (facial) images

The resulting value is returned as a single float value. The score itself is the log-likelihood score of the given input vector belonging to the intrapersonal class.

Note

the __call__ method is an alias for this one

Parameters:

input : array_like (float, 1D)

The input vector, which is the result of comparing to (facial) images

Returns:

score : float

The log-likelihood that the given input belongs to the intrapersonal class

input_size¶: int <– The expected input dimensionality, read-only

is_similar_to(other[, r_epsilon][, a_epsilon]) → similar¶

Compares this BICMachine with the other one to be approximately the same

The optional values r_epsilon and a_epsilon refer to the relative and absolute precision, similarly to numpy.allclose().

Parameters:

other : bob.learn.linear.BICMachine

The other BICMachine to compare with

r_epsilon : float

[Default: 1e-5] The relative precision

a_epsilon : float

[Default: 1e-8] The absolute precision

Returns:

similar : bool

True if the other machine is similar to this one, otherwise False

load(hdf5) → None¶

Loads the BIC machine from the given HDF5 file

Parameters:

hdf5 : bob.io.base.HDF5File

An HDF5 file opened for reading

save(hdf5) → None¶

Saves the BIC machine to the given HDF5 file

Parameters:

hdf5 : bob.io.base.HDF5File

An HDF5 file open for writing

use_DFFS¶: bool <– Use the Distance From Feature Space during forwarding?

class bob.learn.linear.BICTrainer¶

Bases: object

A trainer for a bob.learn.linear.BICMachine

It trains either a BIC model (including projection matrix and eigenvalues) [Teixeira2003] or an IEC model (containing mean and variance only) [Guenther2009]. See bob.learn.linear.BICMachine for more details.

Constructor Documentation:

bob.learn.linear.BICTrainer ()

bob.learn.linear.BICTrainer (intra_dim, extra_dim)

Creates a BIC Trainer

There are two ways of creating a BIC trainer. When you specify the intra_dim and extra_dim subspaces, a BIC model will be created, otherwise an IEC model is created.

Parameters:

intra_dim : int

The subspace dimensionality of the intrapersonal class

extra_dim : int

The subspace dimensionality of the extrapersonal class

Class Members:

train(intra_differences, extra_differences[, machine]) → machine¶

Trains the given machine to classify intrapersonal (image) difference vectors vs. extrapersonal ones

The given difference vectors might be the result of any (image) comparison function, e.g., the pixel difference of two images. In any case, all distance vectors must have the same length.

Parameters:

intra_differences : array_like (float, 2D)

The input vectors, which are the result of intrapersonal (facial image) comparisons, in shape (#features, length)

extra_differences : array_like (float, 2D)

The input vectors, which are the result of extrapersonal (facial image) comparisons, in shape (#features, length)

machine : bob.learn.linear.BICMachine

The machine to be trained

Returns:

machine : bob.learn.linear.BICMachine

A newly generated and trained BIC machine, where the bob.lear.linear.BICMachine.use_DFFS flag is set to False

class bob.learn.linear.CGLogRegTrainer¶

Bases: object

Trains a linear machine to perform Linear Logistic Regression

The training stage will place the resulting weights (and bias) in a linear machine with a single output dimension. For details about Linear Logistic Regression, please see:

A comparison of numerical optimizers for logistic regression, T. Minka, (See Microsoft Research paper)
FoCal, https://sites.google.com/site/nikobrummer/focal

Constructor Documentation:

bob.learn.linear.CGLogRegTrainer ([prior], [convergence_threshold], [max_iterations], [reg], [mean_std_norm])

bob.learn.linear.CGLogRegTrainer (other)

Creates a new trainer to perform Linear Logistic Regression

There are two initializers for objects of this class. In the first variant, the user passes the discrete training parameters, including the classes prior, convergence threshold and the maximum number of conjugate gradient (CG) iterations among other parameters. If mean_std_norm is set to True, your input data will be mean/standard-deviation normalized and the according values will be set as normalization factors to the resulting machine. The second initialization form copy constructs a new trainer from an existing one.

Parameters:

prior : float

[Default: 0.5] The synthetic prior (should be in range $]0.,1.[$ )

convergence_threshold : float

[Default: 1e-5] The convergence threshold for the conjugate gradient algorithm

max_iterations : int

[Default: 10000] The maximum number of iterations for the conjugate gradient algorithm

reg : float

[Default: 0.] The regularization factor lambda. If you set this to the value of 0., then the algorithm will apply no regularization whatsoever

mean_std_norm : bool

[Default: False] Performs mean and standard-deviation normalization (whitening) of the input data before training the (resulting) bob.learn.linear.Machine. Setting this to True is recommended for large data sets with significant amplitude variations between dimensions

other : CGLogRegTrainer

If you decide to copy construct from another object of the same type, pass it using this parameter

Class Members:

convergence_threshold¶: float <– The convergence threshold for the conjugate gradient algorithm

max_iterations¶: int <– The maximum number of iterations for the conjugate gradient algorithm

mean_std_norm¶

bool <– Perform whitening on input data?

If set to True, performs mean and standard-deviation normalization (whitening) of the input data before training the (resulting) Machine. Setting this to True is recommended for large data sets with significant amplitude variations between dimensions

prior¶: float <– The synthetic prior (should be in range $]0.,1.[$ )

reg¶

float <– The regularization factor lambda

If you set this to the value of 0., the algorithm will apply no regularization whatsoever.

train(negatives, positives[, machine]) → machine¶

Trains a linear machine to perform linear logistic regression

The resulting machine will have the same number of inputs as columns in negatives and positives and a single output. This method always returns a machine, which will be identical to the one provided (if the user passed one) or a new one allocated internally.

Parameters:

negatives, positives : array_like(2D, float)

negatives and positives should be arrays organized in such a way that every row corresponds to a new observation of the phenomena (i.e., a new sample) and every column corresponds to a different feature

machine : bob.learn.linear.Machine

The user may provide or not a machine that will be set by this method. If provided, the machine should have 1 output and the number of inputs matching the number of columns in the input data arrays

Returns:

machine : bob.learn.linear.Machine

The trained linear machine; identical to the machine parameter, if given

class bob.learn.linear.FisherLDATrainer¶

Bases: object

Trains a bob.learn.linear.Machine to perform Fisher’s Linear Discriminant Analysis (LDA).

LDA finds the projection matrix W that allows us to linearly project the data matrix X to another (sub) space in which the between-class and within-class variances are jointly optimized: the between-class variance is maximized while the with-class is minimized. The (inverse) cost function for this criteria can be posed as the following:

$J(W) = \frac{W^T S_b W}{W^T S_w W}$

where:

$W$

the transformation matrix that converts X into the LD space

$S_b$

the between-class scatter; it has dimensions (X.shape[0], X.shape[0]) and is defined as $S_b = \sum_{k=1}^K N_k (m_k-m)(m_k-m)^T$ , with $K$ equal to the number of classes.

$S_w$

the within-class scatter; it also has dimensions (X.shape[0], X.shape[0]) and is defined as $S_w = \sum_{k=1}^K \sum_{n \in C_k} (x_n-m_k)(x_n-m_k)^T$ , with $K$ equal to the number of classes and $C_k$ a set representing all samples for class $k$ .

$m_k$

the class k empirical mean, defined as $m_k = \frac{1}{N_k}\sum_{n \in C_k} x_n$

$m$

the overall set empirical mean, defined as $m = \frac{1}{N}\sum_{n=1}^N x_n = \frac{1}{N}\sum_{k=1}^K N_k m_k$

Note

A scatter matrix equals the covariance matrix if we remove the division factor.

Because this cost function is convex, you can just find its maximum by solving $dJ/dW = 0$ . This problem can be re-formulated as finding the eigen-values ( $\lambda_i$ ) that solve the following condition:

$S_b &= \lambda_i Sw \text{ or} \\ (Sb - \lambda_i Sw) &= 0$

The respective eigen-vectors that correspond to the eigen-values $\lambda_i$ form W.

Constructor Documentation:

bob.learn.linear.FisherLDATrainer ([use_pinv, strip_to_rank])

bob.learn.linear.FisherLDATrainer (other)

Constructs a new FisherLDATrainer

Objects of this class can be initialized in two ways. In the first variant, the user creates a new trainer from discrete flags indicating a couple of optional parameters. If use_pinv is set to True, use the pseudo-inverse to calculate $S_w^{-1} S_b$ and then perform eigen value decomposition (using LAPACK’s dgeev) instead of using (the more numerically stable) LAPACK’s dsyvgd to solve the generalized symmetric-definite eigen-problem of the form $S_b v=(\lambda) S_w v$ .

Note

Using the pseudo-inverse for LDA is only recommended if you cannot make it work using the default method (via dsyvg). It is slower and requires more machine memory to store partial values of the pseudo-inverse and the dot product $S_w^{-1} S_b$ .

strip_to_rank specifies how to calculate the final size of the to-be-trained bob.learn.linear.Machine. The default setting (True), makes the trainer return only the K-1 eigen-values/vectors limiting the output to the rank of $S_w^{-1} S_b$ . If you set this value to False, the it returns all eigen-values/vectors of $S_w^{-1} Sb$ , including the ones that are supposed to be zero.

The second initialization variant allows the user to deep copy an object of the same type creating a new identical object.

Parameters:

use_pinv : bool

[Default: False] use the pseudo-inverse to calculate $S_w^{-1} S_b$ ?

strip_to_rank : bool

[Default: True] return only the non-zero eigen-values/vectors

other : FisherLDATrainer

The trainer to copy-construct

Class Members:

output_size(X) → size¶

Returns the expected size of the output (or the number of eigen-values returned) given the data

This number could be either $K-1$ (where $K$ is number of classes) or the number of columns (features) in X, depending on the setting of strip_to_rank. This method should be used to setup linear machines and input vectors prior to feeding them into this trainer.

The value of X should be a sequence over as many 2D 64-bit floating point number arrays as classes in the problem. All arrays will be checked for conformance (identical number of columns). To accomplish this, either prepare a list with all your class observations organized in 2D arrays or pass a 3D array in which the first dimension (depth) contains as many elements as classes you want to discriminate.

Parameters:

X : [array_like(2D, floats)] or array_like(3D, floats)

The input data, separated to contain the training data per class in the first dimension

Returns:

size : int

The number of eigen-vectors/values that will be created in a call to train(), given the same input data X

strip_to_rank¶

bool <– Only return the non-zero eigen-values/vectors?

If True, strip the resulting LDA projection matrix to keep only the eigen-vectors with non-zero eigenvalues. Otherwise the full projection matrix is returned.

train(X[, machine]) → machine, eigen_values¶

Trains a given machine to perform Fisher/LDA discrimination

After this method has been called, an input machine (or one allocated internally) will have the eigen-vectors of the $S_w^{-1} S_b$ product, arranged by decreasing energy. Each input data set represents data from a given input class. This method also returns the eigen-values allowing you to implement your own compression scheme.

The user may provide or not an object of type bob.learn.linear.Machine that will be set by this method. If provided, machine should have the correct number of inputs and outputs matching, respectively, the number of columns in the input data arrays X and the output of the method output_size().

The value of X should be a sequence over as many 2D 64-bit floating point number arrays as classes in the problem. All arrays will be checked for conformance (identical number of columns). To accomplish this, either prepare a list with all your class observations organized in 2D arrays or pass a 3D array in which the first dimension (depth) contains as many elements as classes you want to discriminate.

Note

We set at most output_size() eigen-values and vectors on the passed machine. You can compress the machine output further using Machine.resize() if necessary.

Parameters:

X : [array_like(2D, floats)] or array_like(3D, floats)

The input data, separated to contain the training data per class in the first dimension

machine : bob.learn.linear.Machine

The machine to be trained; this machine will be returned by this function

Returns:

machine : bob.learn.linear.Machine

The machine that has been trained; if given, identical to the machine parameter

eigen_values : array_like(1D, floats)

The eigen-values of the LDA projection.

use_pinv¶

bool <– Use the pseudo-inverse?

If True, use the pseudo-inverse to calculate $S_w^{-1} S_b$ and then perform the eigen value decomposition (using LAPACK’s dgeev) instead of using (the more numerically stable) LAPACK’s dsyvgd to solve the generalized symmetric-definite eigen-problem of the form $S_b v=(\lambda) S_w v$ .

class bob.learn.linear.GFKMachine(hdf5=None)[source]¶

Bases: object

Geodesic flow Kernel (GFK) Machine.

This is output of the bob.learn.linear.GFKTrainer

compute_binetcouchy_distance()[source]¶

Compute the Binet-Couchy distance between source ( $P_s$ ) and target ( $P_t$ ) subspaces in a Grassman which is defined as the following:

$d(P_s, P_t) = 1 - (det(P_{s}^{T} * P_{t}^{T}))^{2}$

compute_principal_angles()[source]¶

Compute the principal angles between source ( $P_s$ ) and target ( $P_t$ ) subspaces in a Grassman which is defined as the following:

$d^{2}(P_s, P_t) = \sum_{i}( heta_i^{2} )$ ,

load(hdf5)[source]¶

Loads the machine from the given HDF5 file

Parameters

hdf5: bob.io.base.HDF5File

An HDF5 file opened for reading

save(hdf5)[source]¶

Saves the machine to the given HDF5 file

Parameters

hdf5: bob.io.base.HDF5File

An HDF5 file opened for writing

shape()[source]¶

A tuple that represents the shape of the kernel matrix

Returns: (int, int) <– The size of the weights matrix

class bob.learn.linear.GFKTrainer(number_of_subspaces=-1, subspace_dim_source=0.99, subspace_dim_target=0.99, eps=1e-20)[source]¶

Bases: object

Trains the Geodesic Flow Kernel (GFK) that models the domain shift from a certain source linear subspace $P_S$ to a certain target linear subspaces $P_T$ .

GFK models the source domain and the target domain with d-dimensional linear subspaces and embeds them onto a Grassmann manifold. Specifically, let denote the basis of the PCA subspaces for each of the two domains, respectively. The Grassmann manifold $G(d,D)$ is the collection of all d-dimensional subspaces of the feature vector space $\mathbb{R}^D$ .

The geodesic flow $\phi(t)$ between $P_S, P_T$ on the manifold parameterizes a path connecting the two subspaces. In the beginning of the flow, the subspace is similar to that of the source domain and in the end of the flow, the subspace is similar to that of the target. The original feature $x$ is projected into these subspaces and forms a feature vector of infinite dimensions:

$z^{\infty} = \phi(t)^T x: t \in [0, 1]$ .

Using the new feature representation for learning, will force the classifiers to NOT lean towards either the source domain or the target domain, or in other words, will force the classifier to use domain-invariant features. The infinite-dimensional feature vector is handled conveniently by their inner product that gives rise to a positive semidefinite kernel defined on the original features,

$G(x_i, x_j) = x_{i}^T \int_0^1 \! \phi(t)\phi(t)^T \, \mathrm{d}t x_{j} = x_i^T G x_j$ .

The matrix G can be computed efficiently using singular value decomposition. Moreover, computing the kernel does not require any labeled data.

More details can be found in:

Gong, Boqing, et al. “Geodesic flow kernel for unsupervised domain adaptation.” Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

A very good intuition can be found in: http://www-scf.usc.edu/~boqinggo/domainadaptation.html#gfk_section

Constructor Documentation:

bob.learn.linear.GFKTrainer (number_of_subspaces, subspace_dim_source, subspace_dim_target, eps)

Parameters

number_of_subspaces: int

Number of subspaces for the transfer learning. If set to -1, this value will be estimated automatically. For more information check, Section 3.4.

subspace_dim_source: float

Energy kept in the source linear subspace

subspace_dim_target: float

Energy kept in the target linear subspace

eps: float

Floor value

get_best_d(Ps, Pt, Pst)[source]¶

Get the best value for the number of subspaces

For more details, read section 3.4 of the paper.

Parameters

Ps: Source subspace

Pt: Target subspace

Pst: Source + Target subspace

train(source_data, target_data, norm_inputs=True)[source]¶

Trains the GFK (bob.learn.linear.GFKMachine)

Parameters

source_data: numpy.array(): Data from the source domain
target_data: numpy.array(): Data from the target domain

Returns

machine: bob.learn.linear.GFKMachine

class bob.learn.linear.Machine¶

Bases: object

A linear classifier, see C. M. Bishop, ‘Pattern Recognition and Machine Learning’, chapter 4 for more details.

The basic matrix operation performed for projecting the input to the output is: $o = w \times i$ (with $w$ being the vector of machine weights and $i$ the input data vector). The weights matrix is therefore organized column-wise. In this scheme, each column of the weights matrix can be interpreted as vector to which the input is projected. The number of columns of the weights matrix determines the number of outputs this linear machine will have. The number of rows is the number of allowed inputs it can process.

Input and output is always performed on 1D arrays with 64-bit floating point numbers.

Constructor Documentation:

bob.learn.linear.Machine ([input_size], [output_size]))

bob.learn.linear.Machine (weights)

bob.learn.linear.Machine (config)

bob.learn.linear.Machine (other)

Creates a new linear machine

A linear machine can be constructed in different ways. In the first form, the user specifies optional input and output vector sizes. The machine is remains uninitialized. With the second form, the user passes a 2D array with 64-bit floats containing weight matrix to be used as the weights matrix by the new machine. In the third form the user passes a bob.io.base.HDF5File opened for reading, which points to the machine information to be loaded in memory. Finally, in the last form (copy constructor), the user passes another bob.learn.linear.Machine that will be deep copied.

Parameters:

input_size : int

[Default: 0] The dimensionality of the input data that should be projected

output_size : int

[Default: 0] The dimensionality of the output data

weights : array_like(2D, float)

A weight matrix to initialize the weights

config : bob.io.base.HDF5File

The HDF5 file open for reading

other : bob.learn.linear.Machine

The machine to copy construct

Class Members:

activation¶

bob.learn.activation.Activation or one of its derivatives <– The activation function

By default, the activation function is the bob.learn.activation.Identity function.

biases¶

array_like(1D, float) <– Bias to the output units of this linear machine

These values will be added to the output before the activation is applied. Must have the same size as shape [1]

forward(input[, output]) → output¶

Projects input through its internal weights and biases

The input (and output) arrays can be either 1D or 2D 64-bit float arrays. If one provides a 1D array, the output array, if provided, should also be 1D, matching the output size of this machine. If one provides a 2D array, it is considered a set of vertically stacked 1D arrays (one input per row) and a 2D array is produced or expected in output. The output array in this case shall have the same number of rows as the input array and as many columns as the output size for this machine.

Note

The __call__ method is an alias for this method.

Parameters:

input : array_like(1D or 2D, float)

The array that should be projected; must be compatible with shape [0]

output : array_like(1D or 2D, float)

The output array that will be filled. If given, must be compatible with input and shape [1]

Returns:

output : array_like(1D or 2D, float)

The projected data; identical to the output parameter, if given

input_divide¶

array_like(1D, float) <– Input division factor

These data will be divided by input_divide before feeding it through the weights matrix. The division is applied just after subtraction. Must have the same size as shape [0]. By default, it is set to 1.

input_subtract¶

array_like(1D, float) <– Input subtraction factor

These values will be subtracted before feeding data through the weights matrix. Must have the same size as shape [0]. By default, it is set to 0.

is_similar_to(other[, r_epsilon][, a_epsilon]) → similar¶

Compares this LinearMachine with the other one to be approximately the same

The optional values r_epsilon and a_epsilon refer to the relative and absolute precision for the weights, biases and any other values internal to this machine.

Parameters:

other : bob.learn.linear.Machine

The other machine to compare with

r_epsilon : float

[Default: 1e-5] The relative precision

a_epsilon : float

[Default: 1e-8] The absolute precision

Returns:

similar : bool

True if the other machine is similar to this one, otherwise False

load(hdf5) → None¶

Loads the machine from the given HDF5 file

Parameters:

hdf5 : bob.io.base.HDF5File

An HDF5 file opened for reading

resize(input, output) → None¶

Resizes the machine

If either the input or output increases in size, the weights and other factors should be considered uninitialized. If the size is preserved or reduced, already initialized values will not be changed.

Note

Use this method to force data compression. All will work out given most relevant factors to be preserved are organized on the top of the weight matrix. In this way, reducing the system size will suppress less relevant projections.

Parameters:

input : int

The input dimension to be set

output : int

The output dimension to be set

save(hdf5) → None¶

Saves the machine to the given HDF5 file

Parameters:

hdf5 : bob.io.base.HDF5File

An HDF5 file open for writing

shape¶

(int, int) <– The size of the weights matrix

A tuple that represents the size of the input vector followed by the size of the output vector in the format (input, output).

weights¶

array_like(2D, float) <– Weight matrix to which the input is projected to

The output of the projection is fed subject to bias and activation before being output

class bob.learn.linear.PCATrainer¶

Bases: object

Sets a linear machine to perform the Principal Component Analysis (PCA; a.k.a. Karhunen-Loeve Transform – KLT) on a given dataset using either Singular Value Decomposition (SVD, the default) or the Covariance Matrix Method

The training stage will place the resulting principal components in the linear machine and set it up to extract the variable means automatically. As an option, you may preset the trainer so that the normalization performed by the resulting linear machine also divides the variables by the standard deviation of each variable ensemble. The principal components correspond the direction of the data in which its points are maximally spread.

Computing these principal components is equivalent to computing the eigen-vectors $U$ for the covariance matrix $\Sigma$ extracted from the data matrix $X$ . The covariance matrix for the data is computed using the equation below:

$\Sigma &= \frac{((X-\mu_X)^T(X-\mu_X))}{m-1} \text{ with}\\ \mu_X &= \sum_i^N x_i$

where $m$ is the number of rows in $X$ (that is, the number of samples).

Once you are in possession of $\Sigma$ , it suffices to compute the eigen-vectors $U$ , solving the linear equation:

$(\Sigma - e I) U = 0$

In this trainer, we make use of LAPACK’s dsyevd to solve the above equation, if you choose to use the Covariance Method for extracting the principal components of your data matrix $X$ .

By default though, this class will perform PC extraction using Singular Value Decomposition (SVD). SVD is a factorization technique that allows for the decomposition of a matrix $X$ , with size (m,n) into 3 other matrices in this way:

$X = U S V^*$

where:

$U$

unitary matrix of size (m,m) - a.k.a., left singular vectors of $X$

$S$

rectangular diagonal matrix with nonnegative real numbers, size (m,n)

$V^*$

(the conjugate transpose of $V$ ) unitary matrix of size (n,n), a.k.a. right singular vectors of $X$

We can use this property to avoid the computation of the covariance matrix of the data matrix $X$ , if we note the following:

$X &= U S V^* \text{ , so} \\ XX^T &= U S V^* V S U^*\\ XX^T &= U S^2 U^*$

If $X$ has zero mean, we can conclude by inspection that the $U$ matrix obtained by SVD contains the eigen-vectors of the covariance matrix of $X$ ( $XX^T$ ) and $S^2/(m-1)$ corresponds to its eigen values.

Note

Our implementation uses LAPACK’s dgesdd to compute the solution to this linear equation.

The corresponding bob.learn.linear.Machine and returned eigen-values of $\Sigma$ , are pre-sorted in descending order (the first eigen-vector - or column - of the weight matrix in the bob.learn.linear.Machine corresponds to the highest eigen-value obtained).

Note

One question you should pose yourself is which of the methods to choose. Here is some advice: you should prefer the covariance method over SVD when the number of samples (rows of $X$ ) is greater than the number of features (columns of $X$ ). It provides a faster execution path in that case. Otherwise, use the default SVD method.

References:

Eigenfaces for Recognition, Turk & Pentland, Journal of Cognitive Neuroscience (1991) Volume: 3, Issue: 1, Publisher: MIT Press, Pages: 71-86
http://en.wikipedia.org/wiki/Singular_value_decomposition
http://en.wikipedia.org/wiki/Principal_component_analysis
http://www.netlib.org/lapack/double/dsyevd.f
http://www.netlib.org/lapack/double/dgesdd.f

Constructor Documentation:

bob.learn.linear.PCATrainer ([use_svd])

bob.learn.linear.PCATrainer (other)

Constructs a new PCA trainer

There are two initializers for objects of this class. In the first variant, the user can pass a flag indicating if the trainer should use SVD (default) or the covariance method for PCA extraction. The second initialization form copy constructs a new trainer from an existing one.

Parameters:

use_svd : bool

[Default: True] Use SVD for computing the PCA?

other : PCATrainer

The trainer to copy-construct

Class Members:

output_size(X) → size¶

Calculates the maximum possible rank for the covariance matrix of the given X

Returns the maximum number of non-zero eigen values that can be generated by this trainer, given X. This number (K) depends on the size of X and is calculated as follows $K=\min{(S-1,F)}$ , with $S$ being the number of rows in data (samples) and $F$ the number of columns (or features).

This method should be used to setup linear machines and input vectors prior to feeding them into the train() function.

Parameters:

X : array_like(2D, floats)

The input data that should be trained on

Returns:

size : int

The number of eigen-vectors/values that will be created in a call to train(), given the same input data X

safe_svd¶

bool <– Use the safe LAPACK SVD function?

If the use_svd flag is enabled, this flag will indicate which LAPACK SVD function to use (dgesvd if set to True, dgesdd otherwise). By default, this flag is set to False upon construction, which makes this trainer use the fastest possible SVD decomposition.

train(X[, machine]) → machine, eigen_values¶

Trains a linear machine to perform the PCA (aka. KLT)

The resulting machine will have the same number of inputs as columns in X and $K$ eigen-vectors, where $K=\min{(S-1,F)}$ , with $S$ being the number of rows in X (samples) and $F$ the number of columns (or features). The vectors are arranged by decreasing eigen-value automatically – there is no need to sort the results.

The user may provide or not an object of type bob.learn.linear.Machine that will be set by this method. If provided, machine should have the correct number of inputs and outputs matching, respectively, the number of columns in the input data array X and the output of the method output_size().

The input data matrix X should correspond to a 64-bit floating point array organized in such a way that every row corresponds to a new observation of the phenomena (i.e., a new sample) and every column corresponds to a different feature.

This method returns a tuple consisting of the trained machine and a 1D 64-bit floating point array containing the eigen-values calculated while computing the KLT. The eigen-value ordering matches that of eigen-vectors set in the machine.

Parameters:

X : array_like(2D, floats)

The input data to train on

machine : bob.learn.linear.Machine

The machine to be trained; this machine will be returned by this function

Returns:

machine : bob.learn.linear.Machine

The machine that has been trained; if given, identical to the machine parameter

eigen_values : array_like(1D, floats)

The eigen-values of the PCA projection.

use_svd¶

bool <– Use the SVD to compute PCA?

This flag determines if this trainer will use the SVD method (set it to True) to calculate the principal components or the Covariance method (set it to False).

class bob.learn.linear.WCCNTrainer¶

Bases: object

Trains a linear machine to perform Within-Class Covariance Normalization (WCCN)

WCCN finds the projection matrix W that allows us to linearly project the data matrix X to another (sub) space such that:

$(1/N) S_{w} = W W^T$

where $W$ is an upper triangular matrix computed using Cholesky Decomposition:

$W = cholesky([(1/K) S_{w} ]^{-1})$

where:

$K$

the number of classes

$S_w$

the within-class scatter; it also has dimensions (X.shape[0], X.shape[0]) and is defined as $S_w = \sum_{k=1}^K \sum_{n \in C_k} (x_n-m_k)(x_n-m_k)^T$ , with $C_k$ being a set representing all samples for class k.

$m_k$

the class k empirical mean, defined as $m_k = \frac{1}{N_k}\sum_{n \in C_k} x_n$

References:

Within-class covariance normalization for SVM-based speaker recognition, Andrew O. Hatch, Sachin Kajarekar, and Andreas Stolcke, In INTERSPEECH, 2006.
http://en.wikipedia.org/wiki/Cholesky_decomposition

Constructor Documentation:

bob.learn.linear.WCCNTrainer ()

bob.learn.linear.WCCNTrainer (other)

Constructs a new trainer to train a linear machine to perform WCCN

Parameters:

other : WCCNTrainer

Another WCCN trainer to copy

Class Members:

train(X[, machine]) → machine¶

Trains a linear machine using WCCN

The value of X should be a sequence over as many 2D 64-bit floating point number arrays as classes in the problem. All arrays will be checked for conformance (identical number of columns). To accomplish this, either prepare a list with all your class observations organized in 2D arrays or pass a 3D array in which the first dimension (depth) contains as many elements as classes you want to train for.

The resulting machine will have the same number of inputs and outputs as columns in any of X‘s matrices.

The user may provide or not an object of type bob.learn.linear.Machine that will be set by this method. In such a case, the machine should have a shape that matches (X.shape[1], X.shape[1]). If the user does not provide a machine to be set, then a new one will be allocated internally. In both cases, the resulting machine is always returned.

Parameters:

X : [array_like(2D,float)] or array_like(3D, float)

The training data arranged by class

machine : bob.learn.linear.Machine

A pre-allocated machine to be trained; may be omitted

Returns:

machine : bob.learn.linear.Machine

The trained machine; identical to the machine parameter, if specified

class bob.learn.linear.WhiteningTrainer¶

Bases: object

Trains a linear bob.learn.linear.Machine to perform Cholesky whitening.

The whitening transformation is a decorrelation method that converts the covariance matrix of a set of samples into the identity matrix $I$ . This effectively linearly transforms random variables such that the resulting variables are uncorrelated and have the same variances as the original random variables. This transformation is invertible. The method is called the whitening transform because it transforms the input matrix $X$ closer towards white noise (let’s call it $\tilde{X}$ ):

$Cov(\tilde{X}) = I$

with:

$\tilde{X} = X W$

where $W$ is the projection matrix that allows us to linearly project the data matrix $X$ to another (sub) space such that:

$Cov(X) = W W^T$

$W$ is computed using Cholesky decomposition:

$W = cholesky([Cov(X)]^{-1})$

References:

Constructor Documentation:

bob.learn.linear.WhiteningTrainer ()

bob.learn.linear.WhiteningTrainer (other)

Constructs a new whitening trainer

Parameters:

other : WhiteningTrainer

Another whitening trainer to copy

Class Members:

train(X[, machine]) → machine¶

Trains a linear machine to perform Cholesky whitening

The user may provide or not an object of type bob.learn.linear.Machine that will be set by this method. In such a case, the machine should have a shape that matches (X.shape[1], X.shape[1]). If the user does not provide a machine to be set, then a new one will be allocated internally. In both cases, the resulting machine is always returned by this method.

The input data matrix $X$ should correspond to a 64-bit floating point 2D array organized in such a way that every row corresponds to a new observation of the phenomena (i.e., a new sample) and every column corresponds to a different feature.

Parameters:

X : array_like(2D, float)

The training data

machine : bob.learn.linear.Machine

A pre-allocated machine to be trained; may be omitted

Returns:

machine : bob.learn.linear.Machine

The trained machine; identical to the machine parameter, if specified

bob.learn.linear.bic_intra_extra_pairs(training_data) → intra_pairs, extra_pairs[source]¶

Computes intra-class and extra-class pairs from given training data.

The training_data should be aligned in a list of sub-lists, where each sub-list contains the data of one class. This function will return two lists of tuples of data, where the first list contains tuples of the same class, while the second list contains tuples of different classes. These tuples can be used to compute difference vectors, which then can be fed into the BICTrainer.train() method.

Note

In general, many more extra_pairs than intra_pairs are returned.

Warning

This function actually returns a two lists of pairs of references to the given data. Even for relatively low numbers of classes and elements per class, the returned lists may contain billions of pairs, which require huge amounts of memory.

Keyword parameters

training_data: The training data, where the data for each class are enclosed in one list.

Return values

intra_pairs: A list of tuples of data, where both data belong to the same class, where each data element is a reference to one element of the given training_data.
extra_pairs: A list of tuples of data, where both data belong to different classes, where each data element is a reference to one element of the given training_data.

bob.learn.linear.bic_intra_extra_pairs_between_factors(first_factor, second_factor) → intra_pairs, extra_pairs[source]¶

Computes intra-class and extra-class pairs from given training data, where only pairs between the first and second factors are considered.

Both first_factor and second_factor should be aligned in a list of sub-lists, where corresponding sub-list contains the data of one class. Both lists need to contain the same classes in the same order; empty classes (empty lists) are allowed. This function will return two lists of tuples of data, where the first list contains tuples of the same class, while the second list contains tuples of different classes. These tuples can be used to compute difference vectors, which then can be fed into the BICTrainer.train() method.

Note

In general, many more extra_pairs than intra_pairs are returned.

Warning

This function actually returns a two lists of pairs of references to the given data. Even for relatively low numbers of classes and elements per class, the returned lists may contain billions of pairs, which require huge amounts of memory.

Keyword parameters

first_factor: The training data for the first factor, where the data for each class are enclosed in one list.
second_factor: The training data for the second factor, where the data for each class are enclosed in one list. Must have the same size as first_factor.

Return values

intra_pairs: A list of tuples of data, where both data belong to the same class, but different factors.
extra_pairs: A list of tuples of data, where both data belong to different classes and different factors.