Feedforward networks¶

Implementation of feedforward network, including Gauss-Newton approximation for use in Hessian-free optimization.

Based on Martens, J. (2010). Deep learning via Hessian-free optimization. In Proceedings of the 27th International Conference on Machine Learning.

class hessianfree.ffnet.FFNet(shape, layers=<hessianfree.nonlinearities.Logistic object at 0x00000197DC688278>, conns=None, loss_type=<hessianfree.loss_funcs.SquaredError object at 0x00000197DC6883C8>, W_init_params=None, use_GPU=False, load_weights=None, debug=False, rng=None, dtype=<class 'numpy.float32'>)[source]¶

Bases: object

Implementation of feed-forward network (including gradient/curvature computation).

Parameters:

shape (list) – the number of neurons in each layer
layers (Nonlinearity or list) – nonlinearity to use in the network (or a list giving a nonlinearity for each layer)
conns (dict) – dictionary of the form {layer_x:[layer_y, layer_z], ...} specifying the connections between layers (default is to connect in series)
loss_type (LossFunction or list) – loss function (or list of loss functions) used to evaluate network
W_init_params (dict) – parameters passed to init_weights() (see parameter descriptions in that function)
use_GPU (bool) – run curvature computation on GPU (requires PyCUDA and scikit-cuda)
load_weights (str or ndarray) – load initial weights from given array or filename
debug (bool) – activates expensive features to help with debugging
rng (RandomState) – used to generate any random numbers for this network (use this to control the seed)
dtype (dtype) – floating point precision used throughout the network

run_epochs(inputs, targets, optimizer, max_epochs=100, minibatch_size=None, test=None, test_err=None, target_err=1e-06, plotting=False, file_output=None, print_period=10)[source]¶

Apply the given optimizer with a sequence of (mini)batches.

Parameters:

inputs (ndarray or Plant) – input vectors (or a Plant that will generate the input vectors dynamically)
targets (ndarray) – target vectors corresponding to each input vector (or None if a plant is being used)
optimizer – computes the weight update each epoch (see optimizers.py)
max_epochs (int) – the maximum number of epochs to run
minibatch_size (int) – the size of the minibatch to use in each epoch (or None to use full batches)
test (tuple) – tuple of (inputs,targets) to use as the test data (if None then the same inputs and targets as training will be used)
test_err (LossFunction) – a custom error function to be applied to the test data (e.g., classification error)
target_err (float) – run will terminate if this test error is reached
file_output (str) – output files from the run will use this as a prefix (if None then don’t output files)
plotting (bool) – if True then data from the run will be output to a file, which can be displayed via dataplotter.py
print_period (int) – print out information about the run every x epochs

forward(inputs, params=None, deriv=False)[source]¶

Compute layer activations for given input and parameters.

Parameters:	inputs (`ndarray`) – input vectors (passed to first layer) params (`ndarray`) – parameter vector (weights) for the network (defaults to `self.W`) deriv (bool) – if True then also compute the derivative of the activations

error(W=None, inputs=None, targets=None)[source]¶

Compute network error.

Parameters:	W (`ndarray`) – network parameters (defaults to `self.W`) inputs (`ndarray`) – input vectors (defaults to the cached (mini)batch for current epoch) targets (`ndarray`) – target vectors (defaults to the cached (mini)batch for current epoch)

cache_minibatch(inputs, targets, minibatch=None)[source]¶: Pick a subset of inputs and targets to use in minibatch, and cache the activations for that minibatch.

load_GPU_data()[source]¶: Load data for the current epoch onto GPU.

static J_dot(J, vec, transpose_J=False, out=None)[source]¶: Compute the product of a Jacobian and some vector.

calc_grad()[source]¶: Compute parameter gradient.

check_grad(calc_grad)[source]¶: Check gradient via finite differences (for debugging).

calc_G(v, damping=0, out=None)[source]¶: Compute Gauss-Newton matrix-vector product.

GPU_calc_G(v, damping=0, out=None)[source]¶: Compute Gauss-Newton matrix-vector product on GPU.

check_J()[source]¶: Compute the Jacobian of the network via finite differences.

check_G(calc_G, v, damping=0)[source]¶: Check Gv calculation via finite differences (for debugging).

init_weights(shapes, coeff=1.0, biases=0.0, init_type='sparse')[source]¶

Weight initialization, given shapes of weight matrices.

Note: coeff, biases, and init_type can be specified by the W_init_params dict in FFNet. Each can be specified as a single value (for all matrices) or as a list giving a value for each matrix.

Parameters:	shapes (list) – list of (pre,post) shapes for each weight matrix coeff (float) – scales the magnitude of the connection weights biases (float) – bias values for the post of each matrix init_type (str) – type of initialization to use (currently supports ‘sparse’, ‘uniform’, ‘gaussian’)

compute_offsets()[source]¶: Precompute offsets for layers in the overall parameter vector.

get_weights(params, conn)[source]¶: Get weight matrix for a connection from overall parameter vector.

init_loss(loss_type)[source]¶: Set the loss type for this network to the given LossFunction (or a list of functions can be passed to create a LossSet).