Feedforward networks

Implementation of feedforward network, including Gauss-Newton approximation for use in Hessian-free optimization.

Based on Martens, J. (2010). Deep learning via Hessian-free optimization. In Proceedings of the 27th International Conference on Machine Learning.

class hessianfree.ffnet.FFNet(shape, layers=<hessianfree.nonlinearities.Logistic object at 0x00000197DC688278>, conns=None, loss_type=<hessianfree.loss_funcs.SquaredError object at 0x00000197DC6883C8>, W_init_params=None, use_GPU=False, load_weights=None, debug=False, rng=None, dtype=<class 'numpy.float32'>)[source]

Bases: object

Implementation of feed-forward network (including gradient/curvature computation).

  • shape (list) – the number of neurons in each layer
  • layers (Nonlinearity or list) – nonlinearity to use in the network (or a list giving a nonlinearity for each layer)
  • conns (dict) – dictionary of the form {layer_x:[layer_y, layer_z], ...} specifying the connections between layers (default is to connect in series)
  • loss_type (LossFunction or list) – loss function (or list of loss functions) used to evaluate network
  • W_init_params (dict) – parameters passed to init_weights() (see parameter descriptions in that function)
  • use_GPU (bool) – run curvature computation on GPU (requires PyCUDA and scikit-cuda)
  • load_weights (str or ndarray) – load initial weights from given array or filename
  • debug (bool) – activates expensive features to help with debugging
  • rng (RandomState) – used to generate any random numbers for this network (use this to control the seed)
  • dtype (dtype) – floating point precision used throughout the network
run_epochs(inputs, targets, optimizer, max_epochs=100, minibatch_size=None, test=None, test_err=None, target_err=1e-06, plotting=False, file_output=None, print_period=10)[source]

Apply the given optimizer with a sequence of (mini)batches.

  • inputs (ndarray or Plant) – input vectors (or a Plant that will generate the input vectors dynamically)
  • targets (ndarray) – target vectors corresponding to each input vector (or None if a plant is being used)
  • optimizer – computes the weight update each epoch (see optimizers.py)
  • max_epochs (int) – the maximum number of epochs to run
  • minibatch_size (int) – the size of the minibatch to use in each epoch (or None to use full batches)
  • test (tuple) – tuple of (inputs,targets) to use as the test data (if None then the same inputs and targets as training will be used)
  • test_err (LossFunction) – a custom error function to be applied to the test data (e.g., classification error)
  • target_err (float) – run will terminate if this test error is reached
  • file_output (str) – output files from the run will use this as a prefix (if None then don’t output files)
  • plotting (bool) – if True then data from the run will be output to a file, which can be displayed via dataplotter.py
  • print_period (int) – print out information about the run every x epochs
forward(inputs, params=None, deriv=False)[source]

Compute layer activations for given input and parameters.

  • inputs (ndarray) – input vectors (passed to first layer)
  • params (ndarray) – parameter vector (weights) for the network (defaults to self.W)
  • deriv (bool) – if True then also compute the derivative of the activations
error(W=None, inputs=None, targets=None)[source]

Compute network error.

  • W (ndarray) – network parameters (defaults to self.W)
  • inputs (ndarray) – input vectors (defaults to the cached (mini)batch for current epoch)
  • targets (ndarray) – target vectors (defaults to the cached (mini)batch for current epoch)
cache_minibatch(inputs, targets, minibatch=None)[source]

Pick a subset of inputs and targets to use in minibatch, and cache the activations for that minibatch.


Load data for the current epoch onto GPU.

static J_dot(J, vec, transpose_J=False, out=None)[source]

Compute the product of a Jacobian and some vector.


Compute parameter gradient.


Check gradient via finite differences (for debugging).

calc_G(v, damping=0, out=None)[source]

Compute Gauss-Newton matrix-vector product.

GPU_calc_G(v, damping=0, out=None)[source]

Compute Gauss-Newton matrix-vector product on GPU.


Compute the Jacobian of the network via finite differences.

check_G(calc_G, v, damping=0)[source]

Check Gv calculation via finite differences (for debugging).

init_weights(shapes, coeff=1.0, biases=0.0, init_type='sparse')[source]

Weight initialization, given shapes of weight matrices.

Note: coeff, biases, and init_type can be specified by the W_init_params dict in FFNet. Each can be specified as a single value (for all matrices) or as a list giving a value for each matrix.

  • shapes (list) – list of (pre,post) shapes for each weight matrix
  • coeff (float) – scales the magnitude of the connection weights
  • biases (float) – bias values for the post of each matrix
  • init_type (str) – type of initialization to use (currently supports ‘sparse’, ‘uniform’, ‘gaussian’)

Precompute offsets for layers in the overall parameter vector.

get_weights(params, conn)[source]

Get weight matrix for a connection from overall parameter vector.


Set the loss type for this network to the given LossFunction (or a list of functions can be passed to create a LossSet).