Recurrent networks¶

Implementation of recurrent network, including Gauss-Newton approximation for use in Hessian-free optimization.

Based on Martens, J., & Sutskever, I. (2011). Learning recurrent neural networks with hessian-free optimization. Proceedings of the 28th International Conference on Machine Learning.

class hessianfree.rnnet.RNNet(shape, rec_layers=None, W_rec_params=None, truncation=None, **kwargs)[source]¶

Bases: hessianfree.ffnet.FFNet

Implementation of recurrent deep network (including gradient/curvature computation).

Parameters:

rec_layers (list) – indices of layers with recurrent connections (default is to make all except first and last layers recurrent)
W_rec_params (dict) – parameters used to initialize recurrent weights (passed to init_weights())
truncation (tuple) – a tuple (n,k) where backpropagation through time will be executed every n timesteps and run backwards for k steps (defaults to full backprop if None)

See FFNet for the remaining parameters.

forward(inputs, params=None, deriv=False, init_activations=None, init_state=None)[source]¶

Compute layer activations for given input and parameters.

Parameters:

inputs (ndarray) – input vectors (passed to first layer)
params (ndarray) – parameter vector (weights) for the network (defaults to self.W)
deriv (bool) – if True then also compute the derivative of the activations
init_activations (list) – initial values for the activations in each layer
init_state (list) – initial values for the internal state of any stateful nonlinearities

calc_grad()[source]¶: Compute parameter gradient.

check_grad(calc_grad)[source]¶: Check gradient via finite differences (for debugging).

calc_G(v, damping=0, out=None)[source]¶: Compute Gauss-Newton matrix-vector product.

load_GPU_data()[source]¶: Load data for the current epoch onto GPU.

GPU_calc_G(v, damping=0, out=None)[source]¶: Compute Gauss-Newton matrix-vector product on GPU.

check_J(start=0, stop=None)[source]¶: Compute the Jacobian of the network via finite differences.

check_G(calc_G, v, damping=0)[source]¶: Check Gv calculation via finite differences (for debugging).

compute_offsets()[source]¶: Precompute offsets for layers in the overall parameter vector.