Built-in objects¶

Optimizers¶

class hessianfree.optimizers.Optimizer[source]¶

Bases: object

Base class for optimizers.

Each optimizer has a self.net parameter that will be set automatically when the optimizer is added to a network (referring to that network).

compute_update(printing=False)[source]¶

Compute a weight update for the current batch.

It can be assumed that the batch has already been stored in net.inputs and net.targets, and the nonlinearity activations/derivatives for the batch are cached in net.activations and net.d_activations.

Parameters:	printing (bool) – if True, print out data about the optimization

class hessianfree.optimizers.HessianFree(CG_iter=250, init_damping=1, plotting=True)[source]¶

Bases: hessianfree.optimizers.Optimizer

Use Hessian-free optimization to compute the weight update.

Parameters:	CG_iter (int) – maximum number of CG iterations to run per epoch init_damping (float) – the initial value of the Tikhonov damping plotting (bool) – if True then collect data for plotting (actual plotting handled in parent network)

compute_update(printing=False)[source]¶

Compute a weight update for the current batch.

Parameters:	printing (bool) – if True, print out data about the optimization

conjugate_gradient(init_delta, grad, iters=250, printing=False)[source]¶: Find minimum of quadratic approximation using conjugate gradient algorithm.

class hessianfree.optimizers.SGD(l_rate=1, plotting=False)[source]¶

Bases: hessianfree.optimizers.Optimizer

Compute weight update using first-order gradient descent.

Parameters:	l_rate – learning rate to apply to weight updates plotting – if True then collect data for plotting (actual plotting handled in parent network)

compute_update(printing=False)[source]¶

Compute a weight update for the current batch.

Parameters:	printing (bool) – if True, print out data about the optimization

Nonlinearities¶

class hessianfree.nonlinearities.Nonlinearity(stateful=False)[source]¶

Bases: object

Base class for layer nonlinearities.

Parameters:	stateful (boolean) – True if this nonlinearity has internal state (in which case it needs to return `d_input`, `d_state`, and `d_output` in `d_activation()`; see `Continuous` for an example)

activation(x)[source]¶

Apply the nonlinearity to the inputs.

Parameters:	x – input to the nonlinearity

d_activation(x, a)[source]¶

Derivative of the nonlinearity with respect to the inputs.

Parameters:	x – input to the nonlinearity a – output of `activation(x)` (can be used to more efficiently compute the derivative for some nonlinearities)

reset(init=None)[source]¶

Reset the nonlinearity to initial conditions.

Parameters:	init – override the default initial conditions with these values

class hessianfree.nonlinearities.Tanh[source]¶

Bases: hessianfree.nonlinearities.Nonlinearity

Hyperbolic tangent function

\(f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\)

class hessianfree.nonlinearities.Logistic[source]¶

Bases: hessianfree.nonlinearities.Nonlinearity

Logistic sigmoid function

\(f(x) = \frac{1}{1 + e^{-x}}\)

Note: if scipy is installed then this will use the slightly faster scipy.special.expit

class hessianfree.nonlinearities.Linear[source]¶

Bases: hessianfree.nonlinearities.Nonlinearity

Linear activation function (passes inputs unchanged).

\(f(x) = x\)

class hessianfree.nonlinearities.ReLU(max=10000000000.0)[source]¶

Bases: hessianfree.nonlinearities.Nonlinearity

Rectified linear unit

\(f(x) = max(x, 0)\)

Parameters:	max – an upper bound on activation to help avoid numerical errors

class hessianfree.nonlinearities.Gaussian[source]¶

Bases: hessianfree.nonlinearities.Nonlinearity

Gaussian activation function

\(f(x) = e^{-x^2}\)

class hessianfree.nonlinearities.Softmax[source]¶

Bases: hessianfree.nonlinearities.Nonlinearity

Softmax activation function

\(f(x_i) = \frac{e^{x_i}}{\sum_j{e^{x_j}}}\)

class hessianfree.nonlinearities.SoftLIF(sigma=1, tau_rc=0.02, tau_ref=0.002, amp=0.01)[source]¶

Bases: hessianfree.nonlinearities.Nonlinearity

SoftLIF activation function

Based on Hunsberger, E. and Eliasmith, C. (2015). Spiking deep networks with LIF neurons. arXiv:1510.08829.

\[f(x) = \frac{amp}{ \tau_{ref} + \tau_{RC} log(1 + \frac{1}{\sigma log(1 + e^{\frac{x}{\sigma}})})}\]

Note: this is equivalent to \(LIF(SoftReLU(x))\)

Parameters:	sigma (float) – controls the smoothness of the nonlinearity threshold tau_rc (float) – LIF RC time constant tau_ref (float) – LIF refractory time constant amp (float) – scales output of nonlinearity

softrelu(x)[source]¶: Smoothed version of the ReLU nonlinearity.

lif(x)[source]¶: LIF activation function.

class hessianfree.nonlinearities.Continuous(base, tau=1.0, dt=1.0)[source]¶

Bases: hessianfree.nonlinearities.Nonlinearity

Creates a version of the base nonlinearity that operates in continuous time (filtering inputs with the given tau/dt).

\[\frac{ds}{dt} = \frac{x - s}{\tau}\]\[f(x) = base(s)\]

Parameters:	base (`Nonlinearity`) – nonlinear output function applied to the continuous state tau (float) – time constant of input filter (higher value means the internal state changes more slowly) dt (float) – simulation time step

reset(init=None)[source]¶: Reset state to zero (or init).

class hessianfree.nonlinearities.Plant(stateful=True)[source]¶

Bases: hessianfree.nonlinearities.Nonlinearity

Base class for a plant that can be called to dynamically generate inputs for a network.

See demos.plant() for an example of this being used in practice.

__call__(x)[source]¶

Update the internal state of the plant based on input.

Parameters:	x – the output of the last layer in the network on the previous timestep

get_vecs()[source]¶: Return a tuple of the (inputs,targets) vectors generated by the plant since the last reset.

reset(init=None)[source]¶

Reset the plant to initial state.

Parameters:	init – override the default initial state with these values

activation(x)[source]¶: This function only needs to be defined if the plant is going to be included as a layer in the network (as opposed to being handled by some external system).

d_activation(x, a)[source]¶: This function only needs to be defined if the plant is going to be included as a layer in the network (as opposed to being handled by some external system).

Loss functions¶

class hessianfree.loss_funcs.LossFunction[source]¶

Bases: object

Defines a loss function that maps nonlinearity activations to error.

loss(activities, targets)[source]¶

Computes the loss for each unit in the network.

Note that most loss functions are only based on the output of the final layer, activities[-1]. However, we pass the activities of all layers here so that loss functions can include things like sparsity constraints. Targets, however, are only defined for the output layer.

Targets can be defined as np.nan, which will be translated into zero error.

Parameters:	activities (list) – output activations of each layer targets (`ndarray`) – target activation values for last layer

d_loss(activities, targets)[source]¶: First derivative of loss function (with respect to activities).

d2_loss(activities, targets)[source]¶: Second derivative of loss function (with respect to activities).

batch_loss(activities, targets)[source]¶: Utility function to compute a single loss value for the network (taking the mean across batches and summing across and within layers).

hessianfree.loss_funcs.output_loss(func)[source]¶: Convenience decorator that takes a loss defined for the output layer and converts it into the more general form in terms of all layers.

class hessianfree.loss_funcs.SquaredError[source]¶

Bases: hessianfree.loss_funcs.LossFunction

Squared error

\(\frac{1}{2} \sum(output - target)^2\)

class hessianfree.loss_funcs.CrossEntropy[source]¶

Bases: hessianfree.loss_funcs.LossFunction

Cross-entropy error

\(-\sum(target * log(output))\)

class hessianfree.loss_funcs.ClassificationError[source]¶

Bases: hessianfree.loss_funcs.LossFunction

Classification error

\(argmax(output) \neq argmax(target)\)

Note: d_loss and d2_loss are not defined; classification error should only be used for validation, which doesn’t require either.

class hessianfree.loss_funcs.StructuralDamping(weight, layers=None, optimizer=None)[source]¶

Bases: hessianfree.loss_funcs.LossFunction

Applies structural damping, which penalizes layers for having highly variable output activity.

Note: this is not exactly the same as the structural damping in Martens (2010), because it is applied on the output side of the nonlinearity (meaning that this error will be filtered through d_activations during the backwards propagation).

Parameters:

weight (float) – scale on structural damping relative to other losses
layers (list) – indices specifying which layers will have the damping applied (defaults to all except first/last layers)
optimizer (Optimizer) – if provided, the weight on structural damping will be scaled relative to the damping attribute in the optimizer (so that any processes dynamically adjusting the damping during the optimization will also affect the structural damping)

class hessianfree.loss_funcs.SparseL1(weight, layers=None, target=0.0)[source]¶

Bases: hessianfree.loss_funcs.LossFunction

Imposes L1 sparsity constraint on nonlinearity activations.

Parameters:	weight (float) – relative weight of sparsity constraint layers (list) – indices specifying which layers will have the sparsity constraint applied (defaults to all except first/last layers) target (float) – target activation level for nonlinearities

class hessianfree.loss_funcs.SparseL2(weight, layers=None, target=0.0)[source]¶

Bases: hessianfree.loss_funcs.LossFunction

Imposes L2 sparsity constraint on nonlinearity activations.

Parameters:	weight (float) – relative weight of sparsity constraint layers (list) – indices specifying which layers will have the sparsity constraint applied (defaults to all except first/last layers) target (float) – target activation level for nonlinearities

class hessianfree.loss_funcs.LossSet(set)[source]¶

Bases: hessianfree.loss_funcs.LossFunction

Combines several loss functions into one (e.g., combining SquaredError and SparseL2). It doesn’t need to be created directly; a list of loss functions can be passed to FFNet/RNNet and a LossSet will be created automatically.

Parameters:	set (list) – list of `LossFunction`

group_func(func_name, activities, targets)[source]¶: Computes the given function for each LossFunction in the set, and sums the result.