berry.optimizers

berry.optimizers module contains classes for training a neural network.

The training process involves looping over the entire training data multiple times (epochs) seeing a handful samples at a time (training step/iteration). Each training step in turn consists mainly of two steps:

  • computing gradient of loss function w.r.t. each parameter
  • updating each parameter using this gradient weighted by the learning rate (this depends on the type of optimizer used)

List of optimizers available:

sgd Stochastic Gradient Descent

It is possible to catch vanishing/exploding gradients problem by monitoring the gradients every few iterations. The class below prints the mean and standard deviation of the parameters and it’s corresponding gradients in order to facilitate better debugging in case the neural network is not learning.

GradientSanity Montior the value of parameters and it’s corresponding gradients.

Helper function

braid.berry.optimizers.get_optimizer(key, loss_op, learning_rate, global_step=None, **kwargs)

Helper function to retrieve the appropriate optimizer class.

key
: string
Name of the optimizer class - “sgd”, etc.
tf.Tensor
Training operation.
list of tf.Variable
List of gradients w.r.t. each trainable parameter.
>>> from berry.optimizers import get_optimizer
>>> # assume: loss_op is the loss operation returned by
>>> # berry.objectives.get_objective()
>>> train_op, grads = get_optimizer("sgd", loss_op, 0.1)

Examples

You are encouraged to use the helper function get_optimizer(). Alternatively, you can use the optimizer functions directly,

>>> from berry.optimizers import sgd
>>> optim = sgd(learning_rate=0.01)

Optimizers

braid.berry.optimizers.sgd(learning_rate)

Stochastic Gradient Descent

learning_rate
: float
Rate of update of paramter values based on it’s gradients.
Derived class of f.train.Optimizer
Class which performs the gradient computation and performs backward pass.

Parameter update step for SGD is

\[\theta_i = \theta_i + \alpha \nabla \mathcal{J}_{\theta_i} (x^{(i)}, y^{(i)})\]

where \(\nabla\mathcal{J}_i(x^{(i)}, y^{(i)})\) is the gradient of the loss for \(i\) -th mini-batch w.r.t. paramter \(\theta_i\).

Gradient Monitoring

class braid.berry.optimizers.GradientSanity(session, param_ops, grad_ops)

Montior the value of parameters and it’s corresponding gradients.

Prints the mean and standard deviation of the parameter values an it’s gradients in a nice formatted table.

session
: tf.Session
The tensorflow session in which the operations are defined.
param_ops
: list of tf.Variable
List of trainable paramters.
grad_ops
: list of tf.Variable
List of gradient of loss function w.r.t. the trainable paramters.
ops
: list of tf.Variable
List of parameters and gradients.
>>> sess = tf.Session()
>>> _, grads = get_optimizer("sgd", loss_op, 0.1)
>>> param_ops = tf.trainable_variables()
>>> sanity = GradientSanity(sess, param_ops, grads)
>>> # assume: feed_dict = {'x:0': ..., 'y:0': ...}
>>> sanity.run(feed_dict=feed_dict)
print_msg(str)

Format for printing

print_summary(vals, grads, names)

Print a formatted table summary of parameter and gradient values.

vals
: list of np.ndarray
List of parameter values.
grads
: list of np.ndarray
List of gradient values.
names
: list of string
List of names of the parameters.
run(feed_dict={})

Perform the forward pass and print a summary of the parameter and gradient values.

feed_dict
: dict
Dict containing the input and target data for forward pass.