`berry.optimizers`¶

berry.optimizers module contains classes for training a neural network.

The training process involves looping over the entire training data multiple times (epochs) seeing a handful samples at a time (training step/iteration). Each training step in turn consists mainly of two steps:

computing gradient of loss function w.r.t. each parameter
updating each parameter using this gradient weighted by the learning rate (this depends on the type of optimizer used)

List of optimizers available:

sgd Stochastic Gradient Descent

It is possible to catch vanishing/exploding gradients problem by monitoring the gradients every few iterations. The class below prints the mean and standard deviation of the parameters and it’s corresponding gradients in order to facilitate better debugging in case the neural network is not learning.

GradientSanity Montior the value of parameters and it’s corresponding gradients.

Helper function¶

braid.berry.optimizers.get_optimizer(key, loss_op, learning_rate, global_step=None, **kwargs)¶

Helper function to retrieve the appropriate optimizer class.

key: Name of the optimizer class - “sgd”, etc.

tf.Tensor: Training operation.
list of tf.Variable: List of gradients w.r.t. each trainable parameter.

>>> from berry.optimizers import get_optimizer
>>> # assume: loss_op is the loss operation returned by
>>> # berry.objectives.get_objective()
>>> train_op, grads = get_optimizer("sgd", loss_op, 0.1)

Examples¶

You are encouraged to use the helper function get_optimizer(). Alternatively, you can use the optimizer functions directly,

>>> from berry.optimizers import sgd
>>> optim = sgd(learning_rate=0.01)

Optimizers¶

braid.berry.optimizers.sgd(learning_rate)¶

Stochastic Gradient Descent

learning_rate: Rate of update of paramter values based on it’s gradients.

Derived class of f.train.Optimizer: Class which performs the gradient computation and performs backward pass.

Parameter update step for SGD is

\[\theta_i = \theta_i + \alpha \nabla \mathcal{J}_{\theta_i} (x^{(i)}, y^{(i)})\]

where \(\nabla\mathcal{J}_i(x^{(i)}, y^{(i)})\) is the gradient of the loss for \(i\) -th mini-batch w.r.t. paramter \(\theta_i\).

Gradient Monitoring¶

class braid.berry.optimizers.GradientSanity(session, param_ops, grad_ops)¶

Montior the value of parameters and it’s corresponding gradients.

Prints the mean and standard deviation of the parameter values an it’s gradients in a nice formatted table.

session: The tensorflow session in which the operations are defined.
param_ops: List of trainable paramters.
grad_ops: List of gradient of loss function w.r.t. the trainable paramters.

ops: List of parameters and gradients.

>>> sess = tf.Session()
>>> _, grads = get_optimizer("sgd", loss_op, 0.1)
>>> param_ops = tf.trainable_variables()
>>> sanity = GradientSanity(sess, param_ops, grads)
>>> # assume: feed_dict = {'x:0': ..., 'y:0': ...}
>>> sanity.run(feed_dict=feed_dict)

print_msg(str)¶: Format for printing

print_summary(vals, grads, names)¶

Print a formatted table summary of parameter and gradient values.

vals: List of parameter values.
grads: List of gradient values.
names: List of names of the parameters.

run(feed_dict={})¶

Perform the forward pass and print a summary of the parameter and gradient values.

feed_dict: Dict containing the input and target data for forward pass.

`berry.optimizers`¶

Helper function¶

Examples¶

Optimizers¶

Gradient Monitoring¶

Table Of Contents

Related Topics

This Page

berry.optimizers¶

Helper function¶

Examples¶

Optimizers¶

Gradient Monitoring¶

`berry.optimizers`¶