# berry.optimizers¶

berry.optimizers module contains classes for training a neural network.

The training process involves looping over the entire training data multiple times (epochs) seeing a handful samples at a time (training step/iteration). Each training step in turn consists mainly of two steps:

• computing gradient of loss function w.r.t. each parameter
• updating each parameter using this gradient weighted by the learning rate (this depends on the type of optimizer used)

List of optimizers available:

 sgd Stochastic Gradient Descent

It is possible to catch vanishing/exploding gradients problem by monitoring the gradients every few iterations. The class below prints the mean and standard deviation of the parameters and it’s corresponding gradients in order to facilitate better debugging in case the neural network is not learning.

 GradientSanity Montior the value of parameters and it’s corresponding gradients.

## Helper function¶

braid.berry.optimizers.get_optimizer(key, loss_op, learning_rate, global_step=None, **kwargs)

Helper function to retrieve the appropriate optimizer class.

key
: string
Name of the optimizer class - “sgd”, etc.
tf.Tensor
Training operation.
list of tf.Variable
List of gradients w.r.t. each trainable parameter.
>>> from berry.optimizers import get_optimizer
>>> # assume: loss_op is the loss operation returned by
>>> # berry.objectives.get_objective()
>>> train_op, grads = get_optimizer("sgd", loss_op, 0.1)


## Examples¶

You are encouraged to use the helper function get_optimizer(). Alternatively, you can use the optimizer functions directly,

>>> from berry.optimizers import sgd
>>> optim = sgd(learning_rate=0.01)


## Optimizers¶

braid.berry.optimizers.sgd(learning_rate)

learning_rate
: float
Rate of update of paramter values based on it’s gradients.
Derived class of f.train.Optimizer
Class which performs the gradient computation and performs backward pass.

Parameter update step for SGD is

$\theta_i = \theta_i + \alpha \nabla \mathcal{J}_{\theta_i} (x^{(i)}, y^{(i)})$

where $$\nabla\mathcal{J}_i(x^{(i)}, y^{(i)})$$ is the gradient of the loss for $$i$$ -th mini-batch w.r.t. paramter $$\theta_i$$.

class braid.berry.optimizers.GradientSanity(session, param_ops, grad_ops)

Montior the value of parameters and it’s corresponding gradients.

Prints the mean and standard deviation of the parameter values an it’s gradients in a nice formatted table.

session
: tf.Session
The tensorflow session in which the operations are defined.
param_ops
: list of tf.Variable
List of trainable paramters.
: list of tf.Variable
List of gradient of loss function w.r.t. the trainable paramters.
ops
: list of tf.Variable
>>> sess = tf.Session()
>>> _, grads = get_optimizer("sgd", loss_op, 0.1)
>>> param_ops = tf.trainable_variables()
>>> # assume: feed_dict = {'x:0': ..., 'y:0': ...}
>>> sanity.run(feed_dict=feed_dict)

print_msg(str)

Format for printing

print_summary(vals, grads, names)

Print a formatted table summary of parameter and gradient values.

vals
: list of np.ndarray
List of parameter values.
: list of np.ndarray
run(feed_dict={})