berry.optimizers
¶
berry.optimizers
module contains classes for training a neural network.
The training process involves looping over the entire training data multiple times (epochs) seeing a handful samples at a time (training step/iteration). Each training step in turn consists mainly of two steps:
- computing gradient of loss function w.r.t. each parameter
- updating each parameter using this gradient weighted by the learning rate (this depends on the type of optimizer used)
List of optimizers available:
sgd |
Stochastic Gradient Descent |
It is possible to catch vanishing/exploding gradients problem by monitoring the gradients every few iterations. The class below prints the mean and standard deviation of the parameters and it’s corresponding gradients in order to facilitate better debugging in case the neural network is not learning.
GradientSanity |
Montior the value of parameters and it’s corresponding gradients. |
Helper function¶
-
braid.berry.optimizers.
get_optimizer
(key, loss_op, learning_rate, global_step=None, **kwargs)¶ Helper function to retrieve the appropriate optimizer class.
- key : string
- Name of the optimizer class - “sgd”, etc.
tf.Tensor
- Training operation.
- list of
tf.Variable
- List of gradients w.r.t. each trainable parameter.
>>> from berry.optimizers import get_optimizer >>> # assume: loss_op is the loss operation returned by >>> # berry.objectives.get_objective() >>> train_op, grads = get_optimizer("sgd", loss_op, 0.1)
Examples¶
You are encouraged to use the helper function get_optimizer()
.
Alternatively, you can use the optimizer functions directly,
>>> from berry.optimizers import sgd
>>> optim = sgd(learning_rate=0.01)
Optimizers¶
-
braid.berry.optimizers.
sgd
(learning_rate)¶ Stochastic Gradient Descent
- learning_rate : float
- Rate of update of paramter values based on it’s gradients.
- Derived class of
f.train.Optimizer
- Class which performs the gradient computation and performs backward pass.
Parameter update step for SGD is
\[\theta_i = \theta_i + \alpha \nabla \mathcal{J}_{\theta_i} (x^{(i)}, y^{(i)})\]where \(\nabla\mathcal{J}_i(x^{(i)}, y^{(i)})\) is the gradient of the loss for \(i\) -th mini-batch w.r.t. paramter \(\theta_i\).
Gradient Monitoring¶
-
class
braid.berry.optimizers.
GradientSanity
(session, param_ops, grad_ops)¶ Montior the value of parameters and it’s corresponding gradients.
Prints the mean and standard deviation of the parameter values an it’s gradients in a nice formatted table.
- session :
- The tensorflow session in which the operations are defined.
- param_ops : list of
- List of trainable paramters.
- grad_ops : list of
- List of gradient of loss function w.r.t. the trainable paramters.
tf.Session
tf.Variable
tf.Variable
- ops : list of
- List of parameters and gradients.
tf.Variable
>>> sess = tf.Session() >>> _, grads = get_optimizer("sgd", loss_op, 0.1) >>> param_ops = tf.trainable_variables() >>> sanity = GradientSanity(sess, param_ops, grads) >>> # assume: feed_dict = {'x:0': ..., 'y:0': ...} >>> sanity.run(feed_dict=feed_dict)
-
print_msg
(str)¶ Format for printing
-
print_summary
(vals, grads, names)¶ Print a formatted table summary of parameter and gradient values.
- vals : list of
- List of parameter values.
- grads : list of
- List of gradient values.
- names : list of string
- List of names of the parameters.
np.ndarray
np.ndarray
-
run
(feed_dict={})¶ Perform the forward pass and print a summary of the parameter and gradient values.
- feed_dict : dict
- Dict containing the input and target data for forward pass.