berry.initializations

Methods for weight initialization.

In order to train a deep network, we initialize the weights and biases in all the layers with random small values. It is very important for the values to lie in a certain range since this mainly commands whether the model will train effectively or not. There are mainly two problems which arise with poor parameter initialization:

  • Too large values: This leads to exploding gradients and causes a NaN loss. After this the gradients drop to 0, hence rendering the training process useless.
  • Too small values: This leads to vanishing gradients - the gradient signals generated are too small to make any significant change to the paramter values and hence, the model does not learn anything.

Below is a list of initializations defined in berry to help with proper initializations based on the model architecture.

xavier Xavier weight initialization.
deepnet This initialization gives good performance with deep nets, e.g.: VGG-16.

Helper function

braid.berry.initializations.get_initialzation(key)

Helper function to retrieve the appropriate initialization function.

key
: string
Name of the type of initialization - “xavier”, “deepnet”, etc.
function
The appropriate function given the key.
>>> from berry.initializations import get_initialzation
>>> func = get_initialzation("deepnet")
>>> params = {'shape': [1500, 500], 'fan_out': 500}
>>> stddev = func(**params)

Examples

You can either use the get_initialzation() function or use the initialization function directly,

>>> from berry.initializations import xavier
>>> params = {'shape': [1500, 500], 'fan_out': 500, 'fan_in': 1500}
>>> stddev = xavier(**params)

Initializations

braid.berry.initializations.xavier(shape=None, fan_in=1, fan_out=1, **kwargs)

Xavier weight initialization.

This is also known as Glorot initialization [1]_. Known to give good performance with sigmoid units.

shape
: tuple or list
Shape of the weight tensor to sample.
fan_in
: int
The number of units connected at the input of the current layer.
fan_out
: int
The number of units connected at the output of the current layer.
float
Standard deviation of the normal distribution for weight initialization.
[1]Xavier Glorot and Yoshua Bengio (2010): Understanding the difficulty of training deep feedforward neural networks. International conference on artificial intelligence and statistics.

The weights are initialized as

\[\begin{split}\sigma &= \sqrt{\frac{2}{fan_{in}+fan_{out}}}\\ W &\sim N(0, \sigma)\end{split}\]
braid.berry.initializations.deepnet(shape=None, fan_out=1, **kwargs)

This initialization gives good performance with deep nets, e.g.: VGG-16.

This method [1]_ was mainly developed with Relu/PRelu activations in mind and is known to give good performance with very deep networks which use these activations.

shape
: tuple or list
Shape of the weight tensor to sample.
fan_out
: int
The number of units connected at the output of the current layer.
float
Standard deviation of the normal distribution for weight initialization.
[1]He, K., Zhang, X., Ren, S., and Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. ArXiv e-prints, February 2015.

The weights are initialized as

\[\sigma = \sqrt{\frac{2}{fan_{out}}}\]