`berry.initializations`¶

Methods for weight initialization.

In order to train a deep network, we initialize the weights and biases in all the layers with random small values. It is very important for the values to lie in a certain range since this mainly commands whether the model will train effectively or not. There are mainly two problems which arise with poor parameter initialization:

Too large values: This leads to exploding gradients and causes a NaN loss. After this the gradients drop to 0, hence rendering the training process useless.
Too small values: This leads to vanishing gradients - the gradient signals generated are too small to make any significant change to the paramter values and hence, the model does not learn anything.

Below is a list of initializations defined in berry to help with proper initializations based on the model architecture.

`xavier`	Xavier weight initialization.
`deepnet`	This initialization gives good performance with deep nets, e.g.: VGG-16.

Helper function¶

braid.berry.initializations.get_initialzation(key)¶

Helper function to retrieve the appropriate initialization function.

key: Name of the type of initialization - “xavier”, “deepnet”, etc.

function: The appropriate function given the key.

>>> from berry.initializations import get_initialzation
>>> func = get_initialzation("deepnet")
>>> params = {'shape': [1500, 500], 'fan_out': 500}
>>> stddev = func(**params)

Examples¶

You can either use the get_initialzation() function or use the initialization function directly,

>>> from berry.initializations import xavier
>>> params = {'shape': [1500, 500], 'fan_out': 500, 'fan_in': 1500}
>>> stddev = xavier(**params)

Initializations¶

braid.berry.initializations.xavier(shape=None, fan_in=1, fan_out=1, **kwargs)¶

Xavier weight initialization.

This is also known as Glorot initialization [1]_. Known to give good performance with sigmoid units.

shape: Shape of the weight tensor to sample.
fan_in: The number of units connected at the input of the current layer.
fan_out: The number of units connected at the output of the current layer.

float: Standard deviation of the normal distribution for weight initialization.

[1]	Xavier Glorot and Yoshua Bengio (2010): Understanding the difficulty of training deep feedforward neural networks. International conference on artificial intelligence and statistics.

The weights are initialized as

\[\begin{split}\sigma &= \sqrt{\frac{2}{fan_{in}+fan_{out}}}\\ W &\sim N(0, \sigma)\end{split}\]

braid.berry.initializations.deepnet(shape=None, fan_out=1, **kwargs)¶

This initialization gives good performance with deep nets, e.g.: VGG-16.

This method [1]_ was mainly developed with Relu/PRelu activations in mind and is known to give good performance with very deep networks which use these activations.

shape: Shape of the weight tensor to sample.
fan_out: The number of units connected at the output of the current layer.

float: Standard deviation of the normal distribution for weight initialization.

[1]	He, K., Zhang, X., Ren, S., and Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. ArXiv e-prints, February 2015.

The weights are initialized as

\[\sigma = \sqrt{\frac{2}{fan_{out}}}\]

`berry.initializations`¶

Helper function¶

Examples¶

Initializations¶

Table Of Contents

Related Topics

This Page

berry.initializations¶

Helper function¶

Examples¶

Initializations¶

`berry.initializations`¶