berry.initializations
¶
Methods for weight initialization.
In order to train a deep network, we initialize the weights and biases in all the layers with random small values. It is very important for the values to lie in a certain range since this mainly commands whether the model will train effectively or not. There are mainly two problems which arise with poor parameter initialization:
 Too large values: This leads to exploding gradients and causes a
NaN
loss. After this the gradients drop to0
, hence rendering the training process useless.  Too small values: This leads to vanishing gradients  the gradient signals generated are too small to make any significant change to the paramter values and hence, the model does not learn anything.
Below is a list of initializations defined in berry
to help with proper
initializations based on the model architecture.
xavier 
Xavier weight initialization. 
deepnet 
This initialization gives good performance with deep nets, e.g.: VGG16. 
Helper function¶

braid.berry.initializations.
get_initialzation
(key)¶ Helper function to retrieve the appropriate initialization function.
 key : string
 Name of the type of initialization  “xavier”, “deepnet”, etc.
 function
 The appropriate function given the
key
.
>>> from berry.initializations import get_initialzation >>> func = get_initialzation("deepnet") >>> params = {'shape': [1500, 500], 'fan_out': 500} >>> stddev = func(**params)
Examples¶
You can either use the get_initialzation()
function or use
the initialization function directly,
>>> from berry.initializations import xavier
>>> params = {'shape': [1500, 500], 'fan_out': 500, 'fan_in': 1500}
>>> stddev = xavier(**params)
Initializations¶

braid.berry.initializations.
xavier
(shape=None, fan_in=1, fan_out=1, **kwargs)¶ Xavier weight initialization.
This is also known as Glorot initialization [1]_. Known to give good performance with sigmoid units.
 shape : tuple or list
 Shape of the weight tensor to sample.
 fan_in : int
 The number of units connected at the input of the current layer.
 fan_out : int
 The number of units connected at the output of the current layer.
 float
 Standard deviation of the normal distribution for weight initialization.
[1] Xavier Glorot and Yoshua Bengio (2010): Understanding the difficulty of training deep feedforward neural networks. International conference on artificial intelligence and statistics. The weights are initialized as
\[\begin{split}\sigma &= \sqrt{\frac{2}{fan_{in}+fan_{out}}}\\ W &\sim N(0, \sigma)\end{split}\]

braid.berry.initializations.
deepnet
(shape=None, fan_out=1, **kwargs)¶ This initialization gives good performance with deep nets, e.g.: VGG16.
This method [1]_ was mainly developed with Relu/PRelu activations in mind and is known to give good performance with very deep networks which use these activations.
 shape : tuple or list
 Shape of the weight tensor to sample.
 fan_out : int
 The number of units connected at the output of the current layer.
 float
 Standard deviation of the normal distribution for weight initialization.
[1] He, K., Zhang, X., Ren, S., and Sun, J. Delving Deep into Rectifiers: Surpassing HumanLevel Performance on ImageNet Classification. ArXiv eprints, February 2015. The weights are initialized as
\[\sigma = \sqrt{\frac{2}{fan_{out}}}\]