berry.initializations
¶
Methods for weight initialization.
In order to train a deep network, we initialize the weights and biases in all the layers with random small values. It is very important for the values to lie in a certain range since this mainly commands whether the model will train effectively or not. There are mainly two problems which arise with poor parameter initialization:
- Too large values: This leads to exploding gradients and causes a
NaN
loss. After this the gradients drop to0
, hence rendering the training process useless. - Too small values: This leads to vanishing gradients - the gradient signals generated are too small to make any significant change to the paramter values and hence, the model does not learn anything.
Below is a list of initializations defined in berry
to help with proper
initializations based on the model architecture.
xavier |
Xavier weight initialization. |
deepnet |
This initialization gives good performance with deep nets, e.g.: VGG-16. |
Helper function¶
-
braid.berry.initializations.
get_initialzation
(key)¶ Helper function to retrieve the appropriate initialization function.
- key : string
- Name of the type of initialization - “xavier”, “deepnet”, etc.
- function
- The appropriate function given the
key
.
>>> from berry.initializations import get_initialzation >>> func = get_initialzation("deepnet") >>> params = {'shape': [1500, 500], 'fan_out': 500} >>> stddev = func(**params)
Examples¶
You can either use the get_initialzation()
function or use
the initialization function directly,
>>> from berry.initializations import xavier
>>> params = {'shape': [1500, 500], 'fan_out': 500, 'fan_in': 1500}
>>> stddev = xavier(**params)
Initializations¶
-
braid.berry.initializations.
xavier
(shape=None, fan_in=1, fan_out=1, **kwargs)¶ Xavier weight initialization.
This is also known as Glorot initialization [1]_. Known to give good performance with sigmoid units.
- shape : tuple or list
- Shape of the weight tensor to sample.
- fan_in : int
- The number of units connected at the input of the current layer.
- fan_out : int
- The number of units connected at the output of the current layer.
- float
- Standard deviation of the normal distribution for weight initialization.
[1] Xavier Glorot and Yoshua Bengio (2010): Understanding the difficulty of training deep feedforward neural networks. International conference on artificial intelligence and statistics. The weights are initialized as
\[\begin{split}\sigma &= \sqrt{\frac{2}{fan_{in}+fan_{out}}}\\ W &\sim N(0, \sigma)\end{split}\]
-
braid.berry.initializations.
deepnet
(shape=None, fan_out=1, **kwargs)¶ This initialization gives good performance with deep nets, e.g.: VGG-16.
This method [1]_ was mainly developed with Relu/PRelu activations in mind and is known to give good performance with very deep networks which use these activations.
- shape : tuple or list
- Shape of the weight tensor to sample.
- fan_out : int
- The number of units connected at the output of the current layer.
- float
- Standard deviation of the normal distribution for weight initialization.
[1] He, K., Zhang, X., Ren, S., and Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. ArXiv e-prints, February 2015. The weights are initialized as
\[\sigma = \sqrt{\frac{2}{fan_{out}}}\]