Hyperparameters

Here we will tune the hyperparameters i.e. Epochs, Batch_size, etc. A hyperparameter is a parameter whose value is used to control the learning process.

Overview:

Here in the next tab, we will tune the Hyperparameters to do training using the multiple CPU's and GPU's in Deep Learning Studio(DLS).

  • Optimizing value for hyperparameter helps to maximize the model's predictive accuracy.

DLS provides Hyperparameters features like :

1) Number of epochs :

  • An epoch indicates the number of passes to the entire training dataset.

  • The number of epochs varies, often in the hundreds or thousands depending on your data and the goal of the model.

  • Epochs allow the learning algorithm to run until the error from the model has been sufficiently minimized.

2) Batch_size :

  • Datasets are usually grouped into batches (especially when the amount of data is very large).

  • In the DLS, you can load a dataset in memory, one batch at a time or full dataset at the same time.

3) Loss_function :

  • The loss (error) function to be optimized by the model.

  • Click on the drop-down button of the loss_function, there you will find different loss functions to train a neural network which evaluates the model how much prediction is deviating from actual results.

4) Optimizer :

  • Optimizers are responsible for reducing the losses and to provide the most accurate results as possible.

  • DLS provides many optimizer options in the software.

5) Lr :

  • LR(Learning Rate) hyperparameter can significantly affect the time taken for Model training.

  • Cycles in the number epochs are how many times we restart LR in every epoch.

  • Another variation is making next cycle longer than the previous one by some constant factor.

(DLS suggests: Float >= 0. initial learning rate, defaults to 1. It is recommended to leave it at the default value.)

6) Rho :

  • Rho is a hyper-parameter which attenuates the influence of past gradient.

  • It is similar to momentum and relates to the memory of prior weight updates. Typical values are between 0.9 and 0.999. This parameter is only active if the adaptive learning rate is enabled.

(DLS suggest: Float >= 0. adadelta decay factor, corresponding to the fraction of gradient to keep at each time step.)

7) Epsilon :

  • It is similar to learning rate annealing during initial training and momentum at later stages where it allows forward progress.This parameter is only active if adaptive learning rate is enabled.

  • Typical values are between 1e-10 and 1e-4.

(DLS suggest : Float >= 0. fuzz factor. If `none`, defaults to `k.epsilon()`.)

8) Decay :

  • The learning rate decay parameter controls the change of learning rate across layers.

  • This parameter is only active if adaptive learning rate is disabled.

(DLS suggest: Float >= 0. initial learning rate decay.)

Note: You can train your model to change some parameters as per your requirement and keep the rest of the parameters as default.

Last updated