utils.schedulers
utils.schedulers
Module for custom LRScheduler class
Classes
Name | Description |
---|---|
InterpolatingLogScheduler | A scheduler that interpolates learning rates in a logarithmic fashion |
RexLR | Reflected Exponential (REX) learning rate scheduler. |
InterpolatingLogScheduler
utils.schedulers.InterpolatingLogScheduler(self,
optimizer,
num_steps,
min_lr,
max_lr,=-1,
last_epoch )
A scheduler that interpolates learning rates in a logarithmic fashion
RexLR
utils.schedulers.RexLR(self,
optimizer,
max_lr,
min_lr,=0,
total_steps=0,
num_warmup_steps=0,
last_step )
Reflected Exponential (REX) learning rate scheduler.
- Original implementation: https://github.com/IvanVassi/REX_LR
- Original license: Apache 2.0
- Based on: https://arxiv.org/abs/2107.04197
Parameters
Name | Type | Description | Default |
---|---|---|---|
optimizer | torch.optim.Optimizer | The optimizer to schedule the learning rate for. | required |
max_lr | float | The maximum learning rate. | required |
min_lr | float | The minimum learning rate. | required |
total_steps | int | The total number of training steps. | 0 |
num_warmup_steps | int | The number of warmup steps. | 0 |
last_step | int | The index of last step. | 0 |
Functions
Name | Description |
---|---|
get_cosine_schedule_with_min_lr | |
get_cosine_schedule_with_quadratic_warmup | Create a schedule with a learning rate that decreases following the values of the cosine function between the |
get_cosine_schedule_with_warmup_decay_constant | Implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (https://arxiv.org/pdf/2308.04014.pdf) |
get_cosine_schedule_with_min_lr
utils.schedulers.get_cosine_schedule_with_min_lr(
optimizer,
num_warmup_steps,
num_training_steps,=0.0,
min_lr_ratio )
Create a learning rate schedule which has
- linear warmup from 0 ->
max_lr
overnum_warmup_steps
- cosine learning rate annealing from
max_lr
->min_lr
overnum_training_steps
get_cosine_schedule_with_quadratic_warmup
utils.schedulers.get_cosine_schedule_with_quadratic_warmup(
optimizer,
num_warmup_steps,
num_training_steps,=0.5,
num_cycles=-1,
last_epoch )
Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.
Parameters
Name | Type | Description | Default |
---|---|---|---|
optimizer | [~torch.optim.Optimizer ] |
The optimizer for which to schedule the learning rate. | required |
num_warmup_steps | int |
The number of steps for the warmup phase. | required |
num_training_steps | int |
The total number of training steps. | required |
num_cycles | float , optional, defaults to 0.5 |
The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine). | 0.5 |
last_epoch | int , optional, defaults to -1 |
The index of the last epoch when resuming training. | -1 |
Return
torch.optim.lr_scheduler.LambdaLR
with the appropriate schedule.
get_cosine_schedule_with_warmup_decay_constant
utils.schedulers.get_cosine_schedule_with_warmup_decay_constant(
optimizer,
num_warmup_steps,
num_training_steps,
constant_lr_ratio,
min_lr_ratio,=0.5,
num_cycles=-1,
last_epoch )
Implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (https://arxiv.org/pdf/2308.04014.pdf) Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to min_lr_ratio until num_training_steps * constant_lr_ratio, after constant_rate returns constant value of min_rate , after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.
Parameters
Name | Type | Description | Default |
---|---|---|---|
optimizer | [~torch.optim.Optimizer ] |
The optimizer for which to schedule the learning rate. | required |
num_warmup_steps | int |
The number of steps for the warmup phase. | required |
num_training_steps | int |
The total number of training steps. | required |
constant_lr_ratio | float | (float ): The ratio of num_training_steps to decrease by cosine function. |
required |
min_lr_ratio | float | (float): The ratio of maximum learning rate for cosine function to decay to minimum learning rate. | _required_ | | num_cycles | float, *optional*, defaults to 0.5 | The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine). | 0.5| | last_epoch | int, *optional*, defaults to -1 | The index of the last epoch when resuming training. | -1` |
Return
torch.optim.lr_scheduler.LambdaLR
with the appropriate schedule.