utils.schedulers

utils.schedulers

Module for custom LRScheduler class

Classes

Name Description
InterpolatingLogScheduler A scheduler that interpolates learning rates in a logarithmic fashion
RexLR Reflected Exponential (REX) learning rate scheduler.

InterpolatingLogScheduler

utils.schedulers.InterpolatingLogScheduler(
    self,
    optimizer,
    num_steps,
    min_lr,
    max_lr,
    last_epoch=-1,
)

A scheduler that interpolates learning rates in a logarithmic fashion

RexLR

utils.schedulers.RexLR(
    self,
    optimizer,
    max_lr,
    min_lr,
    total_steps=0,
    num_warmup_steps=0,
    last_step=0,
)

Reflected Exponential (REX) learning rate scheduler.

  • Original implementation: https://github.com/IvanVassi/REX_LR
  • Original license: Apache 2.0
  • Based on: https://arxiv.org/abs/2107.04197

Parameters

Name Type Description Default
optimizer torch.optim.Optimizer The optimizer to schedule the learning rate for. required
max_lr float The maximum learning rate. required
min_lr float The minimum learning rate. required
total_steps int The total number of training steps. 0
num_warmup_steps int The number of warmup steps. 0
last_step int The index of last step. 0

Functions

Name Description
get_cosine_schedule_with_min_lr
get_cosine_schedule_with_quadratic_warmup Create a schedule with a learning rate that decreases following the values of the cosine function between the
get_cosine_schedule_with_warmup_decay_constant Implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (https://arxiv.org/pdf/2308.04014.pdf)

get_cosine_schedule_with_min_lr

utils.schedulers.get_cosine_schedule_with_min_lr(
    optimizer,
    num_warmup_steps,
    num_training_steps,
    min_lr_ratio=0.0,
)

Create a learning rate schedule which has

  • linear warmup from 0 -> max_lr over num_warmup_steps
  • cosine learning rate annealing from max_lr -> min_lr over num_training_steps

get_cosine_schedule_with_quadratic_warmup

utils.schedulers.get_cosine_schedule_with_quadratic_warmup(
    optimizer,
    num_warmup_steps,
    num_training_steps,
    num_cycles=0.5,
    last_epoch=-1,
)

Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

Parameters

Name Type Description Default
optimizer [~torch.optim.Optimizer] The optimizer for which to schedule the learning rate. required
num_warmup_steps int The number of steps for the warmup phase. required
num_training_steps int The total number of training steps. required
num_cycles float, optional, defaults to 0.5 The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine). 0.5
last_epoch int, optional, defaults to -1 The index of the last epoch when resuming training. -1

Return

torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.

get_cosine_schedule_with_warmup_decay_constant

utils.schedulers.get_cosine_schedule_with_warmup_decay_constant(
    optimizer,
    num_warmup_steps,
    num_training_steps,
    constant_lr_ratio,
    min_lr_ratio,
    num_cycles=0.5,
    last_epoch=-1,
)

Implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (https://arxiv.org/pdf/2308.04014.pdf) Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to min_lr_ratio until num_training_steps * constant_lr_ratio, after constant_rate returns constant value of min_rate , after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

Parameters

Name Type Description Default
optimizer [~torch.optim.Optimizer] The optimizer for which to schedule the learning rate. required
num_warmup_steps int The number of steps for the warmup phase. required
num_training_steps int The total number of training steps. required
constant_lr_ratio float (float): The ratio of num_training_steps to decrease by cosine function. required
min_lr_ratio float (float): The ratio of maximum learning rate for cosine function to decay to minimum learning rate. | _required_ | | num_cycles |float, *optional*, defaults to 0.5 | The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine). |0.5| | last_epoch |int, *optional*, defaults to -1 | The index of the last epoch when resuming training. |-1`

Return

torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.