Learning Rate Groups

Setting different learning rates by module name

Background

Inspired by LoRA+, Axolotl allows practitioners to specify separate learning rates for each module or groups of modules in a model.

Example

lr_groups:
  - name: o_proj
    modules:
      - self_attn.o_proj.weight
    lr: 1e-6
  - name: q_proj
    modules:
      - model.layers.2.self_attn.q_proj.weight
    lr: 1e-5

learning_rate: 2e-5

In this example, we have a default learning rate of 2e-5 across the entire model, but we have a separate learning rate of 1e-6 for all the self attention o_proj modules across all layers, and a learning are of 1e-5 to the 3rd layer’s self attention q_proj module.