Applies optimized Triton kernel patches to a PEFT model.
Patches a PEFT model with optimized implementations for MLP and attention
computations. The optimizations include custom Triton kernels for activation
functions and specialized autograd functions for LoRA computations.
Parameters
Name
Type
Description
Default
model
PeftModelForCausalLM
A PEFT model to be patched with optimized kernels.
required
cfg
DictDefault
Dictionary mapping axolotl config keys to values.
required
Returns
Name
Type
Description
PeftModelForCausalLM
PeftModelForCausalLM
The patched model with optimized kernels.
Raises
Name
Type
Description
TypeError
If the provided model is not a PeftModelForCausalLM.
NotImplementedError
If the model type is not supported.
AssertionError
If multiple adapters are active (currently unsupported).
Note
The optimizations require LoRA adapters with no dropout and no bias terms. The
function will skip patching if these conditions aren’t met.
Get the appropriate attention class by inspecting the model config.
Uses dynamic import to support any model architecture that follows
the standard transformers naming convention.
Parameters
Name
Type
Description
Default
cfg
DictDefault
Dictionary mapping axolotl config keys to values.
required
Returns
Name
Type
Description
Type[nn.Module]
The appropriate attention class for the model.
Raises
Name
Type
Description
ValueError
If base_model not specified or attention class cannot be imported
ImportError
If the model module or attention class doesn’t exist