Unsloth

Hyper-optimized QLoRA finetuning for single GPUs

Unsloth provides hand-written optimized kernels for LLM finetuning that slightly improve speed and VRAM over standard industry baselines.

Important

Due to breaking changes in transformers v4.48.0, users will need to downgrade to <=v4.47.1 to use this patch.

This will later be deprecated in favor of LoRA Optimizations.

The following will install the correct unsloth and extras from source.

python scripts/unsloth_install.py | sh

Axolotl exposes a few configuration options to try out unsloth and get most of the performance gains.

Our unsloth integration is currently limited to the following model architectures: - llama

These options are specific to LoRA finetuning and cannot be used for multi-GPU finetuning

unsloth_lora_mlp: true
unsloth_lora_qkv: true
unsloth_lora_o: true

These options are composable and can be used with multi-gpu finetuning

unsloth_cross_entropy_loss: true
unsloth_rms_norm: true
unsloth_rope: true