Unsloth
Hyper-optimized QLoRA finetuning for single GPUs
Overview
Unsloth provides hand-written optimized kernels for LLM finetuning that slightly improve speed and VRAM over standard industry baselines.
Important
Due to breaking changes in transformers v4.48.0
, users will need to downgrade to <=v4.47.1
to use this patch.
This will later be deprecated in favor of LoRA Optimizations.
Installation
The following will install the correct unsloth and extras from source.
python scripts/unsloth_install.py | sh
Usage
Axolotl exposes a few configuration options to try out unsloth and get most of the performance gains.
Our unsloth integration is currently limited to the following model architectures: - llama
These options are specific to LoRA finetuning and cannot be used for multi-GPU finetuning
unsloth_lora_mlp: true
unsloth_lora_qkv: true
unsloth_lora_o: true
These options are composable and can be used with multi-gpu finetuning
unsloth_cross_entropy_loss: true
unsloth_rms_norm: true
unsloth_rope: true
Limitations
- Single GPU only; e.g. no multi-gpu support
- No deepspeed or FSDP support (requires multi-gpu)
- LoRA + QLoRA support only. No full fine tunes or fp8 support.
- Limited model architecture support. Llama, Phi, Gemma, Mistral only
- No MoE support.