Unsloth
Hyper-optimized QLoRA finetuning for single GPUs
Overview
Unsloth provides hand-written optimized kernels for LLM finetuning that slightly improve speed and VRAM over standard industry baselines.
Installation
The following will install unsloth from source and downgrade xformers as unsloth is incompatible with the most up to date libraries.
pip install --no-deps "unsloth @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps --force-reinstall xformers==0.0.26.post1
Using unsloth w Axolotl
Axolotl exposes a few configuration options to try out unsloth and get most of the performance gains.
Our unsloth integration is currently limited to the following model architectures: - llama
These options are specific to LoRA finetuning and cannot be used for multi-GPU finetuning
unsloth_lora_mlp: true
unsloth_lora_qkv: true
unsloth_lora_o: true
These options are composable and can be used with multi-gpu finetuning
unsloth_cross_entropy_loss: true
unsloth_rms_norm: true
unsloth_rope: true
Limitations
- Single GPU only; e.g. no multi-gpu support
- No deepspeed or FSDP support (requires multi-gpu)
- LoRA + QLoRA support only. No full fine tunes or fp8 support.
- Limited model architecture support. Llama, Phi, Gemma, Mistral only
- No MoE support.