API Reference
Core
Core functionality for training
train | Prepare and train a model on a dataset. Can also infer from a model or merge lora |
evaluate | Module for evaluating models. |
datasets | Module containing Dataset functionality |
convert | Module containing File Reader, File Writer, Json Parser, and Jsonl Serializer classes |
prompt_tokenizers | Module containing PromptTokenizingStrategy and Prompter classes |
logging_config | Common logging module for axolotl |
core.trainer_builder | Builder for the training args and trainer |
core.training_args | extra axolotl specific training args |
core.chat.messages | internal message representations of chat messages |
core.chat.format.chatml | ChatML transformation functions for MessageContents |
core.chat.format.llama3x | Llama 3.x chat formatting functions for MessageContents |
core.chat.format.shared | shared functions for format transforms |
core.datasets.chat | chat dataset module |
core.datasets.transforms.chat_builder | This module contains a function that builds a transform that takes a row from the dataset and converts it to a Chat. |
CLI
Command-line interface
cli.main | Click CLI definitions for various axolotl commands. |
cli.train | CLI to run training on a model. |
cli.evaluate | CLI to run evaluation on a model. |
cli.args | Module for axolotl CLI command arguments. |
cli.checks | Various checks for Axolotl CLI. |
cli.config | Configuration loading and processing. |
cli.inference | CLI to run inference on a trained model. |
cli.merge_lora | CLI to merge a trained LoRA into a base model. |
cli.merge_sharded_fsdp_weights | CLI to merge sharded FSDP model checkpoints into a single combined checkpoint. |
cli.preprocess | CLI to run preprocessing of a dataset. |
cli.sweeps | Utilities for handling sweeps over configs for axolotl train CLI command |
cli.utils | Utility methods for axolotl CLI. |
cli.cloud.base | base class for cloud platforms from cli |
cli.cloud.modal_ | Modal Cloud support from CLI |
Trainers
Training implementations
core.trainers.base | Module for customized trainers |
core.trainers.trl | Module for TRL PPO trainer |
core.trainers.dpo.trainer | DPO trainer for axolotl |
core.trainers.grpo.trainer | Axolotl GRPO trainer |
Prompt Strategies
Prompt formatting strategies
prompt_strategies.base | module for base dataset transform strategies |
prompt_strategies.chat_template | HF Chat Templates prompt strategy |
prompt_strategies.alpaca_chat | Module for Alpaca prompt strategy classes |
prompt_strategies.alpaca_instruct | Module loading the AlpacaInstructPromptTokenizingStrategy class |
prompt_strategies.alpaca_w_system | Prompt strategies loader for alpaca instruction datasets with system prompts |
prompt_strategies.user_defined | User Defined prompts with configuration from the YML config |
prompt_strategies.llama2_chat | Prompt Strategy for finetuning Llama2 chat models |
prompt_strategies.completion | Basic completion text |
prompt_strategies.input_output | Module for plain input/output prompt pairs |
prompt_strategies.stepwise_supervised | Module for stepwise datasets, typically including a prompt and reasoning traces, |
prompt_strategies.metharme | Module containing the MetharmenPromptTokenizingStrategy and MetharmePrompter class |
prompt_strategies.orcamini | Prompt Strategy for finetuning Orca Mini (v2) models |
prompt_strategies.pygmalion | Module containing the PygmalionPromptTokenizingStrategy and PygmalionPrompter class |
prompt_strategies.messages.chat | Chat dataset wrapping strategy for new internal messages representations |
prompt_strategies.dpo.chat_template | DPO prompt strategies for using tokenizer chat templates. |
prompt_strategies.dpo.llama3 | DPO strategies for llama-3 chat template |
prompt_strategies.dpo.chatml | DPO strategies for chatml |
prompt_strategies.dpo.zephyr | DPO strategies for zephyr |
prompt_strategies.dpo.user_defined | User-defined DPO strategies |
prompt_strategies.dpo.passthrough | DPO prompt strategies passthrough/zero-processing strategy |
prompt_strategies.kto.llama3 | KTO strategies for llama-3 chat template |
prompt_strategies.kto.chatml | KTO strategies for chatml |
prompt_strategies.kto.user_defined | User-defined KTO strategies |
prompt_strategies.orpo.chat_template | chatml prompt tokenization strategy for ORPO |
prompt_strategies.bradley_terry.llama3 | chatml transforms for datasets with system, input, chosen, rejected to match llama3 chat template |
Kernels
Low-level performance optimizations
kernels.lora | Module for definition of Low-Rank Adaptation (LoRA) Triton kernels. |
kernels.geglu | Module for definition of GEGLU Triton kernels. |
kernels.swiglu | Module for definition of SwiGLU Triton kernels. |
kernels.quantize | Dequantization utilities for bitsandbytes integration. |
kernels.utils | Utilities for axolotl.kernels submodules. |
MonkeyPatches
Runtime patches for model optimizations
monkeypatch.llama_attn_hijack_flash | Flash attention monkey patch for llama model |
monkeypatch.llama_attn_hijack_xformers | Directly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_attn_hijack.py and made some adjustments |
monkeypatch.mistral_attn_hijack_flash | Flash attention monkey patch for mistral model |
monkeypatch.multipack | multipack patching for v2 of sample packing |
monkeypatch.relora | Implements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695, minus the initial full fine-tune. |
monkeypatch.llama_expand_mask | expands the binary attention mask per 3.2.2 of https://arxiv.org/pdf/2107.02027.pdf |
monkeypatch.lora_kernels | Module for patching custom LoRA Triton kernels and torch.autograd functions. |
monkeypatch.utils | Shared utils for the monkeypatches |
monkeypatch.btlm_attn_hijack_flash | Flash attention monkey patch for cerebras btlm model |
monkeypatch.llama_patch_multipack | Patched LlamaAttention to use torch.nn.functional.scaled_dot_product_attention |
monkeypatch.stablelm_attn_hijack_flash | PyTorch StableLM Epoch model. |
monkeypatch.trainer_fsdp_optim | fix for FSDP optimizer save in trainer w 4.47.0 |
monkeypatch.transformers_fa_utils | see https://github.com/huggingface/transformers/pull/35834 |
monkeypatch.unsloth_ | module for patching with unsloth optimizations |
monkeypatch.attention.mllama | Monkeypatch for Vision Llama for FA2 support |
monkeypatch.data.batch_dataset_fetcher | monkey patches for the dataset fetcher to handle batches of packed indexes |
monkeypatch.mixtral | Patches to support multipack for mixtral |
Utils
Utility functions
utils.models | Module for models and model loading |
utils.tokenization | Module for tokenization utilities |
utils.chat_templates | This module provides functionality for selecting chat templates based on user choices. |
utils.lora | module to get the state dict of a merged lora model |
utils.lora_embeddings | helpers for lora embeddings |
utils.model_shard_quant | module to handle loading model on cpu/meta device for FSDP |
utils.bench | Benchmarking and measurement utilities |
utils.freeze | module to freeze/unfreeze parameters by name |
utils.trainer | Module containing the Trainer class and related functions |
utils.schedulers | Module for custom LRScheduler class |
utils.distributed | utility helpers for distributed checks |
utils.dict | Module containing the DictDefault class |
utils.optimizers.adopt | Copied from https://github.com/iShohei220/adopt |
utils.data.pretraining | data handling specific to pretraining |
utils.data.sft | data handling specific to SFT |
utils.gradient_checkpointing.unsloth | Unsloth checkpointing |
Schemas
Pydantic data models for Axolotl config
utils.schemas.config | Module with Pydantic models for configuration. |
utils.schemas.model | Pydantic models for model input / output, etc. configuration |
utils.schemas.training | Pydantic models for training hyperparameters |
utils.schemas.datasets | Pydantic models for datasets-related configuration |
utils.schemas.peft | Pydantic models for PEFT-related configuration |
utils.schemas.trl | Pydantic models for TRL trainer configuration |
utils.schemas.multimodal | Pydantic models for multimodal-related configuration |
utils.schemas.integrations | Pydantic models for Axolotl integrations |
utils.schemas.enums | Enums for Axolotl input config |
utils.schemas.utils | Utilities for Axolotl Pydantic models |
Integrations
Third-party integrations and extensions
integrations.base | Base class for all plugins. |
integrations.cut_cross_entropy.args | Module for handling Cut Cross Entropy input arguments. |
integrations.grokfast.optimizer | |
integrations.kd.trainer | KD trainer |
integrations.liger.args | Module for handling LIGER input arguments. |
integrations.lm_eval.args | Module for handling lm eval harness input arguments. |
integrations.spectrum.args | Module for handling Spectrum input arguments. |
Common
Common utilities and shared functionality
common.architectures | Common architecture specific constants |
common.const | Various shared constants |
common.datasets | Dataset loading utilities. |
Models
Custom model implementations
models.mamba.modeling_mamba |
Data Processing
Data processing utilities
utils.collators.core | basic shared collator constants |
utils.collators.batching | Data collators for axolotl to pad labels and position_ids for packed sequences. Also |
utils.collators.mamba | collators for Mamba |
utils.collators.mm_chat | Collators for multi-modal chat messages and packing |
utils.samplers.multipack | Multipack Batch Sampler |
Callbacks
Training callbacks
utils.callbacks.perplexity | callback to calculate perplexity as an evaluation metric. |
utils.callbacks.profiler | HF Trainer callback for creating pytorch profiling snapshots |
utils.callbacks.lisa | module for LISA |
utils.callbacks.mlflow_ | MLFlow module for trainer callbacks |
utils.callbacks.comet_ | Comet module for trainer callbacks |