API Reference

Core

Core functionality for training

train Prepare and train a model on a dataset. Can also infer from a model or merge lora
evaluate Module for evaluating models.
datasets Module containing Dataset functionality
convert Module containing File Reader, File Writer, Json Parser, and Jsonl Serializer classes
prompt_tokenizers Module containing PromptTokenizingStrategy and Prompter classes
logging_config Common logging module for axolotl
core.trainer_builder Builder for the training args and trainer
core.training_args extra axolotl specific training args
core.chat.messages internal message representations of chat messages
core.chat.format.chatml ChatML transformation functions for MessageContents
core.chat.format.llama3x Llama 3.x chat formatting functions for MessageContents
core.chat.format.shared shared functions for format transforms
core.datasets.chat chat dataset module
core.datasets.transforms.chat_builder This module contains a function that builds a transform that takes a row from the dataset and converts it to a Chat.

CLI

Command-line interface

cli.main Click CLI definitions for various axolotl commands.
cli.train CLI to run training on a model.
cli.evaluate CLI to run evaluation on a model.
cli.args Module for axolotl CLI command arguments.
cli.checks Various checks for Axolotl CLI.
cli.config Configuration loading and processing.
cli.inference CLI to run inference on a trained model.
cli.merge_lora CLI to merge a trained LoRA into a base model.
cli.merge_sharded_fsdp_weights CLI to merge sharded FSDP model checkpoints into a single combined checkpoint.
cli.preprocess CLI to run preprocessing of a dataset.
cli.sweeps Utilities for handling sweeps over configs for axolotl train CLI command
cli.utils Utility methods for axolotl CLI.
cli.cloud.base base class for cloud platforms from cli
cli.cloud.modal_ Modal Cloud support from CLI

Trainers

Training implementations

core.trainers.base Module for customized trainers
core.trainers.trl Module for TRL PPO trainer
core.trainers.dpo.trainer DPO trainer for axolotl
core.trainers.grpo.trainer Axolotl GRPO trainer

Prompt Strategies

Prompt formatting strategies

prompt_strategies.base module for base dataset transform strategies
prompt_strategies.chat_template HF Chat Templates prompt strategy
prompt_strategies.alpaca_chat Module for Alpaca prompt strategy classes
prompt_strategies.alpaca_instruct Module loading the AlpacaInstructPromptTokenizingStrategy class
prompt_strategies.alpaca_w_system Prompt strategies loader for alpaca instruction datasets with system prompts
prompt_strategies.user_defined User Defined prompts with configuration from the YML config
prompt_strategies.llama2_chat Prompt Strategy for finetuning Llama2 chat models
prompt_strategies.completion Basic completion text
prompt_strategies.input_output Module for plain input/output prompt pairs
prompt_strategies.stepwise_supervised Module for stepwise datasets, typically including a prompt and reasoning traces,
prompt_strategies.metharme Module containing the MetharmenPromptTokenizingStrategy and MetharmePrompter class
prompt_strategies.orcamini Prompt Strategy for finetuning Orca Mini (v2) models
prompt_strategies.pygmalion Module containing the PygmalionPromptTokenizingStrategy and PygmalionPrompter class
prompt_strategies.messages.chat Chat dataset wrapping strategy for new internal messages representations
prompt_strategies.dpo.chat_template DPO prompt strategies for using tokenizer chat templates.
prompt_strategies.dpo.llama3 DPO strategies for llama-3 chat template
prompt_strategies.dpo.chatml DPO strategies for chatml
prompt_strategies.dpo.zephyr DPO strategies for zephyr
prompt_strategies.dpo.user_defined User-defined DPO strategies
prompt_strategies.dpo.passthrough DPO prompt strategies passthrough/zero-processing strategy
prompt_strategies.kto.llama3 KTO strategies for llama-3 chat template
prompt_strategies.kto.chatml KTO strategies for chatml
prompt_strategies.kto.user_defined User-defined KTO strategies
prompt_strategies.orpo.chat_template chatml prompt tokenization strategy for ORPO
prompt_strategies.bradley_terry.llama3 chatml transforms for datasets with system, input, chosen, rejected to match llama3 chat template

Kernels

Low-level performance optimizations

kernels.lora Module for definition of Low-Rank Adaptation (LoRA) Triton kernels.
kernels.geglu Module for definition of GEGLU Triton kernels.
kernels.swiglu Module for definition of SwiGLU Triton kernels.
kernels.quantize Dequantization utilities for bitsandbytes integration.
kernels.utils Utilities for axolotl.kernels submodules.

MonkeyPatches

Runtime patches for model optimizations

monkeypatch.llama_attn_hijack_flash Flash attention monkey patch for llama model
monkeypatch.llama_attn_hijack_xformers Directly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_attn_hijack.py and made some adjustments
monkeypatch.mistral_attn_hijack_flash Flash attention monkey patch for mistral model
monkeypatch.multipack multipack patching for v2 of sample packing
monkeypatch.relora Implements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695, minus the initial full fine-tune.
monkeypatch.llama_expand_mask expands the binary attention mask per 3.2.2 of https://arxiv.org/pdf/2107.02027.pdf
monkeypatch.lora_kernels Module for patching custom LoRA Triton kernels and torch.autograd functions.
monkeypatch.utils Shared utils for the monkeypatches
monkeypatch.btlm_attn_hijack_flash Flash attention monkey patch for cerebras btlm model
monkeypatch.llama_patch_multipack Patched LlamaAttention to use torch.nn.functional.scaled_dot_product_attention
monkeypatch.stablelm_attn_hijack_flash PyTorch StableLM Epoch model.
monkeypatch.trainer_fsdp_optim fix for FSDP optimizer save in trainer w 4.47.0
monkeypatch.transformers_fa_utils see https://github.com/huggingface/transformers/pull/35834
monkeypatch.unsloth_ module for patching with unsloth optimizations
monkeypatch.attention.mllama Monkeypatch for Vision Llama for FA2 support
monkeypatch.data.batch_dataset_fetcher monkey patches for the dataset fetcher to handle batches of packed indexes
monkeypatch.mixtral Patches to support multipack for mixtral

Utils

Utility functions

utils.models Module for models and model loading
utils.tokenization Module for tokenization utilities
utils.chat_templates This module provides functionality for selecting chat templates based on user choices.
utils.lora module to get the state dict of a merged lora model
utils.lora_embeddings helpers for lora embeddings
utils.model_shard_quant module to handle loading model on cpu/meta device for FSDP
utils.bench Benchmarking and measurement utilities
utils.freeze module to freeze/unfreeze parameters by name
utils.trainer Module containing the Trainer class and related functions
utils.schedulers Module for custom LRScheduler class
utils.distributed utility helpers for distributed checks
utils.dict Module containing the DictDefault class
utils.optimizers.adopt Copied from https://github.com/iShohei220/adopt
utils.data.pretraining data handling specific to pretraining
utils.data.sft data handling specific to SFT
utils.gradient_checkpointing.unsloth Unsloth checkpointing

Schemas

Pydantic data models for Axolotl config

utils.schemas.config Module with Pydantic models for configuration.
utils.schemas.model Pydantic models for model input / output, etc. configuration
utils.schemas.training Pydantic models for training hyperparameters
utils.schemas.datasets Pydantic models for datasets-related configuration
utils.schemas.peft Pydantic models for PEFT-related configuration
utils.schemas.trl Pydantic models for TRL trainer configuration
utils.schemas.multimodal Pydantic models for multimodal-related configuration
utils.schemas.integrations Pydantic models for Axolotl integrations
utils.schemas.enums Enums for Axolotl input config
utils.schemas.utils Utilities for Axolotl Pydantic models

Integrations

Third-party integrations and extensions

integrations.base Base class for all plugins.
integrations.cut_cross_entropy.args Module for handling Cut Cross Entropy input arguments.
integrations.grokfast.optimizer
integrations.kd.trainer KD trainer
integrations.liger.args Module for handling LIGER input arguments.
integrations.lm_eval.args Module for handling lm eval harness input arguments.
integrations.spectrum.args Module for handling Spectrum input arguments.

Common

Common utilities and shared functionality

common.architectures Common architecture specific constants
common.const Various shared constants
common.datasets Dataset loading utilities.

Models

Custom model implementations

models.mamba.modeling_mamba

Data Processing

Data processing utilities

utils.collators.core basic shared collator constants
utils.collators.batching Data collators for axolotl to pad labels and position_ids for packed sequences. Also
utils.collators.mamba collators for Mamba
utils.collators.mm_chat Collators for multi-modal chat messages and packing
utils.samplers.multipack Multipack Batch Sampler

Callbacks

Training callbacks

utils.callbacks.perplexity callback to calculate perplexity as an evaluation metric.
utils.callbacks.profiler HF Trainer callback for creating pytorch profiling snapshots
utils.callbacks.lisa module for LISA
utils.callbacks.mlflow_ MLFlow module for trainer callbacks
utils.callbacks.comet_ Comet module for trainer callbacks