Dataset Formats
Supported dataset formats.
Axolotl supports a variety of dataset formats. It is recommended to use a JSONL format. The schema of the JSONL depends upon the task and the prompt template you wish to use. Instead of a JSONL, you can also use a HuggingFace dataset with columns for each JSONL field.
Below are these various formats organized by task:
Title | Description |
---|---|
Pre-training | Data format for a pre-training completion task. |
Instruction Tuning | Instruction tuning formats for supervised fine-tuning. |
Conversation | Conversation format for supervised fine-tuning. |
Template-Free | Construct prompts without a template. |
Custom Pre-Tokenized Dataset | How to use a custom pre-tokenized dataset. |
No matching items