common.datasets

common.datasets

Dataset loading utilities.

Classes

Name Description
TrainDatasetMeta Dataclass with fields for training and validation datasets and metadata.

TrainDatasetMeta

common.datasets.TrainDatasetMeta(
    self,
    train_dataset,
    eval_dataset=None,
    total_num_steps=None,
)

Dataclass with fields for training and validation datasets and metadata.

Functions

Name Description
load_datasets Loads one or more training or evaluation datasets, calling
load_preference_datasets Loads one or more training or evaluation datasets for RL training using paired
sample_dataset Randomly sample num_samples samples from dataset.

load_datasets

common.datasets.load_datasets(cfg, cli_args)

Loads one or more training or evaluation datasets, calling axolotl.utils.data.prepare_dataset. Optionally, logs out debug information.

Parameters

Name Type Description Default
cfg DictDefault Dictionary mapping axolotl config keys to values. required
cli_args Union[PreprocessCliArgs, TrainerCliArgs] Command-specific CLI arguments. required

Returns

Name Type Description
TrainDatasetMeta Dataclass with fields for training and evaluation datasets and the computed
TrainDatasetMeta total_num_steps.

load_preference_datasets

common.datasets.load_preference_datasets(cfg, cli_args)

Loads one or more training or evaluation datasets for RL training using paired preference data, calling axolotl.utils.data.rl.load_prepare_preference_datasets. Optionally, logs out debug information.

Parameters

Name Type Description Default
cfg DictDefault Dictionary mapping axolotl config keys to values. required
cli_args Union[PreprocessCliArgs, TrainerCliArgs] Command-specific CLI arguments. required

Returns

Name Type Description
TrainDatasetMeta Dataclass with fields for training and evaluation datasets and the computed
TrainDatasetMeta total_num_steps.

sample_dataset

common.datasets.sample_dataset(dataset, num_samples)

Randomly sample num_samples samples from dataset.

Parameters

Name Type Description Default
dataset Dataset Dataset. required
num_samples int Number of samples to return. required

Returns

Name Type Description
Dataset Random sample (with replacement) of examples in dataset.