prompt_tokenizers

prompt_tokenizers

Module containing PromptTokenizingStrategy and Prompter classes

Classes

Name Description
AlpacaMultipleChoicePromptTokenizingStrategy Tokenizing strategy for Alpaca Multiple Choice prompts.
AlpacaPromptTokenizingStrategy Tokenizing strategy for Alpaca prompts.
AlpacaReflectionPTStrategy Tokenizing strategy for Alpaca Reflection prompts.
DatasetWrappingStrategy Abstract class for wrapping datasets for Chat Messages
GPTeacherPromptTokenizingStrategy Tokenizing strategy for GPTeacher prompts.
InstructionPromptTokenizingStrategy Tokenizing strategy for instruction-based prompts.
InvalidDataException Exception raised when the data is invalid
JeopardyPromptTokenizingStrategy Tokenizing strategy for Jeopardy prompts.
NomicGPT4AllPromptTokenizingStrategy Tokenizing strategy for NomicGPT4All prompts.
OpenAssistantPromptTokenizingStrategy Tokenizing strategy for OpenAssistant prompts.
PromptTokenizingStrategy Abstract class for tokenizing strategies
ReflectionPromptTokenizingStrategy Tokenizing strategy for Reflection prompts.
SummarizeTLDRPromptTokenizingStrategy Tokenizing strategy for SummarizeTLDR prompts.

AlpacaMultipleChoicePromptTokenizingStrategy

prompt_tokenizers.AlpacaMultipleChoicePromptTokenizingStrategy(
    self,
    prompter,
    tokenizer,
    train_on_inputs=False,
    sequence_len=2048,
)

Tokenizing strategy for Alpaca Multiple Choice prompts.

AlpacaPromptTokenizingStrategy

prompt_tokenizers.AlpacaPromptTokenizingStrategy(
    self,
    prompter,
    tokenizer,
    train_on_inputs=False,
    sequence_len=2048,
)

Tokenizing strategy for Alpaca prompts.

AlpacaReflectionPTStrategy

prompt_tokenizers.AlpacaReflectionPTStrategy(
    self,
    prompter,
    tokenizer,
    train_on_inputs=False,
    sequence_len=2048,
)

Tokenizing strategy for Alpaca Reflection prompts.

DatasetWrappingStrategy

prompt_tokenizers.DatasetWrappingStrategy()

Abstract class for wrapping datasets for Chat Messages

GPTeacherPromptTokenizingStrategy

prompt_tokenizers.GPTeacherPromptTokenizingStrategy(
    self,
    prompter,
    tokenizer,
    train_on_inputs=False,
    sequence_len=2048,
)

Tokenizing strategy for GPTeacher prompts.

InstructionPromptTokenizingStrategy

prompt_tokenizers.InstructionPromptTokenizingStrategy(
    self,
    prompter,
    tokenizer,
    train_on_inputs=False,
    sequence_len=2048,
)

Tokenizing strategy for instruction-based prompts.

InvalidDataException

prompt_tokenizers.InvalidDataException()

Exception raised when the data is invalid

JeopardyPromptTokenizingStrategy

prompt_tokenizers.JeopardyPromptTokenizingStrategy(
    self,
    prompter,
    tokenizer,
    train_on_inputs=False,
    sequence_len=2048,
)

Tokenizing strategy for Jeopardy prompts.

NomicGPT4AllPromptTokenizingStrategy

prompt_tokenizers.NomicGPT4AllPromptTokenizingStrategy(
    self,
    prompter,
    tokenizer,
    train_on_inputs=False,
    sequence_len=2048,
)

Tokenizing strategy for NomicGPT4All prompts.

OpenAssistantPromptTokenizingStrategy

prompt_tokenizers.OpenAssistantPromptTokenizingStrategy(
    self,
    prompter,
    tokenizer,
    train_on_inputs=False,
    sequence_len=2048,
)

Tokenizing strategy for OpenAssistant prompts.

PromptTokenizingStrategy

prompt_tokenizers.PromptTokenizingStrategy(
    self,
    prompter,
    tokenizer,
    train_on_inputs=False,
    sequence_len=2048,
)

Abstract class for tokenizing strategies

ReflectionPromptTokenizingStrategy

prompt_tokenizers.ReflectionPromptTokenizingStrategy(
    self,
    prompter,
    tokenizer,
    train_on_inputs=False,
    sequence_len=2048,
)

Tokenizing strategy for Reflection prompts.

SummarizeTLDRPromptTokenizingStrategy

prompt_tokenizers.SummarizeTLDRPromptTokenizingStrategy(
    self,
    prompter,
    tokenizer,
    train_on_inputs=False,
    sequence_len=2048,
)

Tokenizing strategy for SummarizeTLDR prompts.

Functions

Name Description
parse_tokenized_to_result Parses the tokenized prompt and append the tokenized input_ids, attention_mask and labels to the result
tokenize_prompt_default Returns the default values for the tokenize prompt function

parse_tokenized_to_result

prompt_tokenizers.parse_tokenized_to_result(
    result,
    current_len,
    res,
    labels,
    pad_token_id=None,
)

Parses the tokenized prompt and append the tokenized input_ids, attention_mask and labels to the result

tokenize_prompt_default

prompt_tokenizers.tokenize_prompt_default()

Returns the default values for the tokenize prompt function