Instruction Tuning
alpaca
instruction; input(optional)
data.jsonl
{"instruction": "...", "input": "...", "output": "..."}
jeopardy
question and answer
data.jsonl
{"question": "...", "category": "...", "answer": "..."}
oasst
instruction
data.jsonl
{"INSTRUCTION": "...", "RESPONSE": "..."}
gpteacher
instruction; input(optional)
data.jsonl
{"instruction": "...", "input": "...", "response": "..."}
reflection
instruction with reflect; input(optional)
data.jsonl
{"instruction": "...", "input": "...", "output": "...", "reflection": "...", "corrected": "..."}
explainchoice
question, choices, (solution OR explanation)
data.jsonl
{"question": "...", "choices": ["..."], "solution": "...", "explanation": "..."}
concisechoice
question, choices, (solution OR explanation)
data.jsonl
{"question": "...", "choices": ["..."], "solution": "...", "explanation": "..."}
summarizetldr
article and summary
data.jsonl
{"article": "...", "summary": "..."}
alpaca_chat
basic instruct for alpaca chat
data.jsonl
{"instruction": "...", "input": "...", "response": "..."}
alpaca_chat.load_qa
question and answer for alpaca chat
data.jsonl
{"question": "...", "answer": "..."}
alpaca_chat.load_concise
question and answer for alpaca chat, for concise answers
data.jsonl
{"instruction": "...", "input": "...", "response": "..."}
alpaca_chat.load_camel_ai
question and answer for alpaca chat, for load_camel_ai
data.jsonl
{"message_1": "...", "message_2": "..."}
alpaca_w_system.load_open_orca
support for open orca datasets with included system prompts, instruct
data.jsonl
{"system_prompt": "...", "question": "...", "response": "..."}
context_qa
in context question answering from an article
data.jsonl
{"article": "...", "question": "...", "answer": "..."}
context_qa.load_v2
in context question answering (alternate)
data.jsonl
{"context": "...", "question": "...", "answer": "..."}
context_qa.load_404
in context question answering from an article, with default response for no answer from context
data.jsonl
{"article": "...", "unanswerable_question": "..."}
creative_acr.load_answer
instruction and revision
data.jsonl
{"instruction": "...", "revision": "..."}
creative_acr.load_critique
critique
data.jsonl
{"scores": "...", "critiques": "...", "instruction": "...", "answer": "..."}
creative_acr.load_revise
critique and revise
data.jsonl
{"scores": "...", "critiques": "...", "instruction": "...", "answer": "...", "revision": "..."}
metharme
instruction, adds additional eos tokens
data.jsonl
{"prompt": "...", "generation": "..."}
How to add custom prompt format
For a dataset that is preprocessed for instruction purposes:
data.jsonl
{"input": "...", "output": "..."}
You can use this example in your YAML config:
config.yaml
datasets:
- path: repo
type:
system_prompt: ""
field_system: system
field_instruction: input
field_output: output
format: "[INST] {instruction} [/INST]"
no_input_format: "[INST] {instruction} [/INST]"
See full config options under here.