FAQ
General
Q: The trainer stopped and hasn’t progressed in several minutes.
A: Usually an issue with the GPUs communicating with each other. See the NCCL doc
Q: Exitcode -9
A: This usually happens when you run out of system RAM.
Q: Exitcode -7 while using deepspeed
A: Try upgrading deepspeed w:
pip install -U deepspeed
Q: AttributeError: ‘DummyOptim’ object has no attribute ‘step’
Q: ModuleNotFoundError: No module named ‘mpi4py’ using single GPU with deepspeed
A: You may be using deepspeed with single gpu. Please remove the
deepspeed:
section in the yaml file or--deepspeed
CLI flag.
Q: The codes is stuck on saving preprocessed datasets.
A: This is usually an issue with the GPU. This can be resolved through setting the os environment variable
CUDA_VISIBLE_DEVICES=0
. If you are on runpod, this is usually a pod issue. Starting a new pod should take care of it.
Chat templates
Q: jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content' / 'role' / ____
A: This means that the property mapping for the stated attribute does not exist when building
chat_template
prompt. For example, ifno attribute 'content'
, please check you have added the correct mapping forcontent
undermessage_property_mappings
.
Q: Empty template generated for turn ___
A: The
content
is empty for that turn.
Q: Could not find content start/end boundary for turn __
A: The specific turn’s start/end could not be detected. Please ensure you have set the
eos_token
following yourchat_template
. Otherwise, this could be achat_template
which doesn’t use proper boundaries for each turn (like system). On the rare occurrence, make sure your content is not[[dummy_message]]
. Please let us know about this.
Q: Content end boundary is before start boundary for turn ___
A: This is an edge case which should not occur. Please create an Issue if this happens.
Q: Content end boundary is the same as start boundary for turn ___. This is likely an empty turn.
A: This is likely an empty turn.
Q: The EOS/EOT token is incorrectly being masked or not being masked.
A: This is because of the mismatch between
tokenizer.eos_token
and EOS/EOT token in template. Please make sure to seteos_token
underspecial_tokens
to the same EOS/EOT token as in template.