FAQ

Frequently asked questions

General

Q: The trainer stopped and hasn’t progressed in several minutes.

A: Usually an issue with the GPUs communicating with each other. See the NCCL doc

Q: Exitcode -9

A: This usually happens when you run out of system RAM.

Q: Exitcode -7 while using deepspeed

A: Try upgrading deepspeed w: pip install -U deepspeed

Q: AttributeError: ‘DummyOptim’ object has no attribute ‘step’

Q: ModuleNotFoundError: No module named ‘mpi4py’ using single GPU with deepspeed

A: You may be using deepspeed with single gpu. Please remove the deepspeed: section in the yaml file or --deepspeed CLI flag.

Q: The codes is stuck on saving preprocessed datasets.

A: This is usually an issue with the GPU. This can be resolved through setting the os environment variable CUDA_VISIBLE_DEVICES=0. If you are on runpod, this is usually a pod issue. Starting a new pod should take care of it.

Chat templates

Q: jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content' / 'role' / ____

A: This means that the property mapping for the stated attribute does not exist when building chat_template prompt. For example, if no attribute 'content', please check you have added the correct mapping for content under message_property_mappings.

Q: Empty template generated for turn ___

A: The content is empty for that turn.

Q: Could not find content start/end boundary for turn __

A: The specific turn’s start/end could not be detected. Please ensure you have set the eos_token following your chat_template. Otherwise, this could be a chat_template which doesn’t use proper boundaries for each turn (like system). On the rare occurrence, make sure your content is not [[dummy_message]]. Please let us know about this.

Q: Content end boundary is before start boundary for turn ___

A: This is an edge case which should not occur. Please create an Issue if this happens.

Q: Content end boundary is the same as start boundary for turn ___. This is likely an empty turn.

A: This is likely an empty turn.

Q: The EOS/EOT token is incorrectly being masked or not being masked.

A: This is because of the mismatch between tokenizer.eos_token and EOS/EOT token in template. Please make sure to set eos_token under special_tokens to the same EOS/EOT token as in template.