Tokenizer mismatch all the time

#47

by tian9 - opened Apr 21, 2024

Apr 21, 2024

Hello, I want to change the LLAVA's base model from llama2 to llama3 and I encountered this problem during fine-tuning the pretrained model:

Why all the the tokenization become 1? What's wrong with this?

cognitivetech

Apr 22, 2024

is it related?

The original Llama 3 8b (base) special token weights are zero, which might cause NaN gradients. This version re-initialized the weights of all the following special tokens to alleviate the problem.
https://huggingface.co/imone/Llama-3-8B-fixed-special-embedding

ArthurZ

Meta Llama org Apr 22, 2024

I have no idea what your tokenization missmatch is, but make sure that the tokenizer you are using is of the PreTrainedTokenizerFast class, not the LlamaTokenizerFast.
It should be completely possible otherwise!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment