Tokenizer mismatch all the time
#47
by
tian9
- opened
is it related?
The original Llama 3 8b (base) special token weights are zero, which might cause NaN gradients. This version re-initialized the weights of all the following special tokens to alleviate the problem.
https://huggingface.co/imone/Llama-3-8B-fixed-special-embedding
I have no idea what your tokenization missmatch is, but make sure that the tokenizer you are using is of the PreTrainedTokenizerFast class, not the LlamaTokenizerFast.
It should be completely possible otherwise!
