πŸ”οΈ MarianMT English β†’ Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-ber that translates from English β†’ Atlasic Tamazight (Tachelhit/Central Atlas Tamazight).


πŸ“˜ Model Overview

Property Description
Base Model Helsinki-NLP/opus-mt-en-ber
Architecture MarianMT
Languages English β†’ Tamazight (Tachelhit / Central Atlas Tamazight)
Fine-tuning Dataset 486K medium-quality synthetic sentence pairs generated by translating English corpora using (NLLB-200)
Training Objective Sequence-to-sequence translation fine-tuning
Framework πŸ€— Transformers
Tokenizer SentencePiece

🧠 Training Details

Hyperparameter Value
per_device_train_batch_size 16
per_device_eval_batch_size 48
learning_rate 2e-5
num_train_epochs 4
max_length 140
num_beams 6
eval_steps 10000
save_steps 10000
generation_no_repeat_ngram_size 3
generation_repetition_penalty 1.5

Training Environment:
- 1 Γ— NVIDIA P100 (16 GB) on Kaggle
- Total training time: 7 h 3 m 15 s

πŸ“ˆ Evaluation Results

⚠️ Note: The validation set is fully synthetic (NLLB-200). BLEU only measures similarity to synthetic outputs, not human-level accuracy.

Step Train Loss Val Loss BLEU
10000 0.2569 0.2449 27.47
20000 0.2067 0.2019 33.75
30000 0.1890 0.1813 37.79
40000 0.1758 0.1691 40.11
50000 0.1633 0.1604 42.00
60000 0.1596 0.1536 42.70
70000 0.1510 0.1491 44.42
80000 0.1445 0.1452 45.05
90000 0.1426 0.1425 46.04
100000 0.1385 0.1405 46.27
110000 0.1371 0.1392 46.70
120000 0.1369 0.1386 46.98

πŸ’¬ Example Translations

English Atlasic Tamazight (Ltn) Atlasic Tamazight (Tfng)
I will go to school. rad dduΙ£ s tinml. β΅”β΄°β΄· β΄·β΄·β΅“β΅– β΅™ β΅œβ΅‰β΅β΅Žβ΅.
What did you say? mayd tnnit? ⡎ⴰ⡒ⴷ β΅œβ΅β΅β΅‰β΅œ?
I want to know where Tom and Mary come from. riΙ£ ad ssnΙ£ mani d idda αΉ­um d mari. ⡔⡉⡖ β΄°β΄· ⡙⡙⡏⡖ β΅Žβ΄°β΅β΅‰ β΄· ⡉ⴷⴷⴰ β΅Ÿβ΅“β΅Ž β΄· β΅Žβ΄°β΅”β΅‰.
How many girls are there in this picture? mnck n trbatin ayd illan g twlaft ad? β΅Žβ΅β΅›β΄½ ⡏ β΅œβ΅”β΄±β΄°β΅œβ΅‰β΅ β΄°β΅’β΄· ⡉⡍⡍ⴰ⡏ β΄³ ⡜⡑⡍ⴰⴼ⡜ β΄°β΄·?

Hugging Face Space:
πŸ‘‰ ilyasaqit/English-Tamazight-Translator


πŸͺΆ Notes

  • The dataset is synthetic, not manually verified.
  • The model performs best on short and simple general-domain sentences.
  • Recommended decoding parameters:
    • num_beams=6
    • repetition_penalty=1.2–1.5
    • no_repeat_ngram_size=3

πŸ“š Citation

If you use this model, please cite:

@misc{marian-en-tamazight-2025,
  title  = {MarianMT English β†’ Atlasic Tamazight (Tachelhit / Central Atlas)},
  year   = {2025},
  url    = {https://huggingface.co/ilyasaqit/opus-mt-en-atlasic_tamazight-synth486k-nmv}
}
Downloads last month
251
Safetensors
Model size
62.6M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ilyasaqit/opus-mt-en-atlasic_tamazight-synth486k-nmv

Finetuned
(4)
this model

Space using ilyasaqit/opus-mt-en-atlasic_tamazight-synth486k-nmv 1