ποΈ MarianMT English β Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)
This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-ber that translates from English β Atlasic Tamazight (Tachelhit/Central Atlas Tamazight).
π Model Overview
| Property | Description |
|---|---|
| Base Model | Helsinki-NLP/opus-mt-en-ber |
| Architecture | MarianMT |
| Languages | English β Tamazight (Tachelhit / Central Atlas Tamazight) |
| Fine-tuning Dataset | 486K medium-quality synthetic sentence pairs generated by translating English corpora using (NLLB-200) |
| Training Objective | Sequence-to-sequence translation fine-tuning |
| Framework | π€ Transformers |
| Tokenizer | SentencePiece |
π§ Training Details
| Hyperparameter | Value |
|---|---|
per_device_train_batch_size |
16 |
per_device_eval_batch_size |
48 |
learning_rate |
2e-5 |
num_train_epochs |
4 |
max_length |
140 |
num_beams |
6 |
eval_steps |
10000 |
save_steps |
10000 |
generation_no_repeat_ngram_size |
3 |
generation_repetition_penalty |
1.5 |
Training Environment:
- 1 Γ NVIDIA P100 (16 GB) on Kaggle
- Total training time: 7 h 3 m 15 s
π Evaluation Results
β οΈ Note: The validation set is fully synthetic (NLLB-200). BLEU only measures similarity to synthetic outputs, not human-level accuracy.
| Step | Train Loss | Val Loss | BLEU |
|---|---|---|---|
| 10000 | 0.2569 | 0.2449 | 27.47 |
| 20000 | 0.2067 | 0.2019 | 33.75 |
| 30000 | 0.1890 | 0.1813 | 37.79 |
| 40000 | 0.1758 | 0.1691 | 40.11 |
| 50000 | 0.1633 | 0.1604 | 42.00 |
| 60000 | 0.1596 | 0.1536 | 42.70 |
| 70000 | 0.1510 | 0.1491 | 44.42 |
| 80000 | 0.1445 | 0.1452 | 45.05 |
| 90000 | 0.1426 | 0.1425 | 46.04 |
| 100000 | 0.1385 | 0.1405 | 46.27 |
| 110000 | 0.1371 | 0.1392 | 46.70 |
| 120000 | 0.1369 | 0.1386 | 46.98 |
π¬ Example Translations
| English | Atlasic Tamazight (Ltn) | Atlasic Tamazight (Tfng) |
|---|---|---|
| I will go to school. | rad dduΙ£ s tinml. | β΅β΄°β΄· β΄·β΄·β΅β΅ β΅ β΅β΅β΅β΅β΅. |
| What did you say? | mayd tnnit? | β΅β΄°β΅’β΄· β΅β΅β΅β΅β΅? |
| I want to know where Tom and Mary come from. | riΙ£ ad ssnΙ£ mani d idda αΉum d mari. | β΅β΅β΅ β΄°β΄· β΅β΅β΅β΅ β΅β΄°β΅β΅ β΄· β΅β΄·β΄·β΄° β΅β΅β΅ β΄· β΅β΄°β΅β΅. |
| How many girls are there in this picture? | mnck n trbatin ayd illan g twlaft ad? | β΅β΅β΅β΄½ β΅ β΅β΅β΄±β΄°β΅β΅β΅ β΄°β΅’β΄· β΅β΅β΅β΄°β΅ β΄³ β΅β΅‘β΅β΄°β΄Όβ΅ β΄°β΄·? |
Hugging Face Space:
π ilyasaqit/English-Tamazight-Translator
πͺΆ Notes
- The dataset is synthetic, not manually verified.
- The model performs best on short and simple general-domain sentences.
- Recommended decoding parameters:
num_beams=6repetition_penalty=1.2β1.5no_repeat_ngram_size=3
π Citation
If you use this model, please cite:
@misc{marian-en-tamazight-2025,
title = {MarianMT English β Atlasic Tamazight (Tachelhit / Central Atlas)},
year = {2025},
url = {https://huggingface.co/ilyasaqit/opus-mt-en-atlasic_tamazight-synth486k-nmv}
}
- Downloads last month
- 251
Model tree for ilyasaqit/opus-mt-en-atlasic_tamazight-synth486k-nmv
Base model
Helsinki-NLP/opus-mt-en-ber