🏔️ MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-ber that translates from English → Atlasic Tamazight (Tachelhit/Central Atlas Tamazight).

📘 Model Overview

Property	Description
Base Model	`Helsinki-NLP/opus-mt-en-ber`
Architecture	MarianMT
Languages	English → Tamazight (Tachelhit / Central Atlas Tamazight)
Fine-tuning Dataset	486K medium-quality synthetic sentence pairs generated by translating English corpora using (NLLB-200)
Training Objective	Sequence-to-sequence translation fine-tuning
Framework	🤗 Transformers
Tokenizer	SentencePiece

🧠 Training Details

Hyperparameter	Value
`per_device_train_batch_size`	16
`per_device_eval_batch_size`	48
`learning_rate`	2e-5
`num_train_epochs`	4
`max_length`	140
`num_beams`	6
`eval_steps`	10000
`save_steps`	10000
`generation_no_repeat_ngram_size`	3
`generation_repetition_penalty`	1.5

Training Environment:
- 1 × NVIDIA P100 (16 GB) on Kaggle
- Total training time: 7 h 3 m 15 s

📈 Evaluation Results

⚠️ Note: The validation set is fully synthetic (NLLB-200). BLEU only measures similarity to synthetic outputs, not human-level accuracy.

Step	Train Loss	Val Loss	BLEU
10000	0.2569	0.2449	27.47
20000	0.2067	0.2019	33.75
30000	0.1890	0.1813	37.79
40000	0.1758	0.1691	40.11
50000	0.1633	0.1604	42.00
60000	0.1596	0.1536	42.70
70000	0.1510	0.1491	44.42
80000	0.1445	0.1452	45.05
90000	0.1426	0.1425	46.04
100000	0.1385	0.1405	46.27
110000	0.1371	0.1392	46.70
120000	0.1369	0.1386	46.98

💬 Example Translations

English	Atlasic Tamazight (Ltn)	Atlasic Tamazight (Tfng)
I will go to school.	rad dduɣ s tinml.	ⵔⴰⴷ ⴷⴷⵓⵖ ⵙ ⵜⵉⵏⵎⵍ.
What did you say?	mayd tnnit?	ⵎⴰⵢⴷ ⵜⵏⵏⵉⵜ?
I want to know where Tom and Mary come from.	riɣ ad ssnɣ mani d idda ṭum d mari.	ⵔⵉⵖ ⴰⴷ ⵙⵙⵏⵖ ⵎⴰⵏⵉ ⴷ ⵉⴷⴷⴰ ⵟⵓⵎ ⴷ ⵎⴰⵔⵉ.
How many girls are there in this picture?	mnck n trbatin ayd illan g twlaft ad?	ⵎⵏⵛⴽ ⵏ ⵜⵔⴱⴰⵜⵉⵏ ⴰⵢⴷ ⵉⵍⵍⴰⵏ ⴳ ⵜⵡⵍⴰⴼⵜ ⴰⴷ?

Hugging Face Space:
👉 ilyasaqit/English-Tamazight-Translator

🪶 Notes

The dataset is synthetic, not manually verified.
The model performs best on short and simple general-domain sentences.
Recommended decoding parameters:
- num_beams=6
- repetition_penalty=1.2–1.5
- no_repeat_ngram_size=3

📚 Citation

If you use this model, please cite:

@misc{marian-en-tamazight-2025,
  title  = {MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas)},
  year   = {2025},
  url    = {https://huggingface.co/ilyasaqit/opus-mt-en-atlasic_tamazight-synth486k-nmv}
}

Downloads last month: 251

Safetensors

Model size

62.6M params

Tensor type

F32

Model tree for ilyasaqit/opus-mt-en-atlasic_tamazight-synth486k-nmv

Base model

Helsinki-NLP/opus-mt-en-ber

Finetuned

(4)

this model

ilyasaqit
/

opus-mt-en-atlasic_tamazight-synth486k-nmv

🏔️ MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

📘 Model Overview

🧠 Training Details

Training Environment:
- 1 × NVIDIA P100 (16 GB) on Kaggle
- Total training time: 7 h 3 m 15 s

📈 Evaluation Results

💬 Example Translations

🪶 Notes

📚 Citation

Model tree for ilyasaqit/opus-mt-en-atlasic_tamazight-synth486k-nmv

Space using ilyasaqit/opus-mt-en-atlasic_tamazight-synth486k-nmv 1

🏔️ MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

📘 Model Overview

🧠 Training Details

Training Environment:- 1 × NVIDIA P100 (16 GB) on Kaggle- Total training time: 7 h 3 m 15 s

📈 Evaluation Results

💬 Example Translations

🪶 Notes

📚 Citation

Model tree for ilyasaqit/opus-mt-en-atlasic_tamazight-synth486k-nmv

Space using ilyasaqit/opus-mt-en-atlasic_tamazight-synth486k-nmv 1

Training Environment:
- 1 × NVIDIA P100 (16 GB) on Kaggle
- Total training time: 7 h 3 m 15 s