πŸ”οΈ MarianMT English β†’ Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-ber that translates from English β†’ Atlasic Tamazight (Tachelhit/Central Atlas Tamazight).


πŸ“˜ Model Overview

Property Description
Base Model Helsinki-NLP/opus-mt-en-ber
Architecture MarianMT
Languages English β†’ Tamazight (Tachelhit / Central Atlas Tamazight)
Fine-tuning Dataset 97K medium-quality synthetic sentence pairs generated by translating English corpora
Training Objective Sequence-to-sequence translation fine-tuning
Framework πŸ€— Transformers
Tokenizer SentencePiece

🧠 Training Details

Hyperparameter Value
per_device_train_batch_size 16
per_device_eval_batch_size 48
learning_rate 2e-5
num_train_epochs 5
max_length 128
num_beams 5
eval_steps 5000
save_steps 5000
generation_no_repeat_ngram_size 3
generation_repetition_penalty 1.5

Training Environment:

  • 1 Γ— NVIDIA P100 (16 GB) on Kaggle
  • Total training time: 2 h 4 m 38 s

πŸ“ˆ Evaluation Results

Step Train Loss Val Loss BLEU
5 000 0.453 0.4296 3.24
10 000 0.386 0.3777 4.97
15 000 0.357 0.3546 5.99
20 000 0.334 0.3419 6.60
25 000 0.326 0.3351 7.02

πŸ’¬ Example Translations

English Atlasic Tamazight
I will go to school. Rad ftuΙ£ s tinml.
What did you say? Mad tnnit?
I'm not talking to you, I'm talking to him! Ur dik a s ar sawalΙ£!!!
Everyone has a secret face. Kraygatt yan ila yat tguri.

Hugging Face Space:
πŸ‘‰ ilyasaqit/English-Tamazight-Translator


πŸͺΆ Notes

  • The dataset is synthetic, not manually verified.
  • The model performs best on short and simple general-domain sentences.
  • Recommended decoding parameters:
    • num_beams=5
    • repetition_penalty=1.2–1.5
    • no_repeat_ngram_size=3

πŸ“š Citation

If you use this model, please cite:

@misc{marian-en-tamazight-2025,
  title  = {MarianMT English β†’ Atlasic Tamazight (Tachelhit / Central Atlas)},
  year   = {2025},
  url    = {https://huggingface.co/ilyasaqit/stage2_marian_opus_synth_model_kaggle2}
}
Downloads last month
57
Safetensors
Model size
62.6M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ilyasaqit/opus-mt-en-atlasic_tamazight-synth97k-nmv

Finetuned
(4)
this model

Space using ilyasaqit/opus-mt-en-atlasic_tamazight-synth97k-nmv 1