Omartificial-Intelligence-Space / Shami-MT

huggingface.co
Total runs: 23
24-hour runs: 0
7-day runs: -1
30-day runs: -45
Model's Last Updated: August 06 2025
translation

Introduction of Shami-MT

Model Details of Shami-MT

SHAMI-MT : A Machine Translation Model From MSA to Syrian Dialect

image/png

Model Description

SHAMI-MT is a specialized machine translation model designed to translate from Modern Standard Arabic (MSA) to Syrian dialect. Built on the robust AraT5v2-base-1024 architecture, this model bridges the gap between formal Arabic and the rich dialectal variations of Syrian Arabic.

Model Details
  • Model Type : Sequence-to-Sequence Translation
  • Base Model : UBC-NLP/AraT5v2-base-1024
  • Language : Arabic (MSA → Syrian Dialect)
  • License : Apache 2.0
  • Library : Transformers
Dataset

The model was trained on the Nâbra dataset, a comprehensive corpus of Syrian Arabic dialects with morphological annotations.

image/png

Nâbra Dataset Details

Citation:

Nayouf, A., Hammouda, T., Jarrar, M., Zaraket, F., & Kurdy, M. B. (2023). 
Nâbra: Syrian Arabic dialects with morphological annotations. 
arXiv preprint arXiv:2310.17315.

Key Statistics:

  • Tokens : ~60,000 words
  • Dialects Covered : Multiple Syrian regional dialects including:
    • Aleppo
    • Damascus
    • Deir-ezzur
    • Hama
    • Homs
    • Huran
    • Latakia
    • Mardin
    • Raqqah
    • Suwayda

Data Sources:

  • Social media posts
  • Movie and TV series scripts
  • Song lyrics
  • Local proverbs
Training Details

The model was fine-tuned on the AraT5v2-base-1024 architecture with the following training metrics:

  • Total Training Steps : 10,384
  • Epochs : 22
  • Final Training Loss : 1.396
  • Final Evaluation Loss : 0.771
  • Learning Rate : Cosine schedule starting at 5e-5
  • Batch Size : 256
  • Total FLOPs : 1.58e+17
Training Progress

The model showed consistent improvement throughout training:

  • Initial loss: 12.93 → Final loss: 1.40
  • Evaluation loss steadily decreased from 1.44 to 0.77
  • Gradient norms remained stable throughout training
Usage
Installation
pip install transformers torch
Inference Code
from transformers import T5Tokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("Omartificial-Intelligence-Space/Shami-MT")
model = AutoModelForSeq2SeqLM.from_pretrained("Omartificial-Intelligence-Space/Shami-MT")

# Example usage
ar_prompt = "مرحبا بك هنا"  # MSA input
input_ids = tokenizer(ar_prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids)

print("Input (MSA):", ar_prompt)
print("Tokenized input:", tokenizer.tokenize(ar_prompt))
print("Output (Syrian Dialect):", tokenizer.decode(outputs[0], skip_special_tokens=True))
Generation Parameters

For optimal results, you can adjust generation parameters:

outputs = model.generate(
    input_ids,
    max_length=128,
    num_beams=4,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id
)
Evaluation Results
  • Test Set : 1,500 unseen sentences
  • Evaluation Method : GPT-4.1 as automated judge
  • Average Score : 4.01/5.0
  • Evaluation Criteria : Translation quality, dialectal accuracy, and semantic preservation

The model was evaluated using GPT-4.1 as an automated judge with the following structured prompt:

"You are a language evaluation assistant. Compare the predicted Shami sentence to the reference.
Please return a rating from 0 to 5 and a short comment.

MSA Input: [input sentence]
Model Prediction (Shami dialect): [model output]
Ground Truth (Shami dialect): [reference translation]

Respond in this format:
Score: <number from 0 to 5>
Comment: <brief explanation of the score>"

Score Distribution Analysis:

  • Excellent (5.0) : High-quality translations with perfect dialectal conversion
  • Good (4.0-4.9) : Minor dialectal variations or stylistic differences
  • Average (3.0-3.9) : Acceptable translations with some dialectal inconsistencies
  • Below Average (2.0-2.9) : Noticeable errors in dialect or meaning
  • Poor (0-1.9) : Significant translation errors or loss of meaning
Performance Highlights
  • Strong Dialectal Conversion : Successfully transforms MSA into authentic Syrian dialect
  • Semantic Preservation : Maintains original meaning while adapting linguistic style
  • Regional Adaptability : Handles various Syrian sub-dialects effectively
  • Consistent Quality : Stable performance across different text types and domains
Applications

This model is particularly useful for:

  • Content Localization : Adapting MSA content for Syrian audiences
  • Cultural Preservation : Maintaining and promoting Syrian dialectal variations
  • Educational Tools : Teaching differences between MSA and Syrian dialect
  • Research : Syrian Arabic NLP and dialectology studies
Regional Coverage

The model handles multiple Syrian sub-dialects, making it versatile for different regions within Syria:

🏛️ Urban Centers : Damascus, Aleppo
🏔️ Northern Regions : Latakia, Mardin
🏜️ Eastern Areas : Deir-ezzur, Raqqah
🌄 Central/Southern : Hama, Homs, Huran, Suwayda

Limitations
  • Trained specifically on Syrian dialect variations
  • Performance may vary for other Arabic dialects
  • Limited to text-based translation (no speech support)
  • Dataset size constraints may affect handling of very rare dialectal expressions
Citation

If you use this model in your research, please cite:

@misc{shami-mt-2024,
  title={SHAMI-MT: A Machine Translation Model From MSA to Syrian Dialect},
  author={Omartificial Intelligence Space},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/Omartificial-Intelligence-Space/Shami-MT}
}

@article{nayouf2023nabra,
  title={Nâbra: Syrian Arabic dialects with morphological annotations},
  author={Nayouf, Amal and Hammouda, Tymaa Hasanain and Jarrar, Mustafa and Zaraket, Fadi A and Kurdy, Mohamad-Bassam},
  journal={arXiv preprint arXiv:2310.17315},
  year={2023}
}

@misc{onajar2025shamiMT,
  title={Shami-MT-2MSA : A Machine Translation from Syrian Dialect to MSA},
  author={Sibaee, Serry and Nacar, Omer},
  year={2025}
}
Contact & Support

For questions, issues, or contributions, please visit the model repository or contact the development team.

Runs of Omartificial-Intelligence-Space Shami-MT on huggingface.co

23
Total runs
0
24-hour runs
-1
3-day runs
-1
7-day runs
-45
30-day runs

More Information About Shami-MT huggingface.co Model

More Shami-MT license Visit here:

https://choosealicense.com/licenses/apache-2.0

Shami-MT huggingface.co

Shami-MT huggingface.co is an AI model on huggingface.co that provides Shami-MT's model effect (), which can be used instantly with this Omartificial-Intelligence-Space Shami-MT model. huggingface.co supports a free trial of the Shami-MT model, and also provides paid use of the Shami-MT. Support call Shami-MT model through api, including Node.js, Python, http.

Omartificial-Intelligence-Space Shami-MT online free

Shami-MT huggingface.co is an online trial and call api platform, which integrates Shami-MT's modeling effects, including api services, and provides a free online trial of Shami-MT, you can try Shami-MT online for free by clicking the link below.

Omartificial-Intelligence-Space Shami-MT online free url in huggingface.co:

https://huggingface.co/Omartificial-Intelligence-Space/Shami-MT

Shami-MT install

Shami-MT is an open source model from GitHub that offers a free installation service, and any user can find Shami-MT on GitHub to install. At the same time, huggingface.co provides the effect of Shami-MT install, users can directly use Shami-MT installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

Url of Shami-MT

Provider of Shami-MT huggingface.co

Omartificial-Intelligence-Space
ORGANIZATIONS

Other API from Omartificial-Intelligence-Space