SA-BERT-Classifier
is a binary classifier that distinguishes between Saudi and non-Saudi Arabic dialects. Built on top of the
SA-BERT-V1
embeddings, this model achieves high accuracy in identifying Saudi dialectal expressions across various domains and contexts.
Intended Use
This model is designed for:
Dialect identification in Arabic text
Content filtering for region-specific applications
Improving NLP pipelines for Saudi audience targeting
Research on dialectal variations in Arabic
Performance Metrics
The model achieves the following performance on our test set:
Metric
Score
Accuracy
0.9821
Precision
0.9745
Recall
0.9890
F1 Score
0.9817
Usage
Using the Hugging Face Transformers Pipeline
import os
import torch
from transformers import (
AutoTokenizer,
AutoModelForSequenceClassification,
TextClassificationPipeline
)
# Configuration
MODEL_ID = "Omartificial-Intelligence-Space/SA-BERT-Classifier"
HF_TOKEN = os.getenv("HUGGINGFACE_HUB_TOKEN", "<YOUR_TOKEN_HERE>")
DEVICE = 0if torch.cuda.is_available() else -1# Load tokenizer & model
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_auth_token=HF_TOKEN)
model = AutoModelForSequenceClassification.from_pretrained(
MODEL_ID, use_auth_token=HF_TOKEN
).to("cuda"if DEVICE == 0else"cpu")
# Build the pipeline
pipeline = TextClassificationPipeline(
model=model,
tokenizer=tokenizer,
device=DEVICE,
return_all_scores=True
)
# Example
text = "السلام عليكم ورحمة الله كيف حالك اليوم؟"
results = pipeline(text)[0]
# Format results
scores = {int(item["label"].split("_")[-1]): item["score"] for item in results}
p_non_saudi = scores.get(0, 0.0)
p_saudi = scores.get(1, 0.0)
prediction = "Saudi"if p_saudi > p_non_saudi else"Non-Saudi"print(f"Text: {text}")
print(f"P(Non-Saudi): {p_non_saudi:.4f}")
print(f"P(Saudi): {p_saudi:.4f}")
print(f"Prediction: {prediction}")
Training Parameters
Embedding model
: Omartificial-Intelligence-Space/SA-BERT-V1
Max sequence length
: 256
Classifier
: Logistic Regression with balanced class weights
Training split
: 80% train, 20% test (stratified)
Example Results
Here are some example predictions from our test set:
Sample Text
P(Non-Saudi)
P(Saudi)
Predicted
الإسلام دين رحمة وتسامح، مو تعصب ولا قسوة.
0.0000
1.0000
Saudi
مهرجان الملك عبدالعزيز للإبل له قيمة ثقافية واقتصادية كبيرة.
0.0000
1.0000
Saudi
قبل تبدأ بأي بزنس، لازم تسوي دراسة جدوى كويسة.
0.0000
1.0000
Saudi
هل الطريق إلى المدينة الأخرى سالك؟ وهل توجد تحويلات؟
0.9998
0.0002
Non-Saudi
هل المطعم مفتوح الآن لتناول الغداء؟ وكم وقت الانتظار تقريباً؟
0.9999
0.0001
Non-Saudi
تحب سياحة البر؟ عندك أماكن كثيرة بالجنوب والوسط.
0.9993
0.0007
Non-Saudi
صبحك الله بالخير والعافية يالغالي، عسى يومك كله خير وسعادة.
0.0000
1.0000
Saudi
Analysis
The classifier demonstrates several noteworthy characteristics:
High confidence predictions
: The model often predicts with very high confidence (near 0.0 or 1.0)
Dialectal markers
: Expressions like "مو" (not), "وش" (what), "عشان" (because) are strong Saudi dialect indicators
MSA (Modern Standard Arabic) sensitivity
: Formal, MSA-heavy sentences tend to be classified as non-Saudi, regardless of content
Lexical features
: Saudi-specific vocabulary (e.g., references to places like "جازان", "العلا") increases Saudi classification probability
Limitations
The model may perform less effectively on mixed-dialect text or code-switching between MSA and dialect
Very short text with limited dialectal markers may yield less reliable results
Performance may vary for specialized domains not well-represented in the training data
The binary classification (Saudi/non-Saudi) does not distinguish between specific non-Saudi dialects
Citation
If you use this model in your research or applications, please cite:
SA-BERT-Classifier huggingface.co is an AI model on huggingface.co that provides SA-BERT-Classifier's model effect (), which can be used instantly with this Omartificial-Intelligence-Space SA-BERT-Classifier model. huggingface.co supports a free trial of the SA-BERT-Classifier model, and also provides paid use of the SA-BERT-Classifier. Support call SA-BERT-Classifier model through api, including Node.js, Python, http.
SA-BERT-Classifier huggingface.co is an online trial and call api platform, which integrates SA-BERT-Classifier's modeling effects, including api services, and provides a free online trial of SA-BERT-Classifier, you can try SA-BERT-Classifier online for free by clicking the link below.
Omartificial-Intelligence-Space SA-BERT-Classifier online free url in huggingface.co:
SA-BERT-Classifier is an open source model from GitHub that offers a free installation service, and any user can find SA-BERT-Classifier on GitHub to install. At the same time, huggingface.co provides the effect of SA-BERT-Classifier install, users can directly use SA-BERT-Classifier installed effect in huggingface.co for debugging and trial. It also supports api for free installation.