TookaBERT models are a family of encoder models trained on Persian in two sizes base and large. These Models pre-trained on over 500GB of Persian data including a variety of topics such as News, Blogs, Forums, Books, etc. They pre-trained with the MLM (WWM) objective using two context lengths.
How to use
You can use this model directly for Masked Language Modeling using the provided code below.
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("PartAI/TookaBERT-Base")
model = AutoModelForMaskedLM.from_pretrained("PartAI/TookaBERT-Base")
# prepare input
text = "شهر برلین در کشور <mask> واقع شده است."
encoded_input = tokenizer(text, return_tensors='pt')
# forward pass
output = model(**encoded_input)
It is also possible to use inference pipelines such as below.
from transformers import pipeline
inference_pipeline = pipeline('fill-mask', model="PartAI/TookaBERT-Base")
inference_pipeline("شهر برلین در کشور <mask> واقع شده است.")
You can use this model to fine-tune it over your dataset and prepare it for your task.
DeepSentiPers (Sentiment Analysis)
ParsiNLU - Multiple-choice (Multiple-choice)
Evaluation
TookaBERT models are evaluated on a wide range of NLP downstream tasks, such as Sentiment Analysis (SA), Text Classification, Multiple-choice, Question Answering, and Named Entity Recognition (NER).
Here are some key performance results:
Model name
DeepSentiPers (f1/acc)
MultiCoNER-v2 (f1/acc)
PQuAD (best_exact/best_f1/HasAns_exact/HasAns_f1)
FarsTail (f1/acc)
ParsiNLU-Multiple-choice (f1/acc)
ParsiNLU-Reading-comprehension (exact/f1)
ParsiNLU-QQP (f1/acc)
TookaBERT-large
85.66/85.78
69.69/94.07
75.56/88.06/70.24/87.83
89.71/89.72
36.13/35.97
33.6/60.5
82.72/82.63
TookaBERT-base
83.93/83.93
66.23/93.3
73.18
/
85.71
/
68.29
/
85.94
83.26/83.41
33.6/
33.81
20.8/42.52
81.33/81.29
Shiraz
81.17/81.08
59.1/92.83
65.96/81.25/59.63/81.31
77.76/77.75
34.73/34.53
17.6/39.61
79.68/79.51
ParsBERT
80.22/80.23
64.91/93.23
71.41/84.21/66.29/84.57
80.89/80.94
35.34/35.25
20/39.58
80.15/80.07
XLM-V-base
83.43/83.36
58.83/92.23
73.26
/
85.69
/
68.21
/
85.56
81.1/81.2
35.28/35.25
8/26.66
80.1/79.96
XLM-RoBERTa-base
83.99/84.07
60.38/92.49
73.72
/
86.24
/
68.16
/
85.8
82.0/81.98
32.4/32.37
20.0/40.43
79.14/78.95
FaBERT
82.68/82.65
63.89/93.01
72.57
/
85.39
/67.16/
85.31
83.69/83.67
32.47/32.37
27.2/48.42
82.34/82.29
mBERT
78.57/78.66
60.31/92.54
71.79/84.68/65.89/83.99
82.69/82.82
33.41/33.09
27.2
/42.18
79.19/79.29
AriaBERT
80.51/80.51
60.98/92.45
68.09/81.23/62.12/80.94
74.47/74.43
30.75/30.94
14.4/35.48
79.09/78.84
*Note because of the randomness in the fine-tuning process, results with less than 1% differences are considered together.
Contact us
If you have any questions regarding this model, you can reach us via the
community
of the model in Hugging Face.
Runs of PartAI TookaBERT-Base on huggingface.co
20
Total runs
0
24-hour runs
2
3-day runs
5
7-day runs
5
30-day runs
More Information About TookaBERT-Base huggingface.co Model
TookaBERT-Base huggingface.co is an AI model on huggingface.co that provides TookaBERT-Base's model effect (), which can be used instantly with this PartAI TookaBERT-Base model. huggingface.co supports a free trial of the TookaBERT-Base model, and also provides paid use of the TookaBERT-Base. Support call TookaBERT-Base model through api, including Node.js, Python, http.
TookaBERT-Base huggingface.co is an online trial and call api platform, which integrates TookaBERT-Base's modeling effects, including api services, and provides a free online trial of TookaBERT-Base, you can try TookaBERT-Base online for free by clicking the link below.
PartAI TookaBERT-Base online free url in huggingface.co:
TookaBERT-Base is an open source model from GitHub that offers a free installation service, and any user can find TookaBERT-Base on GitHub to install. At the same time, huggingface.co provides the effect of TookaBERT-Base install, users can directly use TookaBERT-Base installed effect in huggingface.co for debugging and trial. It also supports api for free installation.