Voice of a Continent
is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.
Best-in-Class Multilingual Models
Introduced in our EMNLP 2025 paper
Voice of a Continent
, the
Simba Series
represents the current state-of-the-art for African speech AI.
Unified Suite:
Models optimized for African languages.
Superior Accuracy:
Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
Multitask Capability:
Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
Inclusion-First:
Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.
The
Simba
family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.
🗣️✍️ Simba-ASR
The New Standard for African Speech-to-Text
🎯 Task
Automatic Speech Recognition
— Powering high-accuracy transcription across the continent.
🌍 Language Coverage (43 African languages)
Amharic
(
amh
),
Arabic
(
ara
),
Asante Twi
(
asanti
),
Bambara
(
bam
),
Baoulé
(
bau
),
Bemba
(
bem
),
Ewe
(
ewe
),
Fanti
(
fat
),
Fon
(
fon
),
French
(
fra
),
Ganda
(
lug
),
Hausa
(
hau
),
Igbo
(
ibo
),
Kabiye
(
kab
),
Kinyarwanda
(
kin
),
Kongo
(
kon
),
Lingala
(
lin
),
Luba-Katanga
(
lub
),
Luo
(
luo
),
Malagasy
(
mlg
),
Mossi
(
mos
),
Northern Sotho
(
nso
),
Nyanja
(
nya
),
Oromo
(
orm
),
Portuguese
(
por
),
Shona
(
sna
),
Somali
(
som
),
Southern Sotho
(
sot
),
Swahili
(
swa
),
Swati
(
ssw
),
Tigrinya
(
tir
),
Tsonga
(
tso
),
Tswana
(
tsn
),
Twi
(
twi
),
Umbundu
(
umb
),
Venda
(
ven
),
Wolof
(
wol
),
Xhosa
(
xho
),
Yoruba
(
yor
),
Zulu
(
zul
),
Tamazight
(
tzm
),
Sango
(
sag
),
Dinka
(
din
).
Simba-S
emerged as the best-performing ASR model overall.
🧩 Usage Example
You can easily run inference using the Hugging Face
transformers
library.
from transformers import pipeline
# Load Simba-S for ASR
asr_pipeline = pipeline(
"automatic-speech-recognition",
model="UBC-NLP/Simba-S"#Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M`
)
##### Load the multilingual African adapter (Only for `UBC-NLP/Simba-M`)
asr_pipeline.model.load_adapter("multilingual_african") # Only for `UBC-NLP/Simba-M`############################ Transcribe audio from file
result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav")
print(result["text"])
# Transcribe audio from audio array
result = asr_pipeline({
"array": audio_array,
"sampling_rate": 16_000
})
print(result["text"])
Example Outputs
Using the same audio file with different Simba models:
# Simba-S
{'text': 'watter verontwaardiging sou daar, in ons binneste gewees het.'}
# Simba-W
{'text': 'watter veronwaardigingsel daar, in ons binneste gewees het.'}
# Simba-X
{'text': 'fator fr on ar taamsodr is'}
# Simba-M
{'text': 'watter veronwaardiging sodaar in ons binniste gewees het'}
# Simba-H
{'text': 'watter vironwaardiging so daar in ons binneste geweeshet'}
Get started with Simba models in minutes using our interactive Colab notebook:
Citation
If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.
@inproceedings{elmadany-etal-2025-voice,
title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
author = "Elmadany, AbdelRahim A. and
Kwon, Sang Yun and
Toyin, Hawau Olamide and
Alcoba Inciarte, Alcides and
Aldarmaki, Hanan and
Abdul-Mageed, Muhammad",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-main.559/",
doi = "10.18653/v1/2025.emnlp-main.559",
pages = "11039--11061",
ISBN = "979-8-89176-332-6",
}
Runs of UBC-NLP Simba-S on huggingface.co
222
Total runs
0
24-hour runs
1
3-day runs
2
7-day runs
180
30-day runs
More Information About Simba-S huggingface.co Model
Simba-S huggingface.co is an AI model on huggingface.co that provides Simba-S's model effect (), which can be used instantly with this UBC-NLP Simba-S model. huggingface.co supports a free trial of the Simba-S model, and also provides paid use of the Simba-S. Support call Simba-S model through api, including Node.js, Python, http.
Simba-S huggingface.co is an online trial and call api platform, which integrates Simba-S's modeling effects, including api services, and provides a free online trial of Simba-S, you can try Simba-S online for free by clicking the link below.
UBC-NLP Simba-S online free url in huggingface.co:
Simba-S is an open source model from GitHub that offers a free installation service, and any user can find Simba-S on GitHub to install. At the same time, huggingface.co provides the effect of Simba-S install, users can directly use Simba-S installed effect in huggingface.co for debugging and trial. It also supports api for free installation.