mHuBERT-147 are compact and competitive multilingual HuBERT models trained on 90K hours of open-license data in 147 languages.
Different from
traditional
HuBERTs, mHuBERT-147 models are trained using faiss IVF discrete speech units.
Training employs a two-level language, data source up-sampling during training. See more information in
our paper
.
This repository contains:
Fairseq checkpoint (original);
HuggingFace checkpoint (conversion using transformers library);
Faiss index for continuous pre-training (OPQ16_64,IVF1000_HNSW32,PQ16x4fsr).
Manifest list available here.
Please note that since training, there were CommonVoice removal requests. This means that some of the listed files are no longer available.
Fairseq fork
contains the scripts for training with multilingual batching with two-level up-sampling.
mHubert-147 reaches second and first position in the 10min and 1h leaderboards respectively. We achieve new SOTA scores for three LID tasks.
See more information in
our paper
.
Languages and Datasets
Datasets:
For ASR/ST/TTS datasets, only train set is used.
This is an output of the European Project UTTER (Unified Transcription and Translation for Extended Reality) funded by European Union’s Horizon Europe Research and Innovation programme under grant agreement number 101070631.
mHuBERT-147 huggingface.co is an AI model on huggingface.co that provides mHuBERT-147's model effect (), which can be used instantly with this utter-project mHuBERT-147 model. huggingface.co supports a free trial of the mHuBERT-147 model, and also provides paid use of the mHuBERT-147. Support call mHuBERT-147 model through api, including Node.js, Python, http.
mHuBERT-147 huggingface.co is an online trial and call api platform, which integrates mHuBERT-147's modeling effects, including api services, and provides a free online trial of mHuBERT-147, you can try mHuBERT-147 online for free by clicking the link below.
utter-project mHuBERT-147 online free url in huggingface.co:
mHuBERT-147 is an open source model from GitHub that offers a free installation service, and any user can find mHuBERT-147 on GitHub to install. At the same time, huggingface.co provides the effect of mHuBERT-147 install, users can directly use mHuBERT-147 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.