utter-project / mHuBERT-147

huggingface.co
Total runs: 51.2K
24-hour runs: 0
7-day runs: 2.3K
30-day runs: 5.3K
Model's Last Updated: December 20 2024
feature-extraction

Introduction of mHuBERT-147

Model Details of mHuBERT-147

This repository contains the best mHuBERT-147 pre-trained model.

MODEL DETAILS: 3rd iteration, K=1000, HuBERT base architecture (95M parameters), 147 languages.

Table of Contents:

  1. Summary
  2. Training Data and Code
  3. ML-SUPERB Scores
  4. Languages and Datasets
  5. Citing and Funding Information

mHuBERT-147 models

mHuBERT-147 are compact and competitive multilingual HuBERT models trained on 90K hours of open-license data in 147 languages. Different from traditional HuBERTs, mHuBERT-147 models are trained using faiss IVF discrete speech units. Training employs a two-level language, data source up-sampling during training. See more information in our paper .

This repository contains:

  • Fairseq checkpoint (original);
  • HuggingFace checkpoint (conversion using transformers library);
  • Faiss index for continuous pre-training (OPQ16_64,IVF1000_HNSW32,PQ16x4fsr).

Related Models:

Training

ML-SUPERB Scores

mHubert-147 reaches second and first position in the 10min and 1h leaderboards respectively. We achieve new SOTA scores for three LID tasks. See more information in our paper .

image/png

Languages and Datasets

Datasets: For ASR/ST/TTS datasets, only train set is used.

Languages present not indexed by Huggingface: Asturian (ast), Basaa (bas), Cebuano (ceb), Central Kurdish/Sorani (ckb), Hakha Chin (cnh), Hawaiian (haw), Upper Sorbian (hsb) Kabyle (kab), Moksha (mdf), Meadow Mari (mhr), Hill Mari (mrj), Erzya (myv), Taiwanese Hokkien (nan-tw), Sursilvan (rm-sursilv), Vallader (rm-vallader), Sakha (sah), Santali (sat), Scots (sco), Saraiki (skr), Tigre (tig), Tok Pisin (tpi), Akwapen Twi (tw-akuapem), Asante Twi (tw-asante), Votic (vot), Waray (war), Cantonese (yue).

Citing and Funding Information

@inproceedings{boito2024mhubert,
author={Marcely Zanon Boito, Vivek Iyer, Nikolaos Lagos, Laurent Besacier, Ioan Calapodescu},
title={{mHuBERT-147: A Compact Multilingual HuBERT Model}},
year=2024,
booktitle={Interspeech 2024},
}
This is an output of the European Project UTTER (Unified Transcription and Translation for Extended Reality) funded by European Union’s Horizon Europe Research and Innovation programme under grant agreement number 101070631.

For more information please visit https://he-utter.eu/

Runs of utter-project mHuBERT-147 on huggingface.co

51.2K
Total runs
0
24-hour runs
2.0K
3-day runs
2.3K
7-day runs
5.3K
30-day runs

More Information About mHuBERT-147 huggingface.co Model

More mHuBERT-147 license Visit here:

https://choosealicense.com/licenses/cc-by-nc-sa-4.0

mHuBERT-147 huggingface.co

mHuBERT-147 huggingface.co is an AI model on huggingface.co that provides mHuBERT-147's model effect (), which can be used instantly with this utter-project mHuBERT-147 model. huggingface.co supports a free trial of the mHuBERT-147 model, and also provides paid use of the mHuBERT-147. Support call mHuBERT-147 model through api, including Node.js, Python, http.

utter-project mHuBERT-147 online free

mHuBERT-147 huggingface.co is an online trial and call api platform, which integrates mHuBERT-147's modeling effects, including api services, and provides a free online trial of mHuBERT-147, you can try mHuBERT-147 online for free by clicking the link below.

utter-project mHuBERT-147 online free url in huggingface.co:

https://huggingface.co/utter-project/mHuBERT-147

mHuBERT-147 install

mHuBERT-147 is an open source model from GitHub that offers a free installation service, and any user can find mHuBERT-147 on GitHub to install. At the same time, huggingface.co provides the effect of mHuBERT-147 install, users can directly use mHuBERT-147 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

mHuBERT-147 install url in huggingface.co:

https://huggingface.co/utter-project/mHuBERT-147

Url of mHuBERT-147

Provider of mHuBERT-147 huggingface.co

utter-project
ORGANIZATIONS

Other API from utter-project