amphion / Metis

huggingface.co
Total runs: 44
24-hour runs: 0
7-day runs: 3
30-day runs: 25
Model's Last Updated: April 13 2025
text-to-speech

Introduction of Metis

Model Details of Metis

Metis : A Foundation Speech Generation Model with Masked Generative Pre-training

arXiv hf readme

Overview

We introduce Metis , a foundation model for unified speech generation. Unlike previous task-specific or multi-task models, Metis follows a pre-training and fine-tuning paradigm. It is pre-trained on large-scale unlabeled speech data using masked generative modeling and then fine-tuned to adapt to diverse speech generation tasks. Specifically, (1) Metis utilizes two discrete speech representations: SSL tokens derived from speech self-supervised learning (SSL) features, and acoustic tokens directly quantized from waveforms. (2) Metis performs masked generative pre-training on SSL tokens, utilizing 300K hours of diverse speech data, without any additional condition. (3) Through fine-tuning with task-specific conditions, Metis achieves efficient adaptation to various speech generation tasks while supporting multimodal input, even when using limited data and trainable parameters. Experiments demonstrate that Metis can serve as a foundation model for unified speech generation: Metis outperforms state-of-the-art task-specific or multi-task systems across five speech generation tasks, including zero-shot text-to-speech, voice conversion, target speaker extraction, speech enhancement, and lip-to-speech, even with fewer than 20M trainable parameters or 300 times less training data. Audio samples are are available at demo page .

Model Introduction

Metis is fully compatible with MaskGCT and shares several key model components with it. These shared components are:

Model Name Description
Semantic Codec Converting speech to semantic tokens.
Acoustic Codec Converting speech to acoustic tokens and reconstructing waveform from acoustic tokens.
Semantic2Acoustic Predicts acoustic tokens conditioned on semantic tokens.

We open-source the pretrained model checkpoint of the first stage of Metis (with masked generative pre-training), as well as the fine-tuned models for speech enhancement (SE), target speaker extraction (TSE), voice conversion (VC), lip-to-speech (L2S), and the unified multi-task (Omni) model.

For zero-shot text-to-speech, you can download the text2semantic model from MaskGCT, which is compatible with the Metis framework.

Model Name Description
Metis-Base The base model pre-trained with masked generative pre-training.
Metis-TSE Fine-tuned model for target speaker extraction. Available in both full-scale and LoRA (r = 32) versions.
Metis-VC Fine-tuned model for voice conversion. Available in full-scale version.
Metis-SE Fine-tuned model for speech enhancement. Available in both full-scale and LoRA (r = 32) versions.
Metis-L2S Fine-tuned model for lip-to-speech. Available in full-scale version.
Metis-TTS Zero-shot text-to-speech model (as same as the first stage of MaskGCT).
Metis-Omni Unified multi-task model supporting zero-shot TTS, VC, TSE, and SE.
Usage
Citations

If you use Metis in your research, please cite the following paper:

@article{wang2025metis,
  title={Metis: A Foundation Speech Generation Model with Masked Generative Pre-training},
  author={Wang, Yuancheng and Zheng, Jiachen and Zhang, Junan and Zhang, Xueyao and Liao, Huan and Wu, Zhizheng},
  journal={arXiv preprint arXiv:2502.03128},
  year={2025}
}
@inproceedings{wang2024maskgct,
  author={Wang, Yuancheng and Zhan, Haoyue and Liu, Liwei and Zeng, Ruihong and Guo, Haotian and Zheng, Jiachen and Zhang, Qiang and Zhang, Xueyao and Zhang, Shunsi and Wu, Zhizheng},
  title={MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer},
  booktitle    = {{ICLR}},
  publisher    = {OpenReview.net},
  year         = {2025}
}
@article{amphion_v0.2,
  title        = {Overview of the Amphion Toolkit (v0.2)},
  author       = {Jiaqi Li and Xueyao Zhang and Yuancheng Wang and Haorui He and Chaoren Wang and Li Wang and Huan Liao and Junyi Ao and Zeyu Xie and Yiqiao Huang and Junan Zhang and Zhizheng Wu},
  year         = {2025},
  journal      = {arXiv preprint arXiv:2501.15442},
}
@inproceedings{amphion,
    author={Zhang, Xueyao and Xue, Liumeng and Gu, Yicheng and Wang, Yuancheng and Li, Jiaqi and He, Haorui and Wang, Chaoren and Song, Ting and Chen, Xi and Fang, Zihao and Chen, Haopeng and Zhang, Junan and Tang, Tze Ying and Zou, Lexiao and Wang, Mingxuan and Han, Jun and Chen, Kai and Li, Haizhou and Wu, Zhizheng},
    title={Amphion: An Open-Source Audio, Music and Speech Generation Toolkit},
    booktitle={{IEEE} Spoken Language Technology Workshop, {SLT} 2024},
    year={2024}
}

Runs of amphion Metis on huggingface.co

44
Total runs
0
24-hour runs
0
3-day runs
3
7-day runs
25
30-day runs

More Information About Metis huggingface.co Model

Metis huggingface.co

Metis huggingface.co is an AI model on huggingface.co that provides Metis's model effect (), which can be used instantly with this amphion Metis model. huggingface.co supports a free trial of the Metis model, and also provides paid use of the Metis. Support call Metis model through api, including Node.js, Python, http.

amphion Metis online free

Metis huggingface.co is an online trial and call api platform, which integrates Metis's modeling effects, including api services, and provides a free online trial of Metis, you can try Metis online for free by clicking the link below.

amphion Metis online free url in huggingface.co:

https://huggingface.co/amphion/Metis

Metis install

Metis is an open source model from GitHub that offers a free installation service, and any user can find Metis on GitHub to install. At the same time, huggingface.co provides the effect of Metis install, users can directly use Metis installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

Metis install url in huggingface.co:

https://huggingface.co/amphion/Metis

Url of Metis

Provider of Metis huggingface.co

amphion
ORGANIZATIONS

Other API from amphion

huggingface.co

Total runs: 694
Run Growth: -38
Growth Rate: -5.52%
Updated:April 13 2025
huggingface.co

Total runs: 67
Run Growth: 19
Growth Rate: 26.39%
Updated:April 13 2025
huggingface.co

Total runs: 36
Run Growth: 16
Growth Rate: 44.44%
Updated:September 02 2025
huggingface.co

Total runs: 25
Run Growth: 13
Growth Rate: 54.17%
Updated:December 22 2025
huggingface.co

Total runs: 19
Run Growth: -7
Growth Rate: -36.84%
Updated:April 13 2025
huggingface.co

Total runs: 4
Run Growth: 0
Growth Rate: 0.00%
Updated:July 19 2024
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:January 14 2025
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:October 14 2025
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:September 09 2025
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:December 21 2023
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:April 10 2025
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:May 18 2025
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:September 08 2025