Metis huggingface.co api & amphion Metis github AI Model

Introduction of Metis

Model Details of Metis

Metis : A Foundation Speech Generation Model with Masked Generative Pre-training

Overview

We introduce Metis , a foundation model for unified speech generation. Unlike previous task-specific or multi-task models, Metis follows a pre-training and fine-tuning paradigm. It is pre-trained on large-scale unlabeled speech data using masked generative modeling and then fine-tuned to adapt to diverse speech generation tasks. Specifically, (1) Metis utilizes two discrete speech representations: SSL tokens derived from speech self-supervised learning (SSL) features, and acoustic tokens directly quantized from waveforms. (2) Metis performs masked generative pre-training on SSL tokens, utilizing 300K hours of diverse speech data, without any additional condition. (3) Through fine-tuning with task-specific conditions, Metis achieves efficient adaptation to various speech generation tasks while supporting multimodal input, even when using limited data and trainable parameters. Experiments demonstrate that Metis can serve as a foundation model for unified speech generation: Metis outperforms state-of-the-art task-specific or multi-task systems across five speech generation tasks, including zero-shot text-to-speech, voice conversion, target speaker extraction, speech enhancement, and lip-to-speech, even with fewer than 20M trainable parameters or 300 times less training data. Audio samples are are available at demo page .

Model Introduction

Metis is fully compatible with MaskGCT and shares several key model components with it. These shared components are:

Model Name	Description
Semantic Codec	Converting speech to semantic tokens.
Acoustic Codec	Converting speech to acoustic tokens and reconstructing waveform from acoustic tokens.
Semantic2Acoustic	Predicts acoustic tokens conditioned on semantic tokens.

We open-source the pretrained model checkpoint of the first stage of Metis (with masked generative pre-training), as well as the fine-tuned models for speech enhancement (SE), target speaker extraction (TSE), voice conversion (VC), lip-to-speech (L2S), and the unified multi-task (Omni) model.

For zero-shot text-to-speech, you can download the text2semantic model from MaskGCT, which is compatible with the Metis framework.

Model Name	Description
Metis-Base	The base model pre-trained with masked generative pre-training.
Metis-TSE	Fine-tuned model for target speaker extraction. Available in both full-scale and LoRA (r = 32) versions.
Metis-VC	Fine-tuned model for voice conversion. Available in full-scale version.
Metis-SE	Fine-tuned model for speech enhancement. Available in both full-scale and LoRA (r = 32) versions.
Metis-L2S	Fine-tuned model for lip-to-speech. Available in full-scale version.
Metis-TTS	Zero-shot text-to-speech model (as same as the first stage of MaskGCT).
Metis-Omni	Unified multi-task model supporting zero-shot TTS, VC, TSE, and SE.

Usage

Citations

If you use Metis in your research, please cite the following paper:

@article{wang2025metis,
  title={Metis: A Foundation Speech Generation Model with Masked Generative Pre-training},
  author={Wang, Yuancheng and Zheng, Jiachen and Zhang, Junan and Zhang, Xueyao and Liao, Huan and Wu, Zhizheng},
  journal={arXiv preprint arXiv:2502.03128},
  year={2025}
}
@inproceedings{wang2024maskgct,
  author={Wang, Yuancheng and Zhan, Haoyue and Liu, Liwei and Zeng, Ruihong and Guo, Haotian and Zheng, Jiachen and Zhang, Qiang and Zhang, Xueyao and Zhang, Shunsi and Wu, Zhizheng},
  title={MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer},
  booktitle    = {{ICLR}},
  publisher    = {OpenReview.net},
  year         = {2025}
}
@article{amphion_v0.2,
  title        = {Overview of the Amphion Toolkit (v0.2)},
  author       = {Jiaqi Li and Xueyao Zhang and Yuancheng Wang and Haorui He and Chaoren Wang and Li Wang and Huan Liao and Junyi Ao and Zeyu Xie and Yiqiao Huang and Junan Zhang and Zhizheng Wu},
  year         = {2025},
  journal      = {arXiv preprint arXiv:2501.15442},
}
@inproceedings{amphion,
    author={Zhang, Xueyao and Xue, Liumeng and Gu, Yicheng and Wang, Yuancheng and Li, Jiaqi and He, Haorui and Wang, Chaoren and Song, Ting and Chen, Xi and Fang, Zihao and Chen, Haopeng and Zhang, Junan and Tang, Tze Ying and Zou, Lexiao and Wang, Mingxuan and Han, Jun and Chen, Kai and Li, Haizhou and Wu, Zhizheng},
    title={Amphion: An Open-Source Audio, Music and Speech Generation Toolkit},
    booktitle={{IEEE} Spoken Language Technology Workshop, {SLT} 2024},
    year={2024}
}

Runs of amphion Metis on huggingface.co

Total runs

24-hour runs

3-day runs

7-day runs

30-day runs

More Information About Metis huggingface.co Model

More Metis license Visit here:

https://choosealicense.com/licenses/cc-by-nc-4.0

Metis huggingface.co

Metis huggingface.co is an AI model on huggingface.co that provides Metis's model effect (), which can be used instantly with this amphion Metis model. huggingface.co supports a free trial of the Metis model, and also provides paid use of the Metis. Support call Metis model through api, including Node.js, Python, http.

Metis huggingface.co Url

https://huggingface.co/amphion/Metis

amphion Metis online free

Metis huggingface.co is an online trial and call api platform, which integrates Metis's modeling effects, including api services, and provides a free online trial of Metis, you can try Metis online for free by clicking the link below.

amphion Metis online free url in huggingface.co:

https://huggingface.co/amphion/Metis

Metis install

Metis is an open source model from GitHub that offers a free installation service, and any user can find Metis on GitHub to install. At the same time, huggingface.co provides the effect of Metis install, users can directly use Metis installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

Metis install url in huggingface.co:

https://huggingface.co/amphion/Metis

huggingface.co

amphion/MaskGCT

Total runs: 694

Run Growth: -38

Growth Rate: -5.52%

Updated:April 13 2025

huggingface.co

amphion/Vevo

Total runs: 67

Run Growth: 19

Growth Rate: 26.39%

Updated:April 13 2025

huggingface.co

amphion/TaDiCodec-TTS-MGM

Total runs: 38

Run Growth: 32

Growth Rate: 84.21%

Updated:September 02 2025

huggingface.co

amphion/TaDiCodec

Total runs: 36

Run Growth: 16

Growth Rate: 44.44%

Updated:September 02 2025

huggingface.co

amphion/anyaccomp

Total runs: 25

Run Growth: 13

Growth Rate: 54.17%

Updated:December 22 2025

huggingface.co

amphion/TaDiCodec-TTS-AR-Qwen2.5-0.5B

Total runs: 24

Run Growth: 18

Growth Rate: 72.00%

Updated:September 02 2025

huggingface.co

amphion/Vevo1.5

Total runs: 19

Run Growth: -7

Growth Rate: -36.84%

Updated:April 13 2025

huggingface.co

amphion/TaDiCodec-TTS-AR-Qwen2.5-3B

Total runs: 8

Run Growth: -4

Growth Rate: -50.00%

Updated:August 27 2025

huggingface.co

amphion/valle

Total runs: 4

Run Growth: 0

Growth Rate: 0.00%

Updated:July 19 2024

huggingface.co

amphion/anyenhance

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:January 14 2025

huggingface.co

amphion/vits_hifitts

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:February 24 2024

huggingface.co

amphion/deepfake_detection

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:January 04 2025

huggingface.co

amphion/dualcodec

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:October 14 2025

huggingface.co

amphion/naturalspeech3_facodec

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:March 13 2024

huggingface.co

amphion/INTP

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:September 09 2025

huggingface.co

amphion/diffwave

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 21 2023

huggingface.co

amphion/VevoSing

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:April 10 2025

huggingface.co

amphion/hifigan_ljspeech

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:February 24 2024

huggingface.co

amphion/text_to_audio

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 18 2023

huggingface.co

amphion/fastspeech2_ljspeech

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:February 24 2024

huggingface.co

amphion/Ints

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:May 18 2025

huggingface.co

amphion/valle_librilight_6k

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:January 24 2024

huggingface.co

amphion/naturalspeech2_libritts

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 19 2023

huggingface.co

amphion/valle_libritts

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:January 11 2024

huggingface.co

amphion/BigVGAN_singing_bigdata

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 21 2023

huggingface.co

amphion/singing_voice_conversion

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 21 2023

huggingface.co

amphion/vits_ljspeech

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:February 24 2024

huggingface.co

amphion/hifigan_speech_bigdata

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 21 2023

huggingface.co

amphion/Vevo2

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:September 08 2025

huggingface.co

amphion/dualcodec-tts

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:June 03 2025

amphion / Metis

Introduction of Metis

Model Details of Metis

Metis : A Foundation Speech Generation Model with Masked Generative Pre-training

Overview

Model Introduction

Usage

Citations

Runs of amphion Metis on huggingface.co

More Information About Metis huggingface.co Model

More Metis license Visit here:

Metis huggingface.co

Metis huggingface.co Url

amphion Metis online free

amphion Metis online free url in huggingface.co:

Metis install

Metis install url in huggingface.co:

Url of Metis

Metis huggingface.co Url

Provider of Metis huggingface.co

Other API from amphion