You can use the following code to generate speech from text and a prompt speech.
from models.tts.maskgct.maskgct_utils import *
from huggingface_hub import hf_hub_download
import safetensors
import soundfile as sf
if __name__ == "__main__":
# build model
device = torch.device("cuda:0")
cfg_path = "./models/tts/maskgct/config/maskgct.json"
cfg = load_config(cfg_path)
# 1. build semantic model (w2v-bert-2.0)
semantic_model, semantic_mean, semantic_std = build_semantic_model(device)
# 2. build semantic codec
semantic_codec = build_semantic_codec(cfg.model.semantic_codec, device)
# 3. build acoustic codec
codec_encoder, codec_decoder = build_acoustic_codec(cfg.model.acoustic_codec, device)
# 4. build t2s model
t2s_model = build_t2s_model(cfg.model.t2s_model, device)
# 5. build s2a model
s2a_model_1layer = build_s2a_model(cfg.model.s2a_model.s2a_1layer, device)
s2a_model_full = build_s2a_model(cfg.model.s2a_model.s2a_full, device)
# download checkpoint
...
# load semantic codec
safetensors.torch.load_model(semantic_codec, semantic_code_ckpt)
# load acoustic codec
safetensors.torch.load_model(codec_encoder, codec_encoder_ckpt)
safetensors.torch.load_model(codec_decoder, codec_decoder_ckpt)
# load t2s model
safetensors.torch.load_model(t2s_model, t2s_model_ckpt)
# load s2a model
safetensors.torch.load_model(s2a_model_1layer, s2a_1layer_ckpt)
safetensors.torch.load_model(s2a_model_full, s2a_full_ckpt)
# inference
prompt_wav_path = "./models/tts/maskgct/wav/prompt.wav"
save_path = "[YOUR SAVE PATH]"
prompt_text = " We do not break. We never give in. We never back down."
target_text = "In this paper, we introduce MaskGCT, a fully non-autoregressive TTS model that eliminates the need for explicit alignment information between text and speech supervision."# Specify the target duration (in seconds). If target_len = None, we use a simple rule to predict the target duration.
target_len = 18
recovered_audio = maskgct_inference(prompt_wav_path, prompt_text, target_text, "en", "en", target_len=target_len)
sf.write(save_path, recovered_audio, 24000)
Runs of amphion MaskGCT on huggingface.co
700
Total runs
2
24-hour runs
12
3-day runs
8
7-day runs
-4
30-day runs
More Information About MaskGCT huggingface.co Model
MaskGCT huggingface.co is an AI model on huggingface.co that provides MaskGCT's model effect (), which can be used instantly with this amphion MaskGCT model. huggingface.co supports a free trial of the MaskGCT model, and also provides paid use of the MaskGCT. Support call MaskGCT model through api, including Node.js, Python, http.
MaskGCT huggingface.co is an online trial and call api platform, which integrates MaskGCT's modeling effects, including api services, and provides a free online trial of MaskGCT, you can try MaskGCT online for free by clicking the link below.
amphion MaskGCT online free url in huggingface.co:
MaskGCT is an open source model from GitHub that offers a free installation service, and any user can find MaskGCT on GitHub to install. At the same time, huggingface.co provides the effect of MaskGCT install, users can directly use MaskGCT installed effect in huggingface.co for debugging and trial. It also supports api for free installation.