dualcodec huggingface.co api & amphion dualcodec github AI Model

Introduction of dualcodec

Model Details of dualcodec

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation

About

DualCodec is a low-frame-rate (12.5Hz or 25Hz), semantically-enhanced (with SSL feature) Neural Audio Codec designed to extract discrete tokens for efficient speech generation.

You can check out its demo page . The overview of DualCodec system is shown in the following figure:

Installation

pip install dualcodec

News

2025-05-19: DualCodec is accepted to Interspeech 2025!
2025-03-30: Added automatic downloading from huggingface. Uploaded some TTS models (DualCodec-VALLE, DualCodec-Voicebox).
2025-01-22: I added training and finetuning instructions for DualCodec, as well as a gradio interface. Version is v0.3.0.
2025-01-16: Finished writing DualCodec inference codes, the version is v0.1.0. Latest versions are synced to pypi.

Available models

Model_ID	Frame Rate	RVQ Quantizers	Semantic Codebook Size (RVQ-1 Size)	Acoustic Codebook Size (RVQ-rest Size)	Training Data
12hz_v1	12.5Hz	Any from 1-8 (maximum 8)	16384	4096	100K hours Emilia
25hz_v1	25Hz	Any from 1-12 (maximum 12)	16384	1024	100K hours Emilia

How to inference DualCodec

1. Programmic usage (automatically downloads checkpoints from Huggingface):

import dualcodec

model_id = "12hz_v1" # select from available Model_IDs, "12hz_v1" or "25hz_v1"

dualcodec_model = dualcodec.get_model(model_id)
dualcodec_inference = dualcodec.Inference(dualcodec_model=dualcodec_model, device="cuda")

# do inference for your wav
import torchaudio
audio, sr = torchaudio.load("YOUR_WAV.wav")
# resample to 24kHz
audio = torchaudio.functional.resample(audio, sr, 24000)
audio = audio.reshape(1,1,-1)
audio = audio.to("cuda")
# extract codes, for example, using 8 quantizers here:
semantic_codes, acoustic_codes = dualcodec_inference.encode(audio, n_quantizers=8)
# semantic_codes shape: torch.Size([B, 1, T])
# acoustic_codes shape: torch.Size([B, n_quantizers-1, T])

# produce output audio
out_audio = dualcodec_inference.decode(semantic_codes, acoustic_codes)

# save output audio
torchaudio.save("out.wav", out_audio.cpu().squeeze(0), 24000)

2. Alternative usage with local checkpoints

First, download checkpoints to local:

# export HF_ENDPOINT=https://hf-mirror.com      # uncomment this to use huggingface mirror if you're in China
huggingface-cli download facebook/w2v-bert-2.0 --local-dir w2v-bert-2.0
huggingface-cli download amphion/dualcodec dualcodec_12hz_16384_4096.safetensors dualcodec_25hz_16384_1024.safetensors w2vbert2_mean_var_stats_emilia.pt --local-dir dualcodec_ckpts

The second command downloads the two DualCodec model (12hz_v1 and 25hz_v1) checkpoints and a w2v-bert-2 mean and variance statistics to the local directory dualcodec_ckpts .

Then you can use the following code to inference DualCodec with local checkpoints.

import dualcodec

w2v_path = "./w2v-bert-2.0" # your downloaded path
dualcodec_model_path = "./dualcodec_ckpts" # your downloaded path
model_id = "12hz_v1" # select from available Model_IDs, "12hz_v1" or "25hz_v1"

dualcodec_model = dualcodec.get_model(model_id, dualcodec_model_path)
dualcodec_inference = dualcodec.Inference(dualcodec_model=dualcodec_model, dualcodec_path=dualcodec_model_path, w2v_path=w2v_path, device="cuda")

# do inference for your wav
import torchaudio
audio, sr = torchaudio.load("YOUR_WAV.wav")
# resample to 24kHz
audio = torchaudio.functional.resample(audio, sr, 24000)
audio = audio.reshape(1,1,-1)
audio = audio.to("cuda")
# extract codes, for example, using 8 quantizers here:
semantic_codes, acoustic_codes = dualcodec_inference.encode(audio, n_quantizers=8)
# semantic_codes shape: torch.Size([1, 1, T])
# acoustic_codes shape: torch.Size([1, n_quantizers-1, T])

# produce output audio. If `acoustic_codes=None` is passed, will decode only semantic codes (RVQ-1)
out_audio = dualcodec_inference.decode(semantic_codes, acoustic_codes)

# save output audio
torchaudio.save("out.wav", out_audio.cpu().squeeze(0), 24000)

See "example.ipynb" for a running example.

3. Google Colab

The notebook provides a demo of reconstructing audios using different number of RVQ layers:

4. Gradio interface

If you want to use the Gradio interface, you can run the following command:

python -m dualcodec.app

This will launch an app that allows you to upload a wav file and get the output wav file.

DualCodec-based TTS models

Models available:

DualCodec-VALLE: A super fast 12.5Hz VALL-E TTS model based on DualCodec.
DualCodec-Voicebox: A flow matching decoder for DualCodec 12.5Hz's semantic codes. (this can be used as the second stage of tts). The component alone is not a TTS.

To continue, first install other necessary components for training:

pip install "dualcodec[tts]"

Alternatively, if you want to install from source,

pip install -e .[tts]

DualCodec-VALLE

DualCodec-VALLE is a TTS model based on DualCodec. It is trained with 12Hz sampling rate and 8 quantizers. The model is trained on 100K hours of Emilia data.

CLI Inference

python -m dualcodec.infer.valle.cli_valle_infer --ref_audio <path_to_ref_audio> --ref_text "TEXT OF YOUR REF AUDIO" --gen_text "This is the generated text" --output_dir test --output_file test.wav

You can also leave all options empty and it will use the default values.

Gradio interface

python -m dualcodec.infer.valle.gradio_valle_demo

DualCodec-Voicebox

CLI Inference

python -m dualcodec.infer.voicebox.cli_voicebox_infer --ref_audio <path_to_ref_audio> --output_dir test --output_file test.wav

You can also leave all options empty and it will use the default values.

FAQ

If you meet problems with environment in this stage, try the following:

pip install -U wandb protobuf transformers

Training DualCodec from scratch

Install other necessary components for training:

pip install "dualcodec[tts]"

Clone this repository and cd to the project root folder (the folder that contains this readme):

git clone https://github.com/jiaqili3/DualCodec.git
cd DualCodec

To run example training on example Emilia German data:

accelerate launch train.py --config-name=dualcodec_train \
model=dualcodec_12hz_16384_4096_8vq \
trainer.batch_size=3 \
data.segment_speech.segment_length=24000

This trains from scratch a v1_12hz model with a training batch size of 3. (typically you need larger batch sizes like 10)

To train a v1_25Hz model:

accelerate launch train.py --config-name=dualcodec_train \
model=dualcodec_25hz_16384_1024_12vq \
trainer.batch_size=3 \
data.segment_speech.segment_length=24000

Finetuning DualCodec

Install other necessary components for training:

pip install "dualcodec[train]"

Clone this repository and cd to the project root folder (the folder that contains this readme).
Get discriminator checkpoints:

huggingface-cli download amphion/dualcodec --local-dir dualcodec_ckpts

To run example finetuning on Emilia German data (streaming, no need to download files. Need network access to Huggingface):

accelerate launch train.py --config-name=dualcodec_ft_12hzv1 \
trainer.batch_size=3 \
data.segment_speech.segment_length=24000

This finetunes a 12hz_v1 model with a training batch size of 3. (typically you need larger batch sizes like 10)

To finetune a 25Hz_V1 model:

accelerate launch train.py --config-name=dualcodec_ft_25hzv1 \
trainer.batch_size=3 \
data.segment_speech.segment_length=24000

Citation

@inproceedings{dualcodec,
  title     = {DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation},
  author    = {Li, Jiaqi and Lin, Xiaolong and Li, Zhekai and Huang, Shixi and Wang, Yuancheng and Wang, Chaoren and Zhan, Zhenpeng and Wu, Zhizheng},
  booktitle = {Proceedings of Interspeech 2025},
  year      = {2025}
}

If you use this with Amphion toolkit, please consider citing:

@article{amphion2,
  title        = {Overview of the Amphion Toolkit (v0.2)},
  author       = {Jiaqi Li and Xueyao Zhang and Yuancheng Wang and Haorui He and Chaoren Wang and Li Wang and Huan Liao and Junyi Ao and Zeyu Xie and Yiqiao Huang and Junan Zhang and Zhizheng Wu},
  year         = {2025},
  journal      = {arXiv preprint arXiv:2501.15442},
}

@inproceedings{amphion,
    author={Xueyao Zhang and Liumeng Xue and Yicheng Gu and Yuancheng Wang and Jiaqi Li and Haorui He and Chaoren Wang and Ting Song and Xi Chen and Zihao Fang and Haopeng Chen and Junan Zhang and Tze Ying Tang and Lexiao Zou and Mingxuan Wang and Jun Han and Kai Chen and Haizhou Li and Zhizheng Wu},
    title={Amphion: An Open-Source Audio, Music and Speech Generation Toolkit},
    booktitle={{IEEE} Spoken Language Technology Workshop, {SLT} 2024},
    year={2024}
}

Runs of amphion dualcodec on huggingface.co

Total runs

24-hour runs

3-day runs

7-day runs

30-day runs

More Information About dualcodec huggingface.co Model

More dualcodec license Visit here:

https://choosealicense.com/licenses/apache-2.0

dualcodec huggingface.co

dualcodec huggingface.co is an AI model on huggingface.co that provides dualcodec's model effect (), which can be used instantly with this amphion dualcodec model. huggingface.co supports a free trial of the dualcodec model, and also provides paid use of the dualcodec. Support call dualcodec model through api, including Node.js, Python, http.

dualcodec huggingface.co Url

https://huggingface.co/amphion/dualcodec

amphion dualcodec online free

dualcodec huggingface.co is an online trial and call api platform, which integrates dualcodec's modeling effects, including api services, and provides a free online trial of dualcodec, you can try dualcodec online for free by clicking the link below.

amphion dualcodec online free url in huggingface.co:

https://huggingface.co/amphion/dualcodec

dualcodec install

dualcodec is an open source model from GitHub that offers a free installation service, and any user can find dualcodec on GitHub to install. At the same time, huggingface.co provides the effect of dualcodec install, users can directly use dualcodec installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

dualcodec install url in huggingface.co:

https://huggingface.co/amphion/dualcodec

huggingface.co

amphion/MaskGCT

Total runs: 700

Run Growth: -4

Growth Rate: -0.57%

Updated:April 13 2025

huggingface.co

amphion/Vevo

Total runs: 71

Run Growth: 9

Growth Rate: 13.43%

Updated:April 13 2025

huggingface.co

amphion/Metis

Total runs: 45

Run Growth: 26

Growth Rate: 59.09%

Updated:April 13 2025

huggingface.co

amphion/TaDiCodec-TTS-MGM

Total runs: 38

Run Growth: 32

Growth Rate: 84.21%

Updated:September 02 2025

huggingface.co

amphion/TaDiCodec

Total runs: 38

Run Growth: 20

Growth Rate: 55.56%

Updated:September 02 2025

huggingface.co

amphion/anyaccomp

Total runs: 26

Run Growth: 14

Growth Rate: 56.00%

Updated:December 22 2025

huggingface.co

amphion/TaDiCodec-TTS-AR-Qwen2.5-0.5B

Total runs: 25

Run Growth: 17

Growth Rate: 70.83%

Updated:September 02 2025

huggingface.co

amphion/Vevo1.5

Total runs: 19

Run Growth: -6

Growth Rate: -31.58%

Updated:April 13 2025

huggingface.co

amphion/TaDiCodec-TTS-AR-Qwen2.5-3B

Total runs: 8

Run Growth: -4

Growth Rate: -50.00%

Updated:August 27 2025

huggingface.co

amphion/valle

Total runs: 4

Run Growth: 0

Growth Rate: 0.00%

Updated:July 19 2024

huggingface.co

amphion/anyenhance

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:January 14 2025

huggingface.co

amphion/vits_hifitts

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:February 24 2024

huggingface.co

amphion/deepfake_detection

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:January 04 2025

huggingface.co

amphion/diffwave

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 21 2023

huggingface.co

amphion/INTP

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:September 09 2025

huggingface.co

amphion/naturalspeech3_facodec

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:March 13 2024

huggingface.co

amphion/VevoSing

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:April 10 2025

huggingface.co

amphion/hifigan_ljspeech

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:February 24 2024

huggingface.co

amphion/fastspeech2_ljspeech

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:February 24 2024

huggingface.co

amphion/text_to_audio

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 18 2023

huggingface.co

amphion/Ints

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:May 18 2025

huggingface.co

amphion/valle_libritts

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:January 11 2024

huggingface.co

amphion/naturalspeech2_libritts

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 19 2023

huggingface.co

amphion/valle_librilight_6k

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:January 24 2024

huggingface.co

amphion/BigVGAN_singing_bigdata

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 21 2023

huggingface.co

amphion/singing_voice_conversion

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 21 2023

huggingface.co

amphion/vits_ljspeech

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:February 24 2024

huggingface.co

amphion/hifigan_speech_bigdata

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 21 2023

huggingface.co

amphion/Vevo2

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:September 08 2025

huggingface.co

amphion/dualcodec-tts

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:June 03 2025

amphion / dualcodec

Introduction of dualcodec

Model Details of dualcodec

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation

About

Installation

News

Available models

How to inference DualCodec

1. Programmic usage (automatically downloads checkpoints from Huggingface):

2. Alternative usage with local checkpoints

3. Google Colab

4. Gradio interface

DualCodec-based TTS models

DualCodec-VALLE

CLI Inference

Gradio interface

DualCodec-Voicebox

CLI Inference

FAQ

Training DualCodec from scratch

Finetuning DualCodec

Citation

Runs of amphion dualcodec on huggingface.co

More Information About dualcodec huggingface.co Model

More dualcodec license Visit here:

dualcodec huggingface.co

dualcodec huggingface.co Url

amphion dualcodec online free

amphion dualcodec online free url in huggingface.co:

dualcodec install

dualcodec install url in huggingface.co:

Url of dualcodec

dualcodec huggingface.co Url

Provider of dualcodec huggingface.co

Other API from amphion