Auffusion
is a latent diffusion model (LDM) for text-to-audio (TTA) generation.
Auffusion
can generate realistic audios including human sounds, animal sounds, natural and artificial sounds and sound effects from textual prompts. We introduce Auffusion, a TTA system adapting T2I model frameworks to TTA task, by effectively leveraging their inherent generative strengths and precise cross-modal alignment. Our objective and subjective evaluations demonstrate that Auffusion surpasses previous TTA approaches using limited data and computational resource. We release our model, inference code, and pre-trained checkpoints for the research community.
📣 We are releasing
Auffusion-Full-no-adapter
which was pre-trained on all datasets described in paper and created for easy use of audio manipulation.
📣 We are releasing
Auffusion-Full
which was pre-trained on all datasets described in paper.
📣 We are releasing
Auffusion
which was pre-trained on
AudioCaps
.
Please follow the instructions in the repository for installation, usage and experiments.
Quickstart Guide
First, git clone the repository and install the requirements:
git clone https://github.com/happylittlecat2333/Auffusion/
cd Auffusion
pip install -r requirements.txt
Download the
Auffusion
model and generate audio from a text prompt:
import IPython, torch
import soundfile as sf
from auffusion_pipeline import AuffusionPipeline
pipeline = AuffusionPipeline.from_pretrained("auffusion/auffusion")
prompt = "Birds singing sweetly in a blooming garden"
output = pipeline(prompt=prompt)
audio = output.audios[0]
sf.write(f"{prompt}.wav", audio, samplerate=16000)
IPython.display.Audio(data=audio, rate=16000)
The auffusion model will be automatically downloaded from huggingface and saved in cache. Subsequent runs will load the model directly from cache.
The
generate
function uses 100 steps and 7.5 guidance_scale by default to sample from the latent diffusion model. You can also vary parameters for different results.
Please consider citing the following article if you found our work useful:
@article{xue2024auffusion,
title={Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation},
author={Jinlong Xue and Yayue Deng and Yingming Gao and Ya Li},
journal={arXiv preprint arXiv:2401.01044},
year={2024}
}
Runs of auffusion auffusion on huggingface.co
0
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
0
30-day runs
More Information About auffusion huggingface.co Model
auffusion huggingface.co is an AI model on huggingface.co that provides auffusion's model effect (), which can be used instantly with this auffusion auffusion model. huggingface.co supports a free trial of the auffusion model, and also provides paid use of the auffusion. Support call auffusion model through api, including Node.js, Python, http.
auffusion huggingface.co is an online trial and call api platform, which integrates auffusion's modeling effects, including api services, and provides a free online trial of auffusion, you can try auffusion online for free by clicking the link below.
auffusion auffusion online free url in huggingface.co:
auffusion is an open source model from GitHub that offers a free installation service, and any user can find auffusion on GitHub to install. At the same time, huggingface.co provides the effect of auffusion install, users can directly use auffusion installed effect in huggingface.co for debugging and trial. It also supports api for free installation.