Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System
Attention2Probability (A2P) is a lightweight intervention scheme for speech terminology. The core approach is to use the cross-attention mechanism to retrieve the terms that may appear in the audio and add these terms to the prompt of the llm to complete the term intervention.
News
[2025-08-27] We have released the train and infer code for A2P.
Structure
The overall architecture of Attention2Probability. Audio features are extracted and then fed into a cross-attention retriever, which retrieves the Top-k terms with the highest probability of occurrence within the audio. These retrieved terms are concatenated with the prompt. Finally, the prompt and the audio features are jointly input into the speech large language model.
Installation
A2P is implemented based on the open-source toolkit accelerate
pip3 install -r requirements.txt
Training
Download the data to
/pathtodata
. It's important to change your audio path in json.
Download the model to
/path/pretrained-modelh
. Your can also download
Qwen2-Audio-Instruction
and split it to the audio_tower, projector and embedding.
Running with ```bash ./retriever/train.sh`` in A100-SXM-80GB.
For the dataset configuration, the phrase_type parameter can be adjusted to specify either word-level or phrase-level granularity. It should be noted that models for Chinese are generally trained only at the phrase-level, as word-level granularity is nonsensical for the Chinese language.
Inference
Same as
Training: 1-2
.
Download the ckpt to
ckpt
.
Running with
python3 ./infer/infer.py --config ./infer/infer_config
in A100-SXM-80GB. Now you can change the setting in the
infer_config.json
. Enjoy yourself !
Citation
If you find A2P useful, please cite the paper:
@inproceedings{
dy2025attention,
title={{Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System},
author={Yangfan Du, Jun Zhang, Bin Wang, Jin Qiu, Lu Huang, Yuan Ge, Xiaoqian Liu, Tong Xiao, Jingbo Zhu},
}
Runs of ByteDance Attention2Probability on huggingface.co
0
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
0
30-day runs
More Information About Attention2Probability huggingface.co Model
Attention2Probability huggingface.co is an AI model on huggingface.co that provides Attention2Probability's model effect (), which can be used instantly with this ByteDance Attention2Probability model. huggingface.co supports a free trial of the Attention2Probability model, and also provides paid use of the Attention2Probability. Support call Attention2Probability model through api, including Node.js, Python, http.
Attention2Probability huggingface.co is an online trial and call api platform, which integrates Attention2Probability's modeling effects, including api services, and provides a free online trial of Attention2Probability, you can try Attention2Probability online for free by clicking the link below.
ByteDance Attention2Probability online free url in huggingface.co:
Attention2Probability is an open source model from GitHub that offers a free installation service, and any user can find Attention2Probability on GitHub to install. At the same time, huggingface.co provides the effect of Attention2Probability install, users can directly use Attention2Probability installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
Attention2Probability install url in huggingface.co: