fa-zh huggingface.co api & funasr fa-zh github AI Model

Introduction of fa-zh

Model Details of fa-zh

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

FunASR hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model, researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun！

Highlights

FunASR is a fundamental speech recognition toolkit that offers a variety of features, including speech recognition (ASR), Voice Activity Detection (VAD), Punctuation Restoration, Language Models, Speaker Verification, Speaker Diarization and multi-talker ASR. FunASR provides convenient scripts and tutorials, supporting inference and fine-tuning of pre-trained models.
We have released a vast collection of academic and industrial pretrained models on the ModelScope and huggingface , which can be accessed through our Model Zoo . The representative Paraformer-large , a non-autoregressive end-to-end speech recognition model, has the advantages of high accuracy, high efficiency, and convenient deployment, supporting the rapid construction of speech recognition services. For more details on service deployment, please refer to the service deployment document .

Installation

pip3 install -U funasr

Or install from source code

git clone https://github.com/alibaba/FunASR.git && cd FunASR
pip3 install -e ./

Install modelscope for the pretrained models (Optional)

pip3 install -U modelscope

Model Zoo

FunASR has open-sourced a large number of pre-trained models on industrial data. You are free to use, copy, modify, and share FunASR models under the Model License Agreement . Below are some representative models, for more models please refer to the Model Zoo .

(Note: 🤗 represents the Huggingface model zoo link, ⭐ represents the ModelScope model zoo link)

Model Name	Task Details	Training Data	Parameters
paraformer-zh ( ⭐ 🤗 )	speech recognition, with timestamps, non-streaming	60000 hours, Mandarin	220M
paraformer-zh-streaming ( ⭐ 🤗 )	speech recognition, streaming	60000 hours, Mandarin	220M
paraformer-en ( ⭐ 🤗 )	speech recognition, with timestamps, non-streaming	50000 hours, English	220M
conformer-en ( ⭐ 🤗 )	speech recognition, non-streaming	50000 hours, English	220M
ct-punc ( ⭐ 🤗 )	punctuation restoration	100M, Mandarin and English	1.1G
fsmn-vad ( ⭐ 🤗 )	voice activity detection	5000 hours, Mandarin and English	0.4M
fa-zh ( ⭐ 🤗 )	timestamp prediction	5000 hours, Mandarin	38M
cam++ ( ⭐ 🤗 )	speaker verification/diarization	5000 hours	7.2M

Quick Start

Below is a quick start tutorial. Test audio files ( Mandarin , English ).

Command-line usage

funasr +model=paraformer-zh +vad_model="fsmn-vad" +punc_model="ct-punc" +input=asr_example_zh.wav

Notes: Support recognition of single audio file, as well as file list in Kaldi-style wav.scp format: wav_id wav_pat

Speech Recognition (Non-streaming)

from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-zh", model_revision="v2.0.4",
                  vad_model="fsmn-vad", vad_model_revision="v2.0.4",
                  punc_model="ct-punc-c", punc_model_revision="v2.0.4",
                  # spk_model="cam++", spk_model_revision="v2.0.2",
                  )
res = model.generate(input=f"{model.model_path}/example/asr_example.wav", 
                     batch_size_s=300, 
                     hotword='魔搭')
print(res)

Note: model_hub : represents the model repository, ms stands for selecting ModelScope download, hf stands for selecting Huggingface download.

Speech Recognition (Streaming)

from funasr import AutoModel

chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.4")

import soundfile
import os

wav_file = os.path.join(model.model_path, "example/asr_example.wav")
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600ms

cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
    speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
    is_final = i == total_chunk_num - 1
    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
    print(res)

Note: chunk_size is the configuration for streaming latency. [0,10,5] indicates that the real-time display granularity is 10*60=600ms , and the lookahead information is 5*60=300ms . Each inference input is 600ms (sample points are 16000*0.6=960 ), and the output is the corresponding text. For the last speech segment input, is_final=True needs to be set to force the output of the last word.

Voice Activity Detection (Non-Streaming)

from funasr import AutoModel

model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")
wav_file = f"{model.model_path}/example/asr_example.wav"
res = model.generate(input=wav_file)
print(res)

Voice Activity Detection (Streaming)

from funasr import AutoModel

chunk_size = 200 # ms
model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")

import soundfile

wav_file = f"{model.model_path}/example/vad_example.wav"
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = int(chunk_size * sample_rate / 1000)

cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
    speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
    is_final = i == total_chunk_num - 1
    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size)
    if len(res[0]["value"]):
        print(res)

Punctuation Restoration

from funasr import AutoModel

model = AutoModel(model="ct-punc", model_revision="v2.0.4")
res = model.generate(input="那今天的会就到这里吧 happy new year 明年见")
print(res)

Timestamp Prediction

from funasr import AutoModel

model = AutoModel(model="fa-zh", model_revision="v2.0.4")
wav_file = f"{model.model_path}/example/asr_example.wav"
text_file = f"{model.model_path}/example/text.txt"
res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
print(res)

More examples ref to docs

Runs of funasr fa-zh on huggingface.co

Total runs

24-hour runs

3-day runs

7-day runs

30-day runs

More Information About fa-zh huggingface.co Model

More fa-zh license Visit here:

https://choosealicense.com/licenses/model-license

fa-zh huggingface.co

fa-zh huggingface.co is an AI model on huggingface.co that provides fa-zh's model effect (), which can be used instantly with this funasr fa-zh model. huggingface.co supports a free trial of the fa-zh model, and also provides paid use of the fa-zh. Support call fa-zh model through api, including Node.js, Python, http.

fa-zh huggingface.co Url

https://huggingface.co/funasr/fa-zh

funasr fa-zh online free

fa-zh huggingface.co is an online trial and call api platform, which integrates fa-zh's modeling effects, including api services, and provides a free online trial of fa-zh, you can try fa-zh online for free by clicking the link below.

funasr fa-zh online free url in huggingface.co:

https://huggingface.co/funasr/fa-zh

fa-zh install

fa-zh is an open source model from GitHub that offers a free installation service, and any user can find fa-zh on GitHub to install. At the same time, huggingface.co provides the effect of fa-zh install, users can directly use fa-zh installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

fa-zh install url in huggingface.co:

https://huggingface.co/funasr/fa-zh

huggingface.co

funasr/campplus

Total runs: 754

Run Growth: -40

Growth Rate: -5.31%

Updated:February 01 2024

huggingface.co

funasr/paraformer-zh

Total runs: 453

Run Growth: 158

Growth Rate: 34.88%

Updated:February 02 2024

huggingface.co

funasr/fsmn-vad

Total runs: 400

Run Growth: 32

Growth Rate: 8.00%

Updated:February 01 2024

huggingface.co

funasr/ct-punc

Total runs: 199

Run Growth: 60

Growth Rate: 30.15%

Updated:February 01 2024

huggingface.co

funasr/Paraformer-large

Total runs: 176

Run Growth: 78

Growth Rate: 44.32%

Updated:April 23 2023

huggingface.co

funasr/paraformer-zh-streaming

Total runs: 123

Run Growth: 46

Growth Rate: 37.40%

Updated:February 02 2024

huggingface.co

funasr/paraformer-en

Total runs: 16

Run Growth: 2

Growth Rate: 12.50%

Updated:February 01 2024

huggingface.co

funasr/conformer-en

Total runs: 11

Run Growth: -16

Growth Rate: -145.45%

Updated:February 01 2024

huggingface.co

funasr/ct-punc-onnx

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:April 22 2023

huggingface.co

funasr/SeACo-Paraformer-large

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:February 01 2024

huggingface.co

funasr/fsmn-vad-onnx

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:May 09 2023

funasr / fa-zh

Introduction of fa-zh

Model Details of fa-zh

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

Highlights

Installation

Model Zoo

Quick Start

Command-line usage

Speech Recognition (Non-streaming)

Speech Recognition (Streaming)

Voice Activity Detection (Non-Streaming)

Voice Activity Detection (Streaming)

Punctuation Restoration

Timestamp Prediction

Runs of funasr fa-zh on huggingface.co

More Information About fa-zh huggingface.co Model

More fa-zh license Visit here:

fa-zh huggingface.co

fa-zh huggingface.co Url

funasr fa-zh online free

funasr fa-zh online free url in huggingface.co:

fa-zh install

fa-zh install url in huggingface.co:

Url of fa-zh

fa-zh huggingface.co Url

Provider of fa-zh huggingface.co

Other API from funasr