Automatic speech recognition (ASR) model for English transcription as well as translation
OpenAI’s Whisper ASR (Automatic Speech Recognition) model is a state-of-the-art system designed for transcribing spoken language into written text. It exhibits robust performance in realistic, noisy environments, making it highly reliable for real-world applications. Specifically, it excels in long-form transcription, capable of accurately transcribing audio clips up to 30 seconds long. Time to the first token is the encoder's latency, while time to each additional token is decoder's latency, where we assume a mean decoded length specified below.
This model is an implementation of Whisper-Tiny-En found
here
.
This repository provides scripts to run Whisper-Tiny-En on Qualcomm® devices.
More details on model performance across various devices, can be found
here
.
Profile Job summary of WhisperEncoder
--------------------------------------------------
Device: SA8255 (Proxy) (13)
Estimated Inference Time: 278.96 ms
Estimated Peak Memory Range: 0.16-50.72 MB
Compute Units: NPU (313) | Total (313)
Profile Job summary of WhisperDecoder
--------------------------------------------------
Device: SA8255 (Proxy) (13)
Estimated Inference Time: 2.21 ms
Estimated Peak Memory Range: 0.02-152.14 MB
Compute Units: NPU (447) | Total (447)
How does this work?
This
export script
leverages
Qualcomm® AI Hub
to optimize, validate, and deploy this model
on-device. Lets go through each step below in detail:
Step 1:
Compile model for on-device deployment
To compile a PyTorch model for on-device deployment, we first trace the model
in memory using the
jit.trace
and then call the
submit_compile_job
API.
import torch
import qai_hub as hub
from qai_hub_models.models.whisper_tiny_en import Model
# Load the model
torch_model = Model.from_pretrained()
# Device
device = hub.Device("Samsung Galaxy S23")
# Trace model
input_shape = torch_model.get_input_spec()
sample_inputs = torch_model.sample_inputs()
pt_model = torch.jit.trace(torch_model, [torch.tensor(data[0]) for _, data in sample_inputs.items()])
# Compile model on a specific device
compile_job = hub.submit_compile_job(
model=pt_model,
device=device,
input_specs=torch_model.get_input_spec(),
)
# Get target model to run on-device
target_model = compile_job.get_target_model()
Step 2:
Performance profiling on cloud-hosted device
After compiling models from step 1. Models can be profiled model on-device using the
target_model
. Note that this scripts runs the model on a device automatically
provisioned in the cloud. Once the job is submitted, you can navigate to a
provided job URL to view a variety of on-device performance metrics.
Whisper-Tiny-En huggingface.co is an AI model on huggingface.co that provides Whisper-Tiny-En's model effect (), which can be used instantly with this qualcomm Whisper-Tiny-En model. huggingface.co supports a free trial of the Whisper-Tiny-En model, and also provides paid use of the Whisper-Tiny-En. Support call Whisper-Tiny-En model through api, including Node.js, Python, http.
Whisper-Tiny-En huggingface.co is an online trial and call api platform, which integrates Whisper-Tiny-En's modeling effects, including api services, and provides a free online trial of Whisper-Tiny-En, you can try Whisper-Tiny-En online for free by clicking the link below.
qualcomm Whisper-Tiny-En online free url in huggingface.co:
Whisper-Tiny-En is an open source model from GitHub that offers a free installation service, and any user can find Whisper-Tiny-En on GitHub to install. At the same time, huggingface.co provides the effect of Whisper-Tiny-En install, users can directly use Whisper-Tiny-En installed effect in huggingface.co for debugging and trial. It also supports api for free installation.