Delayed Fusion: Integrating Massive Language Fashions into First-Go Decoding in Finish-to-end Speech Recognition

Bringing which means into expertise deployment | MIT Information

Google for Nonprofits to develop to 100+ new international locations and launch 10+ new no-cost AI options

When “Sufficient” Nonetheless Feels Empty: Sitting within the Ache of What’s Subsequent | by Chrissie Michelle, PhD Survivors Area | Jun, 2025

This paper presents an environment friendly decoding method for end-to-end computerized speech recognition (E2E-ASR) with giant language fashions (LLMs). Though shallow fusion is the commonest method to include language fashions into E2E-ASR decoding, we face two sensible issues with LLMs. (1) LLM inference is computationally expensive. (2) There could also be a vocabulary mismatch between the ASR mannequin and the LLM. To resolve this mismatch, we have to retrain the ASR mannequin and/or the LLM, which is at greatest time-consuming and in lots of circumstances not possible. We suggest “delayed fusion,” which applies LLM scores to ASR hypotheses with a delay throughout decoding and permits simpler use of pre-trained LLMs in ASR duties. This methodology can scale back not solely the variety of hypotheses scored by the LLM but in addition the variety of LLM inference calls. It additionally permits re-tokenizion of ASR hypotheses throughout decoding if ASR and LLM make use of completely different tokenizations. We reveal that delayed fusion offers improved decoding pace and accuracy in comparison with shallow fusion and N-best rescoring utilizing the LibriHeavy ASR corpus and three public LLMs, OpenLLaMA 3B & 7B and Mistral 7B.

Delayed Fusion: Integrating Massive Language Fashions into First-Go Decoding in Finish-to-end Speech Recognition

Bringing which means into expertise deployment | MIT Information

Google for Nonprofits to develop to 100+ new international locations and launch 10+ new no-cost AI options

When “Sufficient” Nonetheless Feels Empty: Sitting within the Ache of What’s Subsequent | by Chrissie Michelle, PhD Survivors Area | Jun, 2025

Understanding MLOps with ZenML Mission

Google DeepMind at NeurIPS 2024

Md Sazzad Hossain

Related Posts

Bringing which means into expertise deployment | MIT Information

Google for Nonprofits to develop to 100+ new international locations and launch 10+ new no-cost AI options

When “Sufficient” Nonetheless Feels Empty: Sitting within the Ache of What’s Subsequent | by Chrissie Michelle, PhD Survivors Area | Jun, 2025

Apple Machine Studying Analysis at CVPR 2025

Constructing clever AI voice brokers with Pipecat and Amazon Bedrock – Half 1

Google DeepMind at NeurIPS 2024

Leave a Reply Cancel reply

Recommended

Regulation Enforcement Crackdowns Drive Novel Ransomware Affiliate Schemes

T-Cellular information breach: They owe you cash—right here’s how one can declare it

Categories

CyberDefenseGo

Recent

Why Each Enterprise Wants a Regulatory & Compliance Lawyer—and the Proper IT Infrastructure to Assist Them

“Scientific poetic license?” What do you name it when somebody is mendacity however they’re doing it in such a socially-acceptable manner that no person ever calls them on it?

Search

Welcome Back!

Retrieve your password

Delayed Fusion: Integrating Massive Language Fashions into First-Go Decoding in Finish-to-end Speech Recognition

You might also like

Understanding MLOps with ZenML Mission

Google DeepMind at NeurIPS 2024

Related Posts

Leave a Reply Cancel reply

Recommended

Categories

CyberDefenseGo

Recent

Search

Welcome Back!

Retrieve your password