dMel: Speech Tokenization Made Easy

Bringing which means into expertise deployment | MIT Information

Google for Nonprofits to develop to 100+ new international locations and launch 10+ new no-cost AI options

NVIDIA CEO Drops the Blueprint for Europe’s AI Growth

Massive language fashions have revolutionized pure language processing by leveraging self-supervised pretraining on huge textual information. Impressed by this success, researchers have investigated difficult speech tokenization strategies to discretize steady speech indicators in order that language modeling strategies will be utilized to speech information. Nevertheless, present approaches both mannequin semantic (content material) tokens, doubtlessly dropping acoustic data, or mannequin acoustic tokens, risking the lack of semantic (content material) data. Having a number of token sorts additionally complicates the structure and requires extra pretraining. Right here we present that discretizing mel-filterbank channels into discrete depth bins produces a easy illustration (dMel), that performs higher than different present speech tokenization strategies. Utilizing an LM-style transformer structure for speech-text modeling, we comprehensively consider totally different speech tokenization strategies on speech recognition (ASR) and speech synthesis (TTS). Our outcomes exhibit the effectiveness of dMel in reaching excessive efficiency on each duties inside a unified framework, paving the way in which for environment friendly and efficient joint modeling of speech and textual content.

Determine 1. dMel tokenization and detokenization course of.

Determine 2. Our speech reconstruction experiments in contrast numerous tokenization strategies throughout three audio situations: clear speech, speech with musical background noise, and speech with overlapping audio system. The outcomes exhibit that dMel’s reconstruction efficiency matched floor fact audio high quality when it comes to Phrase Error Charge (WER) for clear speech. Furthermore, whereas all different tokenization strategies failed when musical or speech noise was launched, dMel maintained its efficiency.

dMel: Speech Tokenization Made Easy

Bringing which means into expertise deployment | MIT Information

Google for Nonprofits to develop to 100+ new international locations and launch 10+ new no-cost AI options

NVIDIA CEO Drops the Blueprint for Europe’s AI Growth

Sicherheitsrisiko: Microsoft entfernt VSCode-Erweiterungen | CSO On-line

10 Important AI Safety Practices for Enterprise Methods

Md Sazzad Hossain

Related Posts

Bringing which means into expertise deployment | MIT Information

Google for Nonprofits to develop to 100+ new international locations and launch 10+ new no-cost AI options

NVIDIA CEO Drops the Blueprint for Europe’s AI Growth

When “Sufficient” Nonetheless Feels Empty: Sitting within the Ache of What’s Subsequent | by Chrissie Michelle, PhD Survivors Area | Jun, 2025

Apple Machine Studying Analysis at CVPR 2025

10 Important AI Safety Practices for Enterprise Methods

Leave a Reply Cancel reply

Recommended

Expertise that modified us: The 2000s, from iPhone to Twitter

Dataiku Brings AI Agent Creation to AI Platform

Categories

CyberDefenseGo

Recent

Addressing Vulnerabilities in Positioning, Navigation and Timing (PNT) Companies

Discord Invite Hyperlink Hijacking Delivers AsyncRAT and Skuld Stealer Concentrating on Crypto Wallets

Search

Welcome Back!

Retrieve your password

dMel: Speech Tokenization Made Easy

You might also like

Sicherheitsrisiko: Microsoft entfernt VSCode-Erweiterungen | CSO On-line

10 Important AI Safety Practices for Enterprise Methods

Related Posts

Leave a Reply Cancel reply

Recommended

Categories

CyberDefenseGo

Recent

Search

Welcome Back!

Retrieve your password