Speech Studio

Early-stage speech recognition and synthesis experiments for Balochi — beta, coming in Phase 3-4.

Phase 3-4: Speech Modules (Months 10-18)

Coming Soon

Speech is one of the most challenging areas for a low-resource language like Balochi. We are starting small — collecting initial speech samples and testing how existing models respond to Balochi audio. This is a beta effort that will improve gradually.

Automatic Speech Recognition (ASR)
Phase 4
Experimenting with OpenAI's Whisper and Meta's Wav2Vec2 models using Balochi speech recordings. This is an early-stage effort to explore how well these models can adapt to Balochi — results will depend on the quality and volume of speech data we can collect.
Base ModelWhisper (OpenAI) + Wav2Vec2 (Meta)
Data Needed150-200 hours of transcribed Balochi speech across 3 dialects
Data collection pending
Text-to-Speech (TTS)
Phase 3
Exploring a basic TTS prototype using Tacotron 2 or FastSpeech 2 with HiFi-GAN vocoder. This is experimental — initial results may sound rough as Balochi phoneme mapping and clean training data are still being developed.
Base ModelTacotron 2 / FastSpeech 2 + HiFi-GAN
Data NeededClean single-speaker Balochi recordings + phoneme mapping tables
Data collection pending
Dialect Coverage Plan

Southern Balochi

Makran, coastal areas

~8M speakers

Primary focus

Eastern Balochi

Punjab, Sindh borders

~3M speakers

Planned

Western Balochi

Iran, Afghanistan

~4M speakers

Planned

Note: Speech tools are in early research phase. Quality will depend on the volume of Balochi speech data we can collect and annotate. Initial results may be limited but will improve over time as the dataset grows.