Speech Studio
Early-stage speech recognition and synthesis experiments for Balochi — beta, coming in Phase 3-4.
Phase 3-4: Speech Modules (Months 10-18)
Coming Soon
Speech is one of the most challenging areas for a low-resource language like Balochi. We are starting small — collecting initial speech samples and testing how existing models respond to Balochi audio. This is a beta effort that will improve gradually.
Automatic Speech Recognition (ASR)
Phase 4
Experimenting with OpenAI's Whisper and Meta's Wav2Vec2 models using Balochi speech recordings. This is an early-stage effort to explore how well these models can adapt to Balochi — results will depend on the quality and volume of speech data we can collect.
Base ModelWhisper (OpenAI) + Wav2Vec2 (Meta)
Data Needed150-200 hours of transcribed Balochi speech across 3 dialects
Data collection pending
Text-to-Speech (TTS)
Phase 3
Exploring a basic TTS prototype using Tacotron 2 or FastSpeech 2 with HiFi-GAN vocoder. This is experimental — initial results may sound rough as Balochi phoneme mapping and clean training data are still being developed.
Base ModelTacotron 2 / FastSpeech 2 + HiFi-GAN
Data NeededClean single-speaker Balochi recordings + phoneme mapping tables
Data collection pending
Dialect Coverage Plan
Southern Balochi
Makran, coastal areas
~8M speakers
Primary focus
Eastern Balochi
Punjab, Sindh borders
~3M speakers
Planned
Western Balochi
Iran, Afghanistan
~4M speakers
Planned
Note: Speech tools are in early research phase. Quality will depend on the volume of Balochi speech data we can collect and annotate. Initial results may be limited but will improve over time as the dataset grows.