Speech recognition that understands how the world talks.
Kundoo transcribes 30+ languages with native accuracy — handling code-mixing, accents, and noisy environments that break other models.
Why Kundoo
Code-mixed fluency
Handles Hindi-English, Tamil-English, and other mixed-language conversations natively — no language detection needed.
Real-time streaming
WebSocket-based streaming API with sub-second latency. Transcribe live calls, meetings, and voice agents.
Noisy environments
Trained on factory floors, call centers, and outdoor recordings. Performs where studio-trained models fail.
30+ languages
Hindi, Tamil, Telugu, Bengali, Kannada, Malayalam, Arabic, Japanese, and more — with the same API call.
Custom vocabulary
Add domain-specific terms — product names, medical terms, part numbers — to improve accuracy for your use case.
Speaker diarization
Identify who said what in multi-speaker conversations. Essential for call center analytics and meeting transcription.
From audio to structured text.
Audio input
- PCM, WAV, MP3, FLAC, OGG supported
- Upload files or stream via WebSocket
- Automatic sample rate detection
Language detection
- Auto-detect from 30+ languages
- Force a specific language if known
- Handle code-mixed input gracefully
Transcription
- GPU-accelerated inference
- Real-time streaming output
- Punctuation and capitalization
Structured output
- JSON with timestamps per segment
- Speaker labels when diarization enabled
- Confidence scores per word
Built for real conversations.
Call center analytics
Transcribe customer calls in real-time. Extract intent, sentiment, and compliance violations automatically.
Voice agents
Power conversational AI that understands mixed-language customer queries across phone and WhatsApp.
Meeting transcription
Capture every word in multilingual team meetings. Search, summarize, and share transcripts.
Factory floor logging
Transcribe verbal reports from production workers in noisy environments. Feed structured data into MES.
