SpeechKit lives in its own repository:
kombify-SpeechKit. It is a separate component from the main kombify-AI service.Two modes of operation
SpeechKit provides two distinct voice interaction modes:- Assist Mode
- Voice Agent Mode
Standard STT/TTS pipeline for kombify AI Companions.Flow: Microphone input → STT transcription → Companion processes request → TTS speech output
- Push-to-talk or voice activity detection (VAD)
- Audio visualization with waveforms
- Works with any configured Companion
- Provider hot-switching (change STT provider without restart)
Supported STT providers
SpeechKit supports six speech-to-text providers. You can switch between them at runtime without restarting the service.| Provider | Type | Latency | Notes |
|---|---|---|---|
| Azure Speech | Cloud | Low | Microsoft Cognitive Services |
| Google Cloud STT | Cloud | Low | Google Cloud Speech-to-Text |
| Deepgram | Cloud | Very low | Optimized for real-time streaming |
| OpenAI Whisper API | Cloud | Medium | OpenAI hosted Whisper |
| Groq Whisper | Cloud | Low | Groq-accelerated Whisper inference |
| Faster Whisper | Local | Medium | Runs entirely on your machine, no cloud needed |
Text-to-speech
SpeechKit uses Qwen3-TTS via the Kokoro pipeline for text-to-speech. This runs locally on your machine — no cloud API required.Architecture
SpeechKit runs as a self-contained service with two components:- Go 1.25 (backend HTTP server, WebSocket handling via gorilla/websocket)
- React 19 + Vite 6 (frontend UI)
- WebRTC for Gemini Live API connectivity
Configuration
API keys (BYOK)
SpeechKit uses the BYOK (Bring Your Own Keys) model. You provide API keys for whichever cloud STT providers you want to use.Select an STT provider
Choose your preferred provider from the dropdown. You can switch providers at any time without restarting.
Provider selection
You can hot-switch between STT providers during a session. SpeechKit routes audio to the currently selected provider without requiring a restart.Quick start
Assist Mode
Start SpeechKit
Launch the SpeechKit service. The Go backend starts on port
8787 and serves the React frontend.Configure a provider
Select an STT provider and enter your API key (or choose Faster Whisper for local processing).
Voice Agent Mode
Platform support
| Platform | Status |
|---|---|
| Windows | Production-ready |
| Linux | Planned |
SpeechKit is currently Windows-first. Linux support is on the roadmap.
Further reading
Voice interaction
User-facing voice features in kombify AI
BYOK setup
Configure your own API keys for AI providers
