Skip to main content
kombify SpeechKit is the speech-to-text (STT) and text-to-speech (TTS) framework that powers voice interaction in kombify AI. It runs as a standalone service with a Go backend and React frontend, connecting to your Companions and the Voice Agent pipeline.
SpeechKit lives in its own repository: kombify-SpeechKit. It is a separate component from the main kombify-AI service.

Two modes of operation

SpeechKit provides two distinct voice interaction modes:
Standard STT/TTS pipeline for kombify AI Companions.Flow: Microphone input → STT transcription → Companion processes request → TTS speech output
  • Push-to-talk or voice activity detection (VAD)
  • Audio visualization with waveforms
  • Works with any configured Companion
  • Provider hot-switching (change STT provider without restart)

Supported STT providers

SpeechKit supports six speech-to-text providers. You can switch between them at runtime without restarting the service.
ProviderTypeLatencyNotes
Azure SpeechCloudLowMicrosoft Cognitive Services
Google Cloud STTCloudLowGoogle Cloud Speech-to-Text
DeepgramCloudVery lowOptimized for real-time streaming
OpenAI Whisper APICloudMediumOpenAI hosted Whisper
Groq WhisperCloudLowGroq-accelerated Whisper inference
Faster WhisperLocalMediumRuns entirely on your machine, no cloud needed
For a fully local setup with no cloud dependencies, use Faster Whisper for STT and Qwen3-TTS for TTS.

Text-to-speech

SpeechKit uses Qwen3-TTS via the Kokoro pipeline for text-to-speech. This runs locally on your machine — no cloud API required.

Architecture

SpeechKit runs as a self-contained service with two components:
┌──────────────────────────────────┐
│         React Frontend           │
│     (Vite 6 / React 19)         │
│  Audio capture, visualization,   │
│  provider selection UI           │
└──────────────┬───────────────────┘
               │ WebSocket / HTTP
┌──────────────▼───────────────────┐
│          Go Backend              │
│        (HTTP on :8787)           │
│  STT routing, TTS pipeline,      │
│  WebRTC (Voice Agent Mode)       │
└──────────────┬───────────────────┘

    ┌──────────▼──────────┐
    │   STT Providers     │
    │   TTS (Qwen3/Kokoro)│
    │   Gemini Live API   │
    └─────────────────────┘
Tech stack:
  • Go 1.25 (backend HTTP server, WebSocket handling via gorilla/websocket)
  • React 19 + Vite 6 (frontend UI)
  • WebRTC for Gemini Live API connectivity

Configuration

API keys (BYOK)

SpeechKit uses the BYOK (Bring Your Own Keys) model. You provide API keys for whichever cloud STT providers you want to use.
1

Open the SpeechKit UI

Navigate to the SpeechKit frontend in your browser.
2

Select an STT provider

Choose your preferred provider from the dropdown. You can switch providers at any time without restarting.
3

Enter your API key

Provide the API key for the selected cloud provider. For Faster Whisper (local), no key is needed.

Provider selection

You can hot-switch between STT providers during a session. SpeechKit routes audio to the currently selected provider without requiring a restart.

Quick start

Assist Mode

1

Start SpeechKit

Launch the SpeechKit service. The Go backend starts on port 8787 and serves the React frontend.
2

Configure a provider

Select an STT provider and enter your API key (or choose Faster Whisper for local processing).
3

Start speaking

Use push-to-talk or enable VAD, then speak your request. The transcription is sent to your Companion, and the response is spoken back via Qwen3-TTS.

Voice Agent Mode

1

Configure Gemini API key

Voice Agent Mode requires a Google Gemini API key for the Gemini Live API.
2

Switch to Voice Agent Mode

Select Voice Agent Mode in the SpeechKit UI.
3

Start a conversation

Begin speaking. The Gemini Live API provides real-time bidirectional audio — you can interrupt and redirect the conversation naturally.

Platform support

PlatformStatus
WindowsProduction-ready
LinuxPlanned
SpeechKit is currently Windows-first. Linux support is on the roadmap.

Further reading

Voice interaction

User-facing voice features in kombify AI

BYOK setup

Configure your own API keys for AI providers