Skip to main content
kombify AI supports voice interaction through text-to-speech (TTS) and speech-to-text (STT), enabling hands-free homelab management. The underlying voice framework is SpeechKit, which provides the full STT/TTS pipeline and real-time voice agent capabilities.

Voice capabilities

FeatureTechnologyStatus
Speech-to-textAzure Speech, Google Cloud STT, Deepgram, OpenAI Whisper, Groq Whisper, Faster Whisper (local)Available
Text-to-speechQwen3-TTS via Kokoro pipeline (local)Available
Voice Agent ModeGemini Live API (real-time bidirectional)Available
Speaker verificationVoice biometricsIn development

Enabling voice

Voice interaction is available in the mobile app and web chat. Enable it in AI Settings > Voice.

Mobile app

The mobile app is designed voice-first. Tap the microphone button to start speaking, or enable always-listening mode for hands-free operation.

Web chat

In the web chat at chat.kombify.io, click the microphone icon next to the text input field.

Use cases

  • Hands-free monitoring — “What is the status of my servers?”
  • Quick commands — “Restart the Traefik container on server-1”
  • Troubleshooting — “Why is my NAS running slow?”
  • Smart home control — “Turn off the office lights” (via Smart Home Companion)
Voice processing happens locally in self-hosted mode. In SaaS mode, audio is processed securely and not stored after transcription. With Faster Whisper (STT) and Qwen3-TTS, you can run a fully local voice pipeline with no cloud dependencies.

Further reading

Companions

Configure which Companion responds to voice commands

Mobile app

Set up the mobile app for voice-first interaction

SpeechKit

Technical details on the STT/TTS framework powering voice