Skip to content
Open Source Voice Interface

Kiwi Voice

ML wake word detection, speaker identification, voice‑gated security, 5 TTS engines, 15 languages, and a real‑time web dashboard — for your own AI stack.

How it works

Kiwi Voice turns your OpenClaw agent into a hands-free assistant. It captures audio from your microphone (or directly from the browser), detects the wake word, transcribes speech locally, identifies who is speaking, enforces security policies, sends the command to any LLM through OpenClaw's WebSocket gateway, and speaks the response back — all in a continuous loop.

You:  "Kiwi, turn on the lights in the bedroom"

Kiwi: [identifies speaker as Owner → full access]
      [sends to OpenClaw → routes to Home Assistant]
      "Done, the bedroom lights are on."

Think Alexa or Siri, but self-hosted, privacy-first, and plugged into your own AI stack.

Features

Quick Start

git clone https://github.com/ekleziast/kiwi-voice.git
cd kiwi-voice
pip install -r requirements.txt
cp .env.example .env
python -m kiwi

Open http://localhost:7789 for the web dashboard.

Full installation guide →

Architecture

Mic (24kHz) / Browser WebSocket → Audio Pipeline (Silero VAD + energy detection)
  → Wake Word (OpenWakeWord ML or text fuzzy match)
  → Faster Whisper STT (or MLX Whisper on Apple Silicon)
  → Speaker ID (pyannote embeddings) → Priority Gate (Owner/Friend/Guest/Blocked)
  → Voice Security (dangerous command regex → Telegram approval)
  → OpenClaw Gateway (WebSocket v3)
  → LLM response stream (delta → sentence chunking)
  → Streaming TTS (Kokoro/Piper/Qwen3/ElevenLabs) → Speaker output + browser playback
  → Barge-in detection → back to listening

Architecture deep dive →