Architecture¶
Audio Pipeline¶
Mic (24kHz) / Browser WebSocket
→ Audio Callback (energy detection + Silero VAD)
→ Audio Queue
→ KiwiListener._record_loop()
→ Faster Whisper STT (or MLX Whisper on Apple Silicon)
→ Wake Word Detection ("kiwi" — text fuzzy match or ML pre-detection)
→ Speaker ID (pyannote embedding → cosine similarity)
→ Priority Gate (OWNER > FRIEND > GUEST > BLOCKED)
→ Voice Security (DangerousCommandDetector → Telegram approval)
→ OpenClaw Gateway (WebSocket v3: chat.send → delta/final events)
→ LLM response stream (delta → sentence chunking)
→ Streaming TTS (Kokoro / Piper / Qwen3 / ElevenLabs)
→ Speaker Output (with barge-in detection)
→ Loop back to listening
Key Modules¶
| Module | File | Purpose |
|---|---|---|
| Service | kiwi/service.py | Main orchestrator, lifecycle management |
| Listener | kiwi/listener.py | Microphone capture, VAD, STT, wake word |
| OpenClaw WS | kiwi/openclaw_ws.py | WebSocket client to OpenClaw Gateway |
| Speaker Manager | kiwi/speaker_manager.py | Voiceprint storage, identification, priority |
| Voice Security | kiwi/voice_security.py | Dangerous command detection, Telegram approval |
| Soul Manager | kiwi/soul_manager.py | Personality loading and switching |
| i18n | kiwi/i18n.py | Internationalization (t() function) |
| Event Bus | kiwi/event_bus.py | Internal pub/sub event system |
| API Server | kiwi/api/server.py | REST API + WebSocket events, aiohttp |
| TTS Providers | kiwi/tts/ | ElevenLabs, Kokoro, Piper, Qwen3 |
Mixins¶
The main service class uses mixins to separate concerns:
| Mixin | File | Responsibility |
|---|---|---|
LLMCallbacks | kiwi/mixins/llm_callbacks.py | LLM token/completion/exec approval handlers |
DialoguePipeline | kiwi/mixins/dialogue_pipeline.py | Dialogue stages including approval checks |
Event Bus¶
Kiwi uses an internal event bus (kiwi/event_bus.py) for decoupled communication between modules:
from kiwi.event_bus import EventBus
bus = EventBus()
bus.subscribe("WAKE_WORD_DETECTED", handler)
bus.emit("WAKE_WORD_DETECTED", {"text": "kiwi"})
Events are also forwarded to the WebSocket API (/api/events) for the dashboard and external integrations.
Key events: STATE_CHANGED, WAKE_WORD_DETECTED, SPEECH_RECOGNIZED, SPEAKER_IDENTIFIED, TTS_STARTED, TTS_FINISHED, LLM_TOKEN, LLM_COMPLETE, EXEC_APPROVAL_REQUESTED, EXEC_APPROVAL_RESOLVED, SOUL_CHANGED, ERROR.
OpenClaw Protocol¶
Kiwi communicates with OpenClaw via WebSocket Gateway v3:
- Connect to
ws://127.0.0.1:18789 - Send
chat.sendwith user message - Receive
deltaevents (streaming tokens) andfinalevent (complete response) - Subscribe to
exec.approval.requestedfor shell command approvals - Send
exec.approval.resolvewith approve/deny decision
Threading Model¶
- Main thread: Service lifecycle, signal handling
- Audio thread: Microphone callback (daemon)
- Record loop: STT + wake word + command processing (daemon)
- TTS thread: Audio output (daemon)
- API thread: aiohttp server in separate event loop (daemon)
- WebSocket thread: OpenClaw Gateway connection (daemon)
All background threads are daemon threads with crash protection (try/except + sleep + continue in loops). Shared resources are guarded by threading.Lock.