Streaming TTS & Barge-In¶
Streaming TTS¶
Kiwi doesn't wait for the full LLM response before speaking. Instead, it uses sentence-aware chunking to start TTS as soon as the first complete sentence arrives.
How it Works¶
- LLM streams response tokens via WebSocket (
deltaevents) - Kiwi's sentence chunker accumulates tokens until it detects a sentence boundary (
.,!,?, or other punctuation) - The completed sentence is immediately sent to the TTS provider
- Audio playback begins while the LLM is still generating the next sentence
- Subsequent sentences are queued and played sequentially
This reduces perceived latency significantly — the user hears the first sentence within ~1 second of the LLM starting to respond, regardless of total response length.
Sentence Chunking¶
The chunker is language-aware and handles:
- Standard punctuation (
.,!,?) - Abbreviations that shouldn't trigger a split (e.g., "Dr.", "U.S.A.")
- Minimum chunk length to avoid tiny audio fragments
- Maximum chunk length to prevent overly long sentences
Barge-In¶
Barge-in lets you interrupt Kiwi mid-sentence by speaking over it.
How it Works¶
- While Kiwi is speaking (TTS playing), the microphone remains passively active
- If the energy level or VAD detects speech during playback, Kiwi:
- Immediately stops TTS playback
- Clears the audio queue
- Switches back to active listening mode
- Your new command is processed normally
This creates a natural conversational flow — you don't have to wait for Kiwi to finish a long response before giving a new command.
Note
Barge-in sensitivity is tied to the VAD (Voice Activity Detection) settings. If it triggers too easily, adjust the energy threshold in the audio configuration.