Skip to content

Configuration

Kiwi Voice is configured through two files:

File Purpose
config.yaml All settings — language, STT, TTS, wake word, security, API
.env Secrets and provider overrides (API keys, tokens)

Precedence: config.yaml → environment variables (.env) → hardcoded defaults.

config.yaml

Language

language: "en"   # en, ru, es, pt, fr, it, de, tr, pl, zh, ja, ko, hi, ar, id

Controls all user-facing strings, STT language, TTS voice selection, wake word variants, and security patterns.

Wake Word

wake_word:
  engine: "text"             # text (fuzzy match) or openwakeword (ML model)
  keyword: "kiwi"
  model: "hey_jarvis"        # OpenWakeWord model name or path to .onnx
  threshold: 0.5             # Detection sensitivity (0.0–1.0)

See Wake Word Detection for details on both engines.

Speech-to-Text

stt:
  engine: "faster-whisper"   # faster-whisper or mlx-whisper (Apple Silicon)
  model: "small"             # tiny, base, small, medium, large
  device: "cuda"             # cuda or cpu
  compute_type: "float16"    # float16 (GPU) or int8 (CPU)

Model size tradeoff

small is the sweet spot — fast with good accuracy. Use large for best accuracy (slower startup), tiny for minimal resources.

Text-to-Speech

tts:
  provider: "kokoro"         # kokoro, piper, qwen3, elevenlabs
  elevenlabs:
    voice_id: "aEO01A4wXwd1O8GPgGlF"
    model_id: "eleven_multilingual_v2"
    stability: 0.45
    similarity_boost: 0.75
    speed: 1.0
  kokoro:
    voice: "af_heart"        # 14 voices available
    speed: 1.0
  piper:
    model: "en_US-lessac-medium"
  qwen3:
    backend: "local"         # local or runpod

See TTS Providers for a full comparison.

Speaker Priority

speaker_priority:
  owner:
    name: "Owner"            # Change to your name

Voice Security

security:
  telegram_approval_enabled: true

LLM

llm:
  model: "openai/gpt-4o"
  chat_timeout: 120

Audio Devices

audio:
  output_device: null        # null = system default
  input_device: null         # null = system default

List available devices:

python -c "import sounddevice; print(sounddevice.query_devices())"

REST API

api:
  enabled: true
  host: "0.0.0.0"
  port: 7789

Web Audio

web_audio:
  enabled: true
  sample_rate: 16000
  max_clients: 3

Home Assistant

homeassistant:
  enabled: true
  url: "http://homeassistant.local:8123"
  token: ""                  # Long-Lived Access Token

Souls (Personalities)

souls:
  default: "mindful-companion"
  nsfw:
    model: "openrouter/mistralai/mistral-7b-instruct"
    session: "kiwi-nsfw"

Environment Variables

All settings can be overridden via environment variables in .env:

Variable Description
KIWI_LANGUAGE Override language (en, ru, etc.)
KIWI_TTS_PROVIDER Override TTS provider (kokoro, piper, qwen3, elevenlabs)
KIWI_ELEVENLABS_API_KEY ElevenLabs API key
RUNPOD_API_KEY RunPod API key (Qwen3-TTS serverless)
RUNPOD_TTS_ENDPOINT_ID RunPod endpoint ID
KIWI_TELEGRAM_BOT_TOKEN Telegram bot token (voice security)
KIWI_TELEGRAM_CHAT_ID Telegram chat ID for approvals
KIWI_WAKE_ENGINE Override wake word engine (text, openwakeword)
KIWI_WAKE_MODEL Override OpenWakeWord model
KIWI_WAKE_THRESHOLD Override detection threshold
KIWI_STT_ENGINE Override STT engine (faster-whisper, mlx-whisper)
KIWI_FFMPEG_PATH Custom FFmpeg path
KIWI_DEBUG Enable debug logging
LLM_MODEL Override LLM model

See .env.example in the repository for the full list.