TTS Providers¶
Kiwi supports five text-to-speech providers. Switch between them in config.yaml or via environment variable.
Comparison¶
| Provider | Quality | Latency | Cost | Local GPU | Languages |
|---|---|---|---|---|---|
| ElevenLabs | Excellent | ~0.3s | ~$0.30/1K chars | No | 29 |
| Qwen3-TTS (local) | Excellent | ~1–3s | Free | Yes (CUDA) | Many |
| Qwen3-TTS (RunPod) | Excellent | ~2–5s | ~$0.0003/sec | No | Many |
| Kokoro ONNX | High | <0.5s | Free | No | 8 |
| Piper | Good | <0.5s | Free | No | 30+ |
Kokoro ONNX¶
FREE RECOMMENDED
Fully local TTS with 14 voices at 24kHz. Models auto-download on first use (~340MB). No GPU needed.
Supports English, Japanese, Chinese, Korean, and several European languages. Russian is not yet supported — use Piper or ElevenLabs for Russian.
Piper¶
FREE
Fast local TTS using ONNX models. Wide language support including Russian.
Models are downloaded automatically. See the Piper voices list for available models.
ElevenLabs¶
PAID
Cloud-based TTS with the lowest latency and highest voice quality. Requires an API key.
tts:
provider: "elevenlabs"
elevenlabs:
voice_id: "aEO01A4wXwd1O8GPgGlF"
model_id: "eleven_multilingual_v2"
stability: 0.45
similarity_boost: 0.75
speed: 1.0
Qwen3-TTS (Local)¶
FREE GPU REQUIRED
High-quality local TTS using the Qwen3-TTS model. Requires a CUDA GPU with sufficient VRAM.
Qwen3-TTS (RunPod)¶
PAY-PER-USE
Same Qwen3-TTS quality, but running on RunPod serverless GPUs. No local GPU needed.
Switching Providers¶
Via config.yaml¶
Via environment variable¶
Via REST API¶
Test TTS without changing config: