GPU & Apple Silicon¶
Kiwi Voice auto-detects GPU availability and falls back to CPU when no GPU is found.
NVIDIA CUDA¶
For GPU-accelerated STT and local TTS:
Verify:
Configure in config.yaml:
Tip
Even without a GPU, Kiwi works well — Faster Whisper runs on CPU with int8 quantization, and Kokoro ONNX / Piper don't need a GPU at all.
Apple Silicon (MLX)¶
On M-series Macs, you can use Lightning Whisper MLX for ~10x faster STT:
Configure:
MLX Whisper is auto-detected on Apple Silicon. On non-Apple hardware, it falls back to Faster Whisper.
CPU-Only Setup¶
Kiwi runs fully on CPU with these settings:
stt:
device: "cpu"
compute_type: "int8"
model: "small" # Use small for better CPU performance
tts:
provider: "kokoro" # or piper — both are CPU-only
No CUDA, no GPU drivers needed. This is the lightest configuration.
VRAM Requirements¶
Approximate VRAM usage for GPU components:
| Component | Model | VRAM |
|---|---|---|
| Faster Whisper | small | ~1 GB |
| Faster Whisper | large | ~3 GB |
| Qwen3-TTS | 0.6B | ~2 GB |
| Qwen3-TTS | 1.7B | ~4 GB |
| pyannote (Speaker ID) | — | ~0.5 GB |
Kokoro ONNX and Piper run on CPU and don't use VRAM.