GPU & Apple Silicon¶

Kiwi Voice auto-detects GPU availability and falls back to CPU when no GPU is found.

NVIDIA CUDA¶

For GPU-accelerated STT and local TTS:

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

Verify:

python -c "import torch; print(torch.cuda.is_available())"

Configure in config.yaml:

stt:
  device: "cuda"
  compute_type: "float16"

Tip

Even without a GPU, Kiwi works well — Faster Whisper runs on CPU with int8 quantization, and Kokoro ONNX / Piper don't need a GPU at all.

Apple Silicon (MLX)¶

On M-series Macs, you can use Lightning Whisper MLX for ~10x faster STT:

pip install lightning-whisper-mlx

Configure:

stt:
  engine: "mlx-whisper"
  model: "small"          # or large, medium, etc.

MLX Whisper is auto-detected on Apple Silicon. On non-Apple hardware, it falls back to Faster Whisper.

CPU-Only Setup¶

Kiwi runs fully on CPU with these settings:

stt:
  device: "cpu"
  compute_type: "int8"
  model: "small"          # Use small for better CPU performance

tts:
  provider: "kokoro"      # or piper — both are CPU-only

No CUDA, no GPU drivers needed. This is the lightest configuration.

VRAM Requirements¶

Approximate VRAM usage for GPU components:

Component	Model	VRAM
Faster Whisper	`small`	~1 GB
Faster Whisper	`large`	~3 GB
Qwen3-TTS	0.6B	~2 GB
Qwen3-TTS	1.7B	~4 GB
pyannote (Speaker ID)	—	~0.5 GB

Kokoro ONNX and Piper run on CPU and don't use VRAM.