# 2026-01-27 ## voice messaging setup completed successfully implemented full voice message support for telegram: **tts (text-to-speech):** - qwen3-tts-12hz-1.7b-base model - voice cloned from nicholai's alan rickman/snape impression - reference audio: /mnt/work/clawdbot-voice/reference_snape_v2.wav - running as systemd service on port 8765 - larger 1.7b model chosen over 0.6b for better british accent preservation **transcription (speech-to-text):** - using pywhispercpp (whisper.cpp python bindings) - same setup as hyprwhspr - model: base.en with 4 threads - known issue: longer messages get truncated (only first segment captured) **workflow:** 1. receive voice message → transcribe with whisper 2. respond with text 3. generate voice reply via tts service (port 8765) 4. send voice message back all documented in VOICE-WORKFLOW.md **project location:** /mnt/work/clawdbot-voice/ **setup by:** sonnet (with opencode sub-agent assistance) ## tts service idle timeout added implemented automatic vram management for qwen3-tts service: **changes:** - created tts_service_idle.py with idle timeout functionality - model unloads after 120 seconds of inactivity (frees ~3.5GB VRAM) - lazy loading: model only loads on first request, not at startup - background monitor task checks idle status every 10 seconds - updated systemd service to use new idle-aware version **configuration:** - TTS_IDLE_TIMEOUT=120 (configurable via environment variable) - service still runs continuously, just unloads model when idle - automatically reloads on next TTS request **benefit:** nicholai can use comfyui without tts service consuming vram when not actively generating speech