49 lines
1.6 KiB
Markdown
49 lines
1.6 KiB
Markdown
# 2026-01-27
|
|
|
|
## voice messaging setup completed
|
|
|
|
successfully implemented full voice message support for telegram:
|
|
|
|
**tts (text-to-speech):**
|
|
- qwen3-tts-12hz-1.7b-base model
|
|
- voice cloned from nicholai's alan rickman/snape impression
|
|
- reference audio: /mnt/work/clawdbot-voice/reference_snape_v2.wav
|
|
- running as systemd service on port 8765
|
|
- larger 1.7b model chosen over 0.6b for better british accent preservation
|
|
|
|
**transcription (speech-to-text):**
|
|
- using pywhispercpp (whisper.cpp python bindings)
|
|
- same setup as hyprwhspr
|
|
- model: base.en with 4 threads
|
|
- known issue: longer messages get truncated (only first segment captured)
|
|
|
|
**workflow:**
|
|
1. receive voice message → transcribe with whisper
|
|
2. respond with text
|
|
3. generate voice reply via tts service (port 8765)
|
|
4. send voice message back
|
|
|
|
all documented in VOICE-WORKFLOW.md
|
|
|
|
**project location:** /mnt/work/clawdbot-voice/
|
|
|
|
**setup by:** sonnet (with opencode sub-agent assistance)
|
|
|
|
## tts service idle timeout added
|
|
|
|
implemented automatic vram management for qwen3-tts service:
|
|
|
|
**changes:**
|
|
- created tts_service_idle.py with idle timeout functionality
|
|
- model unloads after 120 seconds of inactivity (frees ~3.5GB VRAM)
|
|
- lazy loading: model only loads on first request, not at startup
|
|
- background monitor task checks idle status every 10 seconds
|
|
- updated systemd service to use new idle-aware version
|
|
|
|
**configuration:**
|
|
- TTS_IDLE_TIMEOUT=120 (configurable via environment variable)
|
|
- service still runs continuously, just unloads model when idle
|
|
- automatically reloads on next TTS request
|
|
|
|
**benefit:** nicholai can use comfyui without tts service consuming vram when not actively generating speech
|