.agents/memory/2026-01-26.md
2026-01-31 14:50:27 -07:00

1.6 KiB

2026-01-27

voice messaging setup completed

successfully implemented full voice message support for telegram:

tts (text-to-speech):

  • qwen3-tts-12hz-1.7b-base model
  • voice cloned from nicholai's alan rickman/snape impression
  • reference audio: /mnt/work/clawdbot-voice/reference_snape_v2.wav
  • running as systemd service on port 8765
  • larger 1.7b model chosen over 0.6b for better british accent preservation

transcription (speech-to-text):

  • using pywhispercpp (whisper.cpp python bindings)
  • same setup as hyprwhspr
  • model: base.en with 4 threads
  • known issue: longer messages get truncated (only first segment captured)

workflow:

  1. receive voice message → transcribe with whisper
  2. respond with text
  3. generate voice reply via tts service (port 8765)
  4. send voice message back

all documented in VOICE-WORKFLOW.md

project location: /mnt/work/clawdbot-voice/

setup by: sonnet (with opencode sub-agent assistance)

tts service idle timeout added

implemented automatic vram management for qwen3-tts service:

changes:

  • created tts_service_idle.py with idle timeout functionality
  • model unloads after 120 seconds of inactivity (frees ~3.5GB VRAM)
  • lazy loading: model only loads on first request, not at startup
  • background monitor task checks idle status every 10 seconds
  • updated systemd service to use new idle-aware version

configuration:

  • TTS_IDLE_TIMEOUT=120 (configurable via environment variable)
  • service still runs continuously, just unloads model when idle
  • automatically reloads on next TTS request

benefit: nicholai can use comfyui without tts service consuming vram when not actively generating speech