Nicholai/.agents

Nicholai e48e59586e manual backup

2026-01-31 14:50:27 -07:00

1.6 KiB

Raw Blame History

2026-01-27

voice messaging setup completed

successfully implemented full voice message support for telegram:

tts (text-to-speech):

qwen3-tts-12hz-1.7b-base model
voice cloned from nicholai's alan rickman/snape impression
reference audio: /mnt/work/clawdbot-voice/reference_snape_v2.wav
running as systemd service on port 8765
larger 1.7b model chosen over 0.6b for better british accent preservation

transcription (speech-to-text):

using pywhispercpp (whisper.cpp python bindings)
same setup as hyprwhspr
model: base.en with 4 threads
known issue: longer messages get truncated (only first segment captured)

workflow:

receive voice message → transcribe with whisper
respond with text
generate voice reply via tts service (port 8765)
send voice message back

all documented in VOICE-WORKFLOW.md

project location: /mnt/work/clawdbot-voice/

setup by: sonnet (with opencode sub-agent assistance)

tts service idle timeout added

implemented automatic vram management for qwen3-tts service:

changes:

created tts_service_idle.py with idle timeout functionality
model unloads after 120 seconds of inactivity (frees ~3.5GB VRAM)
lazy loading: model only loads on first request, not at startup
background monitor task checks idle status every 10 seconds
updated systemd service to use new idle-aware version

configuration:

TTS_IDLE_TIMEOUT=120 (configurable via environment variable)
service still runs continuously, just unloads model when idle
automatically reloads on next TTS request

benefit: nicholai can use comfyui without tts service consuming vram when not actively generating speech