# Voice Workflow Skill Transcribe incoming voice messages and respond with voice. ## When to Use When you receive a voice message (audio file attachment, usually .ogg format). ## Workflow ### 1. Transcribe Incoming Voice ```bash # Convert to 16khz wav ffmpeg -i -ar 16000 -ac 1 -f wav /tmp/voice_input.wav -y 2>&1 | tail -3 # Transcribe using whisper ~/.local/share/hyprwhspr/venv/bin/python << 'EOF' from pywhispercpp.model import Model m = Model('base.en', n_threads=4) result = m.transcribe('/tmp/voice_input.wav') full_text = ' '.join(seg.text for seg in result) if result else '' print(full_text) EOF ``` ### 2. Generate Voice Reply ```bash curl -s -X POST http://localhost:8765/tts \ -H "Content-Type: application/json" \ -d '{"text":"YOUR REPLY TEXT HERE","format":"ogg"}' \ --output /tmp/voice_reply.ogg ``` Note: TTS service lazy-loads the model on first request (~30s warmup), then fast. ### 3. Send Voice Reply Use the message tool with filePath: ``` message(action: send, message: "🎤", filePath: /tmp/voice_reply.ogg) ``` This auto-routes to the current channel. ## Technical Details - **Transcription**: pywhispercpp (whisper.cpp bindings), base.en model - **TTS**: qwen3-tts-12hz-1.7b-base with Snape voice clone - **Reference audio**: /mnt/work/clawdbot-voice/reference_snape_v2.wav - **Service**: clawdbot-tts.service (systemd user service, port 8765) - **Idle timeout**: Model unloads after 120s inactivity (frees ~3.5GB VRAM)