1.4 KiB
1.4 KiB
Voice Workflow Skill
Transcribe incoming voice messages and respond with voice.
When to Use
When you receive a voice message (audio file attachment, usually .ogg format).
Workflow
1. Transcribe Incoming Voice
# Convert to 16khz wav
ffmpeg -i <input.ogg> -ar 16000 -ac 1 -f wav /tmp/voice_input.wav -y 2>&1 | tail -3
# Transcribe using whisper
~/.local/share/hyprwhspr/venv/bin/python << 'EOF'
from pywhispercpp.model import Model
m = Model('base.en', n_threads=4)
result = m.transcribe('/tmp/voice_input.wav')
full_text = ' '.join(seg.text for seg in result) if result else ''
print(full_text)
EOF
2. Generate Voice Reply
curl -s -X POST http://localhost:8765/tts \
-H "Content-Type: application/json" \
-d '{"text":"YOUR REPLY TEXT HERE","format":"ogg"}' \
--output /tmp/voice_reply.ogg
Note: TTS service lazy-loads the model on first request (~30s warmup), then fast.
3. Send Voice Reply
Use the message tool with filePath:
message(action: send, message: "🎤", filePath: /tmp/voice_reply.ogg)
This auto-routes to the current channel.
Technical Details
- Transcription: pywhispercpp (whisper.cpp bindings), base.en model
- TTS: qwen3-tts-12hz-1.7b-base with Snape voice clone
- Reference audio: /mnt/work/clawdbot-voice/reference_snape_v2.wav
- Service: clawdbot-tts.service (systemd user service, port 8765)
- Idle timeout: Model unloads after 120s inactivity (frees ~3.5GB VRAM)