1.4 KiB

Voice Workflow Skill

Transcribe incoming voice messages and respond with voice.

When to Use

When you receive a voice message (audio file attachment, usually .ogg format).

Workflow

1. Transcribe Incoming Voice

# Convert to 16khz wav
ffmpeg -i <input.ogg> -ar 16000 -ac 1 -f wav /tmp/voice_input.wav -y 2>&1 | tail -3

# Transcribe using whisper
~/.local/share/hyprwhspr/venv/bin/python << 'EOF'
from pywhispercpp.model import Model
m = Model('base.en', n_threads=4)
result = m.transcribe('/tmp/voice_input.wav')
full_text = ' '.join(seg.text for seg in result) if result else ''
print(full_text)
EOF

2. Generate Voice Reply

curl -s -X POST http://localhost:8765/tts \
  -H "Content-Type: application/json" \
  -d '{"text":"YOUR REPLY TEXT HERE","format":"ogg"}' \
  --output /tmp/voice_reply.ogg

Note: TTS service lazy-loads the model on first request (~30s warmup), then fast.

3. Send Voice Reply

Use the message tool with filePath:

message(action: send, message: "🎤", filePath: /tmp/voice_reply.ogg)

This auto-routes to the current channel.

Technical Details

  • Transcription: pywhispercpp (whisper.cpp bindings), base.en model
  • TTS: qwen3-tts-12hz-1.7b-base with Snape voice clone
  • Reference audio: /mnt/work/clawdbot-voice/reference_snape_v2.wav
  • Service: clawdbot-tts.service (systemd user service, port 8765)
  • Idle timeout: Model unloads after 120s inactivity (frees ~3.5GB VRAM)