55 lines
1.4 KiB
Markdown

# Voice Workflow Skill
Transcribe incoming voice messages and respond with voice.
## When to Use
When you receive a voice message (audio file attachment, usually .ogg format).
## Workflow
### 1. Transcribe Incoming Voice
```bash
# Convert to 16khz wav
ffmpeg -i <input.ogg> -ar 16000 -ac 1 -f wav /tmp/voice_input.wav -y 2>&1 | tail -3
# Transcribe using whisper
~/.local/share/hyprwhspr/venv/bin/python << 'EOF'
from pywhispercpp.model import Model
m = Model('base.en', n_threads=4)
result = m.transcribe('/tmp/voice_input.wav')
full_text = ' '.join(seg.text for seg in result) if result else ''
print(full_text)
EOF
```
### 2. Generate Voice Reply
```bash
curl -s -X POST http://localhost:8765/tts \
-H "Content-Type: application/json" \
-d '{"text":"YOUR REPLY TEXT HERE","format":"ogg"}' \
--output /tmp/voice_reply.ogg
```
Note: TTS service lazy-loads the model on first request (~30s warmup), then fast.
### 3. Send Voice Reply
Use the message tool with filePath:
```
message(action: send, message: "🎤", filePath: /tmp/voice_reply.ogg)
```
This auto-routes to the current channel.
## Technical Details
- **Transcription**: pywhispercpp (whisper.cpp bindings), base.en model
- **TTS**: qwen3-tts-12hz-1.7b-base with Snape voice clone
- **Reference audio**: /mnt/work/clawdbot-voice/reference_snape_v2.wav
- **Service**: clawdbot-tts.service (systemd user service, port 8765)
- **Idle timeout**: Model unloads after 120s inactivity (frees ~3.5GB VRAM)