# Voice Workflow Skill

Transcribe incoming voice messages and respond with voice.

## When to Use

When you receive a voice message (audio file attachment, usually .ogg format).

## Workflow

### 1. Transcribe Incoming Voice

```bash
# Convert to 16khz wav
ffmpeg -i <input.ogg> -ar 16000 -ac 1 -f wav /tmp/voice_input.wav -y 2>&1 | tail -3

# Transcribe using whisper
~/.local/share/hyprwhspr/venv/bin/python << 'EOF'
from pywhispercpp.model import Model
m = Model('base.en', n_threads=4)
result = m.transcribe('/tmp/voice_input.wav')
full_text = ' '.join(seg.text for seg in result) if result else ''
print(full_text)
EOF
```

### 2. Generate Voice Reply

```bash
curl -s -X POST http://localhost:8765/tts \
  -H "Content-Type: application/json" \
  -d '{"text":"YOUR REPLY TEXT HERE","format":"ogg"}' \
  --output /tmp/voice_reply.ogg
```

Note: TTS service lazy-loads the model on first request (~30s warmup), then fast.

### 3. Send Voice Reply

Use the message tool with filePath:

```
message(action: send, message: "🎤", filePath: /tmp/voice_reply.ogg)
```

This auto-routes to the current channel.

## Technical Details

- **Transcription**: pywhispercpp (whisper.cpp bindings), base.en model
- **TTS**: qwen3-tts-12hz-1.7b-base with Snape voice clone
- **Reference audio**: /mnt/work/clawdbot-voice/reference_snape_v2.wav
- **Service**: clawdbot-tts.service (systemd user service, port 8765)
- **Idle timeout**: Model unloads after 120s inactivity (frees ~3.5GB VRAM)