55 lines
1.4 KiB
Markdown
55 lines
1.4 KiB
Markdown
# Voice Workflow Skill
|
|
|
|
Transcribe incoming voice messages and respond with voice.
|
|
|
|
## When to Use
|
|
|
|
When you receive a voice message (audio file attachment, usually .ogg format).
|
|
|
|
## Workflow
|
|
|
|
### 1. Transcribe Incoming Voice
|
|
|
|
```bash
|
|
# Convert to 16khz wav
|
|
ffmpeg -i <input.ogg> -ar 16000 -ac 1 -f wav /tmp/voice_input.wav -y 2>&1 | tail -3
|
|
|
|
# Transcribe using whisper
|
|
~/.local/share/hyprwhspr/venv/bin/python << 'EOF'
|
|
from pywhispercpp.model import Model
|
|
m = Model('base.en', n_threads=4)
|
|
result = m.transcribe('/tmp/voice_input.wav')
|
|
full_text = ' '.join(seg.text for seg in result) if result else ''
|
|
print(full_text)
|
|
EOF
|
|
```
|
|
|
|
### 2. Generate Voice Reply
|
|
|
|
```bash
|
|
curl -s -X POST http://localhost:8765/tts \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"text":"YOUR REPLY TEXT HERE","format":"ogg"}' \
|
|
--output /tmp/voice_reply.ogg
|
|
```
|
|
|
|
Note: TTS service lazy-loads the model on first request (~30s warmup), then fast.
|
|
|
|
### 3. Send Voice Reply
|
|
|
|
Use the message tool with filePath:
|
|
|
|
```
|
|
message(action: send, message: "🎤", filePath: /tmp/voice_reply.ogg)
|
|
```
|
|
|
|
This auto-routes to the current channel.
|
|
|
|
## Technical Details
|
|
|
|
- **Transcription**: pywhispercpp (whisper.cpp bindings), base.en model
|
|
- **TTS**: qwen3-tts-12hz-1.7b-base with Snape voice clone
|
|
- **Reference audio**: /mnt/work/clawdbot-voice/reference_snape_v2.wav
|
|
- **Service**: clawdbot-tts.service (systemd user service, port 8765)
|
|
- **Idle timeout**: Model unloads after 120s inactivity (frees ~3.5GB VRAM)
|