62 lines
1.9 KiB
Markdown
62 lines
1.9 KiB
Markdown
# voice message workflow
|
|
|
|
## when you receive a voice message:
|
|
|
|
1. **transcribe it:**
|
|
```bash
|
|
# convert to 16khz wav
|
|
ffmpeg -i <input.ogg> -ar 16000 -ac 1 -f wav /tmp/voice_input.wav -y 2>&1 | tail -3
|
|
|
|
# transcribe using hyprwhspr's whisper
|
|
~/.local/share/hyprwhspr/venv/bin/python << 'EOF'
|
|
from pywhispercpp.model import Model
|
|
m = Model('base.en', n_threads=4)
|
|
result = m.transcribe('/tmp/voice_input.wav')
|
|
# concatenate all segments (fixes truncation for longer audio)
|
|
full_text = ' '.join(seg.text for seg in result) if result else ''
|
|
print(full_text)
|
|
EOF
|
|
```
|
|
|
|
2. **respond normally with text**
|
|
|
|
3. **generate voice reply:**
|
|
```bash
|
|
curl -s -X POST http://localhost:8765/tts \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"text":"YOUR REPLY TEXT HERE","format":"ogg"}' \
|
|
--output /tmp/voice_reply.ogg
|
|
```
|
|
|
|
4. **send voice reply:**
|
|
|
|
**discord (preferred method — inline MEDIA tag):**
|
|
Include this line in your text reply and clawdbot auto-attaches it:
|
|
```
|
|
MEDIA:/tmp/voice_reply.ogg
|
|
```
|
|
|
|
**telegram (via message tool):**
|
|
```bash
|
|
clawdbot message send --channel telegram --target 6661478571 --media /tmp/voice_reply.ogg
|
|
```
|
|
|
|
**fallback (if message tool has auth issues):**
|
|
Use the MEDIA: tag method — it works on all channels since it goes
|
|
through clawdbot's internal reply routing, not the gateway HTTP API.
|
|
|
|
## tts service details:
|
|
- running on port 8765
|
|
- using qwen3-tts-12hz-1.7b-base (upgraded from 0.6b for better accent preservation)
|
|
- voice cloning with nicholai's snape voice impression
|
|
- reference audio: /mnt/work/clawdbot-voice/reference_snape_v2.wav
|
|
- systemd service: clawdbot-tts.service
|
|
- auto-starts on boot, restarts on failure
|
|
- **idle timeout**: automatically unloads model after 120s of inactivity (frees ~3.5GB VRAM)
|
|
- lazy loading: model loads on first request, not at startup
|
|
|
|
## transcription details:
|
|
- using pywhispercpp (whisper.cpp python bindings)
|
|
- model: base.en (same as hyprwhspr)
|
|
- venv: ~/.local/share/hyprwhspr/venv/
|