2.1 KiB
2.1 KiB
Remi Bot Self-Healing System
Set up 2026-01-24 to auto-monitor and recover from failures.
What Was Fixed
Root Causes of /scan Timeout (2026-01-24)
- Missing
asyncioimport inpackages/core/analyzers/scoring_async.py- causedNameErrorwhen usingasyncio.gather() - Synchronous HTTP calls blocking event loop - SoundCloud API calls were sync, blocking Discord heartbeat for 850+ seconds
- No timeouts on scoring operations
Fixes Applied
- Added
asyncioandThreadPoolExecutorimports toscoring_async.py - Changed
score_batch_async()to run sync scoring in thread pool with 5-min timeout - Added 2-min timeout on enrichment phase
- Reduced SoundCloud HTTP client timeout from 30s to 10s
Self-Healing Infrastructure
Watchdog Service (com.remi.watchdog)
- Location:
~/Library/LaunchAgents/com.remi.watchdog.plist - Script:
~/projects/remix-sniper/scripts/watchdog.sh - Behavior:
- Checks every 60 seconds if bot process is running
- Auto-restarts bot if process dies
- Monitors
bot_error.logfor critical errors - Writes status to
~/.bot_healthfile - Logs alerts to
~/.bot_alertfile
Health Check Script
- Location:
~/projects/remix-sniper/scripts/health_check.sh - Checks: process alive, recent errors, gateway connection
- Exit 0 = healthy, Exit 1 = issues
Manual Commands
# Check watchdog status
launchctl list | grep remi
# View watchdog logs
tail -f ~/projects/remix-sniper/watchdog.log
# Check bot health
cat ~/projects/remix-sniper/.bot_health
# Restart bot manually
pkill -f "python.*main.py"
cd ~/projects/remix-sniper && source venv/bin/activate
nohup python packages/bot/main.py >> bot.log 2>> bot_error.log &
# Stop watchdog
launchctl unload ~/Library/LaunchAgents/com.remi.watchdog.plist
# Start watchdog
launchctl load ~/Library/LaunchAgents/com.remi.watchdog.plist
Bubabot Integration
When alerted about Remi issues:
- Run
~/projects/remix-sniper/scripts/health_check.sh - If fails, check
bot_error.logfor root cause - Fix code in
~/projects/remix-sniper/packages/ - Restart bot
- Test with
/scancommand - Report to #quick-tasks