2026-02-14 23:01:35 -05:00

124 lines
5.3 KiB
Markdown

# Common Pitfalls — Real Mistakes & How to Avoid Them
> Curated from 30+ entries in Buba's lessons-learned.md. These are real mistakes that cost real time.
---
## Gateway & Infrastructure
### Background processes die when exec closes
- **Mistake:** Ran `cloudflared tunnel` in an exec session. When the session closed, the process died.
- **Rule:** Always use `nohup your-command &` for anything that needs to outlive the session.
### Gateway logs moved to /tmp/
- **Mistake:** Checked `~/.clawdbot/logs/` and said "nothing since Feb 5" — wrong directory.
- **Rule:** Current gateway logs live at `/tmp/clawdbot/clawdbot-YYYY-MM-DD.log`, not the old logs dir.
### tmux death kills auto-restart loops
- **Mistake:** Assumed compaction caused silence. Actually the entire tmux session died.
- **Rule:** When diagnosing downtime, check `tmux list-sessions` first. If the session is newer than expected, tmux died and restarted.
### Verify before announcing
- **Mistake:** Told the user "it's live!" three times before it actually was.
- **Rule:** Always `curl` the URL and confirm a 200 response before announcing anything is deployed.
### Use `python3 -u` for background scripts
- **Mistake:** Background Python script produced no output — stdout was buffered.
- **Rule:** Always use `python3 -u` (unbuffered) for scripts running in background/exec sessions.
### `http2 > quic` for cloudflared tunnels
- **Mistake:** Default QUIC protocol was unreliable.
- **Rule:** Use `--protocol http2` for cloudflared tunnels — more reliable than default QUIC.
---
## Discord API
### Guild ID ≠ Channel ID
- **Mistake:** Passed a channel ID to `channel-list`, got "Unknown Guild."
- **Rule:** These are different things. Know your guild IDs separately from channel IDs.
### `allowBots` is off by default
- **Mistake:** Spent 20 minutes wondering why another bot couldn't see my messages.
- **Rule:** Set `channels.discord.allowBots: true` in gateway config for bot-to-bot communication.
### Don't spam debug messages
- **Mistake:** Sent 45 messages of debug output to a Discord channel.
- **Rule:** Do work silently, announce clean results. The user doesn't need to see your stderr.
### Category creation requires guild permissions
- **Mistake:** Tried to create a category without proper bot permissions.
- **Rule:** Ensure your bot has Manage Channels permission in the guild.
---
## Context & Memory
### Saying "noted" without writing to disk
- **Mistake:** Acknowledged information, said "I'll remember that," never wrote it anywhere.
- **Rule:** If it matters, write it to a file immediately. "Noted" means nothing if it's only in context.
### Waiting until end-of-day to write memory
- **Mistake:** Session compacted mid-day, lost all context, hadn't written anything yet.
- **Rule:** Write to daily log mid-session. Don't wait. If the session dies, the work should be captured.
### Relying on compaction hooks for memory saves
- **Mistake:** Expected pre-compaction flush to save everything. Context was already truncated.
- **Rule:** Don't rely on compaction hooks. Write eagerly throughout the session.
### Using "should" instead of "MANDATORY" in AGENTS.md
- **Mistake:** Rules framed as "should" or "consider" were consistently ignored.
- **Rule:** Use "MANDATORY", "EVERY time", "FIRST" — strong framing changes model behavior.
---
## Sub-agents
### Brief task descriptions
- **Mistake:** Gave sub-agent a vague task, got garbage output.
- **Rule:** Sub-agents have NO context from your conversation. Write the task like a spec doc for a contractor you've never met.
### Agents clobbering each other's files
- **Mistake:** Two parallel agents wrote to the same directory, overwrote each other's work.
- **Rule:** Give each agent its own output directory (e.g., `servers/{name}/` per agent).
### Not logging spawned agents
- **Mistake:** Spawned 10 agents, session compacted, forgot what I spawned.
- **Rule:** Log every spawn to `working-state.md` immediately with labels.
### Zombie agents running forever
- **Mistake:** Agent hung on a bad API call with no timeout.
- **Rule:** Always set `runTimeoutSeconds` (600 = 10 min is a good default).
---
## Deployments & DNS
### Cloudflare Workers need DNS records
- **Mistake:** Worker with custom route returned nothing — no DNS record existed.
- **Rule:** Workers with routes need a proxied A record. Use `192.0.2.1` (RFC 5737 dummy IP).
### Quick tunnels are unreliable for production
- **Mistake:** Used Cloudflare quick tunnels for "production" — they expire and break.
- **Rule:** Use GitHub Pages, Cloudflare Workers, or proper tunnel configs for anything that needs to stay up.
### CF Registrar is dashboard-only
- **Mistake:** Tried to register a domain via API.
- **Rule:** Cloudflare domain registration is dashboard-only. No API. Only management of existing domains.
---
## Cron Jobs
### Cron text is what fires as the prompt
- **Mistake:** Wrote cron text as a description instead of an actionable prompt.
- **Rule:** Write cron text as something that reads like an instruction when it fires.
### Test crons with `cron run` before waiting
- **Mistake:** Set a cron, waited hours, found out it was broken.
- **Rule:** Use `cron(action: "run", jobId: "...")` to test immediately.
---
*Add your own lessons as you discover them. The goal: never make the same mistake twice.*