124 lines
5.3 KiB
Markdown
124 lines
5.3 KiB
Markdown
# Common Pitfalls — Real Mistakes & How to Avoid Them
|
|
|
|
> Curated from 30+ entries in Buba's lessons-learned.md. These are real mistakes that cost real time.
|
|
|
|
---
|
|
|
|
## Gateway & Infrastructure
|
|
|
|
### Background processes die when exec closes
|
|
- **Mistake:** Ran `cloudflared tunnel` in an exec session. When the session closed, the process died.
|
|
- **Rule:** Always use `nohup your-command &` for anything that needs to outlive the session.
|
|
|
|
### Gateway logs moved to /tmp/
|
|
- **Mistake:** Checked `~/.clawdbot/logs/` and said "nothing since Feb 5" — wrong directory.
|
|
- **Rule:** Current gateway logs live at `/tmp/clawdbot/clawdbot-YYYY-MM-DD.log`, not the old logs dir.
|
|
|
|
### tmux death kills auto-restart loops
|
|
- **Mistake:** Assumed compaction caused silence. Actually the entire tmux session died.
|
|
- **Rule:** When diagnosing downtime, check `tmux list-sessions` first. If the session is newer than expected, tmux died and restarted.
|
|
|
|
### Verify before announcing
|
|
- **Mistake:** Told the user "it's live!" three times before it actually was.
|
|
- **Rule:** Always `curl` the URL and confirm a 200 response before announcing anything is deployed.
|
|
|
|
### Use `python3 -u` for background scripts
|
|
- **Mistake:** Background Python script produced no output — stdout was buffered.
|
|
- **Rule:** Always use `python3 -u` (unbuffered) for scripts running in background/exec sessions.
|
|
|
|
### `http2 > quic` for cloudflared tunnels
|
|
- **Mistake:** Default QUIC protocol was unreliable.
|
|
- **Rule:** Use `--protocol http2` for cloudflared tunnels — more reliable than default QUIC.
|
|
|
|
---
|
|
|
|
## Discord API
|
|
|
|
### Guild ID ≠ Channel ID
|
|
- **Mistake:** Passed a channel ID to `channel-list`, got "Unknown Guild."
|
|
- **Rule:** These are different things. Know your guild IDs separately from channel IDs.
|
|
|
|
### `allowBots` is off by default
|
|
- **Mistake:** Spent 20 minutes wondering why another bot couldn't see my messages.
|
|
- **Rule:** Set `channels.discord.allowBots: true` in gateway config for bot-to-bot communication.
|
|
|
|
### Don't spam debug messages
|
|
- **Mistake:** Sent 45 messages of debug output to a Discord channel.
|
|
- **Rule:** Do work silently, announce clean results. The user doesn't need to see your stderr.
|
|
|
|
### Category creation requires guild permissions
|
|
- **Mistake:** Tried to create a category without proper bot permissions.
|
|
- **Rule:** Ensure your bot has Manage Channels permission in the guild.
|
|
|
|
---
|
|
|
|
## Context & Memory
|
|
|
|
### Saying "noted" without writing to disk
|
|
- **Mistake:** Acknowledged information, said "I'll remember that," never wrote it anywhere.
|
|
- **Rule:** If it matters, write it to a file immediately. "Noted" means nothing if it's only in context.
|
|
|
|
### Waiting until end-of-day to write memory
|
|
- **Mistake:** Session compacted mid-day, lost all context, hadn't written anything yet.
|
|
- **Rule:** Write to daily log mid-session. Don't wait. If the session dies, the work should be captured.
|
|
|
|
### Relying on compaction hooks for memory saves
|
|
- **Mistake:** Expected pre-compaction flush to save everything. Context was already truncated.
|
|
- **Rule:** Don't rely on compaction hooks. Write eagerly throughout the session.
|
|
|
|
### Using "should" instead of "MANDATORY" in AGENTS.md
|
|
- **Mistake:** Rules framed as "should" or "consider" were consistently ignored.
|
|
- **Rule:** Use "MANDATORY", "EVERY time", "FIRST" — strong framing changes model behavior.
|
|
|
|
---
|
|
|
|
## Sub-agents
|
|
|
|
### Brief task descriptions
|
|
- **Mistake:** Gave sub-agent a vague task, got garbage output.
|
|
- **Rule:** Sub-agents have NO context from your conversation. Write the task like a spec doc for a contractor you've never met.
|
|
|
|
### Agents clobbering each other's files
|
|
- **Mistake:** Two parallel agents wrote to the same directory, overwrote each other's work.
|
|
- **Rule:** Give each agent its own output directory (e.g., `servers/{name}/` per agent).
|
|
|
|
### Not logging spawned agents
|
|
- **Mistake:** Spawned 10 agents, session compacted, forgot what I spawned.
|
|
- **Rule:** Log every spawn to `working-state.md` immediately with labels.
|
|
|
|
### Zombie agents running forever
|
|
- **Mistake:** Agent hung on a bad API call with no timeout.
|
|
- **Rule:** Always set `runTimeoutSeconds` (600 = 10 min is a good default).
|
|
|
|
---
|
|
|
|
## Deployments & DNS
|
|
|
|
### Cloudflare Workers need DNS records
|
|
- **Mistake:** Worker with custom route returned nothing — no DNS record existed.
|
|
- **Rule:** Workers with routes need a proxied A record. Use `192.0.2.1` (RFC 5737 dummy IP).
|
|
|
|
### Quick tunnels are unreliable for production
|
|
- **Mistake:** Used Cloudflare quick tunnels for "production" — they expire and break.
|
|
- **Rule:** Use GitHub Pages, Cloudflare Workers, or proper tunnel configs for anything that needs to stay up.
|
|
|
|
### CF Registrar is dashboard-only
|
|
- **Mistake:** Tried to register a domain via API.
|
|
- **Rule:** Cloudflare domain registration is dashboard-only. No API. Only management of existing domains.
|
|
|
|
---
|
|
|
|
## Cron Jobs
|
|
|
|
### Cron text is what fires as the prompt
|
|
- **Mistake:** Wrote cron text as a description instead of an actionable prompt.
|
|
- **Rule:** Write cron text as something that reads like an instruction when it fires.
|
|
|
|
### Test crons with `cron run` before waiting
|
|
- **Mistake:** Set a cron, waited hours, found out it was broken.
|
|
- **Rule:** Use `cron(action: "run", jobId: "...")` to test immediately.
|
|
|
|
---
|
|
|
|
*Add your own lessons as you discover them. The goal: never make the same mistake twice.*
|