205 lines
12 KiB
Markdown
205 lines
12 KiB
Markdown
# Lessons Learned
|
|
|
|
## Cloudflare / Tunnels / DNS (2026-02-12)
|
|
- **nohup your tunnels**: cloudflared processes die when exec sessions close. Always use `nohup cloudflared tunnel ... &`
|
|
- **Verify before announcing**: Always curl the tunnel URL and confirm 200 before posting to Discord. Got burned 3 times in a row.
|
|
- **Workers need DNS**: Cloudflare Workers with routes need a proxied A record (use 192.0.2.1 RFC 5737 dummy IP)
|
|
- **http2 > quic**: `--protocol http2` works more reliably than default quic for cloudflared tunnels
|
|
- **CF Registrar is dashboard-only**: No API for new domain registration. Only management of existing domains.
|
|
- **Wrangler OAuth vs API Token**: The OAuth token (in wrangler config) and CLOUDFLARE_API_TOKEN have different scopes. Check both.
|
|
|
|
## Python / Veo (2026-02-12)
|
|
- **Unbuffered output**: Use `python3 -u` for scripts running in background — otherwise stdout is buffered and you see no output
|
|
- **Veo download workaround**: `client.files.download()` returns 404. Instead grab the URI from `video.video.uri` and download with `?key=API_KEY`
|
|
|
|
## Discord Etiquette (2026-02-12)
|
|
- **Don't spam debug messages**: Do work silently, announce clean results. Jake had to tell me to delete 45 messages of debug spam. — Buba's Self-Learning Log
|
|
|
|
> Every mistake is a lesson. Every lesson makes us mega beastly.
|
|
> This file is updated CONSTANTLY whenever I figure something out the hard way.
|
|
> Search this BEFORE attempting anything similar.
|
|
|
|
---
|
|
|
|
## Gateway & Infrastructure
|
|
|
|
### Gateway logs live at /tmp/clawdbot/ not ~/.clawdbot/logs/
|
|
- **Date:** 2026-02-11
|
|
- **Mistake:** Checked ~/.clawdbot/logs/ and said "nothing since Feb 5" — confused Jake
|
|
- **Reality:** Gateway switched to /tmp/clawdbot/clawdbot-YYYY-MM-DD.log. The old logs dir is stale.
|
|
- **Rule:** Always check `/tmp/clawdbot/` for current gateway logs.
|
|
|
|
### tmux death kills the auto-restart loop
|
|
- **Date:** 2026-02-11
|
|
- **Mistake:** Assumed compaction caused silence. Actually the entire tmux session died.
|
|
- **Reality:** `run-gateway.sh` has a `while true` loop that only works if tmux survives. If tmux itself dies, no recovery.
|
|
- **Rule:** When diagnosing downtime, check `tmux list-sessions` and session creation time with `tmux display-message -t clawdbot -p '#{session_created}'`. If the session is newer than expected, tmux died.
|
|
|
|
### Gateway freeze vs crash — different diagnostics
|
|
- **Date:** 2026-02-11
|
|
- **Mistake:** Initially thought it was an event loop freeze (alive but hung). Was actually a full crash.
|
|
- **Rule:** Check the log timeline for gaps. If there's a gap AND the tmux session is freshly created, it was a crash. If the tmux session is old but logs have a gap, THEN it's a freeze.
|
|
|
|
## Discord API
|
|
|
|
### channel-list needs guildId, not channel ID
|
|
- **Date:** 2026-02-10
|
|
- **Mistake:** Passed channel ID to channel-list, got "Unknown Guild"
|
|
- **Rule:** Guild ID ≠ channel ID. Jake's main guild is `1458233582404501547`. Channel IDs are different.
|
|
|
|
### Guild ID reference
|
|
- **Main server:** `1458233582404501547`
|
|
- **Config has all guilds listed** under channels.discord.guilds in clawdbot.json
|
|
|
|
### Deleting messages needs the channel as target
|
|
- **Date:** 2026-02-10
|
|
- **Rule:** `message delete` needs `target` set to the channel ID where the message lives.
|
|
|
|
## Cron Jobs
|
|
|
|
### Cron job parameter format
|
|
- **Date:** 2026-02-10
|
|
- **Mistake:** Tried multiple wrong formats before getting it right
|
|
- **Correct format:**
|
|
```json
|
|
{
|
|
"name": "job-name",
|
|
"schedule": {"kind": "cron", "expr": "0 9 * * 1,4"},
|
|
"sessionTarget": "main",
|
|
"payload": {"kind": "systemEvent", "text": "..."},
|
|
"enabled": true
|
|
}
|
|
```
|
|
- **Rule:** schedule needs `kind` + `expr`. Payload needs `kind: "systemEvent"` + `text`. NOT `label`, NOT `message`.
|
|
|
|
## File Operations
|
|
|
|
### Edit tool requires EXACT text match
|
|
- **Date:** 2026-02-11 (CREdispo sub-agent)
|
|
- **Mistake:** Multiple edit failures on CREdispo files because oldText didn't match exactly
|
|
- **Rule:** Always read the file first to get exact text before editing. Never guess at whitespace or content.
|
|
|
|
## iMessage / BlueBubbles
|
|
|
|
### Sending images to group chats via AppleScript is unreliable
|
|
- **Date:** 2026-02-10
|
|
- **Mistake:** Tried to send images to iMessage group chats via AppleScript — text sends but images may not deliver
|
|
- **Rule:** For image delivery to group chats, use BlueBubbles API directly or have Jake send manually from Discord.
|
|
|
|
### Group chat ID format
|
|
- **Date:** 2026-02-10
|
|
- **Rule:** iMessage group chat IDs look like `chat358249523368699090`. The send format is `any;+;chat358249523368699090`.
|
|
|
|
## Context & Memory
|
|
|
|
### ALWAYS save state to memory before heavy work
|
|
- **Date:** 2026-02-11
|
|
- **Mistake:** Was deep in CREdispo work, context got compacted, lost all working state
|
|
- **Rule:** Before starting any multi-step project, write current state to memory/YYYY-MM-DD.md. Update it at milestones. This survives compaction.
|
|
|
|
### Compaction ≠ crash — don't confuse them
|
|
- **Date:** 2026-02-11
|
|
- **Mistake:** Told Jake compaction caused the silence when it was actually a gateway crash
|
|
- **Rule:** Compaction just compresses context. It doesn't stop me from responding. If I went silent, something else happened.
|
|
|
|
## Image Generation
|
|
|
|
### Nano Banana Pro needs specific iterative prompting for character accuracy
|
|
- **Date:** 2026-02-10
|
|
- **Mistake:** Took 4 iterations to get Caleb's appearance right (white hair → brown, no beard → beard, etc.)
|
|
- **Rule:** When generating character images, be VERY specific about hair color, facial hair, build, and clothing in the first prompt. Don't assume defaults.
|
|
|
|
## Sub-agents
|
|
|
|
### Sub-agent results arrive as system messages after compaction
|
|
- **Date:** 2026-02-11
|
|
- **Mistake:** Didn't realize the CREdispo postgres migration had completed because context was compacted
|
|
- **Rule:** After spawning a sub-agent for heavy work, the result comes back as a user message. If context compacts before I process it, I need to check sessions_list for completed sub-agents.
|
|
|
|
## Security
|
|
|
|
### Cloudflare quick tunnels break HTML form POST (405 Method Not Allowed)
|
|
- **Date:** 2026-02-11
|
|
- **Mistake:** Signup/login forms used native HTML `<form method="POST">` which returns 405 through cloudflared quick tunnels
|
|
- **Reality:** Cloudflare quick tunnels can mangle POST form submissions. JSON API calls via `fetch()` work fine.
|
|
- **Rule:** When serving apps through cloudflared tunnels, use JavaScript fetch() for form submissions instead of native HTML form POSTs. Keep the old form routes for direct access but add `/api/` JSON endpoints.
|
|
|
|
### VPN breaks Cloudflare tunnels
|
|
- **Date:** 2026-02-11
|
|
- **Mistake:** Had Mullvad VPN connected to Mexico while trying to create new cloudflared tunnels — tunnels couldn't establish
|
|
- **Rule:** Disconnect VPN before creating new cloudflared tunnels. Existing tunnels may also break when VPN connects.
|
|
|
|
### API tokens must go in gateway config env.vars, not just .env files
|
|
- **Date:** 2026-02-11
|
|
- **Mistake:** Saved Cloudflare token to `.env.local` but not to gateway config. Gateway couldn't use it.
|
|
- **Reality:** The gateway reads env vars from `clawdbot.json` → `env.vars`. A `.env.local` file is for apps, not the gateway process.
|
|
- **Rule:** When Jake gives a new API token, save it via `gateway config.patch` to `env.vars` so the gateway has it. Also save to `.env.local` for local app use.
|
|
|
|
### NEVER save secrets/tokens in memory/*.md files
|
|
- **Date:** 2026-02-11
|
|
- **Rule:** Memory files are git-backed and could leak. Save tokens/keys to `.env.local` (which is in .gitignore). Reference them by name in memory, never by value.
|
|
|
|
### Delete messages containing tokens IMMEDIATELY
|
|
- **Date:** 2026-02-11
|
|
- **Rule:** If Jake or anyone pastes a secret in Discord, delete the message FIRST, then save the token. Every second it sits in a channel is a risk.
|
|
|
|
---
|
|
|
|
## Agent Coordination / Factory Builds
|
|
|
|
### 18. Parallel agents on shared filesystem = disaster
|
|
- **Date:** 2026-02-12
|
|
- **Mistake:** Spawned 5-10 sub-agents simultaneously, all writing to the same `mcpengine-repo/servers/` directory
|
|
- **What happened:** Agents deleted each other's files, overwrote each other's work, and left half-built servers everywhere
|
|
- **Rule:** For file-heavy work on a shared repo, go SEQUENTIAL (one agent at a time) or give each agent a SEPARATE directory, then merge. Never let multiple agents write to the same folder simultaneously.
|
|
|
|
### 19. "Delete everything and rebuild" agents are time bombs
|
|
- **Date:** 2026-02-12
|
|
- **Mistake:** Gave rebuild agents instructions to "DELETE everything, build from scratch"
|
|
- **What happened:** Agent deletes all files in minute 1, then times out at minute 10 with only 30% rebuilt. Now the server is WORSE than before.
|
|
- **Rule:** NEVER tell agents to delete first. Say "build new files alongside existing ones" or "write to a temp directory, then swap." Always keep the old code until the new code is verified.
|
|
|
|
### 20. Factory monitor cron + manual spawns = competing agents
|
|
- **Date:** 2026-02-12
|
|
- **Mistake:** Had a cron job (every 10min) spawning fix agents for incomplete servers, PLUS I was manually spawning rebuild agents
|
|
- **What happened:** 3-4 agents fighting over the same server simultaneously, each deleting what the others wrote
|
|
- **Rule:** Before spawning fix agents, DISABLE any cron monitors that might also spawn agents for the same servers. One coordinator, one set of workers. No freelancers.
|
|
|
|
### 21. 10-minute timeout is too short for full MCP builds
|
|
- **Date:** 2026-02-12
|
|
- **Mistake:** Set 600s (10min) timeout for agents building entire MCP servers (tools + apps + types + server + README)
|
|
- **What happened:** Agents got 60-80% done then died. "No output" completions burning 60-70k tokens each.
|
|
- **Rule:** Full MCP server builds need 900s (15min). App-only or tool-only jobs can use 600s. Always set `runTimeoutSeconds` based on scope.
|
|
|
|
### 22. Git checkout HEAD restores wiped files
|
|
- **Date:** 2026-02-12
|
|
- **Mistake:** Panicked when rebuild agents wiped committed files
|
|
- **What saved us:** `git checkout HEAD -- servers/{name}/` instantly restores all committed files
|
|
- **Rule:** Always commit after each server completes. Then if a rogue agent wipes files, one git command fixes it. Commit early, commit often.
|
|
|
|
### 23. Single-purpose agents > multi-purpose agents
|
|
- **Date:** 2026-02-12
|
|
- **Mistake:** Gave agents broad tasks like "build the complete MCP server" (tools + apps + types + infra + README)
|
|
- **What happened:** They'd run out of tokens/time trying to do everything, often failing at the apps stage
|
|
- **Rule:** Split into focused agents: "build tools only", "build apps only", "fix TSC errors only". Smaller scope = higher success rate. Each agent should have ONE clear deliverable.
|
|
|
|
### 24. Always verify sub-agent output — "success" doesn't mean complete
|
|
- **Date:** 2026-02-12
|
|
- **Mistake:** Trusted agent completion messages like "50+ tools built!" without checking
|
|
- **What happened:** Agent claimed 50 tools but only wrote 2 files. The "findings" text was aspirational, not factual.
|
|
- **Rule:** After EVERY sub-agent completion, run a file count check: `find src/tools -name "*.ts" | wc -l`. Never trust the narrative. Trust the filesystem.
|
|
|
|
### 25. Count apps correctly — multiple storage patterns exist
|
|
- **Date:** 2026-02-12
|
|
- **Mistake:** Kept miscounting apps because different servers store them differently
|
|
- **What happened:** Some use subdirectories, some use .tsx files, some use .ts files, some use .html files, some use src/apps/ instead of src/ui/react-app/
|
|
- **Rule:** Check ALL patterns: subdirs in react-app/, .tsx files, .ts files, .html files, AND src/apps/*.ts. Take the max. Use a consistent counting script.
|
|
|
|
*Last updated: 2026-02-12 22:20 EST*
|
|
*Total lessons: 25*
|
|
|
|
### 17. Jake's Preferred Image Style
|
|
- **Mistake:** Used comic book/vibrant cartoon style when Jake asked for "the style I like"
|
|
- **What happened:** Jake corrected — his preferred style is **chibi kawaii anime**, NOT comic book
|
|
- **Rule:** Jake's go-to image style = chibi/kawaii anime (pastel colors, big eyes, oversized heads, tiny bodies, sparkles, hearts, stars). Same style as Buba's visual identity in IDENTITY.md. Always default to this unless he says otherwise.
|