clawdbot-workspace/memory/lessons-learned.md

12 KiB

Lessons Learned

Cloudflare / Tunnels / DNS (2026-02-12)

  • nohup your tunnels: cloudflared processes die when exec sessions close. Always use nohup cloudflared tunnel ... &
  • Verify before announcing: Always curl the tunnel URL and confirm 200 before posting to Discord. Got burned 3 times in a row.
  • Workers need DNS: Cloudflare Workers with routes need a proxied A record (use 192.0.2.1 RFC 5737 dummy IP)
  • http2 > quic: --protocol http2 works more reliably than default quic for cloudflared tunnels
  • CF Registrar is dashboard-only: No API for new domain registration. Only management of existing domains.
  • Wrangler OAuth vs API Token: The OAuth token (in wrangler config) and CLOUDFLARE_API_TOKEN have different scopes. Check both.

Python / Veo (2026-02-12)

  • Unbuffered output: Use python3 -u for scripts running in background — otherwise stdout is buffered and you see no output
  • Veo download workaround: client.files.download() returns 404. Instead grab the URI from video.video.uri and download with ?key=API_KEY

Discord Etiquette (2026-02-12)

  • Don't spam debug messages: Do work silently, announce clean results. Jake had to tell me to delete 45 messages of debug spam. — Buba's Self-Learning Log

Every mistake is a lesson. Every lesson makes us mega beastly. This file is updated CONSTANTLY whenever I figure something out the hard way. Search this BEFORE attempting anything similar.


Gateway & Infrastructure

Gateway logs live at /tmp/clawdbot/ not ~/.clawdbot/logs/

  • Date: 2026-02-11
  • Mistake: Checked ~/.clawdbot/logs/ and said "nothing since Feb 5" — confused Jake
  • Reality: Gateway switched to /tmp/clawdbot/clawdbot-YYYY-MM-DD.log. The old logs dir is stale.
  • Rule: Always check /tmp/clawdbot/ for current gateway logs.

tmux death kills the auto-restart loop

  • Date: 2026-02-11
  • Mistake: Assumed compaction caused silence. Actually the entire tmux session died.
  • Reality: run-gateway.sh has a while true loop that only works if tmux survives. If tmux itself dies, no recovery.
  • Rule: When diagnosing downtime, check tmux list-sessions and session creation time with tmux display-message -t clawdbot -p '#{session_created}'. If the session is newer than expected, tmux died.

Gateway freeze vs crash — different diagnostics

  • Date: 2026-02-11
  • Mistake: Initially thought it was an event loop freeze (alive but hung). Was actually a full crash.
  • Rule: Check the log timeline for gaps. If there's a gap AND the tmux session is freshly created, it was a crash. If the tmux session is old but logs have a gap, THEN it's a freeze.

Discord API

channel-list needs guildId, not channel ID

  • Date: 2026-02-10
  • Mistake: Passed channel ID to channel-list, got "Unknown Guild"
  • Rule: Guild ID ≠ channel ID. Jake's main guild is 1458233582404501547. Channel IDs are different.

Guild ID reference

  • Main server: 1458233582404501547
  • Config has all guilds listed under channels.discord.guilds in clawdbot.json

Deleting messages needs the channel as target

  • Date: 2026-02-10
  • Rule: message delete needs target set to the channel ID where the message lives.

Cron Jobs

Cron job parameter format

  • Date: 2026-02-10
  • Mistake: Tried multiple wrong formats before getting it right
  • Correct format:
{
  "name": "job-name",
  "schedule": {"kind": "cron", "expr": "0 9 * * 1,4"},
  "sessionTarget": "main",
  "payload": {"kind": "systemEvent", "text": "..."},
  "enabled": true
}
  • Rule: schedule needs kind + expr. Payload needs kind: "systemEvent" + text. NOT label, NOT message.

File Operations

Edit tool requires EXACT text match

  • Date: 2026-02-11 (CREdispo sub-agent)
  • Mistake: Multiple edit failures on CREdispo files because oldText didn't match exactly
  • Rule: Always read the file first to get exact text before editing. Never guess at whitespace or content.

iMessage / BlueBubbles

Sending images to group chats via AppleScript is unreliable

  • Date: 2026-02-10
  • Mistake: Tried to send images to iMessage group chats via AppleScript — text sends but images may not deliver
  • Rule: For image delivery to group chats, use BlueBubbles API directly or have Jake send manually from Discord.

Group chat ID format

  • Date: 2026-02-10
  • Rule: iMessage group chat IDs look like chat358249523368699090. The send format is any;+;chat358249523368699090.

Context & Memory

ALWAYS save state to memory before heavy work

  • Date: 2026-02-11
  • Mistake: Was deep in CREdispo work, context got compacted, lost all working state
  • Rule: Before starting any multi-step project, write current state to memory/YYYY-MM-DD.md. Update it at milestones. This survives compaction.

Compaction ≠ crash — don't confuse them

  • Date: 2026-02-11
  • Mistake: Told Jake compaction caused the silence when it was actually a gateway crash
  • Rule: Compaction just compresses context. It doesn't stop me from responding. If I went silent, something else happened.

Image Generation

Nano Banana Pro needs specific iterative prompting for character accuracy

  • Date: 2026-02-10
  • Mistake: Took 4 iterations to get Caleb's appearance right (white hair → brown, no beard → beard, etc.)
  • Rule: When generating character images, be VERY specific about hair color, facial hair, build, and clothing in the first prompt. Don't assume defaults.

Sub-agents

Sub-agent results arrive as system messages after compaction

  • Date: 2026-02-11
  • Mistake: Didn't realize the CREdispo postgres migration had completed because context was compacted
  • Rule: After spawning a sub-agent for heavy work, the result comes back as a user message. If context compacts before I process it, I need to check sessions_list for completed sub-agents.

Security

Cloudflare quick tunnels break HTML form POST (405 Method Not Allowed)

  • Date: 2026-02-11
  • Mistake: Signup/login forms used native HTML <form method="POST"> which returns 405 through cloudflared quick tunnels
  • Reality: Cloudflare quick tunnels can mangle POST form submissions. JSON API calls via fetch() work fine.
  • Rule: When serving apps through cloudflared tunnels, use JavaScript fetch() for form submissions instead of native HTML form POSTs. Keep the old form routes for direct access but add /api/ JSON endpoints.

VPN breaks Cloudflare tunnels

  • Date: 2026-02-11
  • Mistake: Had Mullvad VPN connected to Mexico while trying to create new cloudflared tunnels — tunnels couldn't establish
  • Rule: Disconnect VPN before creating new cloudflared tunnels. Existing tunnels may also break when VPN connects.

API tokens must go in gateway config env.vars, not just .env files

  • Date: 2026-02-11
  • Mistake: Saved Cloudflare token to .env.local but not to gateway config. Gateway couldn't use it.
  • Reality: The gateway reads env vars from clawdbot.jsonenv.vars. A .env.local file is for apps, not the gateway process.
  • Rule: When Jake gives a new API token, save it via gateway config.patch to env.vars so the gateway has it. Also save to .env.local for local app use.

NEVER save secrets/tokens in memory/*.md files

  • Date: 2026-02-11
  • Rule: Memory files are git-backed and could leak. Save tokens/keys to .env.local (which is in .gitignore). Reference them by name in memory, never by value.

Delete messages containing tokens IMMEDIATELY

  • Date: 2026-02-11
  • Rule: If Jake or anyone pastes a secret in Discord, delete the message FIRST, then save the token. Every second it sits in a channel is a risk.

Agent Coordination / Factory Builds

18. Parallel agents on shared filesystem = disaster

  • Date: 2026-02-12
  • Mistake: Spawned 5-10 sub-agents simultaneously, all writing to the same mcpengine-repo/servers/ directory
  • What happened: Agents deleted each other's files, overwrote each other's work, and left half-built servers everywhere
  • Rule: For file-heavy work on a shared repo, go SEQUENTIAL (one agent at a time) or give each agent a SEPARATE directory, then merge. Never let multiple agents write to the same folder simultaneously.

19. "Delete everything and rebuild" agents are time bombs

  • Date: 2026-02-12
  • Mistake: Gave rebuild agents instructions to "DELETE everything, build from scratch"
  • What happened: Agent deletes all files in minute 1, then times out at minute 10 with only 30% rebuilt. Now the server is WORSE than before.
  • Rule: NEVER tell agents to delete first. Say "build new files alongside existing ones" or "write to a temp directory, then swap." Always keep the old code until the new code is verified.

20. Factory monitor cron + manual spawns = competing agents

  • Date: 2026-02-12
  • Mistake: Had a cron job (every 10min) spawning fix agents for incomplete servers, PLUS I was manually spawning rebuild agents
  • What happened: 3-4 agents fighting over the same server simultaneously, each deleting what the others wrote
  • Rule: Before spawning fix agents, DISABLE any cron monitors that might also spawn agents for the same servers. One coordinator, one set of workers. No freelancers.

21. 10-minute timeout is too short for full MCP builds

  • Date: 2026-02-12
  • Mistake: Set 600s (10min) timeout for agents building entire MCP servers (tools + apps + types + server + README)
  • What happened: Agents got 60-80% done then died. "No output" completions burning 60-70k tokens each.
  • Rule: Full MCP server builds need 900s (15min). App-only or tool-only jobs can use 600s. Always set runTimeoutSeconds based on scope.

22. Git checkout HEAD restores wiped files

  • Date: 2026-02-12
  • Mistake: Panicked when rebuild agents wiped committed files
  • What saved us: git checkout HEAD -- servers/{name}/ instantly restores all committed files
  • Rule: Always commit after each server completes. Then if a rogue agent wipes files, one git command fixes it. Commit early, commit often.

23. Single-purpose agents > multi-purpose agents

  • Date: 2026-02-12
  • Mistake: Gave agents broad tasks like "build the complete MCP server" (tools + apps + types + infra + README)
  • What happened: They'd run out of tokens/time trying to do everything, often failing at the apps stage
  • Rule: Split into focused agents: "build tools only", "build apps only", "fix TSC errors only". Smaller scope = higher success rate. Each agent should have ONE clear deliverable.

24. Always verify sub-agent output — "success" doesn't mean complete

  • Date: 2026-02-12
  • Mistake: Trusted agent completion messages like "50+ tools built!" without checking
  • What happened: Agent claimed 50 tools but only wrote 2 files. The "findings" text was aspirational, not factual.
  • Rule: After EVERY sub-agent completion, run a file count check: find src/tools -name "*.ts" | wc -l. Never trust the narrative. Trust the filesystem.

25. Count apps correctly — multiple storage patterns exist

  • Date: 2026-02-12
  • Mistake: Kept miscounting apps because different servers store them differently
  • What happened: Some use subdirectories, some use .tsx files, some use .ts files, some use .html files, some use src/apps/ instead of src/ui/react-app/
  • Rule: Check ALL patterns: subdirs in react-app/, .tsx files, .ts files, .html files, AND src/apps/*.ts. Take the max. Use a consistent counting script.

Last updated: 2026-02-12 22:20 EST Total lessons: 25

17. Jake's Preferred Image Style

  • Mistake: Used comic book/vibrant cartoon style when Jake asked for "the style I like"
  • What happened: Jake corrected — his preferred style is chibi kawaii anime, NOT comic book
  • Rule: Jake's go-to image style = chibi/kawaii anime (pastel colors, big eyes, oversized heads, tiny bodies, sparkles, hearts, stars). Same style as Buba's visual identity in IDENTITY.md. Always default to this unless he says otherwise.