12 KiB
12 KiB
Lessons Learned
Cloudflare / Tunnels / DNS (2026-02-12)
- nohup your tunnels: cloudflared processes die when exec sessions close. Always use
nohup cloudflared tunnel ... & - Verify before announcing: Always curl the tunnel URL and confirm 200 before posting to Discord. Got burned 3 times in a row.
- Workers need DNS: Cloudflare Workers with routes need a proxied A record (use 192.0.2.1 RFC 5737 dummy IP)
- http2 > quic:
--protocol http2works more reliably than default quic for cloudflared tunnels - CF Registrar is dashboard-only: No API for new domain registration. Only management of existing domains.
- Wrangler OAuth vs API Token: The OAuth token (in wrangler config) and CLOUDFLARE_API_TOKEN have different scopes. Check both.
Python / Veo (2026-02-12)
- Unbuffered output: Use
python3 -ufor scripts running in background — otherwise stdout is buffered and you see no output - Veo download workaround:
client.files.download()returns 404. Instead grab the URI fromvideo.video.uriand download with?key=API_KEY
Discord Etiquette (2026-02-12)
- Don't spam debug messages: Do work silently, announce clean results. Jake had to tell me to delete 45 messages of debug spam. — Buba's Self-Learning Log
Every mistake is a lesson. Every lesson makes us mega beastly. This file is updated CONSTANTLY whenever I figure something out the hard way. Search this BEFORE attempting anything similar.
Gateway & Infrastructure
Gateway logs live at /tmp/clawdbot/ not ~/.clawdbot/logs/
- Date: 2026-02-11
- Mistake: Checked ~/.clawdbot/logs/ and said "nothing since Feb 5" — confused Jake
- Reality: Gateway switched to /tmp/clawdbot/clawdbot-YYYY-MM-DD.log. The old logs dir is stale.
- Rule: Always check
/tmp/clawdbot/for current gateway logs.
tmux death kills the auto-restart loop
- Date: 2026-02-11
- Mistake: Assumed compaction caused silence. Actually the entire tmux session died.
- Reality:
run-gateway.shhas awhile trueloop that only works if tmux survives. If tmux itself dies, no recovery. - Rule: When diagnosing downtime, check
tmux list-sessionsand session creation time withtmux display-message -t clawdbot -p '#{session_created}'. If the session is newer than expected, tmux died.
Gateway freeze vs crash — different diagnostics
- Date: 2026-02-11
- Mistake: Initially thought it was an event loop freeze (alive but hung). Was actually a full crash.
- Rule: Check the log timeline for gaps. If there's a gap AND the tmux session is freshly created, it was a crash. If the tmux session is old but logs have a gap, THEN it's a freeze.
Discord API
channel-list needs guildId, not channel ID
- Date: 2026-02-10
- Mistake: Passed channel ID to channel-list, got "Unknown Guild"
- Rule: Guild ID ≠ channel ID. Jake's main guild is
1458233582404501547. Channel IDs are different.
Guild ID reference
- Main server:
1458233582404501547 - Config has all guilds listed under channels.discord.guilds in clawdbot.json
Deleting messages needs the channel as target
- Date: 2026-02-10
- Rule:
message deleteneedstargetset to the channel ID where the message lives.
Cron Jobs
Cron job parameter format
- Date: 2026-02-10
- Mistake: Tried multiple wrong formats before getting it right
- Correct format:
{
"name": "job-name",
"schedule": {"kind": "cron", "expr": "0 9 * * 1,4"},
"sessionTarget": "main",
"payload": {"kind": "systemEvent", "text": "..."},
"enabled": true
}
- Rule: schedule needs
kind+expr. Payload needskind: "systemEvent"+text. NOTlabel, NOTmessage.
File Operations
Edit tool requires EXACT text match
- Date: 2026-02-11 (CREdispo sub-agent)
- Mistake: Multiple edit failures on CREdispo files because oldText didn't match exactly
- Rule: Always read the file first to get exact text before editing. Never guess at whitespace or content.
iMessage / BlueBubbles
Sending images to group chats via AppleScript is unreliable
- Date: 2026-02-10
- Mistake: Tried to send images to iMessage group chats via AppleScript — text sends but images may not deliver
- Rule: For image delivery to group chats, use BlueBubbles API directly or have Jake send manually from Discord.
Group chat ID format
- Date: 2026-02-10
- Rule: iMessage group chat IDs look like
chat358249523368699090. The send format isany;+;chat358249523368699090.
Context & Memory
ALWAYS save state to memory before heavy work
- Date: 2026-02-11
- Mistake: Was deep in CREdispo work, context got compacted, lost all working state
- Rule: Before starting any multi-step project, write current state to memory/YYYY-MM-DD.md. Update it at milestones. This survives compaction.
Compaction ≠ crash — don't confuse them
- Date: 2026-02-11
- Mistake: Told Jake compaction caused the silence when it was actually a gateway crash
- Rule: Compaction just compresses context. It doesn't stop me from responding. If I went silent, something else happened.
Image Generation
Nano Banana Pro needs specific iterative prompting for character accuracy
- Date: 2026-02-10
- Mistake: Took 4 iterations to get Caleb's appearance right (white hair → brown, no beard → beard, etc.)
- Rule: When generating character images, be VERY specific about hair color, facial hair, build, and clothing in the first prompt. Don't assume defaults.
Sub-agents
Sub-agent results arrive as system messages after compaction
- Date: 2026-02-11
- Mistake: Didn't realize the CREdispo postgres migration had completed because context was compacted
- Rule: After spawning a sub-agent for heavy work, the result comes back as a user message. If context compacts before I process it, I need to check sessions_list for completed sub-agents.
Security
Cloudflare quick tunnels break HTML form POST (405 Method Not Allowed)
- Date: 2026-02-11
- Mistake: Signup/login forms used native HTML
<form method="POST">which returns 405 through cloudflared quick tunnels - Reality: Cloudflare quick tunnels can mangle POST form submissions. JSON API calls via
fetch()work fine. - Rule: When serving apps through cloudflared tunnels, use JavaScript fetch() for form submissions instead of native HTML form POSTs. Keep the old form routes for direct access but add
/api/JSON endpoints.
VPN breaks Cloudflare tunnels
- Date: 2026-02-11
- Mistake: Had Mullvad VPN connected to Mexico while trying to create new cloudflared tunnels — tunnels couldn't establish
- Rule: Disconnect VPN before creating new cloudflared tunnels. Existing tunnels may also break when VPN connects.
API tokens must go in gateway config env.vars, not just .env files
- Date: 2026-02-11
- Mistake: Saved Cloudflare token to
.env.localbut not to gateway config. Gateway couldn't use it. - Reality: The gateway reads env vars from
clawdbot.json→env.vars. A.env.localfile is for apps, not the gateway process. - Rule: When Jake gives a new API token, save it via
gateway config.patchtoenv.varsso the gateway has it. Also save to.env.localfor local app use.
NEVER save secrets/tokens in memory/*.md files
- Date: 2026-02-11
- Rule: Memory files are git-backed and could leak. Save tokens/keys to
.env.local(which is in .gitignore). Reference them by name in memory, never by value.
Delete messages containing tokens IMMEDIATELY
- Date: 2026-02-11
- Rule: If Jake or anyone pastes a secret in Discord, delete the message FIRST, then save the token. Every second it sits in a channel is a risk.
Agent Coordination / Factory Builds
18. Parallel agents on shared filesystem = disaster
- Date: 2026-02-12
- Mistake: Spawned 5-10 sub-agents simultaneously, all writing to the same
mcpengine-repo/servers/directory - What happened: Agents deleted each other's files, overwrote each other's work, and left half-built servers everywhere
- Rule: For file-heavy work on a shared repo, go SEQUENTIAL (one agent at a time) or give each agent a SEPARATE directory, then merge. Never let multiple agents write to the same folder simultaneously.
19. "Delete everything and rebuild" agents are time bombs
- Date: 2026-02-12
- Mistake: Gave rebuild agents instructions to "DELETE everything, build from scratch"
- What happened: Agent deletes all files in minute 1, then times out at minute 10 with only 30% rebuilt. Now the server is WORSE than before.
- Rule: NEVER tell agents to delete first. Say "build new files alongside existing ones" or "write to a temp directory, then swap." Always keep the old code until the new code is verified.
20. Factory monitor cron + manual spawns = competing agents
- Date: 2026-02-12
- Mistake: Had a cron job (every 10min) spawning fix agents for incomplete servers, PLUS I was manually spawning rebuild agents
- What happened: 3-4 agents fighting over the same server simultaneously, each deleting what the others wrote
- Rule: Before spawning fix agents, DISABLE any cron monitors that might also spawn agents for the same servers. One coordinator, one set of workers. No freelancers.
21. 10-minute timeout is too short for full MCP builds
- Date: 2026-02-12
- Mistake: Set 600s (10min) timeout for agents building entire MCP servers (tools + apps + types + server + README)
- What happened: Agents got 60-80% done then died. "No output" completions burning 60-70k tokens each.
- Rule: Full MCP server builds need 900s (15min). App-only or tool-only jobs can use 600s. Always set
runTimeoutSecondsbased on scope.
22. Git checkout HEAD restores wiped files
- Date: 2026-02-12
- Mistake: Panicked when rebuild agents wiped committed files
- What saved us:
git checkout HEAD -- servers/{name}/instantly restores all committed files - Rule: Always commit after each server completes. Then if a rogue agent wipes files, one git command fixes it. Commit early, commit often.
23. Single-purpose agents > multi-purpose agents
- Date: 2026-02-12
- Mistake: Gave agents broad tasks like "build the complete MCP server" (tools + apps + types + infra + README)
- What happened: They'd run out of tokens/time trying to do everything, often failing at the apps stage
- Rule: Split into focused agents: "build tools only", "build apps only", "fix TSC errors only". Smaller scope = higher success rate. Each agent should have ONE clear deliverable.
24. Always verify sub-agent output — "success" doesn't mean complete
- Date: 2026-02-12
- Mistake: Trusted agent completion messages like "50+ tools built!" without checking
- What happened: Agent claimed 50 tools but only wrote 2 files. The "findings" text was aspirational, not factual.
- Rule: After EVERY sub-agent completion, run a file count check:
find src/tools -name "*.ts" | wc -l. Never trust the narrative. Trust the filesystem.
25. Count apps correctly — multiple storage patterns exist
- Date: 2026-02-12
- Mistake: Kept miscounting apps because different servers store them differently
- What happened: Some use subdirectories, some use .tsx files, some use .ts files, some use .html files, some use src/apps/ instead of src/ui/react-app/
- Rule: Check ALL patterns: subdirs in react-app/, .tsx files, .ts files, .html files, AND src/apps/*.ts. Take the max. Use a consistent counting script.
Last updated: 2026-02-12 22:20 EST Total lessons: 25
17. Jake's Preferred Image Style
- Mistake: Used comic book/vibrant cartoon style when Jake asked for "the style I like"
- What happened: Jake corrected — his preferred style is chibi kawaii anime, NOT comic book
- Rule: Jake's go-to image style = chibi/kawaii anime (pastel colors, big eyes, oversized heads, tiny bodies, sparkles, hearts, stars). Same style as Buba's visual identity in IDENTITY.md. Always default to this unless he says otherwise.