Jake Shore 9a11210151 Pre-reboot backup: Feb 13 — SOLVR contract, Buba 3D dashboard, MCP V3 batch 1+2, artist research, daily log

2026-02-13 03:35:43 -05:00

16 KiB

Raw Blame History

Lessons Learned

Cloudflare / Tunnels / DNS (2026-02-12)

nohup your tunnels: cloudflared processes die when exec sessions close. Always use nohup cloudflared tunnel ... &
Verify before announcing: Always curl the tunnel URL and confirm 200 before posting to Discord. Got burned 3 times in a row.
Workers need DNS: Cloudflare Workers with routes need a proxied A record (use 192.0.2.1 RFC 5737 dummy IP)
http2 > quic: --protocol http2 works more reliably than default quic for cloudflared tunnels
CF Registrar is dashboard-only: No API for new domain registration. Only management of existing domains.
Wrangler OAuth vs API Token: The OAuth token (in wrangler config) and CLOUDFLARE_API_TOKEN have different scopes. Check both.

Python / Veo (2026-02-12)

Unbuffered output: Use python3 -u for scripts running in background — otherwise stdout is buffered and you see no output
Veo download workaround: client.files.download() returns 404. Instead grab the URI from video.video.uri and download with ?key=API_KEY

Discord Etiquette (2026-02-12)

Don't spam debug messages: Do work silently, announce clean results. Jake had to tell me to delete 45 messages of debug spam. — Buba's Self-Learning Log

Every mistake is a lesson. Every lesson makes us mega beastly. This file is updated CONSTANTLY whenever I figure something out the hard way. Search this BEFORE attempting anything similar.

Gateway & Infrastructure

Gateway logs live at /tmp/clawdbot/ not ~/.clawdbot/logs/

Date: 2026-02-11
Mistake: Checked ~/.clawdbot/logs/ and said "nothing since Feb 5" — confused Jake
Reality: Gateway switched to /tmp/clawdbot/clawdbot-YYYY-MM-DD.log. The old logs dir is stale.
Rule: Always check /tmp/clawdbot/ for current gateway logs.

tmux death kills the auto-restart loop

Date: 2026-02-11
Mistake: Assumed compaction caused silence. Actually the entire tmux session died.
Reality: run-gateway.sh has a while true loop that only works if tmux survives. If tmux itself dies, no recovery.
Rule: When diagnosing downtime, check tmux list-sessions and session creation time with tmux display-message -t clawdbot -p '#{session_created}'. If the session is newer than expected, tmux died.

Gateway freeze vs crash — different diagnostics

Date: 2026-02-11
Mistake: Initially thought it was an event loop freeze (alive but hung). Was actually a full crash.
Rule: Check the log timeline for gaps. If there's a gap AND the tmux session is freshly created, it was a crash. If the tmux session is old but logs have a gap, THEN it's a freeze.

Discord API

channel-list needs guildId, not channel ID

Date: 2026-02-10
Mistake: Passed channel ID to channel-list, got "Unknown Guild"
Rule: Guild ID ≠ channel ID. Jake's main guild is 1458233582404501547. Channel IDs are different.

Guild ID reference

Main server: 1458233582404501547
Config has all guilds listed under channels.discord.guilds in clawdbot.json

Deleting messages needs the channel as target

Date: 2026-02-10
Rule: message delete needs target set to the channel ID where the message lives.

Cron Jobs

Cron job parameter format

Date: 2026-02-10
Mistake: Tried multiple wrong formats before getting it right
Correct format:

{
  "name": "job-name",
  "schedule": {"kind": "cron", "expr": "0 9 * * 1,4"},
  "sessionTarget": "main",
  "payload": {"kind": "systemEvent", "text": "..."},
  "enabled": true
}

Rule: schedule needs kind + expr. Payload needs kind: "systemEvent" + text. NOT label, NOT message.

File Operations

Edit tool requires EXACT text match

Date: 2026-02-11 (CREdispo sub-agent)
Mistake: Multiple edit failures on CREdispo files because oldText didn't match exactly
Rule: Always read the file first to get exact text before editing. Never guess at whitespace or content.

iMessage / BlueBubbles

Sending images to group chats via AppleScript is unreliable

Date: 2026-02-10
Mistake: Tried to send images to iMessage group chats via AppleScript — text sends but images may not deliver
Rule: For image delivery to group chats, use BlueBubbles API directly or have Jake send manually from Discord.

Group chat ID format

Date: 2026-02-10
Rule: iMessage group chat IDs look like chat358249523368699090. The send format is any;+;chat358249523368699090.

Context & Memory

ALWAYS save state to memory before heavy work

Date: 2026-02-11
Mistake: Was deep in CREdispo work, context got compacted, lost all working state
Rule: Before starting any multi-step project, write current state to memory/YYYY-MM-DD.md. Update it at milestones. This survives compaction.

Compaction ≠ crash — don't confuse them

Date: 2026-02-11
Mistake: Told Jake compaction caused the silence when it was actually a gateway crash
Rule: Compaction just compresses context. It doesn't stop me from responding. If I went silent, something else happened.

Image Generation

Nano Banana Pro needs specific iterative prompting for character accuracy

Date: 2026-02-10
Mistake: Took 4 iterations to get Caleb's appearance right (white hair → brown, no beard → beard, etc.)
Rule: When generating character images, be VERY specific about hair color, facial hair, build, and clothing in the first prompt. Don't assume defaults.

Sub-agents

Sub-agent results arrive as system messages after compaction

Date: 2026-02-11
Mistake: Didn't realize the CREdispo postgres migration had completed because context was compacted
Rule: After spawning a sub-agent for heavy work, the result comes back as a user message. If context compacts before I process it, I need to check sessions_list for completed sub-agents.

Security

Cloudflare quick tunnels break HTML form POST (405 Method Not Allowed)

Date: 2026-02-11
Mistake: Signup/login forms used native HTML <form method="POST"> which returns 405 through cloudflared quick tunnels
Reality: Cloudflare quick tunnels can mangle POST form submissions. JSON API calls via fetch() work fine.
Rule: When serving apps through cloudflared tunnels, use JavaScript fetch() for form submissions instead of native HTML form POSTs. Keep the old form routes for direct access but add /api/ JSON endpoints.

VPN breaks Cloudflare tunnels

Date: 2026-02-11
Mistake: Had Mullvad VPN connected to Mexico while trying to create new cloudflared tunnels — tunnels couldn't establish
Rule: Disconnect VPN before creating new cloudflared tunnels. Existing tunnels may also break when VPN connects.

API tokens must go in gateway config env.vars, not just .env files

Date: 2026-02-11
Mistake: Saved Cloudflare token to .env.local but not to gateway config. Gateway couldn't use it.
Reality: The gateway reads env vars from clawdbot.json → env.vars. A .env.local file is for apps, not the gateway process.
Rule: When Jake gives a new API token, save it via gateway config.patch to env.vars so the gateway has it. Also save to .env.local for local app use.

NEVER save secrets/tokens in memory/*.md files

Date: 2026-02-11
Rule: Memory files are git-backed and could leak. Save tokens/keys to .env.local (which is in .gitignore). Reference them by name in memory, never by value.

Delete messages containing tokens IMMEDIATELY

Date: 2026-02-11
Rule: If Jake or anyone pastes a secret in Discord, delete the message FIRST, then save the token. Every second it sits in a channel is a risk.

Agent Coordination / Factory Builds

18. Parallel agents on shared filesystem = disaster

Date: 2026-02-12
Mistake: Spawned 5-10 sub-agents simultaneously, all writing to the same mcpengine-repo/servers/ directory
What happened: Agents deleted each other's files, overwrote each other's work, and left half-built servers everywhere
Rule: For file-heavy work on a shared repo, go SEQUENTIAL (one agent at a time) or give each agent a SEPARATE directory, then merge. Never let multiple agents write to the same folder simultaneously.

19. "Delete everything and rebuild" agents are time bombs

Date: 2026-02-12
Mistake: Gave rebuild agents instructions to "DELETE everything, build from scratch"
What happened: Agent deletes all files in minute 1, then times out at minute 10 with only 30% rebuilt. Now the server is WORSE than before.
Rule: NEVER tell agents to delete first. Say "build new files alongside existing ones" or "write to a temp directory, then swap." Always keep the old code until the new code is verified.

20. Factory monitor cron + manual spawns = competing agents

Date: 2026-02-12
Mistake: Had a cron job (every 10min) spawning fix agents for incomplete servers, PLUS I was manually spawning rebuild agents
What happened: 3-4 agents fighting over the same server simultaneously, each deleting what the others wrote
Rule: Before spawning fix agents, DISABLE any cron monitors that might also spawn agents for the same servers. One coordinator, one set of workers. No freelancers.

21. 10-minute timeout is too short for full MCP builds

Date: 2026-02-12
Mistake: Set 600s (10min) timeout for agents building entire MCP servers (tools + apps + types + server + README)
What happened: Agents got 60-80% done then died. "No output" completions burning 60-70k tokens each.
Rule: Full MCP server builds need 900s (15min). App-only or tool-only jobs can use 600s. Always set runTimeoutSeconds based on scope.

22. Git checkout HEAD restores wiped files

Date: 2026-02-12
Mistake: Panicked when rebuild agents wiped committed files
What saved us: git checkout HEAD -- servers/{name}/ instantly restores all committed files
Rule: Always commit after each server completes. Then if a rogue agent wipes files, one git command fixes it. Commit early, commit often.

23. Single-purpose agents > multi-purpose agents

Date: 2026-02-12
Mistake: Gave agents broad tasks like "build the complete MCP server" (tools + apps + types + infra + README)
What happened: They'd run out of tokens/time trying to do everything, often failing at the apps stage
Rule: Split into focused agents: "build tools only", "build apps only", "fix TSC errors only". Smaller scope = higher success rate. Each agent should have ONE clear deliverable.

24. Always verify sub-agent output — "success" doesn't mean complete

Date: 2026-02-12
Mistake: Trusted agent completion messages like "50+ tools built!" without checking
What happened: Agent claimed 50 tools but only wrote 2 files. The "findings" text was aspirational, not factual.
Rule: After EVERY sub-agent completion, run a file count check: find src/tools -name "*.ts" | wc -l. Never trust the narrative. Trust the filesystem.

25. Count apps correctly — multiple storage patterns exist

Date: 2026-02-12
Mistake: Kept miscounting apps because different servers store them differently
What happened: Some use subdirectories, some use .tsx files, some use .ts files, some use .html files, some use src/apps/ instead of src/ui/react-app/
Rule: Check ALL patterns: subdirs in react-app/, .tsx files, .ts files, .html files, AND src/apps/*.ts. Take the max. Use a consistent counting script.

MCP Factory Quality Standards (2026-02-13)

26. ALWAYS start from the actual API spec — never hand-pick tools from vibes

Date: 2026-02-13
Mistake: For the 30 SMB MCP servers, I read API docs casually and hand-picked 7-8 "obvious" tools per server
What happened: Ended up with surface-level CRUD (list/get/create/update) covering maybe 10-15% of each API, missing the tools people actually need
Rule: ALWAYS pull the official OpenAPI/Swagger spec (or systematically crawl every endpoint). Build a complete endpoint inventory BEFORE deciding what becomes a tool. If Mailchimp has 127 endpoints, I need to know all 127 before picking which 50 become tools.

27. Prioritize tools by real user workflows, not alphabetical CRUD

Date: 2026-02-13
Mistake: Mechanically created list_X / get_X / create_X / update_X for each resource — zero workflow awareness
What happened: A CRM MCP that can list_leads but can't log_a_call or add_note_to_lead — the things salespeople do 50x/day
Rule: Research the platform's top use cases. Map workflow chains (create contact → add to list → send campaign → check results). Tier the tools:
- Tier 1 (daily): 10-15 things every user does daily
- Tier 2 (power user): 15-30 things power users need
- Tier 3 (complete): Everything else for full API coverage
- Ship Tier 1+2 minimum. Tier 3 = "best on market" differentiator.

28. Rich tool descriptions are NOT optional — they drive agent behavior

Date: 2026-02-13
Mistake: Wrote basic descriptions like "Lists contacts" with minimal parameter docs
What happened: AI agents make tool selection decisions based on descriptions. Vague = wrong tool chosen = bad UX
Rule: Every tool description must tell an AI agent WHEN to use it:
- BAD: "Lists contacts"
- GOOD: "Lists contacts with optional filtering by email, name, tag, or date range. Use when the user wants to find, search, or browse their contact database. Returns paginated results up to 100 per page."
- Every param needs: description, type+format constraints, defaults, required/optional, example values
- _meta labels from day one: category, access (read/write/destructive), complexity, rateLimit

29. Maintain a coverage manifest for every MCP server

Date: 2026-02-13
Mistake: No tracking of which endpoints were covered vs skipped. No way to measure quality.
Rule: Every server gets a coverage manifest in its README:
```
Total API endpoints: 127
Tools implemented: 45
Intentionally skipped: 12 (deprecated/admin-only)
Not yet covered: 70 (backlog)
Coverage: 35% → target 80%+
```
Every skipped endpoint needs a REASON (deprecated, admin-only, OAuth-only, redundant). Set 80%+ as "production quality" threshold.

30. 7-8 tools per server is a demo, not a product

Date: 2026-02-13
Mistake: Treated 7-8 tools as "enough" for the initial 30 servers
What it actually is: A toy. Nobody can do their real job with 7 tools for a platform that has 100+ API endpoints.
Rule: Minimum viable tool count depends on API size:
- Small API (<30 endpoints): 15-20 tools
- Medium API (30-100 endpoints): 30-50 tools
- Large API (100+ endpoints): 50-80+ tools
- If customers install it and can't do their #1 use case, it's not a product.

31. Consistent naming conventions across ALL servers — no exceptions

Date: 2026-02-13
Rule: Factory-wide naming standard:
- list_* for paginated collections
- get_* for single resource by ID
- create_*, update_*, delete_* for mutations
- search_* for query-based lookups
- Domain verbs: send_email, cancel_event, archive_card, assign_task
- NEVER mix fetch_* / get_* / retrieve_* — pick ONE
- All snake_case, all lowercase

32. Handle pagination and rate limits properly in every server

Date: 2026-02-13
Rule: Every list_* tool must:
- Support cursor/page tokens
- Use reasonable default page sizes (25-100, never "all")
- Return has_more / next_page indicators
- Handle API rate limits (429) with retry + exponential backoff
- Document known rate limits in tool _meta

Last updated: 2026-02-13 02:46 EST Total lessons: 32

17. Jake's Preferred Image Style

Mistake: Used comic book/vibrant cartoon style when Jake asked for "the style I like"
What happened: Jake corrected — his preferred style is chibi kawaii anime, NOT comic book
Rule: Jake's go-to image style = chibi/kawaii anime (pastel colors, big eyes, oversized heads, tiny bodies, sparkles, hearts, stars). Same style as Buba's visual identity in IDENTITY.md. Always default to this unless he says otherwise.

16 KiB Raw Blame History

Lessons Learned

Cloudflare / Tunnels / DNS (2026-02-12)

Python / Veo (2026-02-12)

Discord Etiquette (2026-02-12)

Gateway & Infrastructure

Gateway logs live at /tmp/clawdbot/ not ~/.clawdbot/logs/

tmux death kills the auto-restart loop

Gateway freeze vs crash — different diagnostics

Discord API

channel-list needs guildId, not channel ID

Guild ID reference

Deleting messages needs the channel as target

Cron Jobs

Cron job parameter format

File Operations

Edit tool requires EXACT text match

iMessage / BlueBubbles

Sending images to group chats via AppleScript is unreliable

Group chat ID format

Context & Memory

ALWAYS save state to memory before heavy work

Compaction ≠ crash — don't confuse them

Image Generation

Nano Banana Pro needs specific iterative prompting for character accuracy

Sub-agents

Sub-agent results arrive as system messages after compaction

Security

Cloudflare quick tunnels break HTML form POST (405 Method Not Allowed)

VPN breaks Cloudflare tunnels

API tokens must go in gateway config env.vars, not just .env files

NEVER save secrets/tokens in memory/*.md files

Delete messages containing tokens IMMEDIATELY

Agent Coordination / Factory Builds

18. Parallel agents on shared filesystem = disaster

19. "Delete everything and rebuild" agents are time bombs

20. Factory monitor cron + manual spawns = competing agents

21. 10-minute timeout is too short for full MCP builds

22. Git checkout HEAD restores wiped files

23. Single-purpose agents > multi-purpose agents

24. Always verify sub-agent output — "success" doesn't mean complete

25. Count apps correctly — multiple storage patterns exist

MCP Factory Quality Standards (2026-02-13)

26. ALWAYS start from the actual API spec — never hand-pick tools from vibes

27. Prioritize tools by real user workflows, not alphabetical CRUD

28. Rich tool descriptions are NOT optional — they drive agent behavior

29. Maintain a coverage manifest for every MCP server

30. 7-8 tools per server is a demo, not a product

31. Consistent naming conventions across ALL servers — no exceptions

32. Handle pagination and rate limits properly in every server

17. Jake's Preferred Image Style

16 KiB

Raw Blame History