diff --git a/HEARTBEAT.md b/HEARTBEAT.md index 91ab611..7892cfd 100644 --- a/HEARTBEAT.md +++ b/HEARTBEAT.md @@ -1,50 +1,64 @@ # HEARTBEAT.md — Active Task State ## Current Task -- **Project:** Multi-project day — CREdispo, OSKV, MCP Pipeline, FB Ads App, Memory System -- **Last completed:** Memory system upgrade (lessons-learned + working-state), coaching Day 6 escalation, FB Ads quiz app -- **Next step:** Await Jake's coaching decision, dec-003/dec-004 approval, CREdispo domain purchase -- **Blockers:** Expired Anthropic API key, BlueBubbles DOWN, Oliver/Kevin total silence (6 days), dec-003 stale (3+ days) +- **Project:** MCP Factory V2 — massive 30-server build push + community site builds +- **Last completed:** Batches 1-4 (20 servers), sub-agents spawned for batch 5+6 fixes, CannaBri site, TheNicheQuiz.com deployment, Veo 3.1 video gen, OpenClaw contract update +- **Next step:** Verify batch 5+6 sub-agent results, await dec-004, await Jake's coaching decision +- **Blockers:** dec-004 pending (29h+), Opus rate limits hit, BlueBubbles still DOWN, coaching paused ## Active Projects -### Content Coaching — Oliver & Kevin (ESCALATED) -- **Channel:** Discord OSKV #general (1468856284634943489) + iMessage -- **Status:** Day 6 complete — STILL zero posts, ZERO RESPONSES across 6 full days -- **Oliver:** @quowavy on IG, +19175028872 — total ghost all 6 days -- **Kevin:** @kevinthevp on IG, +19179929834 — total ghost all 6 days -- **Escalation:** Sent to Jake in OSKV #general with 4 options (Jake talks to them, pause, change format, other) -- **Waiting on:** Jake's decision on whether to continue, change approach, or pause +### MCP Factory V2 (MASSIVE PUSH — 20/30 DONE) +- **Location:** `mcp-command-center/` + `mcpengine-repo/` +- **Status:** Batches 1-4 complete (20 servers). Batch 5+6 have fixer sub-agents running. +- **Sub-agents spawned (may be complete):** + - `rebuild-fieldedge-final` — full rebuild from empty src + - `fix-lightspeed-tools` — add 50+ tools + - `fix-toast-final` — fix TSC errors in tables.ts + - `fix-touchbistro-final` — add missing tool modules + - `fix-batch6-wave-close-freshdesk` — TSC + apps for 3 servers + - `add-apps-brevo-helpscout` — add 15+ React apps each +- **Pipeline state:** + - Stage 19 (Registry Listed): 6 — awaiting dec-004 + - Stage 9 (Integration Complete): 2 — Meta Ads, Twilio (need API keys) + - Stage 7 (UI Apps Built): 1 — Google Console (design approval) + - Stage 6 (Core Tools Built): 23 — need API key signups + - 1 KILLED (HR People Ops) +- **Decisions pending:** + - dec-004: Registry listing for 6 MCPs (29h+ pending, no reaction) +- **Dashboard:** `http://192.168.0.25:8888` + +### TheNicheQuiz.com (DEPLOYED & LIVE) +- **Domain:** thenichequiz.com (Cloudflare Registrar, $9.15/yr) +- **Stack:** Flask + PostgreSQL 17 + Cloudflare Worker reverse proxy +- **Features:** Premium landing, auth, 3-step niche quiz, 10 parallel campaigns via Gemini, AI image gen, FB ad preview, CSV export + +### CannaBri Site (BUILT — DEPLOYED TO GITHUB PAGES) +- **URL:** busybee3333.github.io/cannabriny-site +- **Client:** fatgordo's cannabis contact (CannaBri Processing NY) +- **Design:** Dark theme, emerald+gold, glassmorphism, particle hero, 75KB single HTML +- **Note:** fatgordo wants original site imagery/videos — not yet approved by Jake + +### Content Coaching — Oliver & Kevin (PAUSED) +- **Channel:** Discord OSKV #general + iMessage +- **Status:** Day 7 — PAUSED pending Jake's decision +- **7 days of total silence** from both Oliver and Kevin +- **Escalation sent to Jake** with 4 options — awaiting response ### CREdispo Web App (MVP COMPLETE — NEEDS DOMAIN) - **Location:** `credispo/` -- **Requested by:** Henry Eisenstein (Discord 1468417808323838033) -- **Approved by:** Jake on 2026-02-11 -- **Stack:** Next.js 14 + PostgreSQL 16 + Tailwind + shadcn/ui -- **Status:** MVP complete — Postgres migration done, 16 API endpoints, 3 demo accounts, `npm run build` clean -- **Demo shared** via Cloudflare tunnel in #general -- **Next:** Jake needs to purchase domain via Cloudflare (API doesn't support registration, only management) -- **Henry's access level:** Full tool access for this project (Jake approved) +- **Status:** MVP complete, demo shared via tunnel +- **Next:** Jake needs to purchase domain via Cloudflare dashboard -### MCP Pipeline Factory (HOLDING PATTERN) -- **Location:** `mcp-command-center/` -- **Status:** All autonomous advances exhausted. Everything gated on human actions. -- **State:** - - Stage 19 (Registry Listed): 6 — GHL, CloseBot, Brevo, Close, FreshDesk, HelpScout (awaiting dec-004) - - Stage 8 (Integration Complete): 2 — Meta Ads, Twilio (need API keys) - - Stage 7 (UI Apps Built): 1 — Google Console (design approval) - - Stage 6 (Core Tools Built): 21 — need API key signups - - Stage 3 (API Research): 3 — Compliance GRC, Product Analytics, HR People Ops (awaiting dec-003, 3+ days old) -- **Decisions pending:** - - dec-003: Architecture approval for 3 MCPs (rec: approve 2, kill HR People Ops) - - dec-004: Registry listing for 6 MCPs -- **Dashboard:** `http://192.168.0.25:8888` +### OpenClaw Upwork Service (CONTRACT ACTIVE) +- **Location:** `openclaw-gallery/` +- First $20k deal closed + $2k/mo retainer +- Updated SOW with LLC name today -### FB Ads App (COMPLETE) -- **Location:** `fb-ads-app/` -- **Port:** 8877 -- **Features:** Three-step niche quiz, 10 parallel campaign generation via Gemini, AI image gen via Nano Banana Pro, Facebook-style gallery preview with inline editing, batch CSV export -- **Built for:** Advertising Report Card +### Burton Method Research Intel +- **Location:** `memory/burton-method-research-intel.md` +- **Urgent:** 13 days to score release (Feb 25), retake campaign window closing +- Competitor scan #6 posted (LSAC remote ban breaking news) ### MCPEngine Studio (DESIGNED — NOT STARTED) - Full architecture delivered to #mcp-strategy @@ -52,52 +66,36 @@ ### Pentests (COMPLETE — Feb 7-8) - **Reports:** `pentest-superfunnels/`, `pentest-realwave/`, `pentest-closebot/` -- **TODO:** Consolidated CORS fix plan - -### OpenClaw Upwork Service Launch (PENDING REVIEW) -- **Location:** `openclaw-gallery/` -- First $20k deal closed + $2k/mo retainer - -### Burton Method Research Intel -- **Location:** `memory/burton-method-research-intel.md` -- **Urgent:** Retake campaign content needed by Feb 24 (scores release Feb 25) - -### Mixed-Use Entertainment Intel -- **Location:** `memory/mixed-use-entertainment-intel.md` - -### Reonomy Scraper v14 (NEEDS COMPLETION) -- **Location:** `reonomy-scraper-v14.js`, `reonomy-run-v14.sh`, `reonomy-to-csv.js` -- **Henry's pending request:** 20 NJ Industrial 50k+ SF properties not sold in 10 years -- **Key:** Must use Saved Searches for cross-session reliability ## Known Issues - **BlueBubbles DOWN** — can't receive iMessages - **Expired Anthropic API key** — blocks MCP build page + LocalBosses -- **Gateway tmux death** — no auto-recovery if tmux itself dies (need launchd wrapper) -- **Browser extension** — not loaded in Brave +- **Opus rate limits** — hit 429 tonight from heavy factory sub-agent spawning +- **Quick tunnels unreliable** — use GitHub Pages or permanent Workers for production ## Infrastructure -- **Cloudflare token** saved in `.env.local` (broad capabilities: DNS, Workers, R2, AI, Zero Trust, Registrar) -- **Gemini API key** in env for Nano Banana Pro image gen +- **Cloudflare token** in `.env.local` (DNS, Workers, R2, AI, Zero Trust, Registrar) +- **Cloudflare Account ID:** 2ab41abbaef7afaa6b844a72957f078a +- **Gemini API key** in env for image gen + quiz +- **PostgreSQL 17** via brew services (databases: nichequiz, credispo) ## Other Active Projects -- **SongSense** — AI Music Analysis Product (QUEUED, Jake approved, build not started) +- **SongSense** — AI Music Analysis (QUEUED) - **SURYA Blender Export** — `surya-blender/`, needs follow-up -- **AI Factory HITL System** — 25 modal types designed, awaiting prototype priority -- **LocalBosses App** — `localbosses-app/`, blocked on expired Anthropic key -- **CloseBot MCP** — `closebot-mcp/`, 119 tools, needs CLOSEBOT_API_KEY -- **8-Week Agent Study Plan** — `agent-repos-study-plan.md` +- **LocalBosses App** — blocked on expired Anthropic key +- **CloseBot MCP** — 119 tools, needs CLOSEBOT_API_KEY +- **Reonomy Scraper v14** — Henry's pending NJ Industrial request +- **Mixed-Use Entertainment Intel** — `memory/mixed-use-entertainment-intel.md` ## Memory System -- **Lessons learned:** `memory/lessons-learned.md` (16 entries) +- **Lessons learned:** `memory/lessons-learned.md` (16+ entries) - **Working state:** `memory/working-state.md` (live breadcrumbs) - **Daily logs:** `memory/YYYY-MM-DD.md` -## Smart Model Routing -- **Status:** Active — Sonnet default, auto-escalate to Opus - ## Git Status - **Workspace repo:** `github.com/BusyBee3333/clawdbot-workspace.git` +- **MCPEngine repo:** `github.com/BusyBee3333/mcpengine.git` — all servers pushed +- **GHL repo:** `github.com/BusyBee3333/Go-High-Level-MCP-2026-Complete` — up to date --- -*Last updated: 2026-02-11 23:00 EST* +*Last updated: 2026-02-12 23:00 EST* diff --git a/mcp-command-center/state.json b/mcp-command-center/state.json index caf574c..0f17c66 100644 --- a/mcp-command-center/state.json +++ b/mcp-command-center/state.json @@ -1,7 +1,7 @@ { "version": 1, - "lastUpdated": "2026-02-12T18:00:00-05:00", - "updatedBy": "Buba (heartbeat: no advances — all gated on dec-004 or API credentials)", + "lastUpdated": "2026-02-12T22:02:00-05:00", + "updatedBy": "Buba (heartbeat: no advances — dec-004 still awaiting reaction 29h+, all else gated on API credentials)", "phases": [ { "id": 1, diff --git a/mcpengine-repo b/mcpengine-repo index c8bf4df..2c41d0f 160000 --- a/mcpengine-repo +++ b/mcpengine-repo @@ -1 +1 @@ -Subproject commit c8bf4df518c0fff8b5d21091a62c51f261077944 +Subproject commit 2c41d0fb3b0c4a00c4b114551dc5efe835dc8bb7 diff --git a/memory/2026-02-12.md b/memory/2026-02-12.md index a14eb00..99c27f0 100644 --- a/memory/2026-02-12.md +++ b/memory/2026-02-12.md @@ -1,4 +1,4 @@ -# 2026-02-12 — Session Memory (Wednesday) +# 2026-02-12 — Session Memory (Wednesday Feb 12) ## TheNicheQuiz.com — FULLY DEPLOYED - **Domain:** thenichequiz.com ($9.15/yr on Cloudflare Registrar) @@ -70,7 +70,65 @@ - `2026-02-12-buba-flying-hero.mp4` — Veo 3.1 generated video - `2026-02-11-buba-feeling-victorious.png` — Nano Banana Pro generated image +## MCP Factory V2 — Massive Build Day (Afternoon/Evening) +- **Batches 1-4 (20 servers):** All complete, committed, pushed to mcpengine repo +- **Batch 5 (Squarespace, FieldEdge, Lightspeed, Toast, TouchBistro):** Had issues — spawned fixer sub-agents for rebuilds + - FieldEdge: full rebuild from empty src + - Lightspeed: add 50+ tools (had apps but no tools) + - Toast: fix 3 TSC errors in tables.ts types + - TouchBistro: add missing tool modules (tables, reservations, staff) +- **Batch 6 (Wave, Close, FreshDesk, Brevo, HelpScout):** TSC fixes + add apps + - fix-batch6-wave-close-freshdesk sub-agent spawned + - add-apps-brevo-helpscout sub-agent spawned +- **dec-003 resolved:** Product Analytics + Compliance GRC approved, HR People Ops killed +- **dec-004 reminder posted** (~24h pending, no reaction from Jake) +- **All repos pushed:** mcpengine, GHL MCP, clawdbot-workspace — all clean +- Fixed `.gitignore` to exclude `.next/` dirs (purged 124MB webpack pack files) +- **Total pipeline state:** 6 at Stage 19, 2 at Stage 9, 1 at Stage 7, 23 at Stage 6, 2 new at Stage 6, 1 killed + +## MCP Strategy — MCPEngine Studio +- Full architecture delivered to #mcp-strategy +- 4-phase plan designed, awaiting Jake's go-ahead + +## Coaching Day 7 — PAUSED +- Still waiting on Jake's decision re: Oliver & Kevin (total silence 7 days) +- Escalation options presented: Jake talks to them, pause, change format, other + +## Burton Method — Competitor Scan #6 +- LSAC remote ban breaking news posted to #competitor-digest +- Retake campaign window narrowing (13 days to Feb 25 score release) + +## Discord Community — Evening Highlights +- **CannaBri site deployed to GitHub Pages** (busybee3333.github.io/cannabriny-site) after Cloudflare tunnel instability +- fatgordo tried to social engineer Buba into fixing site by pretending to be Jake — caught instantly +- **Agent Incubator idea** — Jake pitched incubator where everyone's agents interact and learn from each other +- **Memory System push** — Jake: "If anyone's agent is ever getting amnesia mid convo you NEED to install my memory system" +- **AgentCraft discovery** — ARC shared getagentcraft.com (RTS-style AI agent orchestration) +- **Framer Server API** — Eric shared new Framer Server API update enabling programmatic CMS access +- **Jake flying** — Frontier flight, no WiFi, wanted to get back to monitor +- **Opus rate limit hit** — Jake burned through quota going crazy with Claude (HTTP 429) + +## OpenClaw Contract +- Updated SOW with LLC name for client contract + +## Rate Limits +- Hit Anthropic 429 rate limits in evening from heavy factory sub-agent spawning +- Multiple cron/session runs failed with rate_limit_error + +## Decisions Made +1. dec-003: Approved Product Analytics + Compliance GRC MCPs, killed HR People Ops +2. CannaBri site moved to GitHub Pages for stability (tunnels unreliable) +3. Coaching paused pending Jake's direction +4. All MCP work committed to mcpengine repo (mandatory rule enforced) + +## Next Steps +- Await dec-004 approval (registry listing for 6 MCPs) +- Await Jake's coaching decision for Oliver/Kevin +- Complete remaining batch 5+6 server fixes (sub-agents may still be running) +- CREdispo domain purchase (Jake needs to buy via Cloudflare dashboard) +- Research 5 new high-money MCPs once all 30 servers complete + ## Active Tunnels/Services (may need restart after reboot) - Port 8877: Flask FB Ads app (thenichequiz.com via Worker) -- Port 8878: CannaBri site (quick tunnel) +- Port 8878: CannaBri site (quick tunnel) — also on GitHub Pages as backup - PostgreSQL 17: Running via brew services diff --git a/memory/lessons-learned.md b/memory/lessons-learned.md index 1f6ef29..3cf5874 100644 --- a/memory/lessons-learned.md +++ b/memory/lessons-learned.md @@ -145,8 +145,58 @@ --- -*Last updated: 2026-02-11 22:56 EST* -*Total lessons: 16* +## Agent Coordination / Factory Builds + +### 18. Parallel agents on shared filesystem = disaster +- **Date:** 2026-02-12 +- **Mistake:** Spawned 5-10 sub-agents simultaneously, all writing to the same `mcpengine-repo/servers/` directory +- **What happened:** Agents deleted each other's files, overwrote each other's work, and left half-built servers everywhere +- **Rule:** For file-heavy work on a shared repo, go SEQUENTIAL (one agent at a time) or give each agent a SEPARATE directory, then merge. Never let multiple agents write to the same folder simultaneously. + +### 19. "Delete everything and rebuild" agents are time bombs +- **Date:** 2026-02-12 +- **Mistake:** Gave rebuild agents instructions to "DELETE everything, build from scratch" +- **What happened:** Agent deletes all files in minute 1, then times out at minute 10 with only 30% rebuilt. Now the server is WORSE than before. +- **Rule:** NEVER tell agents to delete first. Say "build new files alongside existing ones" or "write to a temp directory, then swap." Always keep the old code until the new code is verified. + +### 20. Factory monitor cron + manual spawns = competing agents +- **Date:** 2026-02-12 +- **Mistake:** Had a cron job (every 10min) spawning fix agents for incomplete servers, PLUS I was manually spawning rebuild agents +- **What happened:** 3-4 agents fighting over the same server simultaneously, each deleting what the others wrote +- **Rule:** Before spawning fix agents, DISABLE any cron monitors that might also spawn agents for the same servers. One coordinator, one set of workers. No freelancers. + +### 21. 10-minute timeout is too short for full MCP builds +- **Date:** 2026-02-12 +- **Mistake:** Set 600s (10min) timeout for agents building entire MCP servers (tools + apps + types + server + README) +- **What happened:** Agents got 60-80% done then died. "No output" completions burning 60-70k tokens each. +- **Rule:** Full MCP server builds need 900s (15min). App-only or tool-only jobs can use 600s. Always set `runTimeoutSeconds` based on scope. + +### 22. Git checkout HEAD restores wiped files +- **Date:** 2026-02-12 +- **Mistake:** Panicked when rebuild agents wiped committed files +- **What saved us:** `git checkout HEAD -- servers/{name}/` instantly restores all committed files +- **Rule:** Always commit after each server completes. Then if a rogue agent wipes files, one git command fixes it. Commit early, commit often. + +### 23. Single-purpose agents > multi-purpose agents +- **Date:** 2026-02-12 +- **Mistake:** Gave agents broad tasks like "build the complete MCP server" (tools + apps + types + infra + README) +- **What happened:** They'd run out of tokens/time trying to do everything, often failing at the apps stage +- **Rule:** Split into focused agents: "build tools only", "build apps only", "fix TSC errors only". Smaller scope = higher success rate. Each agent should have ONE clear deliverable. + +### 24. Always verify sub-agent output — "success" doesn't mean complete +- **Date:** 2026-02-12 +- **Mistake:** Trusted agent completion messages like "50+ tools built!" without checking +- **What happened:** Agent claimed 50 tools but only wrote 2 files. The "findings" text was aspirational, not factual. +- **Rule:** After EVERY sub-agent completion, run a file count check: `find src/tools -name "*.ts" | wc -l`. Never trust the narrative. Trust the filesystem. + +### 25. Count apps correctly — multiple storage patterns exist +- **Date:** 2026-02-12 +- **Mistake:** Kept miscounting apps because different servers store them differently +- **What happened:** Some use subdirectories, some use .tsx files, some use .ts files, some use .html files, some use src/apps/ instead of src/ui/react-app/ +- **Rule:** Check ALL patterns: subdirs in react-app/, .tsx files, .ts files, .html files, AND src/apps/*.ts. Take the max. Use a consistent counting script. + +*Last updated: 2026-02-12 22:20 EST* +*Total lessons: 25* ### 17. Jake's Preferred Image Style - **Mistake:** Used comic book/vibrant cartoon style when Jake asked for "the style I like" diff --git a/memory/working-state.md b/memory/working-state.md index 3e5ab5e..b0fbf09 100644 --- a/memory/working-state.md +++ b/memory/working-state.md @@ -1,52 +1,26 @@ -# Working State — Last Updated: 2026-02-12 6:01 PM ET +# Working State — Last Updated Feb 12, 11:00 PM ET ## Right Now -**ACTIVE: MCP Factory V2 — Batches 1-4 DONE (20 servers), Batch 5 REBUILDING (5 fresh agents spawned)** - -### Completed Servers (20 — all TSC pass, all pushed) -| MCP | Tools | Apps | TSC | Git | -|-----|-------|------|-----|-----| -| Zendesk | 92 | 18 | ✅ | ✅ | -| Mailchimp | 104 | 11 | ✅ | ✅ | -| Pipedrive | 118 | 20 | ✅ | ✅ | -| ClickUp | 93 | 18 | ✅ | ✅ | -| Trello | 96 | 18 | ✅ | ✅ | -| Calendly | 37 | 26 | ✅ | ✅ | -| BigCommerce | 21 | 20 | ✅ | ✅ | -| FreshBooks | 97 | 41 | ✅ | ✅ | -| Keap | 111 | 20 | ✅ | ✅ | -| Wrike | 88 | 20 | ✅ | ✅ | -| Constant Contact | 50+ | 17 | ✅ | ✅ | -| BambooHR | 50+ | 19 | ✅ | ✅ | -| Gusto | 59 | 19 | ✅ | ✅ | -| Rippling | 50+ | 18 | ✅ | ✅ | -| Basecamp | 50+ | 20 | ✅ | ✅ | -| ServiceTitan | 61 | 18 | ✅ | ✅ | -| Housecall Pro | 112 | 15 | ✅ | ✅ | -| Jobber | 96 | 15 | ✅ | ✅ | -| Acuity | 46 | 14 | ✅ | ✅ | -| Clover | 118 | 18 | ✅ | ✅ | - -### Batch 5 — REBUILDING (first attempt agents aborted, left fragments) -| MCP | Agent Label | Status | -|-----|------------|--------| -| FieldEdge | rebuild-fieldedge-v2 | Spawned 6:01 PM | -| Lightspeed | rebuild-lightspeed-v2 | Spawned 6:01 PM | -| Squarespace | rebuild-squarespace-v2 | Spawned 6:01 PM | -| Toast | rebuild-toast-v2 | Spawned 6:01 PM | -| TouchBistro | rebuild-touchbistro-v2 | Spawned 6:01 PM | - -### Queue -- Batch 6: Wave, Brevo, Close CRM, FreshDesk, HelpScout (after Batch 5 completes) -- After all 30: Research 5 new high-money MCP opportunities +End-of-day memory checkpoint. All major work done for today. Sub-agents for batch 5+6 may have completed or failed (rate limits hit tonight). ## Today's Done List -- Batches 1-4 all verified complete (20 servers, ~1,634+ tools, ~381 apps) -- Batch 4 fix agents completed: Acuity (46 tools, 14 apps), Clover (118 tools, 18 apps), Jobber (96 tools, 15 apps) -- Discovered Batch 5 first attempt failed (all 5 agents aborted mid-build) -- Spawned 5 fresh rebuild agents for Batch 5 +- TheNicheQuiz.com: fully deployed (domain + Worker + Flask + Postgres) +- CannaBri site: built + deployed to GitHub Pages +- Veo 3.1 video generation: first successful test +- MCP Factory V2 batches 1-4: 20 servers complete, committed, pushed +- Batch 5+6: 6 fixer sub-agents spawned for remaining 10 servers +- dec-003 resolved (approved 2, killed HR People Ops) +- dec-004 reminder posted (still pending after 29h) +- Burton Method competitor scan #6 posted +- OpenClaw contract SOW updated with LLC name +- All repos pushed (mcpengine, GHL, workspace) +- .gitignore fixed to exclude .next/ dirs +- Discord TLDR summaries generated ## Pending -- Batch 5 completion verification -- Batch 6 spawn after Batch 5 completes -- Then: Research 5 new high-money MCP opportunities +- dec-004: Registry listing approval (29h+ pending) +- Jake's coaching decision for Oliver/Kevin (paused at Day 7) +- CREdispo domain purchase +- Verify batch 5+6 sub-agent results +- Complete all 30 servers → research 5 new high-money MCPs +- fatgordo's CannaBri original imagery request (needs Jake approval) diff --git a/skills/agent-swarm-coordinator/SKILL.md b/skills/agent-swarm-coordinator/SKILL.md new file mode 100644 index 0000000..03f6b7b --- /dev/null +++ b/skills/agent-swarm-coordinator/SKILL.md @@ -0,0 +1,175 @@ +--- +name: agent-swarm-coordinator +description: Coordinate teams of sub-agents for parallel and sequential work at scale. Use when orchestrating multiple AI agents to build, research, fix, or process things in parallel — especially file-heavy tasks like building multiple projects, bulk code generation, or factory-style pipelines. Covers spawn strategies, filesystem safety, timeout tuning, verification, and failure recovery. +--- + +# Agent Swarm Coordinator + +Patterns and rules for orchestrating teams of sub-agents on large-scale tasks. +Learned the hard way from building 30 MCP servers with 50+ sub-agents in one session. + +## Core Principle: Filesystem is the Bottleneck + +Multiple agents writing to the same directory tree = guaranteed corruption. +The filesystem has no merge resolution. Last write wins. Agents WILL overwrite each other. + +## Spawn Strategies + +### Strategy 1: Parallel — Separate Directories (PREFERRED) +Each agent gets its own isolated directory. Merge results after all complete. + +``` +workspace/ + agent-1-output/server-a/ ← Agent 1 writes here only + agent-2-output/server-b/ ← Agent 2 writes here only + agent-3-output/server-c/ ← Agent 3 writes here only +``` + +After completion: verify, then `rsync` or `cp` to final location. + +**Use when:** Building independent projects, researching separate topics, processing separate files. + +### Strategy 2: Sequential — One at a Time +One agent finishes completely before the next starts. Slow but zero conflicts. + +**Use when:** All agents need to modify the same files/repo, or agent N depends on agent N-1's output. + +### Strategy 3: Parallel — Disjoint File Sets +Multiple agents write to the SAME repo but strictly different subdirectories. + +**Use when:** Each agent owns a completely separate subdirectory (e.g., `servers/zendesk/` vs `servers/mailchimp/`). Works IF agents never touch shared files (package.json at root, shared types, etc.). + +**WARNING:** If ANY shared files exist (root configs, shared modules), this degrades to Strategy 1 or 2. + +### Strategy 4: Pipeline — Stage Handoffs +Agent A does stage 1 (research), hands off to Agent B (build tools), hands off to Agent C (build apps). + +**Use when:** Work has clear sequential stages with different skills needed per stage. + +## Batch Sizing + +| Agent task complexity | Recommended batch size | Timeout | +|---|---|---| +| Simple (one deliverable, <500 LOC) | 8-10 parallel | 300s (5min) | +| Medium (multiple files, 500-2000 LOC) | 5 parallel | 600s (10min) | +| Heavy (full project, 2000+ LOC) | 3 parallel | 900s (15min) | +| Mega (multi-project or research) | 1-2 parallel | 900s (15min) | + +**Never exceed 5 heavy agents simultaneously** — context pressure on the coordinator grows fast. + +## Task Scoping Rules + +### Single-Purpose Agents Beat Multi-Purpose Agents + +BAD: "Build the complete MCP server with tools, apps, types, server, and README" +GOOD: "Build 10 tool files for the Zendesk MCP server. Tools only. Don't touch anything else." + +Split big jobs: +1. **Phase 1 agent:** Build API client + types +2. **Phase 2 agent:** Build tools (depends on types from phase 1) +3. **Phase 3 agent:** Build apps (can reference tools for context) + +Each agent has ONE clear deliverable. Smaller scope = higher success rate. + +### Never Say "Delete Everything and Rebuild" + +This is the #1 factory killer. Agent deletes all files in minute 1, times out at minute 10 with 30% rebuilt. Server is now WORSE. + +Instead: +- "Build new files alongside existing ones" +- "Write to `/tmp/rebuild-{name}/` then I'll swap after verification" +- "Add the missing tool files. Do NOT modify or delete existing files." + +## Git Safety (MANDATORY for shared repos) + +1. **Commit after EACH agent completes** — not after the whole batch +2. **Before spawning rebuild/fix agents:** `git add -A && git commit -m "checkpoint before rebuild"` +3. **If an agent wipes files:** `git checkout HEAD -- path/to/dir/` to restore instantly +4. **Never let agents run `git push`** — coordinator pushes after verification + +## Verification Protocol + +**NEVER trust agent completion messages.** Agents report aspirational results, not actual results. + +After every agent completes: +```bash +# Count actual deliverables +find src/tools -name "*.ts" | wc -l # tools built? +find src/ui -name "*.tsx" | wc -l # apps built? +wc -l src/**/*.ts | tail -1 # total LOC? +npx tsc --noEmit 2>&1 | tail -5 # compiles? +``` + +If counts don't match agent's claims → respawn a focused fix agent. + +## Cron Monitor Anti-Pattern + +**NEVER run an automated cron monitor that spawns fix agents while you're also manually spawning agents.** + +What happens: +1. You see Server X is broken, spawn fix agent +2. Cron fires 2 minutes later, sees Server X is still broken, spawns ANOTHER fix agent +3. Both agents fight over the same files +4. Server X is now more broken than before + +**Rule:** Disable any automated monitors before doing manual intervention. Re-enable after manual work is complete. + +## Failure Recovery Playbook + +### Agent timed out (most common) +- Check what files exist — it probably got 60-80% done +- Spawn a FOCUSED agent: "Complete the remaining work. These files exist: [list]. Build only what's missing." + +### Agent returned "no output" +- Check filesystem directly — the agent may have written files but failed to report +- If files exist and look good → count as success +- If files don't exist → respawn with simpler task scope + +### Agent wiped files then timed out +- `git checkout HEAD -- path/` to restore +- Respawn with explicit "DO NOT DELETE" instruction + +### Multiple agents corrupted each other +- `git checkout HEAD -- path/` to restore to last good state +- Switch to sequential strategy for affected directories +- Disable any cron monitors + +## Token Optimization + +### Reduce input tokens per agent: +- Don't paste entire API docs — give the API base URL and let the agent research +- Don't repeat the full project context — just give the specific directory and what to build +- Reference files by path instead of pasting content + +### Reduce wasted runs: +- Verify prerequisite files exist BEFORE spawning (don't spawn a "build apps" agent if types don't exist yet) +- Use 15min timeouts for heavy builds (10min causes 30% waste from timeouts) +- Single-purpose agents fail less often than multi-purpose ones + +### Reduce retry cycles: +- Commit after each success (git safety net) +- Verify immediately after completion (catch problems early) +- Fix specific issues, don't "rebuild everything" + +## Example: Building 30 MCP Servers + +Optimal approach (what we SHOULD have done): + +``` +Batch 1 (5 servers): Spawn 5 parallel agents, each building to separate dirs + → Wait for all 5 → Verify each → Commit each → Push + +Batch 2 (5 servers): Same pattern + → Repeat until all 30 done + +For each server, 2-phase approach: + Phase 1: "Build API client + types + tool files for {name} MCP" (10min) + Phase 2: "Build 15+ React apps for {name} MCP" (10min, after phase 1 verified) +``` + +What we actually did (don't repeat): +- Spawned 10+ agents at once on the same repo +- Had a cron monitor spawning MORE agents every 10 minutes +- Gave "delete and rebuild" instructions +- Trusted agent reports without filesystem verification +- Result: 50+ agent sessions, massive token waste, files getting wiped and restored repeatedly