Jake Shore 35728a682c Daily backup: 2026-02-12 — MCP Factory V2 (20/30 servers), TheNicheQuiz.com deployed, CannaBri site, Veo 3.1 video gen

2026-02-12 23:01:47 -05:00

7.2 KiB

Raw Permalink Blame History

name	description
agent-swarm-coordinator	Coordinate teams of sub-agents for parallel and sequential work at scale. Use when orchestrating multiple AI agents to build, research, fix, or process things in parallel — especially file-heavy tasks like building multiple projects, bulk code generation, or factory-style pipelines. Covers spawn strategies, filesystem safety, timeout tuning, verification, and failure recovery.

Agent Swarm Coordinator

Patterns and rules for orchestrating teams of sub-agents on large-scale tasks. Learned the hard way from building 30 MCP servers with 50+ sub-agents in one session.

Core Principle: Filesystem is the Bottleneck

Multiple agents writing to the same directory tree = guaranteed corruption. The filesystem has no merge resolution. Last write wins. Agents WILL overwrite each other.

Spawn Strategies

Strategy 1: Parallel — Separate Directories (PREFERRED)

Each agent gets its own isolated directory. Merge results after all complete.

workspace/
  agent-1-output/server-a/   ← Agent 1 writes here only
  agent-2-output/server-b/   ← Agent 2 writes here only
  agent-3-output/server-c/   ← Agent 3 writes here only

After completion: verify, then rsync or cp to final location.

Use when: Building independent projects, researching separate topics, processing separate files.

Strategy 2: Sequential — One at a Time

One agent finishes completely before the next starts. Slow but zero conflicts.

Use when: All agents need to modify the same files/repo, or agent N depends on agent N-1's output.

Strategy 3: Parallel — Disjoint File Sets

Multiple agents write to the SAME repo but strictly different subdirectories.

Use when: Each agent owns a completely separate subdirectory (e.g., servers/zendesk/ vs servers/mailchimp/). Works IF agents never touch shared files (package.json at root, shared types, etc.).

WARNING: If ANY shared files exist (root configs, shared modules), this degrades to Strategy 1 or 2.

Strategy 4: Pipeline — Stage Handoffs

Agent A does stage 1 (research), hands off to Agent B (build tools), hands off to Agent C (build apps).

Use when: Work has clear sequential stages with different skills needed per stage.

Batch Sizing

Agent task complexity	Recommended batch size	Timeout
Simple (one deliverable, <500 LOC)	8-10 parallel	300s (5min)
Medium (multiple files, 500-2000 LOC)	5 parallel	600s (10min)
Heavy (full project, 2000+ LOC)	3 parallel	900s (15min)
Mega (multi-project or research)	1-2 parallel	900s (15min)

Never exceed 5 heavy agents simultaneously — context pressure on the coordinator grows fast.

Task Scoping Rules

Single-Purpose Agents Beat Multi-Purpose Agents

BAD: "Build the complete MCP server with tools, apps, types, server, and README" GOOD: "Build 10 tool files for the Zendesk MCP server. Tools only. Don't touch anything else."

Split big jobs:

Phase 1 agent: Build API client + types
Phase 2 agent: Build tools (depends on types from phase 1)
Phase 3 agent: Build apps (can reference tools for context)

Each agent has ONE clear deliverable. Smaller scope = higher success rate.

Never Say "Delete Everything and Rebuild"

This is the #1 factory killer. Agent deletes all files in minute 1, times out at minute 10 with 30% rebuilt. Server is now WORSE.

Instead:

"Build new files alongside existing ones"
"Write to /tmp/rebuild-{name}/ then I'll swap after verification"
"Add the missing tool files. Do NOT modify or delete existing files."

Git Safety (MANDATORY for shared repos)

Commit after EACH agent completes — not after the whole batch
Before spawning rebuild/fix agents: git add -A && git commit -m "checkpoint before rebuild"
If an agent wipes files: git checkout HEAD -- path/to/dir/ to restore instantly
Never let agents run git push — coordinator pushes after verification

Verification Protocol

NEVER trust agent completion messages. Agents report aspirational results, not actual results.

After every agent completes:

# Count actual deliverables
find src/tools -name "*.ts" | wc -l    # tools built?
find src/ui -name "*.tsx" | wc -l       # apps built?
wc -l src/**/*.ts | tail -1             # total LOC?
npx tsc --noEmit 2>&1 | tail -5        # compiles?

If counts don't match agent's claims → respawn a focused fix agent.

Cron Monitor Anti-Pattern

NEVER run an automated cron monitor that spawns fix agents while you're also manually spawning agents.

What happens:

You see Server X is broken, spawn fix agent
Cron fires 2 minutes later, sees Server X is still broken, spawns ANOTHER fix agent
Both agents fight over the same files
Server X is now more broken than before

Rule: Disable any automated monitors before doing manual intervention. Re-enable after manual work is complete.

Failure Recovery Playbook

Agent timed out (most common)

Check what files exist — it probably got 60-80% done
Spawn a FOCUSED agent: "Complete the remaining work. These files exist: [list]. Build only what's missing."

Agent returned "no output"

Check filesystem directly — the agent may have written files but failed to report
If files exist and look good → count as success
If files don't exist → respawn with simpler task scope

Agent wiped files then timed out

git checkout HEAD -- path/ to restore
Respawn with explicit "DO NOT DELETE" instruction

Multiple agents corrupted each other

git checkout HEAD -- path/ to restore to last good state
Switch to sequential strategy for affected directories
Disable any cron monitors

Token Optimization

Reduce input tokens per agent:

Don't paste entire API docs — give the API base URL and let the agent research
Don't repeat the full project context — just give the specific directory and what to build
Reference files by path instead of pasting content

Reduce wasted runs:

Verify prerequisite files exist BEFORE spawning (don't spawn a "build apps" agent if types don't exist yet)
Use 15min timeouts for heavy builds (10min causes 30% waste from timeouts)
Single-purpose agents fail less often than multi-purpose ones

Reduce retry cycles:

Commit after each success (git safety net)
Verify immediately after completion (catch problems early)
Fix specific issues, don't "rebuild everything"

Example: Building 30 MCP Servers

Optimal approach (what we SHOULD have done):

Batch 1 (5 servers): Spawn 5 parallel agents, each building to separate dirs
  → Wait for all 5 → Verify each → Commit each → Push
  
Batch 2 (5 servers): Same pattern
  → Repeat until all 30 done

For each server, 2-phase approach:
  Phase 1: "Build API client + types + tool files for {name} MCP"  (10min)
  Phase 2: "Build 15+ React apps for {name} MCP" (10min, after phase 1 verified)

What we actually did (don't repeat):

Spawned 10+ agents at once on the same repo
Had a cron monitor spawning MORE agents every 10 minutes
Gave "delete and rebuild" instructions
Trusted agent reports without filesystem verification
Result: 50+ agent sessions, massive token waste, files getting wiped and restored repeatedly

7.2 KiB Raw Permalink Blame History