clawdbot-workspace/MCP-FACTORY.md
2026-02-04 23:01:37 -05:00

22 KiB

MCP Factory — Production Pipeline

The systematic process for turning any API into a fully tested, production-ready MCP experience inside LocalBosses.


The Problem

We've been building MCP servers ad-hoc: grab an API, bang out tools, create some apps, throw them in LocalBosses, move on. Result: 30+ servers that compile but have never been tested against live APIs, apps that may not render, tool descriptions that might not trigger correctly via natural language.

The Pipeline

API Docs → Analyze → Build → Design → Integrate → Test → Ship
             P1        P2      P3        P4         P5     P6

6 phases. Agents 2 (Build) and 3 (Design) run in parallel. QA findings route back to Builder/Designer for fixes before Ship.

Every phase has:

  • Clear inputs (what you need to start)
  • Clear outputs (what you produce)
  • Quality gate (what must pass before moving on)
  • Dedicated skill (documented, repeatable instructions)
  • Agent capability (can be run by a sub-agent)

Phase 1: Analyze (API Discovery & Analysis)

Skill: mcp-api-analyzer Input: API documentation URL(s), OpenAPI spec (if available), user guides, public marketing copy Output: {service}-api-analysis.md

What the analysis produces:

  1. Service Overview — What the product does, who it's for, pricing tiers
  2. Auth Method — OAuth2 / API key / JWT / session — with exact flow
  3. Endpoint Catalog — Every endpoint grouped by domain
  4. Tool Groups — Logical groupings for lazy loading (aim for 5-15 groups)
  5. Tool Inventory — Each tool with:
    • Name (snake_case, descriptive)
    • Description (optimized for LLM routing — what it does, when to use it)
    • Required vs optional params
    • Read-only / destructive / idempotent annotations
  6. App Candidates — Which endpoints/features deserve visual UI:
    • Dashboard views (aggregate data, KPIs)
    • List/Grid views (searchable collections)
    • Detail views (single entity deep-dive)
    • Forms (create/edit workflows)
    • Specialized views (calendars, timelines, funnels, maps)
  7. Rate Limits & Quirks — API-specific gotchas

Quality Gate:

  • Every endpoint is cataloged
  • Tool groups are balanced (no group with 50+ tools)
  • Tool descriptions are LLM-friendly (action-oriented, include "when to use")
  • App candidates have clear data sources (which tools feed them)
  • Auth flow is documented with example

Phase 2: Build (MCP Server)

Skill: mcp-server-builder (updated from existing mcp-server-development) Input: {service}-api-analysis.md Output: Complete MCP server in {service}-mcp/

Server structure:

{service}-mcp/
├── src/
│   ├── index.ts              # Server entry, transport, lazy loading
│   ├── client.ts             # API client (auth, request, error handling)
│   ├── tools/
│   │   ├── index.ts          # Tool registry + lazy loader
│   │   ├── {group1}.ts       # Tool group module
│   │   ├── {group2}.ts       # ...
│   │   └── ...
│   └── types.ts              # Shared TypeScript types
├── dist/                     # Compiled output
├── package.json
├── tsconfig.json
├── .env.example
└── README.md

Must-haves (Feb 2026 standard):

  • MCP SDK ^1.26.0 (security fix: GHSA-345p-7cg4-v4c7 in v1.26.0). Pin to v1.x — SDK v2 is pre-alpha, stable expected Q1 2026
  • Lazy loading — tool groups load on first use, not at startup
  • MCP Annotations on every tool:
    • readOnlyHint (true for GET operations)
    • destructiveHint (true for DELETE operations)
    • idempotentHint (true for PUT/upsert operations)
    • openWorldHint (false for most API tools)
  • Zod validation on all tool inputs
  • Structured error handling — never crash, always return useful error messages
  • Rate limit awareness — respect API limits, add retry logic
  • Pagination support — tools that list things must handle pagination
  • Environment variables — all secrets via env, never hardcoded
  • TypeScript strict mode — no any, proper types throughout

Quality Gate:

  • npm run build succeeds (tsc compiles clean)
  • Every tool has MCP annotations
  • Every tool has Zod input validation
  • .env.example lists all required env vars
  • README documents setup + tool list

Phase 3: Design (MCP Apps)

Skill: mcp-app-designer Input: {service}-api-analysis.md (app candidates section), server tool definitions Output: HTML app files in {service}-mcp/app-ui/ or {service}-mcp/ui/

App types and when to use them:

Type When Example
Dashboard Aggregate KPIs, overview CRM Dashboard, Ad Performance
Data Grid Searchable/filterable lists Contact List, Order History
Detail Card Single entity deep-dive Contact Card, Invoice Preview
Form/Wizard Create or edit flows Campaign Builder, Appointment Booker
Timeline Chronological events Activity Feed, Audit Log
Funnel/Flow Stage-based progression Pipeline Board, Sales Funnel
Calendar Date-based data Appointment Calendar, Schedule View
Analytics Charts and visualizations Revenue Chart, Traffic Graph

App architecture (single-file HTML):

<!DOCTYPE html>
<html>
<head>
  <style>
    /* Dark theme matching LocalBosses (#1a1d23 bg, #ff6d5a accent) */
    /* Responsive — works at 280px-800px width */
    /* No external dependencies */
  </style>
</head>
<body>
  <div id="app"><!-- Loading state --></div>
  <script>
    // 1. Receive data via postMessage
    window.addEventListener('message', (event) => {
      const data = event.data;
      if (data.type === 'mcp_app_data') render(data.data);
      // Also handle workflow_ops type for workflow apps
    });

    // 2. Also fetch from polling endpoint as fallback
    async function pollForData() {
      try {
        const res = await fetch('/api/app-data?app=APP_ID');
        if (res.ok) { const data = await res.json(); render(data); }
      } catch {}
    }

    // 3. Render function with proper empty/error/loading states
    function render(data) {
      if (!data || Object.keys(data).length === 0) {
        showEmptyState(); return;
      }
      // ... actual rendering
    }

    // Auto-poll on load
    pollForData();
    setInterval(pollForData, 3000);
  </script>
</body>
</html>

Design rules:

  • Dark theme only#1a1d23 background, #2b2d31 cards, #ff6d5a accent, #dcddde text
  • Responsive — must work from 280px to 800px width
  • Self-contained — zero external dependencies, no CDN links
  • Three states — loading skeleton, empty state, data state
  • Compact — no wasted space, dense but readable
  • Interactive — hover effects, click handlers where appropriate
  • Data-driven — renders whatever data it receives, graceful with missing fields

Quality Gate:

  • Every app renders with sample data (no blank screens)
  • Every app has loading, empty, and error states
  • Dark theme is consistent with LocalBosses
  • Works at 280px width (thread panel minimum)
  • No external dependencies or CDN links

Phase 4: Integrate (LocalBosses)

Skill: mcp-localbosses-integrator Input: Built MCP server + apps Output: Fully wired LocalBosses channel

Files to update:

  1. src/lib/channels.ts — Add channel definition:

    {
      id: "channel-name",
      name: "Channel Name",
      icon: "🔥",
      category: "BUSINESS OPS",  // or MARKETING, TOOLS, SYSTEM
      description: "What this channel does",
      systemPrompt: `...`, // Must include tool descriptions + when to use them
      defaultApp: "app-id",  // Optional: auto-open app
      mcpApps: ["app-id-1", "app-id-2", ...],
    }
    
  2. src/lib/appNames.ts — Add display names:

    "app-id": { name: "App Name", icon: "📊" },
    
  3. src/lib/app-intakes.ts — Add intake questions:

    "app-id": {
      question: "What would you like to see?",
      category: "data-view",
      skipLabel: "Show dashboard",
    },
    
  4. src/app/api/mcp-apps/route.ts — Add app routing:

    // In APP_NAME_MAP:
    "app-id": "filename-without-html",
    // In APP_DIRS (if in a different location):
    path.join(process.cwd(), "path/to/app-ui"),
    
  5. src/app/api/chat/route.ts — Add tool routing:

    • System prompt must know about the tools
    • Tool results should include <!--APP_DATA:{...}:END_APP_DATA--> blocks
    • Or <!--WORKFLOW_JSON:{...}:END_WORKFLOW--> for workflow-type apps

System prompt engineering:

The channel system prompt is CRITICAL. It must:

  • Describe the tools available in natural language
  • Specify when to use each tool (not just what they do)
  • Include the hidden data block format so the AI returns structured data to apps
  • Set the tone and expertise level

Quality Gate:

  • Channel appears in sidebar under correct category
  • All apps appear in toolbar
  • Default app auto-opens on channel entry (if configured)
  • System prompt mentions all available tools
  • Intake questions are clear and actionable

Phase 5: Test (QA & Validation)

Skill: mcp-qa-tester Input: Integrated LocalBosses channel Output: Test report + fixes

Testing layers:

Layer 1: Static Analysis

  • TypeScript compiles clean (tsc --noEmit)
  • No any types in tool handlers
  • All apps are valid HTML (no unclosed tags, no script errors)
  • All routes resolve (no 404s for app files)

Layer 2: Visual Testing (Peekaboo + Gemini)

# Capture the rendered app
peekaboo capture --app "Safari" --format png --output /tmp/test-{app}.png

# Or use browser tool to screenshot
# browser → screenshot → analyze with Gemini

# Gemini multimodal analysis
gemini "Analyze this screenshot of an MCP app. Check:
1. Does it render correctly (no blank screen, no broken layout)?
2. Is the dark theme consistent (#1a1d23 bg, #ff6d5a accent)?
3. Are there proper loading/empty states?
4. Is it responsive-friendly?
5. Any visual bugs?" -f /tmp/test-{app}.png

Layer 3: Functional Testing

  • Tool invocation: Send natural language messages, verify correct tool is triggered
  • Data flow: Send a message → verify AI returns APP_DATA block → verify app receives data
  • Thread lifecycle: Create thread → interact → close → delete → verify cleanup
  • Cross-channel: Open app from one channel, switch channels, come back — does state persist?

Layer 4: Live API Testing (when credentials available)

  • Authenticate with real API credentials
  • Call each tool with real parameters
  • Verify response shapes match what apps expect
  • Test error cases (invalid IDs, missing permissions, rate limits)

Layer 5: Integration Testing

  • Full flow: user sends message → AI responds → app renders → user interacts in thread
  • Test with 2-3 realistic use cases per channel

Automated test script pattern:

#!/bin/bash
# MCP QA Test Runner
SERVICE="$1"
RESULTS="/tmp/mcp-qa-${SERVICE}.md"

echo "# QA Report: ${SERVICE}" > "$RESULTS"
echo "Date: $(date)" >> "$RESULTS"

# Static checks
echo "## Static Analysis" >> "$RESULTS"
cd "${SERVICE}-mcp"
npm run build 2>&1 | tail -5 >> "$RESULTS"

# App file checks
echo "## App Files" >> "$RESULTS"
for f in app-ui/*.html ui/dist/*.html; do
  [ -f "$f" ] && echo "✅ $f ($(wc -c < "$f") bytes)" >> "$RESULTS"
done

# Route mapping check
echo "## Route Mapping" >> "$RESULTS"
# ... verify APP_NAME_MAP entries exist

Quality Gate:

  • All static analysis passes
  • Every app renders visually (verified by screenshot)
  • At least 3 NL messages trigger correct tools
  • Thread create/interact/delete cycle works
  • No console errors in browser dev tools

QA → Fix Feedback Loop

QA findings don't just get logged — they route back to the responsible agent for fixes:

Finding Type Routes To Fix Cycle
Tool description misrouting Agent 1 (Analyst) — update analysis doc, then Agent 2 rebuilds Re-run QA Layer 3 after fix
Server crash / protocol error Agent 2 (Builder) — fix server code Re-run QA Layers 0-1
App visual bug / accessibility Agent 3 (Designer) — fix HTML app Re-run QA Layers 2-2.5
Integration wiring issue Agent 4 (Integrator) — fix channel config Re-run QA Layers 3, 5
APP_DATA shape mismatch Agent 3 + Agent 4 — align app expectations with system prompt Re-run QA Layer 3 + 5

Rule: No server ships with any P0 QA failures. P1 warnings are documented. The fix cycle repeats until QA passes.


Phase 6: Ship (Documentation & Deployment)

Skill: Part of each phase (not separate)

Per-server README must include:

  • What the service does
  • Setup instructions (env vars, API key acquisition)
  • Complete tool list with descriptions
  • App gallery (screenshots or descriptions)
  • Known limitations

Post-Ship: MCP Registry Registration

Register shipped servers in the MCP Registry for discoverability:

  • Server metadata (name, description, icon, capabilities summary)
  • Authentication requirements and setup instructions
  • Tool catalog summary (names + descriptions)
  • Link to README and setup guide

The MCP Registry launched preview Sep 2025 and is heading to GA. Registration makes your servers discoverable by any MCP client.


Post-Ship Lifecycle

Shipping is not the end. APIs change, LLMs update, user patterns evolve.

Monitoring (continuous)

  • APP_DATA parse success rate — target >98%, alert at <95% (see QA Tester Layer 6)
  • Tool correctness sampling — 5% of interactions weekly, LLM-judged
  • User retry rate — if >25%, system prompt needs tuning
  • Thread completion rate — >80% target

API Change Detection (monthly)

  • Check API changelogs for breaking changes, new endpoints, deprecated fields
  • Re-run QA Layer 4 (live API testing) quarterly for active servers
  • Update MSW mocks when API response shapes change

Re-QA Cadence

Trigger Scope Frequency
API version bump Full QA (all layers) On detection
MCP SDK update Layers 0-1 (protocol + static) Monthly
System prompt change Layers 3, 5 (functional + integration) On change
App template update Layers 2-2.5 (visual + accessibility) On change
LLM model upgrade DeepEval tool routing eval On model change
Routine health check Layer 4 (live API) + smoke test Quarterly

MCP Apps Protocol (Adopt Now)

The MCP Apps extension is live as of January 26, 2026. Supported by Claude, ChatGPT, VS Code, and Goose.

Key features:

  • _meta.ui.resourceUri on tools — tools declare which UI to render
  • ui:// resource URIs — server-side HTML/JS served as MCP resources
  • JSON-RPC over postMessage — standardized bidirectional app↔host communication
  • @modelcontextprotocol/ext-apps SDK — App class with ontoolresult, callServerTool

Implication for LocalBosses: The custom <!--APP_DATA:...:END_APP_DATA--> pattern works but is LocalBosses-specific. MCP Apps is the official standard for delivering UI from tools. New servers should adopt MCP Apps. Existing servers should add MCP Apps support alongside the current pattern for backward compatibility.

Migration path:

  1. Add _meta.ui.resourceUri to tool definitions in the server builder
  2. Register app HTML files as ui:// resources in each server
  3. Update app template to use @modelcontextprotocol/ext-apps App class
  4. Maintain backward compat with postMessage/polling for LocalBosses during transition

Operational Notes

Version Control Strategy

All pipeline artifacts should be tracked:

{service}-mcp/
├── .git/                    # Each server is its own repo (or monorepo)
├── src/                     # Server source
├── app-ui/                  # App HTML files
├── test-fixtures/           # Test data (committed)
├── test-baselines/          # Visual regression baselines (committed via LFS for images)
├── test-results/            # Test outputs (gitignored)
└── mcp-factory-reviews/     # QA reports (committed for trending)
  • Branching: main is production. dev for active work. Feature branches for new tool groups.
  • Tagging: Tag each shipped version: v1.0.0-{service}. Tag corresponds to the analysis doc version + build.
  • Monorepo option: For 30+ servers, consider a Turborepo workspace with shared packages (logger, client base class, types).

Capacity Planning (Mac Mini)

Running 30+ MCP servers as stdio processes on a Mac Mini:

Config Capacity Notes
Mac Mini M2 (8GB) ~15 servers Each Node.js process uses 50-80MB RSS at rest
Mac Mini M2 (16GB) ~25 servers Leave 4GB for OS + LocalBosses app
Mac Mini M2 Pro (32GB) ~40 servers Comfortable headroom

Mitigations for constrained memory:

  • Lazy loading (already implemented) — tools only load when called
  • On-demand startup — only start servers that have active channels
  • HTTP transport with shared process — multiple "servers" behind one Node process
  • Containerized with memory limits — docker run --memory=100m per server
  • PM2 with max memory restart — pm2 start index.js --max-memory-restart 150M

Server Prioritization (30 Untested Servers)

For the 30 built-but-untested servers, prioritize by:

Criteria Weight How to Assess
Business value 40% Which services do users ask about most? Check channel requests.
Credential availability 30% Can we get API keys/sandbox access today? No creds = can't do Layer 4.
API stability 20% Is the API mature (v2+) or beta? Stable APIs = fewer re-QA cycles.
App complexity 10% Simple CRUD (fast) vs complex workflows (slow). Start with simple.

Recommended first batch (highest priority): Servers with sandbox APIs + high business value + simple CRUD patterns. Run them through the full pipeline first to validate the process, then tackle complex ones.


Agent Roles

For mass production, these phases map to specialized agents:

Agent 1: API Analyst (mcp-analyst)

  • Input: "Here's the API docs for ServiceX"
  • Does: Reads all docs, produces {service}-api-analysis.md
  • Model: Opus (needs deep reading comprehension)
  • Skills: mcp-api-analyzer

Agent 2: Server Builder (mcp-builder)

  • Input: {service}-api-analysis.md
  • Does: Generates full MCP server with all tools
  • Model: Sonnet (code generation, well-defined patterns)
  • Skills: mcp-server-builder, mcp-server-development

Agent 3: App Designer (mcp-designer)

  • Input: {service}-api-analysis.md + built server
  • Does: Creates all HTML apps
  • Model: Sonnet (HTML/CSS generation)
  • Skills: mcp-app-designer, frontend-design

Agent 4: Integrator (mcp-integrator)

  • Input: Built server + apps
  • Does: Wires into LocalBosses (channels, routing, intakes, system prompts)
  • Model: Sonnet
  • Skills: mcp-localbosses-integrator

Agent 5: QA Tester (mcp-qa)

  • Input: Integrated LocalBosses channel
  • Does: Visual + functional testing, produces test report
  • Model: Opus (multimodal analysis, judgment calls)
  • Skills: mcp-qa-tester
  • Tools: Peekaboo, Gemini, browser screenshots

Orchestration (6 phases with feedback loop):

[You provide API docs]
       │
       ▼
  P1: Agent 1 — Analyst ──→ analysis.md
       │
       ├──→ P2: Agent 2 — Builder ──→ MCP server ──┐
       │                                             │ (parallel)
       └──→ P3: Agent 3 — Designer ──→ HTML apps ──┘
                                                     │
                                                     ▼
                              P4: Agent 4 — Integrator ──→ LocalBosses wired up
                                                     │
                                                     ▼
                              P5: Agent 5 — QA Tester ──→ Test report
                                                     │
                                            ┌────────┴────────┐
                                            │  Findings?       │
                                            │  P0 failures ──→ Route back to
                                            │                  Agent 2/3/4 for fix
                                            │  All clear ──→   │
                                            └────────┬────────┘
                                                     ▼
                              P6: Ship + Registry Registration + Monitoring

Agents 2 and 3 run in parallel since apps only need the analysis doc + tool definitions. QA failures loop back to the responsible agent — no server ships with P0 issues.


Current Inventory (Feb 3, 2026)

Completed (in LocalBosses):

  • n8n (automations channel) — 8 apps
  • GHL CRM (crm channel) — 65 apps
  • Reonomy (reonomy channel) — 3 apps
  • CloseBot (closebot channel) — 6 apps
  • Meta Ads (meta-ads channel) — 11 apps
  • Google Console (google-console channel) — 5 apps
  • Twilio (twilio channel) — 19 apps

Built but untested (30 servers):

Acuity Scheduling, BambooHR, Basecamp, BigCommerce, Brevo, Calendly, ClickUp, Close, Clover, Constant Contact, FieldEdge, FreshBooks, Freshdesk, Gusto, Help Scout, Housecall Pro, Jobber, Keap, Lightspeed, Mailchimp, Pipedrive, Rippling, ServiceTitan, Squarespace, Toast, TouchBistro, Trello, Wave, Wrike, Zendesk

Priority: Test the 30 built servers against live APIs and bring the best ones into LocalBosses.


File Locations

What Where
This document MCP-FACTORY.md
Skills ~/.clawdbot/workspace/skills/mcp-*/
Built servers mcp-diagrams/mcp-servers/{service}/ or {service}-mcp/
LocalBosses app localbosses-app/
GHL apps (65) mcp-diagrams/GoHighLevel-MCP/src/ui/react-app/src/apps/
App routing localbosses-app/src/app/api/mcp-apps/route.ts
Channel config localbosses-app/src/lib/channels.ts