mcpengine/docs/MCP-FACTORY.md

# MCP Factory — Production Pipeline

> The systematic process for turning any API into a fully tested, production-ready MCP experience inside LocalBosses.

---

## The Problem

We've been building MCP servers ad-hoc: grab an API, bang out tools, create some apps, throw them in LocalBosses, move on. Result: 30+ servers that compile but have never been tested against live APIs, apps that may not render, tool descriptions that might not trigger correctly via natural language.

## The Pipeline

```
API Docs → Analyze → Build → Design → Integrate → Test → Ship
             P1        P2      P3        P4         P5     P6
```

> **6 phases.** Agents 2 (Build) and 3 (Design) run in parallel. QA findings route back to Builder/Designer for fixes before Ship.

Every phase has:
- **Clear inputs** (what you need to start)
- **Clear outputs** (what you produce)
- **Quality gate** (what must pass before moving on)
- **Dedicated skill** (documented, repeatable instructions)
- **Agent capability** (can be run by a sub-agent)

---

## Phase 1: Analyze (API Discovery & Analysis)

**Skill:** `mcp-api-analyzer`
**Input:** API documentation URL(s), OpenAPI spec (if available), user guides, public marketing copy
**Output:** `{service}-api-analysis.md`

### What the analysis produces:
1. **Service Overview** — What the product does, who it's for, pricing tiers
2. **Auth Method** — OAuth2 / API key / JWT / session — with exact flow
3. **Endpoint Catalog** — Every endpoint grouped by domain
4. **Tool Groups** — Logical groupings for lazy loading (aim for 5-15 groups)
5. **Tool Inventory** — Each tool with:
   - Name (snake_case, descriptive)
   - Description (optimized for LLM routing — what it does, when to use it)
   - Required vs optional params
   - Read-only / destructive / idempotent annotations
6. **App Candidates** — Which endpoints/features deserve visual UI:
   - Dashboard views (aggregate data, KPIs)
   - List/Grid views (searchable collections)
   - Detail views (single entity deep-dive)
   - Forms (create/edit workflows)
   - Specialized views (calendars, timelines, funnels, maps)
7. **Rate Limits & Quirks** — API-specific gotchas

### Quality Gate:
- [ ] Every endpoint is cataloged
- [ ] Tool groups are balanced (no group with 50+ tools)
- [ ] Tool descriptions are LLM-friendly (action-oriented, include "when to use")
- [ ] App candidates have clear data sources (which tools feed them)
- [ ] Auth flow is documented with example

---

## Phase 2: Build (MCP Server)

**Skill:** `mcp-server-builder` (updated from existing `mcp-server-development`)
**Input:** `{service}-api-analysis.md`
**Output:** Complete MCP server in `{service}-mcp/`

### Server structure:
```
{service}-mcp/
├── src/
│   ├── index.ts              # Server entry, transport, lazy loading
│   ├── client.ts             # API client (auth, request, error handling)
│   ├── tools/
│   │   ├── index.ts          # Tool registry + lazy loader
│   │   ├── {group1}.ts       # Tool group module
│   │   ├── {group2}.ts       # ...
│   │   └── ...
│   └── types.ts              # Shared TypeScript types
├── dist/                     # Compiled output
├── package.json
├── tsconfig.json
├── .env.example
└── README.md
```

### Must-haves (Feb 2026 standard):
- **MCP SDK `^1.26.0`** (security fix: GHSA-345p-7cg4-v4c7 in v1.26.0). Pin to v1.x — SDK v2 is pre-alpha, stable expected Q1 2026
- **Lazy loading** — tool groups load on first use, not at startup
- **MCP Annotations** on every tool:
  - `readOnlyHint` (true for GET operations)
  - `destructiveHint` (true for DELETE operations)
  - `idempotentHint` (true for PUT/upsert operations)
  - `openWorldHint` (false for most API tools)
- **Zod validation** on all tool inputs
- **Structured error handling** — never crash, always return useful error messages
- **Rate limit awareness** — respect API limits, add retry logic
- **Pagination support** — tools that list things must handle pagination
- **Environment variables** — all secrets via env, never hardcoded
- **TypeScript strict mode** — no `any`, proper types throughout

### Quality Gate:
- [ ] `npm run build` succeeds (tsc compiles clean)
- [ ] Every tool has MCP annotations
- [ ] Every tool has Zod input validation
- [ ] .env.example lists all required env vars
- [ ] README documents setup + tool list

---

## Phase 3: Design (MCP Apps)

**Skill:** `mcp-app-designer`
**Input:** `{service}-api-analysis.md` (app candidates section), server tool definitions
**Output:** HTML app files in `{service}-mcp/app-ui/` or `{service}-mcp/ui/`

### App types and when to use them:

| Type | When | Example |
|------|------|---------|
| **Dashboard** | Aggregate KPIs, overview | CRM Dashboard, Ad Performance |
| **Data Grid** | Searchable/filterable lists | Contact List, Order History |
| **Detail Card** | Single entity deep-dive | Contact Card, Invoice Preview |
| **Form/Wizard** | Create or edit flows | Campaign Builder, Appointment Booker |
| **Timeline** | Chronological events | Activity Feed, Audit Log |
| **Funnel/Flow** | Stage-based progression | Pipeline Board, Sales Funnel |
| **Calendar** | Date-based data | Appointment Calendar, Schedule View |
| **Analytics** | Charts and visualizations | Revenue Chart, Traffic Graph |

### App architecture (single-file HTML):
```html
<!DOCTYPE html>
<html>
<head>
  <style>
    /* Dark theme matching LocalBosses (#1a1d23 bg, #ff6d5a accent) */
    /* Responsive — works at 280px-800px width */
    /* No external dependencies */
  </style>
</head>
<body>
  <div id="app"><!-- Loading state --></div>
  <script>
    // 1. Receive data via postMessage
    window.addEventListener('message', (event) => {
      const data = event.data;
      if (data.type === 'mcp_app_data') render(data.data);
      // Also handle workflow_ops type for workflow apps
    });

    // 2. Also fetch from polling endpoint as fallback
    async function pollForData() {
      try {
        const res = await fetch('/api/app-data?app=APP_ID');
        if (res.ok) { const data = await res.json(); render(data); }
      } catch {}
    }

    // 3. Render function with proper empty/error/loading states
    function render(data) {
      if (!data || Object.keys(data).length === 0) {
        showEmptyState(); return;
      }
      // ... actual rendering
    }

    // Auto-poll on load
    pollForData();
    setInterval(pollForData, 3000);
  </script>
</body>
</html>
```

### Design rules:
- **Dark theme only** — `#1a1d23` background, `#2b2d31` cards, `#ff6d5a` accent, `#dcddde` text
- **Responsive** — must work from 280px to 800px width
- **Self-contained** — zero external dependencies, no CDN links
- **Three states** — loading skeleton, empty state, data state
- **Compact** — no wasted space, dense but readable
- **Interactive** — hover effects, click handlers where appropriate
- **Data-driven** — renders whatever data it receives, graceful with missing fields

### Quality Gate:
- [ ] Every app renders with sample data (no blank screens)
- [ ] Every app has loading, empty, and error states
- [ ] Dark theme is consistent with LocalBosses
- [ ] Works at 280px width (thread panel minimum)
- [ ] No external dependencies or CDN links

---

## Phase 4: Integrate (LocalBosses)

**Skill:** `mcp-localbosses-integrator`
**Input:** Built MCP server + apps
**Output:** Fully wired LocalBosses channel

### Files to update:

1. **`src/lib/channels.ts`** — Add channel definition:
   ```typescript
   {
     id: "channel-name",
     name: "Channel Name",
     icon: "🔥",
     category: "BUSINESS OPS",  // or MARKETING, TOOLS, SYSTEM
     description: "What this channel does",
     systemPrompt: `...`, // Must include tool descriptions + when to use them
     defaultApp: "app-id",  // Optional: auto-open app
     mcpApps: ["app-id-1", "app-id-2", ...],
   }
   ```

2. **`src/lib/appNames.ts`** — Add display names:
   ```typescript
   "app-id": { name: "App Name", icon: "📊" },
   ```

3. **`src/lib/app-intakes.ts`** — Add intake questions:
   ```typescript
   "app-id": {
     question: "What would you like to see?",
     category: "data-view",
     skipLabel: "Show dashboard",
   },
   ```

4. **`src/app/api/mcp-apps/route.ts`** — Add app routing:
   ```typescript
   // In APP_NAME_MAP:
   "app-id": "filename-without-html",
   // In APP_DIRS (if in a different location):
   path.join(process.cwd(), "path/to/app-ui"),
   ```

5. **`src/app/api/chat/route.ts`** — Add tool routing:
   - System prompt must know about the tools
   - Tool results should include `<!--APP_DATA:{...}:END_APP_DATA-->` blocks
   - Or `<!--WORKFLOW_JSON:{...}:END_WORKFLOW-->` for workflow-type apps

### System prompt engineering:
The channel system prompt is CRITICAL. It must:
- Describe the tools available in natural language
- Specify when to use each tool (not just what they do)
- Include the hidden data block format so the AI returns structured data to apps
- Set the tone and expertise level

### Quality Gate:
- [ ] Channel appears in sidebar under correct category
- [ ] All apps appear in toolbar
- [ ] Default app auto-opens on channel entry (if configured)
- [ ] System prompt mentions all available tools
- [ ] Intake questions are clear and actionable

---

## Phase 5: Test (QA & Validation)

**Skill:** `mcp-qa-tester`
**Input:** Integrated LocalBosses channel
**Output:** Test report + fixes

### Testing layers:

#### Layer 1: Static Analysis
- TypeScript compiles clean (`tsc --noEmit`)
- No `any` types in tool handlers
- All apps are valid HTML (no unclosed tags, no script errors)
- All routes resolve (no 404s for app files)

#### Layer 2: Visual Testing (Peekaboo + Gemini)
```bash
# Capture the rendered app
peekaboo capture --app "Safari" --format png --output /tmp/test-{app}.png

# Or use browser tool to screenshot
# browser → screenshot → analyze with Gemini

# Gemini multimodal analysis
gemini "Analyze this screenshot of an MCP app. Check:
1. Does it render correctly (no blank screen, no broken layout)?
2. Is the dark theme consistent (#1a1d23 bg, #ff6d5a accent)?
3. Are there proper loading/empty states?
4. Is it responsive-friendly?
5. Any visual bugs?" -f /tmp/test-{app}.png
```

#### Layer 3: Functional Testing
- **Tool invocation:** Send natural language messages, verify correct tool is triggered
- **Data flow:** Send a message → verify AI returns APP_DATA block → verify app receives data
- **Thread lifecycle:** Create thread → interact → close → delete → verify cleanup
- **Cross-channel:** Open app from one channel, switch channels, come back — does state persist?

#### Layer 4: Live API Testing (when credentials available)
- Authenticate with real API credentials
- Call each tool with real parameters
- Verify response shapes match what apps expect
- Test error cases (invalid IDs, missing permissions, rate limits)

#### Layer 5: Integration Testing
- Full flow: user sends message → AI responds → app renders → user interacts in thread
- Test with 2-3 realistic use cases per channel

### Automated test script pattern:
```bash
#!/bin/bash
# MCP QA Test Runner
SERVICE="$1"
RESULTS="/tmp/mcp-qa-${SERVICE}.md"

echo "# QA Report: ${SERVICE}" > "$RESULTS"
echo "Date: $(date)" >> "$RESULTS"

# Static checks
echo "## Static Analysis" >> "$RESULTS"
cd "${SERVICE}-mcp"
npm run build 2>&1 | tail -5 >> "$RESULTS"

# App file checks
echo "## App Files" >> "$RESULTS"
for f in app-ui/*.html ui/dist/*.html; do
  [ -f "$f" ] && echo "✅ $f ($(wc -c < "$f") bytes)" >> "$RESULTS"
done

# Route mapping check
echo "## Route Mapping" >> "$RESULTS"
# ... verify APP_NAME_MAP entries exist
```

### Quality Gate:
- [ ] All static analysis passes
- [ ] Every app renders visually (verified by screenshot)
- [ ] At least 3 NL messages trigger correct tools
- [ ] Thread create/interact/delete cycle works
- [ ] No console errors in browser dev tools

### QA → Fix Feedback Loop

QA findings don't just get logged — they route back to the responsible agent for fixes:

| Finding Type | Routes To | Fix Cycle |
|-------------|-----------|-----------|
| Tool description misrouting | Agent 1 (Analyst) — update analysis doc, then Agent 2 rebuilds | Re-run QA Layer 3 after fix |
| Server crash / protocol error | Agent 2 (Builder) — fix server code | Re-run QA Layers 0-1 |
| App visual bug / accessibility | Agent 3 (Designer) — fix HTML app | Re-run QA Layers 2-2.5 |
| Integration wiring issue | Agent 4 (Integrator) — fix channel config | Re-run QA Layers 3, 5 |
| APP_DATA shape mismatch | Agent 3 + Agent 4 — align app expectations with system prompt | Re-run QA Layer 3 + 5 |

**Rule:** No server ships with any P0 QA failures. P1 warnings are documented. The fix cycle repeats until QA passes.

---

## Phase 6: Ship (Documentation & Deployment)

**Skill:** Part of each phase (not separate)

### Per-server README must include:
- What the service does
- Setup instructions (env vars, API key acquisition)
- Complete tool list with descriptions
- App gallery (screenshots or descriptions)
- Known limitations

### Post-Ship: MCP Registry Registration

Register shipped servers in the [MCP Registry](https://registry.modelcontextprotocol.io) for discoverability:
- Server metadata (name, description, icon, capabilities summary)
- Authentication requirements and setup instructions
- Tool catalog summary (names + descriptions)
- Link to README and setup guide

The MCP Registry launched preview Sep 2025 and is heading to GA. Registration makes your servers discoverable by any MCP client.

---

## Post-Ship Lifecycle

Shipping is not the end. APIs change, LLMs update, user patterns evolve.

### Monitoring (continuous)
- **APP_DATA parse success rate** — target >98%, alert at <95% (see QA Tester Layer 6)
- **Tool correctness sampling** — 5% of interactions weekly, LLM-judged
- **User retry rate** — if >25%, system prompt needs tuning
- **Thread completion rate** — >80% target

### API Change Detection (monthly)
- Check API changelogs for breaking changes, new endpoints, deprecated fields
- Re-run QA Layer 4 (live API testing) quarterly for active servers
- Update MSW mocks when API response shapes change

### Re-QA Cadence
| Trigger | Scope | Frequency |
|---------|-------|-----------|
| API version bump | Full QA (all layers) | On detection |
| MCP SDK update | Layers 0-1 (protocol + static) | Monthly |
| System prompt change | Layers 3, 5 (functional + integration) | On change |
| App template update | Layers 2-2.5 (visual + accessibility) | On change |
| LLM model upgrade | DeepEval tool routing eval | On model change |
| Routine health check | Layer 4 (live API) + smoke test | Quarterly |

---

## MCP Apps Protocol (Adopt Now)

> The MCP Apps extension is **live** as of January 26, 2026. Supported by Claude, ChatGPT, VS Code, and Goose.

Key features:
- **`_meta.ui.resourceUri`** on tools — tools declare which UI to render
- **`ui://` resource URIs** — server-side HTML/JS served as MCP resources
- **JSON-RPC over postMessage** — standardized bidirectional app↔host communication
- **`@modelcontextprotocol/ext-apps`** SDK — App class with `ontoolresult`, `callServerTool`

**Implication for LocalBosses:** The custom `<!--APP_DATA:...:END_APP_DATA-->` pattern works but is LocalBosses-specific. MCP Apps is the official standard for delivering UI from tools. **New servers should adopt MCP Apps. Existing servers should add MCP Apps support alongside the current pattern for backward compatibility.**

Migration path:
1. Add `_meta.ui.resourceUri` to tool definitions in the server builder
2. Register app HTML files as `ui://` resources in each server
3. Update app template to use `@modelcontextprotocol/ext-apps` App class
4. Maintain backward compat with postMessage/polling for LocalBosses during transition

---

## Operational Notes

### Version Control Strategy

All pipeline artifacts should be tracked:

```
{service}-mcp/
├── .git/                    # Each server is its own repo (or monorepo)
├── src/                     # Server source
├── app-ui/                  # App HTML files
├── test-fixtures/           # Test data (committed)
├── test-baselines/          # Visual regression baselines (committed via LFS for images)
├── test-results/            # Test outputs (gitignored)
└── mcp-factory-reviews/     # QA reports (committed for trending)
```

- **Branching:** `main` is production. `dev` for active work. Feature branches for new tool groups.
- **Tagging:** Tag each shipped version: `v1.0.0-{service}`. Tag corresponds to the analysis doc version + build.
- **Monorepo option:** For 30+ servers, consider a Turborepo workspace with shared packages (logger, client base class, types).

### Capacity Planning (Mac Mini)

Running 30+ MCP servers as stdio processes on a Mac Mini:

| Config | Capacity | Notes |
|--------|----------|-------|
| Mac Mini M2 (8GB) | ~15 servers | Each Node.js process uses 50-80MB RSS at rest |
| Mac Mini M2 (16GB) | ~25 servers | Leave 4GB for OS + LocalBosses app |
| Mac Mini M2 Pro (32GB) | ~40 servers | Comfortable headroom |

**Mitigations for constrained memory:**
- Lazy loading (already implemented) — tools only load when called
- On-demand startup — only start servers that have active channels
- HTTP transport with shared process — multiple "servers" behind one Node process
- Containerized with memory limits — `docker run --memory=100m` per server
- PM2 with max memory restart — `pm2 start index.js --max-memory-restart 150M`

### Server Prioritization (30 Untested Servers)

For the 30 built-but-untested servers, prioritize by:

| Criteria | Weight | How to Assess |
|----------|--------|---------------|
| **Business value** | 40% | Which services do users ask about most? Check channel requests. |
| **Credential availability** | 30% | Can we get API keys/sandbox access today? No creds = can't do Layer 4. |
| **API stability** | 20% | Is the API mature (v2+) or beta? Stable APIs = fewer re-QA cycles. |
| **App complexity** | 10% | Simple CRUD (fast) vs complex workflows (slow). Start with simple. |

**Recommended first batch (highest priority):**
Servers with sandbox APIs + high business value + simple CRUD patterns. Run them through the full pipeline first to validate the process, then tackle complex ones.

---

## Agent Roles

For mass production, these phases map to specialized agents:

### Agent 1: API Analyst (`mcp-analyst`)
- **Input:** "Here's the API docs for ServiceX"
- **Does:** Reads all docs, produces `{service}-api-analysis.md`
- **Model:** Opus (needs deep reading comprehension)
- **Skills:** `mcp-api-analyzer`

### Agent 2: Server Builder (`mcp-builder`)
- **Input:** `{service}-api-analysis.md`
- **Does:** Generates full MCP server with all tools
- **Model:** Sonnet (code generation, well-defined patterns)
- **Skills:** `mcp-server-builder`, `mcp-server-development`

### Agent 3: App Designer (`mcp-designer`)
- **Input:** `{service}-api-analysis.md` + built server
- **Does:** Creates all HTML apps
- **Model:** Sonnet (HTML/CSS generation)
- **Skills:** `mcp-app-designer`, `frontend-design`

### Agent 4: Integrator (`mcp-integrator`)
- **Input:** Built server + apps
- **Does:** Wires into LocalBosses (channels, routing, intakes, system prompts)
- **Model:** Sonnet
- **Skills:** `mcp-localbosses-integrator`

### Agent 5: QA Tester (`mcp-qa`)
- **Input:** Integrated LocalBosses channel
- **Does:** Visual + functional testing, produces test report
- **Model:** Opus (multimodal analysis, judgment calls)
- **Skills:** `mcp-qa-tester`
- **Tools:** Peekaboo, Gemini, browser screenshots

### Orchestration (6 phases with feedback loop):
```
[You provide API docs]
       │
       ▼
  P1: Agent 1 — Analyst ──→ analysis.md
       │
       ├──→ P2: Agent 2 — Builder ──→ MCP server ──┐
       │                                             │ (parallel)
       └──→ P3: Agent 3 — Designer ──→ HTML apps ──┘
                                                     │
                                                     ▼
                              P4: Agent 4 — Integrator ──→ LocalBosses wired up
                                                     │
                                                     ▼
                              P5: Agent 5 — QA Tester ──→ Test report
                                                     │
                                            ┌────────┴────────┐
                                            │  Findings?       │
                                            │  P0 failures ──→ Route back to
                                            │                  Agent 2/3/4 for fix
                                            │  All clear ──→   │
                                            └────────┬────────┘
                                                     ▼
                              P6: Ship + Registry Registration + Monitoring
```

Agents 2 and 3 run in parallel since apps only need the analysis doc + tool definitions. QA failures loop back to the responsible agent — no server ships with P0 issues.

---

## Current Inventory (Feb 3, 2026)

### Completed (in LocalBosses):
- n8n (automations channel) — 8 apps
- GHL CRM (crm channel) — 65 apps
- Reonomy (reonomy channel) — 3 apps
- CloseBot (closebot channel) — 6 apps
- Meta Ads (meta-ads channel) — 11 apps
- Google Console (google-console channel) — 5 apps
- Twilio (twilio channel) — 19 apps

### Built but untested (30 servers):
Acuity Scheduling, BambooHR, Basecamp, BigCommerce, Brevo, Calendly, ClickUp, Close, Clover, Constant Contact, FieldEdge, FreshBooks, Freshdesk, Gusto, Help Scout, Housecall Pro, Jobber, Keap, Lightspeed, Mailchimp, Pipedrive, Rippling, ServiceTitan, Squarespace, Toast, TouchBistro, Trello, Wave, Wrike, Zendesk

### Priority: Test the 30 built servers against live APIs and bring the best ones into LocalBosses.

---

## File Locations

| What | Where |
|------|-------|
| This document | `MCP-FACTORY.md` |
| Skills | `~/.clawdbot/workspace/skills/mcp-*/` |
| Built servers | `mcp-diagrams/mcp-servers/{service}/` or `{service}-mcp/` |
| LocalBosses app | `localbosses-app/` |
| GHL apps (65) | `mcp-diagrams/GoHighLevel-MCP/src/ui/react-app/src/apps/` |
| App routing | `localbosses-app/src/app/api/mcp-apps/route.ts` |
| Channel config | `localbosses-app/src/lib/channels.ts` |