Jake Shore a2c95437c1 Daily backup: 2026-02-04

2026-02-04 23:01:37 -05:00

22 KiB

Raw Blame History

MCP Factory — Production Pipeline

The systematic process for turning any API into a fully tested, production-ready MCP experience inside LocalBosses.

The Problem

We've been building MCP servers ad-hoc: grab an API, bang out tools, create some apps, throw them in LocalBosses, move on. Result: 30+ servers that compile but have never been tested against live APIs, apps that may not render, tool descriptions that might not trigger correctly via natural language.

The Pipeline

API Docs → Analyze → Build → Design → Integrate → Test → Ship
             P1        P2      P3        P4         P5     P6

6 phases. Agents 2 (Build) and 3 (Design) run in parallel. QA findings route back to Builder/Designer for fixes before Ship.

Every phase has:

Clear inputs (what you need to start)
Clear outputs (what you produce)
Quality gate (what must pass before moving on)
Dedicated skill (documented, repeatable instructions)
Agent capability (can be run by a sub-agent)

Phase 1: Analyze (API Discovery & Analysis)

Skill: mcp-api-analyzer Input: API documentation URL(s), OpenAPI spec (if available), user guides, public marketing copy Output: {service}-api-analysis.md

What the analysis produces:

Service Overview — What the product does, who it's for, pricing tiers
Auth Method — OAuth2 / API key / JWT / session — with exact flow
Endpoint Catalog — Every endpoint grouped by domain
Tool Groups — Logical groupings for lazy loading (aim for 5-15 groups)
Tool Inventory — Each tool with:
- Name (snake_case, descriptive)
- Description (optimized for LLM routing — what it does, when to use it)
- Required vs optional params
- Read-only / destructive / idempotent annotations
App Candidates — Which endpoints/features deserve visual UI:
- Dashboard views (aggregate data, KPIs)
- List/Grid views (searchable collections)
- Detail views (single entity deep-dive)
- Forms (create/edit workflows)
- Specialized views (calendars, timelines, funnels, maps)
Rate Limits & Quirks — API-specific gotchas

Quality Gate:

Every endpoint is cataloged
Tool groups are balanced (no group with 50+ tools)
Tool descriptions are LLM-friendly (action-oriented, include "when to use")
App candidates have clear data sources (which tools feed them)
Auth flow is documented with example

Phase 2: Build (MCP Server)

Skill: mcp-server-builder (updated from existing mcp-server-development) Input: {service}-api-analysis.md Output: Complete MCP server in {service}-mcp/

Server structure:

{service}-mcp/
├── src/
│   ├── index.ts              # Server entry, transport, lazy loading
│   ├── client.ts             # API client (auth, request, error handling)
│   ├── tools/
│   │   ├── index.ts          # Tool registry + lazy loader
│   │   ├── {group1}.ts       # Tool group module
│   │   ├── {group2}.ts       # ...
│   │   └── ...
│   └── types.ts              # Shared TypeScript types
├── dist/                     # Compiled output
├── package.json
├── tsconfig.json
├── .env.example
└── README.md

Must-haves (Feb 2026 standard):

MCP SDK ^1.26.0 (security fix: GHSA-345p-7cg4-v4c7 in v1.26.0). Pin to v1.x — SDK v2 is pre-alpha, stable expected Q1 2026
Lazy loading — tool groups load on first use, not at startup
MCP Annotations on every tool:
- readOnlyHint (true for GET operations)
- destructiveHint (true for DELETE operations)
- idempotentHint (true for PUT/upsert operations)
- openWorldHint (false for most API tools)
Zod validation on all tool inputs
Structured error handling — never crash, always return useful error messages
Rate limit awareness — respect API limits, add retry logic
Pagination support — tools that list things must handle pagination
Environment variables — all secrets via env, never hardcoded
TypeScript strict mode — no any, proper types throughout

Quality Gate:

npm run build succeeds (tsc compiles clean)
Every tool has MCP annotations
Every tool has Zod input validation
.env.example lists all required env vars
README documents setup + tool list

Phase 3: Design (MCP Apps)

Skill: mcp-app-designer Input: {service}-api-analysis.md (app candidates section), server tool definitions Output: HTML app files in {service}-mcp/app-ui/ or {service}-mcp/ui/

App types and when to use them:

Type	When	Example
Dashboard	Aggregate KPIs, overview	CRM Dashboard, Ad Performance
Data Grid	Searchable/filterable lists	Contact List, Order History
Detail Card	Single entity deep-dive	Contact Card, Invoice Preview
Form/Wizard	Create or edit flows	Campaign Builder, Appointment Booker
Timeline	Chronological events	Activity Feed, Audit Log
Funnel/Flow	Stage-based progression	Pipeline Board, Sales Funnel
Calendar	Date-based data	Appointment Calendar, Schedule View
Analytics	Charts and visualizations	Revenue Chart, Traffic Graph

App architecture (single-file HTML):

<!DOCTYPE html>
<html>
<head>
  <style>
    /* Dark theme matching LocalBosses (#1a1d23 bg, #ff6d5a accent) */
    /* Responsive — works at 280px-800px width */
    /* No external dependencies */
  </style>
</head>
<body>
  <div id="app"><!-- Loading state --></div>
  <script>
    // 1. Receive data via postMessage
    window.addEventListener('message', (event) => {
      const data = event.data;
      if (data.type === 'mcp_app_data') render(data.data);
      // Also handle workflow_ops type for workflow apps
    });

    // 2. Also fetch from polling endpoint as fallback
    async function pollForData() {
      try {
        const res = await fetch('/api/app-data?app=APP_ID');
        if (res.ok) { const data = await res.json(); render(data); }
      } catch {}
    }

    // 3. Render function with proper empty/error/loading states
    function render(data) {
      if (!data || Object.keys(data).length === 0) {
        showEmptyState(); return;
      }
      // ... actual rendering
    }

    // Auto-poll on load
    pollForData();
    setInterval(pollForData, 3000);
  </script>
</body>
</html>

Design rules:

Dark theme only — #1a1d23 background, #2b2d31 cards, #ff6d5a accent, #dcddde text
Responsive — must work from 280px to 800px width
Self-contained — zero external dependencies, no CDN links
Three states — loading skeleton, empty state, data state
Compact — no wasted space, dense but readable
Interactive — hover effects, click handlers where appropriate
Data-driven — renders whatever data it receives, graceful with missing fields

Quality Gate:

Every app renders with sample data (no blank screens)
Every app has loading, empty, and error states
Dark theme is consistent with LocalBosses
Works at 280px width (thread panel minimum)
No external dependencies or CDN links

Phase 4: Integrate (LocalBosses)

Skill: mcp-localbosses-integrator Input: Built MCP server + apps Output: Fully wired LocalBosses channel

Files to update:

src/lib/channels.ts — Add channel definition:

{
  id: "channel-name",
  name: "Channel Name",
  icon: "🔥",
  category: "BUSINESS OPS",  // or MARKETING, TOOLS, SYSTEM
  description: "What this channel does",
  systemPrompt: `...`, // Must include tool descriptions + when to use them
  defaultApp: "app-id",  // Optional: auto-open app
  mcpApps: ["app-id-1", "app-id-2", ...],
}

src/lib/appNames.ts — Add display names:

"app-id": { name: "App Name", icon: "📊" },

src/lib/app-intakes.ts — Add intake questions:

"app-id": {
  question: "What would you like to see?",
  category: "data-view",
  skipLabel: "Show dashboard",
},

src/app/api/mcp-apps/route.ts — Add app routing:

// In APP_NAME_MAP:
"app-id": "filename-without-html",
// In APP_DIRS (if in a different location):
path.join(process.cwd(), "path/to/app-ui"),

src/app/api/chat/route.ts — Add tool routing:
- System prompt must know about the tools
- Tool results should include  blocks
- Or  for workflow-type apps

System prompt engineering:

The channel system prompt is CRITICAL. It must:

Describe the tools available in natural language
Specify when to use each tool (not just what they do)
Include the hidden data block format so the AI returns structured data to apps
Set the tone and expertise level

Quality Gate:

Channel appears in sidebar under correct category
All apps appear in toolbar
Default app auto-opens on channel entry (if configured)
System prompt mentions all available tools
Intake questions are clear and actionable

Phase 5: Test (QA & Validation)

Skill: mcp-qa-tester Input: Integrated LocalBosses channel Output: Test report + fixes

Testing layers:

Layer 1: Static Analysis

TypeScript compiles clean (tsc --noEmit)
No any types in tool handlers
All apps are valid HTML (no unclosed tags, no script errors)
All routes resolve (no 404s for app files)

Layer 2: Visual Testing (Peekaboo + Gemini)

# Capture the rendered app
peekaboo capture --app "Safari" --format png --output /tmp/test-{app}.png

# Or use browser tool to screenshot
# browser → screenshot → analyze with Gemini

# Gemini multimodal analysis
gemini "Analyze this screenshot of an MCP app. Check:
1. Does it render correctly (no blank screen, no broken layout)?
2. Is the dark theme consistent (#1a1d23 bg, #ff6d5a accent)?
3. Are there proper loading/empty states?
4. Is it responsive-friendly?
5. Any visual bugs?" -f /tmp/test-{app}.png

Layer 3: Functional Testing

Tool invocation: Send natural language messages, verify correct tool is triggered
Data flow: Send a message → verify AI returns APP_DATA block → verify app receives data
Thread lifecycle: Create thread → interact → close → delete → verify cleanup
Cross-channel: Open app from one channel, switch channels, come back — does state persist?

Layer 4: Live API Testing (when credentials available)

Authenticate with real API credentials
Call each tool with real parameters
Verify response shapes match what apps expect
Test error cases (invalid IDs, missing permissions, rate limits)

Layer 5: Integration Testing

Full flow: user sends message → AI responds → app renders → user interacts in thread
Test with 2-3 realistic use cases per channel

Automated test script pattern:

#!/bin/bash
# MCP QA Test Runner
SERVICE="$1"
RESULTS="/tmp/mcp-qa-${SERVICE}.md"

echo "# QA Report: ${SERVICE}" > "$RESULTS"
echo "Date: $(date)" >> "$RESULTS"

# Static checks
echo "## Static Analysis" >> "$RESULTS"
cd "${SERVICE}-mcp"
npm run build 2>&1 | tail -5 >> "$RESULTS"

# App file checks
echo "## App Files" >> "$RESULTS"
for f in app-ui/*.html ui/dist/*.html; do
  [ -f "$f" ] && echo "✅ $f ($(wc -c < "$f") bytes)" >> "$RESULTS"
done

# Route mapping check
echo "## Route Mapping" >> "$RESULTS"
# ... verify APP_NAME_MAP entries exist

Quality Gate:

All static analysis passes
Every app renders visually (verified by screenshot)
At least 3 NL messages trigger correct tools
Thread create/interact/delete cycle works
No console errors in browser dev tools

QA → Fix Feedback Loop

QA findings don't just get logged — they route back to the responsible agent for fixes:

Finding Type	Routes To	Fix Cycle
Tool description misrouting	Agent 1 (Analyst) — update analysis doc, then Agent 2 rebuilds	Re-run QA Layer 3 after fix
Server crash / protocol error	Agent 2 (Builder) — fix server code	Re-run QA Layers 0-1
App visual bug / accessibility	Agent 3 (Designer) — fix HTML app	Re-run QA Layers 2-2.5
Integration wiring issue	Agent 4 (Integrator) — fix channel config	Re-run QA Layers 3, 5
APP_DATA shape mismatch	Agent 3 + Agent 4 — align app expectations with system prompt	Re-run QA Layer 3 + 5

Rule: No server ships with any P0 QA failures. P1 warnings are documented. The fix cycle repeats until QA passes.

Phase 6: Ship (Documentation & Deployment)

Skill: Part of each phase (not separate)

Per-server README must include:

What the service does
Setup instructions (env vars, API key acquisition)
Complete tool list with descriptions
App gallery (screenshots or descriptions)
Known limitations

Post-Ship: MCP Registry Registration

Server metadata (name, description, icon, capabilities summary)
Authentication requirements and setup instructions
Tool catalog summary (names + descriptions)
Link to README and setup guide

The MCP Registry launched preview Sep 2025 and is heading to GA. Registration makes your servers discoverable by any MCP client.

Post-Ship Lifecycle

Shipping is not the end. APIs change, LLMs update, user patterns evolve.

Monitoring (continuous)

APP_DATA parse success rate — target >98%, alert at <95% (see QA Tester Layer 6)
Tool correctness sampling — 5% of interactions weekly, LLM-judged
User retry rate — if >25%, system prompt needs tuning
Thread completion rate — >80% target

API Change Detection (monthly)

Check API changelogs for breaking changes, new endpoints, deprecated fields
Re-run QA Layer 4 (live API testing) quarterly for active servers
Update MSW mocks when API response shapes change

Re-QA Cadence

Trigger	Scope	Frequency
API version bump	Full QA (all layers)	On detection
MCP SDK update	Layers 0-1 (protocol + static)	Monthly
System prompt change	Layers 3, 5 (functional + integration)	On change
App template update	Layers 2-2.5 (visual + accessibility)	On change
LLM model upgrade	DeepEval tool routing eval	On model change
Routine health check	Layer 4 (live API) + smoke test	Quarterly

MCP Apps Protocol (Adopt Now)

The MCP Apps extension is live as of January 26, 2026. Supported by Claude, ChatGPT, VS Code, and Goose.

Key features:

_meta.ui.resourceUri on tools — tools declare which UI to render
ui:// resource URIs — server-side HTML/JS served as MCP resources
JSON-RPC over postMessage — standardized bidirectional app↔host communication
@modelcontextprotocol/ext-apps SDK — App class with ontoolresult, callServerTool

Implication for LocalBosses: The custom  pattern works but is LocalBosses-specific. MCP Apps is the official standard for delivering UI from tools. New servers should adopt MCP Apps. Existing servers should add MCP Apps support alongside the current pattern for backward compatibility.

Migration path:

Add _meta.ui.resourceUri to tool definitions in the server builder
Register app HTML files as ui:// resources in each server
Update app template to use @modelcontextprotocol/ext-apps App class
Maintain backward compat with postMessage/polling for LocalBosses during transition

Operational Notes

Version Control Strategy

All pipeline artifacts should be tracked:

{service}-mcp/
├── .git/                    # Each server is its own repo (or monorepo)
├── src/                     # Server source
├── app-ui/                  # App HTML files
├── test-fixtures/           # Test data (committed)
├── test-baselines/          # Visual regression baselines (committed via LFS for images)
├── test-results/            # Test outputs (gitignored)
└── mcp-factory-reviews/     # QA reports (committed for trending)

Branching: main is production. dev for active work. Feature branches for new tool groups.
Tagging: Tag each shipped version: v1.0.0-{service}. Tag corresponds to the analysis doc version + build.
Monorepo option: For 30+ servers, consider a Turborepo workspace with shared packages (logger, client base class, types).

Capacity Planning (Mac Mini)

Running 30+ MCP servers as stdio processes on a Mac Mini:

Config	Capacity	Notes
Mac Mini M2 (8GB)	~15 servers	Each Node.js process uses 50-80MB RSS at rest
Mac Mini M2 (16GB)	~25 servers	Leave 4GB for OS + LocalBosses app
Mac Mini M2 Pro (32GB)	~40 servers	Comfortable headroom

Mitigations for constrained memory:

Lazy loading (already implemented) — tools only load when called
On-demand startup — only start servers that have active channels
HTTP transport with shared process — multiple "servers" behind one Node process
Containerized with memory limits — docker run --memory=100m per server
PM2 with max memory restart — pm2 start index.js --max-memory-restart 150M

Server Prioritization (30 Untested Servers)

For the 30 built-but-untested servers, prioritize by:

Criteria	Weight	How to Assess
Business value	40%	Which services do users ask about most? Check channel requests.
Credential availability	30%	Can we get API keys/sandbox access today? No creds = can't do Layer 4.
API stability	20%	Is the API mature (v2+) or beta? Stable APIs = fewer re-QA cycles.
App complexity	10%	Simple CRUD (fast) vs complex workflows (slow). Start with simple.

Recommended first batch (highest priority): Servers with sandbox APIs + high business value + simple CRUD patterns. Run them through the full pipeline first to validate the process, then tackle complex ones.

Agent Roles

For mass production, these phases map to specialized agents:

Agent 1: API Analyst (`mcp-analyst`)

Input: "Here's the API docs for ServiceX"
Does: Reads all docs, produces {service}-api-analysis.md
Model: Opus (needs deep reading comprehension)
Skills: mcp-api-analyzer

Agent 2: Server Builder (`mcp-builder`)

Input: {service}-api-analysis.md
Does: Generates full MCP server with all tools
Model: Sonnet (code generation, well-defined patterns)
Skills: mcp-server-builder, mcp-server-development

Agent 3: App Designer (`mcp-designer`)

Input: {service}-api-analysis.md + built server
Does: Creates all HTML apps
Model: Sonnet (HTML/CSS generation)
Skills: mcp-app-designer, frontend-design

Agent 4: Integrator (`mcp-integrator`)

Input: Built server + apps
Does: Wires into LocalBosses (channels, routing, intakes, system prompts)
Model: Sonnet
Skills: mcp-localbosses-integrator

Agent 5: QA Tester (`mcp-qa`)

Input: Integrated LocalBosses channel
Does: Visual + functional testing, produces test report
Model: Opus (multimodal analysis, judgment calls)
Skills: mcp-qa-tester
Tools: Peekaboo, Gemini, browser screenshots

Orchestration (6 phases with feedback loop):

[You provide API docs]
       │
       ▼
  P1: Agent 1 — Analyst ──→ analysis.md
       │
       ├──→ P2: Agent 2 — Builder ──→ MCP server ──┐
       │                                             │ (parallel)
       └──→ P3: Agent 3 — Designer ──→ HTML apps ──┘
                                                     │
                                                     ▼
                              P4: Agent 4 — Integrator ──→ LocalBosses wired up
                                                     │
                                                     ▼
                              P5: Agent 5 — QA Tester ──→ Test report
                                                     │
                                            ┌────────┴────────┐
                                            │  Findings?       │
                                            │  P0 failures ──→ Route back to
                                            │                  Agent 2/3/4 for fix
                                            │  All clear ──→   │
                                            └────────┬────────┘
                                                     ▼
                              P6: Ship + Registry Registration + Monitoring

Agents 2 and 3 run in parallel since apps only need the analysis doc + tool definitions. QA failures loop back to the responsible agent — no server ships with P0 issues.

Current Inventory (Feb 3, 2026)

Completed (in LocalBosses):

n8n (automations channel) — 8 apps
GHL CRM (crm channel) — 65 apps
Reonomy (reonomy channel) — 3 apps
CloseBot (closebot channel) — 6 apps
Meta Ads (meta-ads channel) — 11 apps
Google Console (google-console channel) — 5 apps
Twilio (twilio channel) — 19 apps

Built but untested (30 servers):

Acuity Scheduling, BambooHR, Basecamp, BigCommerce, Brevo, Calendly, ClickUp, Close, Clover, Constant Contact, FieldEdge, FreshBooks, Freshdesk, Gusto, Help Scout, Housecall Pro, Jobber, Keap, Lightspeed, Mailchimp, Pipedrive, Rippling, ServiceTitan, Squarespace, Toast, TouchBistro, Trello, Wave, Wrike, Zendesk

Priority: Test the 30 built servers against live APIs and bring the best ones into LocalBosses.

File Locations

What	Where
This document	`MCP-FACTORY.md`
Skills	`~/.clawdbot/workspace/skills/mcp-*/`
Built servers	`mcp-diagrams/mcp-servers/{service}/` or `{service}-mcp/`
LocalBosses app	`localbosses-app/`
GHL apps (65)	`mcp-diagrams/GoHighLevel-MCP/src/ui/react-app/src/apps/`
App routing	`localbosses-app/src/app/api/mcp-apps/route.ts`
Channel config	`localbosses-app/src/lib/channels.ts`

22 KiB Raw Blame History

MCP Factory — Production Pipeline

The Problem

The Pipeline

Phase 1: Analyze (API Discovery & Analysis)

What the analysis produces:

Quality Gate:

Phase 2: Build (MCP Server)

Server structure:

Must-haves (Feb 2026 standard):

Quality Gate:

Phase 3: Design (MCP Apps)

App types and when to use them:

App architecture (single-file HTML):

Design rules:

Quality Gate:

Phase 4: Integrate (LocalBosses)

Files to update:

System prompt engineering:

Quality Gate:

Phase 5: Test (QA & Validation)

Testing layers:

Layer 1: Static Analysis

Layer 2: Visual Testing (Peekaboo + Gemini)

Layer 3: Functional Testing

Layer 4: Live API Testing (when credentials available)

Layer 5: Integration Testing

Automated test script pattern:

Quality Gate:

QA → Fix Feedback Loop

Phase 6: Ship (Documentation & Deployment)

Per-server README must include:

Post-Ship: MCP Registry Registration

Post-Ship Lifecycle

Monitoring (continuous)

API Change Detection (monthly)

Re-QA Cadence

MCP Apps Protocol (Adopt Now)

Operational Notes

Version Control Strategy

Capacity Planning (Mac Mini)

Server Prioritization (30 Untested Servers)

Agent Roles

Agent 1: API Analyst (mcp-analyst)

Agent 2: Server Builder (mcp-builder)

Agent 3: App Designer (mcp-designer)

Agent 4: Integrator (mcp-integrator)

Agent 5: QA Tester (mcp-qa)

Orchestration (6 phases with feedback loop):

Current Inventory (Feb 3, 2026)

Completed (in LocalBosses):

Built but untested (30 servers):

Priority: Test the 30 built servers against live APIs and bring the best ones into LocalBosses.

File Locations

22 KiB

Raw Blame History

Agent 1: API Analyst (`mcp-analyst`)

Agent 2: Server Builder (`mcp-builder`)

Agent 3: App Designer (`mcp-designer`)

Agent 4: Integrator (`mcp-integrator`)

Agent 5: QA Tester (`mcp-qa`)