22 KiB
MCP Factory — Production Pipeline
The systematic process for turning any API into a fully tested, production-ready MCP experience inside LocalBosses.
The Problem
We've been building MCP servers ad-hoc: grab an API, bang out tools, create some apps, throw them in LocalBosses, move on. Result: 30+ servers that compile but have never been tested against live APIs, apps that may not render, tool descriptions that might not trigger correctly via natural language.
The Pipeline
API Docs → Analyze → Build → Design → Integrate → Test → Ship
P1 P2 P3 P4 P5 P6
6 phases. Agents 2 (Build) and 3 (Design) run in parallel. QA findings route back to Builder/Designer for fixes before Ship.
Every phase has:
- Clear inputs (what you need to start)
- Clear outputs (what you produce)
- Quality gate (what must pass before moving on)
- Dedicated skill (documented, repeatable instructions)
- Agent capability (can be run by a sub-agent)
Phase 1: Analyze (API Discovery & Analysis)
Skill: mcp-api-analyzer
Input: API documentation URL(s), OpenAPI spec (if available), user guides, public marketing copy
Output: {service}-api-analysis.md
What the analysis produces:
- Service Overview — What the product does, who it's for, pricing tiers
- Auth Method — OAuth2 / API key / JWT / session — with exact flow
- Endpoint Catalog — Every endpoint grouped by domain
- Tool Groups — Logical groupings for lazy loading (aim for 5-15 groups)
- Tool Inventory — Each tool with:
- Name (snake_case, descriptive)
- Description (optimized for LLM routing — what it does, when to use it)
- Required vs optional params
- Read-only / destructive / idempotent annotations
- App Candidates — Which endpoints/features deserve visual UI:
- Dashboard views (aggregate data, KPIs)
- List/Grid views (searchable collections)
- Detail views (single entity deep-dive)
- Forms (create/edit workflows)
- Specialized views (calendars, timelines, funnels, maps)
- Rate Limits & Quirks — API-specific gotchas
Quality Gate:
- Every endpoint is cataloged
- Tool groups are balanced (no group with 50+ tools)
- Tool descriptions are LLM-friendly (action-oriented, include "when to use")
- App candidates have clear data sources (which tools feed them)
- Auth flow is documented with example
Phase 2: Build (MCP Server)
Skill: mcp-server-builder (updated from existing mcp-server-development)
Input: {service}-api-analysis.md
Output: Complete MCP server in {service}-mcp/
Server structure:
{service}-mcp/
├── src/
│ ├── index.ts # Server entry, transport, lazy loading
│ ├── client.ts # API client (auth, request, error handling)
│ ├── tools/
│ │ ├── index.ts # Tool registry + lazy loader
│ │ ├── {group1}.ts # Tool group module
│ │ ├── {group2}.ts # ...
│ │ └── ...
│ └── types.ts # Shared TypeScript types
├── dist/ # Compiled output
├── package.json
├── tsconfig.json
├── .env.example
└── README.md
Must-haves (Feb 2026 standard):
- MCP SDK
^1.26.0(security fix: GHSA-345p-7cg4-v4c7 in v1.26.0). Pin to v1.x — SDK v2 is pre-alpha, stable expected Q1 2026 - Lazy loading — tool groups load on first use, not at startup
- MCP Annotations on every tool:
readOnlyHint(true for GET operations)destructiveHint(true for DELETE operations)idempotentHint(true for PUT/upsert operations)openWorldHint(false for most API tools)
- Zod validation on all tool inputs
- Structured error handling — never crash, always return useful error messages
- Rate limit awareness — respect API limits, add retry logic
- Pagination support — tools that list things must handle pagination
- Environment variables — all secrets via env, never hardcoded
- TypeScript strict mode — no
any, proper types throughout
Quality Gate:
npm run buildsucceeds (tsc compiles clean)- Every tool has MCP annotations
- Every tool has Zod input validation
- .env.example lists all required env vars
- README documents setup + tool list
Phase 3: Design (MCP Apps)
Skill: mcp-app-designer
Input: {service}-api-analysis.md (app candidates section), server tool definitions
Output: HTML app files in {service}-mcp/app-ui/ or {service}-mcp/ui/
App types and when to use them:
| Type | When | Example |
|---|---|---|
| Dashboard | Aggregate KPIs, overview | CRM Dashboard, Ad Performance |
| Data Grid | Searchable/filterable lists | Contact List, Order History |
| Detail Card | Single entity deep-dive | Contact Card, Invoice Preview |
| Form/Wizard | Create or edit flows | Campaign Builder, Appointment Booker |
| Timeline | Chronological events | Activity Feed, Audit Log |
| Funnel/Flow | Stage-based progression | Pipeline Board, Sales Funnel |
| Calendar | Date-based data | Appointment Calendar, Schedule View |
| Analytics | Charts and visualizations | Revenue Chart, Traffic Graph |
App architecture (single-file HTML):
<!DOCTYPE html>
<html>
<head>
<style>
/* Dark theme matching LocalBosses (#1a1d23 bg, #ff6d5a accent) */
/* Responsive — works at 280px-800px width */
/* No external dependencies */
</style>
</head>
<body>
<div id="app"><!-- Loading state --></div>
<script>
// 1. Receive data via postMessage
window.addEventListener('message', (event) => {
const data = event.data;
if (data.type === 'mcp_app_data') render(data.data);
// Also handle workflow_ops type for workflow apps
});
// 2. Also fetch from polling endpoint as fallback
async function pollForData() {
try {
const res = await fetch('/api/app-data?app=APP_ID');
if (res.ok) { const data = await res.json(); render(data); }
} catch {}
}
// 3. Render function with proper empty/error/loading states
function render(data) {
if (!data || Object.keys(data).length === 0) {
showEmptyState(); return;
}
// ... actual rendering
}
// Auto-poll on load
pollForData();
setInterval(pollForData, 3000);
</script>
</body>
</html>
Design rules:
- Dark theme only —
#1a1d23background,#2b2d31cards,#ff6d5aaccent,#dcdddetext - Responsive — must work from 280px to 800px width
- Self-contained — zero external dependencies, no CDN links
- Three states — loading skeleton, empty state, data state
- Compact — no wasted space, dense but readable
- Interactive — hover effects, click handlers where appropriate
- Data-driven — renders whatever data it receives, graceful with missing fields
Quality Gate:
- Every app renders with sample data (no blank screens)
- Every app has loading, empty, and error states
- Dark theme is consistent with LocalBosses
- Works at 280px width (thread panel minimum)
- No external dependencies or CDN links
Phase 4: Integrate (LocalBosses)
Skill: mcp-localbosses-integrator
Input: Built MCP server + apps
Output: Fully wired LocalBosses channel
Files to update:
-
src/lib/channels.ts— Add channel definition:{ id: "channel-name", name: "Channel Name", icon: "🔥", category: "BUSINESS OPS", // or MARKETING, TOOLS, SYSTEM description: "What this channel does", systemPrompt: `...`, // Must include tool descriptions + when to use them defaultApp: "app-id", // Optional: auto-open app mcpApps: ["app-id-1", "app-id-2", ...], } -
src/lib/appNames.ts— Add display names:"app-id": { name: "App Name", icon: "📊" }, -
src/lib/app-intakes.ts— Add intake questions:"app-id": { question: "What would you like to see?", category: "data-view", skipLabel: "Show dashboard", }, -
src/app/api/mcp-apps/route.ts— Add app routing:// In APP_NAME_MAP: "app-id": "filename-without-html", // In APP_DIRS (if in a different location): path.join(process.cwd(), "path/to/app-ui"), -
src/app/api/chat/route.ts— Add tool routing:- System prompt must know about the tools
- Tool results should include
<!--APP_DATA:{...}:END_APP_DATA-->blocks - Or
<!--WORKFLOW_JSON:{...}:END_WORKFLOW-->for workflow-type apps
System prompt engineering:
The channel system prompt is CRITICAL. It must:
- Describe the tools available in natural language
- Specify when to use each tool (not just what they do)
- Include the hidden data block format so the AI returns structured data to apps
- Set the tone and expertise level
Quality Gate:
- Channel appears in sidebar under correct category
- All apps appear in toolbar
- Default app auto-opens on channel entry (if configured)
- System prompt mentions all available tools
- Intake questions are clear and actionable
Phase 5: Test (QA & Validation)
Skill: mcp-qa-tester
Input: Integrated LocalBosses channel
Output: Test report + fixes
Testing layers:
Layer 1: Static Analysis
- TypeScript compiles clean (
tsc --noEmit) - No
anytypes in tool handlers - All apps are valid HTML (no unclosed tags, no script errors)
- All routes resolve (no 404s for app files)
Layer 2: Visual Testing (Peekaboo + Gemini)
# Capture the rendered app
peekaboo capture --app "Safari" --format png --output /tmp/test-{app}.png
# Or use browser tool to screenshot
# browser → screenshot → analyze with Gemini
# Gemini multimodal analysis
gemini "Analyze this screenshot of an MCP app. Check:
1. Does it render correctly (no blank screen, no broken layout)?
2. Is the dark theme consistent (#1a1d23 bg, #ff6d5a accent)?
3. Are there proper loading/empty states?
4. Is it responsive-friendly?
5. Any visual bugs?" -f /tmp/test-{app}.png
Layer 3: Functional Testing
- Tool invocation: Send natural language messages, verify correct tool is triggered
- Data flow: Send a message → verify AI returns APP_DATA block → verify app receives data
- Thread lifecycle: Create thread → interact → close → delete → verify cleanup
- Cross-channel: Open app from one channel, switch channels, come back — does state persist?
Layer 4: Live API Testing (when credentials available)
- Authenticate with real API credentials
- Call each tool with real parameters
- Verify response shapes match what apps expect
- Test error cases (invalid IDs, missing permissions, rate limits)
Layer 5: Integration Testing
- Full flow: user sends message → AI responds → app renders → user interacts in thread
- Test with 2-3 realistic use cases per channel
Automated test script pattern:
#!/bin/bash
# MCP QA Test Runner
SERVICE="$1"
RESULTS="/tmp/mcp-qa-${SERVICE}.md"
echo "# QA Report: ${SERVICE}" > "$RESULTS"
echo "Date: $(date)" >> "$RESULTS"
# Static checks
echo "## Static Analysis" >> "$RESULTS"
cd "${SERVICE}-mcp"
npm run build 2>&1 | tail -5 >> "$RESULTS"
# App file checks
echo "## App Files" >> "$RESULTS"
for f in app-ui/*.html ui/dist/*.html; do
[ -f "$f" ] && echo "✅ $f ($(wc -c < "$f") bytes)" >> "$RESULTS"
done
# Route mapping check
echo "## Route Mapping" >> "$RESULTS"
# ... verify APP_NAME_MAP entries exist
Quality Gate:
- All static analysis passes
- Every app renders visually (verified by screenshot)
- At least 3 NL messages trigger correct tools
- Thread create/interact/delete cycle works
- No console errors in browser dev tools
QA → Fix Feedback Loop
QA findings don't just get logged — they route back to the responsible agent for fixes:
| Finding Type | Routes To | Fix Cycle |
|---|---|---|
| Tool description misrouting | Agent 1 (Analyst) — update analysis doc, then Agent 2 rebuilds | Re-run QA Layer 3 after fix |
| Server crash / protocol error | Agent 2 (Builder) — fix server code | Re-run QA Layers 0-1 |
| App visual bug / accessibility | Agent 3 (Designer) — fix HTML app | Re-run QA Layers 2-2.5 |
| Integration wiring issue | Agent 4 (Integrator) — fix channel config | Re-run QA Layers 3, 5 |
| APP_DATA shape mismatch | Agent 3 + Agent 4 — align app expectations with system prompt | Re-run QA Layer 3 + 5 |
Rule: No server ships with any P0 QA failures. P1 warnings are documented. The fix cycle repeats until QA passes.
Phase 6: Ship (Documentation & Deployment)
Skill: Part of each phase (not separate)
Per-server README must include:
- What the service does
- Setup instructions (env vars, API key acquisition)
- Complete tool list with descriptions
- App gallery (screenshots or descriptions)
- Known limitations
Post-Ship: MCP Registry Registration
Register shipped servers in the MCP Registry for discoverability:
- Server metadata (name, description, icon, capabilities summary)
- Authentication requirements and setup instructions
- Tool catalog summary (names + descriptions)
- Link to README and setup guide
The MCP Registry launched preview Sep 2025 and is heading to GA. Registration makes your servers discoverable by any MCP client.
Post-Ship Lifecycle
Shipping is not the end. APIs change, LLMs update, user patterns evolve.
Monitoring (continuous)
- APP_DATA parse success rate — target >98%, alert at <95% (see QA Tester Layer 6)
- Tool correctness sampling — 5% of interactions weekly, LLM-judged
- User retry rate — if >25%, system prompt needs tuning
- Thread completion rate — >80% target
API Change Detection (monthly)
- Check API changelogs for breaking changes, new endpoints, deprecated fields
- Re-run QA Layer 4 (live API testing) quarterly for active servers
- Update MSW mocks when API response shapes change
Re-QA Cadence
| Trigger | Scope | Frequency |
|---|---|---|
| API version bump | Full QA (all layers) | On detection |
| MCP SDK update | Layers 0-1 (protocol + static) | Monthly |
| System prompt change | Layers 3, 5 (functional + integration) | On change |
| App template update | Layers 2-2.5 (visual + accessibility) | On change |
| LLM model upgrade | DeepEval tool routing eval | On model change |
| Routine health check | Layer 4 (live API) + smoke test | Quarterly |
MCP Apps Protocol (Adopt Now)
The MCP Apps extension is live as of January 26, 2026. Supported by Claude, ChatGPT, VS Code, and Goose.
Key features:
_meta.ui.resourceUrion tools — tools declare which UI to renderui://resource URIs — server-side HTML/JS served as MCP resources- JSON-RPC over postMessage — standardized bidirectional app↔host communication
@modelcontextprotocol/ext-appsSDK — App class withontoolresult,callServerTool
Implication for LocalBosses: The custom <!--APP_DATA:...:END_APP_DATA--> pattern works but is LocalBosses-specific. MCP Apps is the official standard for delivering UI from tools. New servers should adopt MCP Apps. Existing servers should add MCP Apps support alongside the current pattern for backward compatibility.
Migration path:
- Add
_meta.ui.resourceUrito tool definitions in the server builder - Register app HTML files as
ui://resources in each server - Update app template to use
@modelcontextprotocol/ext-appsApp class - Maintain backward compat with postMessage/polling for LocalBosses during transition
Operational Notes
Version Control Strategy
All pipeline artifacts should be tracked:
{service}-mcp/
├── .git/ # Each server is its own repo (or monorepo)
├── src/ # Server source
├── app-ui/ # App HTML files
├── test-fixtures/ # Test data (committed)
├── test-baselines/ # Visual regression baselines (committed via LFS for images)
├── test-results/ # Test outputs (gitignored)
└── mcp-factory-reviews/ # QA reports (committed for trending)
- Branching:
mainis production.devfor active work. Feature branches for new tool groups. - Tagging: Tag each shipped version:
v1.0.0-{service}. Tag corresponds to the analysis doc version + build. - Monorepo option: For 30+ servers, consider a Turborepo workspace with shared packages (logger, client base class, types).
Capacity Planning (Mac Mini)
Running 30+ MCP servers as stdio processes on a Mac Mini:
| Config | Capacity | Notes |
|---|---|---|
| Mac Mini M2 (8GB) | ~15 servers | Each Node.js process uses 50-80MB RSS at rest |
| Mac Mini M2 (16GB) | ~25 servers | Leave 4GB for OS + LocalBosses app |
| Mac Mini M2 Pro (32GB) | ~40 servers | Comfortable headroom |
Mitigations for constrained memory:
- Lazy loading (already implemented) — tools only load when called
- On-demand startup — only start servers that have active channels
- HTTP transport with shared process — multiple "servers" behind one Node process
- Containerized with memory limits —
docker run --memory=100mper server - PM2 with max memory restart —
pm2 start index.js --max-memory-restart 150M
Server Prioritization (30 Untested Servers)
For the 30 built-but-untested servers, prioritize by:
| Criteria | Weight | How to Assess |
|---|---|---|
| Business value | 40% | Which services do users ask about most? Check channel requests. |
| Credential availability | 30% | Can we get API keys/sandbox access today? No creds = can't do Layer 4. |
| API stability | 20% | Is the API mature (v2+) or beta? Stable APIs = fewer re-QA cycles. |
| App complexity | 10% | Simple CRUD (fast) vs complex workflows (slow). Start with simple. |
Recommended first batch (highest priority): Servers with sandbox APIs + high business value + simple CRUD patterns. Run them through the full pipeline first to validate the process, then tackle complex ones.
Agent Roles
For mass production, these phases map to specialized agents:
Agent 1: API Analyst (mcp-analyst)
- Input: "Here's the API docs for ServiceX"
- Does: Reads all docs, produces
{service}-api-analysis.md - Model: Opus (needs deep reading comprehension)
- Skills:
mcp-api-analyzer
Agent 2: Server Builder (mcp-builder)
- Input:
{service}-api-analysis.md - Does: Generates full MCP server with all tools
- Model: Sonnet (code generation, well-defined patterns)
- Skills:
mcp-server-builder,mcp-server-development
Agent 3: App Designer (mcp-designer)
- Input:
{service}-api-analysis.md+ built server - Does: Creates all HTML apps
- Model: Sonnet (HTML/CSS generation)
- Skills:
mcp-app-designer,frontend-design
Agent 4: Integrator (mcp-integrator)
- Input: Built server + apps
- Does: Wires into LocalBosses (channels, routing, intakes, system prompts)
- Model: Sonnet
- Skills:
mcp-localbosses-integrator
Agent 5: QA Tester (mcp-qa)
- Input: Integrated LocalBosses channel
- Does: Visual + functional testing, produces test report
- Model: Opus (multimodal analysis, judgment calls)
- Skills:
mcp-qa-tester - Tools: Peekaboo, Gemini, browser screenshots
Orchestration (6 phases with feedback loop):
[You provide API docs]
│
▼
P1: Agent 1 — Analyst ──→ analysis.md
│
├──→ P2: Agent 2 — Builder ──→ MCP server ──┐
│ │ (parallel)
└──→ P3: Agent 3 — Designer ──→ HTML apps ──┘
│
▼
P4: Agent 4 — Integrator ──→ LocalBosses wired up
│
▼
P5: Agent 5 — QA Tester ──→ Test report
│
┌────────┴────────┐
│ Findings? │
│ P0 failures ──→ Route back to
│ Agent 2/3/4 for fix
│ All clear ──→ │
└────────┬────────┘
▼
P6: Ship + Registry Registration + Monitoring
Agents 2 and 3 run in parallel since apps only need the analysis doc + tool definitions. QA failures loop back to the responsible agent — no server ships with P0 issues.
Current Inventory (Feb 3, 2026)
Completed (in LocalBosses):
- n8n (automations channel) — 8 apps
- GHL CRM (crm channel) — 65 apps
- Reonomy (reonomy channel) — 3 apps
- CloseBot (closebot channel) — 6 apps
- Meta Ads (meta-ads channel) — 11 apps
- Google Console (google-console channel) — 5 apps
- Twilio (twilio channel) — 19 apps
Built but untested (30 servers):
Acuity Scheduling, BambooHR, Basecamp, BigCommerce, Brevo, Calendly, ClickUp, Close, Clover, Constant Contact, FieldEdge, FreshBooks, Freshdesk, Gusto, Help Scout, Housecall Pro, Jobber, Keap, Lightspeed, Mailchimp, Pipedrive, Rippling, ServiceTitan, Squarespace, Toast, TouchBistro, Trello, Wave, Wrike, Zendesk
Priority: Test the 30 built servers against live APIs and bring the best ones into LocalBosses.
File Locations
| What | Where |
|---|---|
| This document | MCP-FACTORY.md |
| Skills | ~/.clawdbot/workspace/skills/mcp-*/ |
| Built servers | mcp-diagrams/mcp-servers/{service}/ or {service}-mcp/ |
| LocalBosses app | localbosses-app/ |
| GHL apps (65) | mcp-diagrams/GoHighLevel-MCP/src/ui/react-app/src/apps/ |
| App routing | localbosses-app/src/app/api/mcp-apps/route.ts |
| Channel config | localbosses-app/src/lib/channels.ts |