8.2 KiB
MCP Factory Review — Synthesis & Debate Summary
Date: February 4, 2026 Reviewers: Alpha (Protocol), Beta (Production), Gamma (AI/UX) Total findings: ~48 unique recommendations across 3 reviews
Where All Three Agree (The No-Brainers)
1. Testing/QA Is the Weakest Skill
- Alpha: No MCP protocol compliance testing at all
- Beta: "Everything is manual. 30 servers × 10 apps = 300 things to manually verify. This doesn't scale."
- Gamma: "It's a manual checklist masquerading as a testing framework." No quantitative metrics, no regression baselines, no automated tests.
Verdict: QA needs a complete overhaul — automated test framework, quantitative metrics, fixture data, regression baselines.
2. MCP Spec Has Moved Past Our Skills
- Alpha: Missing structuredContent, outputSchema, Elicitation, Tasks — 3 major spec features since June 2025
- Beta: APP_DATA format is fragile (LLMs produce bad JSON), should use proper structured output
- Gamma: Official MCP Apps extension (Jan 2026) with
ui://URIs makes our iframe/postMessage pattern semi-obsolete
Verdict: Our skills are built against ~March 2025 spec. Need to update for the November 2025 spec + January 2026 MCP Apps extension.
3. Tool Descriptions Are Insufficient
- Alpha: Missing
titlefield, no outputSchema declarations - Beta: Descriptions are too verbose for token budgets
- Gamma: Need "do NOT use when" disambiguation — reduces misrouting ~30%
Verdict: Tool descriptions are the #1 lever for quality. Add negative disambiguation, add title field, optimize for token budget.
4. Apps Are Display-Only
- Beta: No interactive patterns noted as a gap
- Gamma: "No drag-and-drop, no inline editing, no search-within-app. Apps feel like screenshots, not tools."
Verdict: Need at minimum: client-side sort, filter, copy-to-clipboard, expand/collapse.
Unique High-Impact Insights Per Agent
Alpha's Gems (Protocol):
- SDK v1.26.0 is current — we should pin
^1.25.0minimum, not^1.0.0 - Streamable HTTP is the recommended production transport — we only cover stdio
- structuredContent + outputSchema is THE proper way to send typed data to apps
- SDK v2 split coming Q1 2026 — need migration plan
Beta's Gems (Production):
- Token budget is the real bottleneck, not memory — 50+ tools = 10K+ tokens just in definitions
- Circuit breaker pattern is missing — retry without circuit breaker amplifies failures
- No request timeouts — a hanging API blocks the tool indefinitely
- MCP Gateway pattern — industry standard for managing multiple servers at scale
- OpenAPI-to-MCP automation — tools exist to auto-generate servers from specs (10x speedup potential)
- Pipeline resumability — if an agent crashes mid-phase, there's no checkpoint to resume from
Gamma's Gems (AI/UX):
- "Do NOT use when" in tool descriptions — single highest-impact improvement per Paragon research
- WCAG contrast failure — #96989d secondary text fails AA at 3.7:1 (needs 4.5:1, fix: #b0b2b8)
- Quantitative QA metrics — Tool Correctness Rate, Task Completion Rate, not just pass/fail checklists
- Test data fixtures — standardized sample data per app type, including edge cases and adversarial data
- System prompts need structured tool routing rules, not just "describe capabilities"
- BackstopJS for visual regression — pixel-diff screenshot comparison
The Debate: Where They Diverge
Lazy Loading: Valuable or Misguided?
- Alpha: Lazy loading is good, optimize further with selective tool registration
- Beta: "Lazy loading optimizes the wrong thing — token budget is the bottleneck"
- Gamma: "Cap active tools at 15-20 per interaction"
Resolution: Lazy loading helps with startup time but doesn't solve the token problem. Need BOTH: lazy loading for code + dynamic tool filtering for context. Only surface tools relevant to the current conversation.
APP_DATA Pattern: Fix or Replace?
- Alpha: It's proprietary and conflated with MCP protocol. Should use structuredContent.
- Beta: It's fragile — LLMs produce bad JSON in HTML comments. Need robust parsing.
- Gamma: Official MCP Apps extension supersedes it.
Resolution: Short-term: make the parser more robust (Beta's point). Medium-term: adopt structuredContent as the data transport (Alpha's point). Long-term: support official MCP Apps protocol alongside our custom one (Gamma's point).
How Much Testing Is Enough?
- Alpha: Add protocol compliance testing (MCP Inspector)
- Beta: Need Jest + Playwright automation. Manual doesn't scale.
- Gamma: Need quantitative metrics (>95% tool correctness rate) + regression baselines
Resolution: All three are right at different layers. Build a 4-tier automated test stack: MCP Inspector (protocol) → Jest (unit) → Playwright (visual) → Fixture-based routing tests (functional).
Consolidated Priority Actions
TIER 1 — Before Shipping Next Server (1-2 days)
| # | Action | Source | Effort |
|---|---|---|---|
| 1 | Fix WCAG contrast: #96989d → #b0b2b8 in all app templates | Gamma | 30 min |
| 2 | Add request timeouts (AbortController, 30s default) to server template | Beta | 30 min |
| 3 | Add "do NOT use when" disambiguation to tool description formula | Gamma | 2 hrs |
| 4 | Pin SDK to ^1.25.0, Zod to ^3.25.0 |
Alpha | 15 min |
| 5 | Add title field to all tool definitions |
Alpha | 1 hr |
| 6 | Add circuit breaker to API client template | Beta | 2 hrs |
| 7 | Add structured logging to server template | Beta | 1 hr |
| 8 | Add error boundaries to all app templates | Gamma | 1 hr |
TIER 2 — Before the 30-Server Push (1 week)
| # | Action | Source | Effort |
|---|---|---|---|
| 9 | Add structuredContent + outputSchema to server builder | Alpha | 4 hrs |
| 10 | Build automated QA framework (Jest + Playwright) | Beta+Gamma | 2 days |
| 11 | Create test data fixtures library (per app type) | Gamma | 4 hrs |
| 12 | Add quantitative QA metrics (tool correctness, task completion) | Gamma | 4 hrs |
| 13 | Add integration validation script (cross-reference all 4 files) | Beta | 3 hrs |
| 14 | Add interactive patterns to apps (sort, filter, copy, expand/collapse) | Gamma | 1 day |
| 15 | Improve system prompt engineering (routing rules, few-shot examples, negatives) | Gamma | 4 hrs |
| 16 | Add Streamable HTTP transport option | Alpha | 4 hrs |
TIER 3 — During/After 30-Server Push (2-4 weeks)
| # | Action | Source | Effort |
|---|---|---|---|
| 17 | Support official MCP Apps extension (_meta.ui.resourceUri) |
Alpha+Gamma | 1 week |
| 18 | Implement dynamic tool filtering (context-aware registration) | Beta+Gamma | 3 days |
| 19 | Add Elicitation support | Alpha | 2 days |
| 20 | Explore OpenAPI-to-MCP automation for existing servers | Beta | 3 days |
| 21 | Add visual regression baselines (BackstopJS) | Gamma | 2 days |
| 22 | Add data visualization primitives (line charts, sparklines, donuts) | Gamma | 3 days |
| 23 | Implement MCP gateway layer for LocalBosses | Beta | 1-2 weeks |
| 24 | Pipeline resumability (checkpoints, idempotent phases) | Beta | 1 day |
| 25 | Add accessibility testing (axe-core, keyboard nav) | Gamma | 2 days |
TIER 4 — Future / Nice-to-Have
| # | Action | Source |
|---|---|---|
| 26 | SDK v2 migration plan | Alpha |
| 27 | Non-REST API support (GraphQL, SOAP) | Beta |
| 28 | Bidirectional app communication (sendToHost) | Gamma |
| 29 | Tasks (async operations) support | Alpha |
| 30 | Centralized secret management | Beta |
| 31 | App micro-interactions (staggered animations) | Gamma |
| 32 | Multi-tenant considerations | Beta |
Key Numbers
- 3 major MCP spec features missing (structuredContent, Elicitation, Tasks)
- 30% misrouting reduction possible with "do NOT use when" disambiguation
- 10K+ tokens consumed by 50+ tool definitions (the real bottleneck)
- 3.7:1 contrast ratio on secondary text (needs 4.5:1 for WCAG AA)
- 300+ manual test cases needed for 30 servers (need automation)
- SDK v1.26.0 is current (we reference v1.x vaguely)
All three reviews are saved in mcp-factory-reviews/ for reference.