2026-02-04 23:01:37 -05:00

8.2 KiB
Raw Blame History

MCP Factory Review — Synthesis & Debate Summary

Date: February 4, 2026 Reviewers: Alpha (Protocol), Beta (Production), Gamma (AI/UX) Total findings: ~48 unique recommendations across 3 reviews


Where All Three Agree (The No-Brainers)

1. Testing/QA Is the Weakest Skill

  • Alpha: No MCP protocol compliance testing at all
  • Beta: "Everything is manual. 30 servers × 10 apps = 300 things to manually verify. This doesn't scale."
  • Gamma: "It's a manual checklist masquerading as a testing framework." No quantitative metrics, no regression baselines, no automated tests.

Verdict: QA needs a complete overhaul — automated test framework, quantitative metrics, fixture data, regression baselines.

2. MCP Spec Has Moved Past Our Skills

  • Alpha: Missing structuredContent, outputSchema, Elicitation, Tasks — 3 major spec features since June 2025
  • Beta: APP_DATA format is fragile (LLMs produce bad JSON), should use proper structured output
  • Gamma: Official MCP Apps extension (Jan 2026) with ui:// URIs makes our iframe/postMessage pattern semi-obsolete

Verdict: Our skills are built against ~March 2025 spec. Need to update for the November 2025 spec + January 2026 MCP Apps extension.

3. Tool Descriptions Are Insufficient

  • Alpha: Missing title field, no outputSchema declarations
  • Beta: Descriptions are too verbose for token budgets
  • Gamma: Need "do NOT use when" disambiguation — reduces misrouting ~30%

Verdict: Tool descriptions are the #1 lever for quality. Add negative disambiguation, add title field, optimize for token budget.

4. Apps Are Display-Only

  • Beta: No interactive patterns noted as a gap
  • Gamma: "No drag-and-drop, no inline editing, no search-within-app. Apps feel like screenshots, not tools."

Verdict: Need at minimum: client-side sort, filter, copy-to-clipboard, expand/collapse.


Unique High-Impact Insights Per Agent

Alpha's Gems (Protocol):

  • SDK v1.26.0 is current — we should pin ^1.25.0 minimum, not ^1.0.0
  • Streamable HTTP is the recommended production transport — we only cover stdio
  • structuredContent + outputSchema is THE proper way to send typed data to apps
  • SDK v2 split coming Q1 2026 — need migration plan

Beta's Gems (Production):

  • Token budget is the real bottleneck, not memory — 50+ tools = 10K+ tokens just in definitions
  • Circuit breaker pattern is missing — retry without circuit breaker amplifies failures
  • No request timeouts — a hanging API blocks the tool indefinitely
  • MCP Gateway pattern — industry standard for managing multiple servers at scale
  • OpenAPI-to-MCP automation — tools exist to auto-generate servers from specs (10x speedup potential)
  • Pipeline resumability — if an agent crashes mid-phase, there's no checkpoint to resume from

Gamma's Gems (AI/UX):

  • "Do NOT use when" in tool descriptions — single highest-impact improvement per Paragon research
  • WCAG contrast failure — #96989d secondary text fails AA at 3.7:1 (needs 4.5:1, fix: #b0b2b8)
  • Quantitative QA metrics — Tool Correctness Rate, Task Completion Rate, not just pass/fail checklists
  • Test data fixtures — standardized sample data per app type, including edge cases and adversarial data
  • System prompts need structured tool routing rules, not just "describe capabilities"
  • BackstopJS for visual regression — pixel-diff screenshot comparison

The Debate: Where They Diverge

Lazy Loading: Valuable or Misguided?

  • Alpha: Lazy loading is good, optimize further with selective tool registration
  • Beta: "Lazy loading optimizes the wrong thing — token budget is the bottleneck"
  • Gamma: "Cap active tools at 15-20 per interaction"

Resolution: Lazy loading helps with startup time but doesn't solve the token problem. Need BOTH: lazy loading for code + dynamic tool filtering for context. Only surface tools relevant to the current conversation.

APP_DATA Pattern: Fix or Replace?

  • Alpha: It's proprietary and conflated with MCP protocol. Should use structuredContent.
  • Beta: It's fragile — LLMs produce bad JSON in HTML comments. Need robust parsing.
  • Gamma: Official MCP Apps extension supersedes it.

Resolution: Short-term: make the parser more robust (Beta's point). Medium-term: adopt structuredContent as the data transport (Alpha's point). Long-term: support official MCP Apps protocol alongside our custom one (Gamma's point).

How Much Testing Is Enough?

  • Alpha: Add protocol compliance testing (MCP Inspector)
  • Beta: Need Jest + Playwright automation. Manual doesn't scale.
  • Gamma: Need quantitative metrics (>95% tool correctness rate) + regression baselines

Resolution: All three are right at different layers. Build a 4-tier automated test stack: MCP Inspector (protocol) → Jest (unit) → Playwright (visual) → Fixture-based routing tests (functional).


Consolidated Priority Actions

TIER 1 — Before Shipping Next Server (1-2 days)

# Action Source Effort
1 Fix WCAG contrast: #96989d → #b0b2b8 in all app templates Gamma 30 min
2 Add request timeouts (AbortController, 30s default) to server template Beta 30 min
3 Add "do NOT use when" disambiguation to tool description formula Gamma 2 hrs
4 Pin SDK to ^1.25.0, Zod to ^3.25.0 Alpha 15 min
5 Add title field to all tool definitions Alpha 1 hr
6 Add circuit breaker to API client template Beta 2 hrs
7 Add structured logging to server template Beta 1 hr
8 Add error boundaries to all app templates Gamma 1 hr

TIER 2 — Before the 30-Server Push (1 week)

# Action Source Effort
9 Add structuredContent + outputSchema to server builder Alpha 4 hrs
10 Build automated QA framework (Jest + Playwright) Beta+Gamma 2 days
11 Create test data fixtures library (per app type) Gamma 4 hrs
12 Add quantitative QA metrics (tool correctness, task completion) Gamma 4 hrs
13 Add integration validation script (cross-reference all 4 files) Beta 3 hrs
14 Add interactive patterns to apps (sort, filter, copy, expand/collapse) Gamma 1 day
15 Improve system prompt engineering (routing rules, few-shot examples, negatives) Gamma 4 hrs
16 Add Streamable HTTP transport option Alpha 4 hrs

TIER 3 — During/After 30-Server Push (2-4 weeks)

# Action Source Effort
17 Support official MCP Apps extension (_meta.ui.resourceUri) Alpha+Gamma 1 week
18 Implement dynamic tool filtering (context-aware registration) Beta+Gamma 3 days
19 Add Elicitation support Alpha 2 days
20 Explore OpenAPI-to-MCP automation for existing servers Beta 3 days
21 Add visual regression baselines (BackstopJS) Gamma 2 days
22 Add data visualization primitives (line charts, sparklines, donuts) Gamma 3 days
23 Implement MCP gateway layer for LocalBosses Beta 1-2 weeks
24 Pipeline resumability (checkpoints, idempotent phases) Beta 1 day
25 Add accessibility testing (axe-core, keyboard nav) Gamma 2 days

TIER 4 — Future / Nice-to-Have

# Action Source
26 SDK v2 migration plan Alpha
27 Non-REST API support (GraphQL, SOAP) Beta
28 Bidirectional app communication (sendToHost) Gamma
29 Tasks (async operations) support Alpha
30 Centralized secret management Beta
31 App micro-interactions (staggered animations) Gamma
32 Multi-tenant considerations Beta

Key Numbers

  • 3 major MCP spec features missing (structuredContent, Elicitation, Tasks)
  • 30% misrouting reduction possible with "do NOT use when" disambiguation
  • 10K+ tokens consumed by 50+ tool definitions (the real bottleneck)
  • 3.7:1 contrast ratio on secondary text (needs 4.5:1 for WCAG AA)
  • 300+ manual test cases needed for 30 servers (need automation)
  • SDK v1.26.0 is current (we reference v1.x vaguely)

All three reviews are saved in mcp-factory-reviews/ for reference.