Jake Shore a2c95437c1 Daily backup: 2026-02-04

2026-02-04 23:01:37 -05:00

8.2 KiB

Raw Blame History

MCP Factory Review — Synthesis & Debate Summary

Date: February 4, 2026 Reviewers: Alpha (Protocol), Beta (Production), Gamma (AI/UX) Total findings: ~48 unique recommendations across 3 reviews

Where All Three Agree (The No-Brainers)

1. Testing/QA Is the Weakest Skill

Alpha: No MCP protocol compliance testing at all
Beta: "Everything is manual. 30 servers × 10 apps = 300 things to manually verify. This doesn't scale."
Gamma: "It's a manual checklist masquerading as a testing framework." No quantitative metrics, no regression baselines, no automated tests.

Verdict: QA needs a complete overhaul — automated test framework, quantitative metrics, fixture data, regression baselines.

2. MCP Spec Has Moved Past Our Skills

Alpha: Missing structuredContent, outputSchema, Elicitation, Tasks — 3 major spec features since June 2025
Beta: APP_DATA format is fragile (LLMs produce bad JSON), should use proper structured output
Gamma: Official MCP Apps extension (Jan 2026) with ui:// URIs makes our iframe/postMessage pattern semi-obsolete

Verdict: Our skills are built against ~March 2025 spec. Need to update for the November 2025 spec + January 2026 MCP Apps extension.

3. Tool Descriptions Are Insufficient

Alpha: Missing title field, no outputSchema declarations
Beta: Descriptions are too verbose for token budgets
Gamma: Need "do NOT use when" disambiguation — reduces misrouting ~30%

Verdict: Tool descriptions are the #1 lever for quality. Add negative disambiguation, add title field, optimize for token budget.

4. Apps Are Display-Only

Beta: No interactive patterns noted as a gap
Gamma: "No drag-and-drop, no inline editing, no search-within-app. Apps feel like screenshots, not tools."

Verdict: Need at minimum: client-side sort, filter, copy-to-clipboard, expand/collapse.

Unique High-Impact Insights Per Agent

Alpha's Gems (Protocol):

SDK v1.26.0 is current — we should pin ^1.25.0 minimum, not ^1.0.0
Streamable HTTP is the recommended production transport — we only cover stdio
structuredContent + outputSchema is THE proper way to send typed data to apps
SDK v2 split coming Q1 2026 — need migration plan

Beta's Gems (Production):

Token budget is the real bottleneck, not memory — 50+ tools = 10K+ tokens just in definitions
Circuit breaker pattern is missing — retry without circuit breaker amplifies failures
No request timeouts — a hanging API blocks the tool indefinitely
MCP Gateway pattern — industry standard for managing multiple servers at scale
OpenAPI-to-MCP automation — tools exist to auto-generate servers from specs (10x speedup potential)
Pipeline resumability — if an agent crashes mid-phase, there's no checkpoint to resume from

Gamma's Gems (AI/UX):

"Do NOT use when" in tool descriptions — single highest-impact improvement per Paragon research
WCAG contrast failure — #96989d secondary text fails AA at 3.7:1 (needs 4.5:1, fix: #b0b2b8)
Quantitative QA metrics — Tool Correctness Rate, Task Completion Rate, not just pass/fail checklists
Test data fixtures — standardized sample data per app type, including edge cases and adversarial data
System prompts need structured tool routing rules, not just "describe capabilities"
BackstopJS for visual regression — pixel-diff screenshot comparison

The Debate: Where They Diverge

Lazy Loading: Valuable or Misguided?

Alpha: Lazy loading is good, optimize further with selective tool registration
Beta: "Lazy loading optimizes the wrong thing — token budget is the bottleneck"
Gamma: "Cap active tools at 15-20 per interaction"

Resolution: Lazy loading helps with startup time but doesn't solve the token problem. Need BOTH: lazy loading for code + dynamic tool filtering for context. Only surface tools relevant to the current conversation.

APP_DATA Pattern: Fix or Replace?

Alpha: It's proprietary and conflated with MCP protocol. Should use structuredContent.
Beta: It's fragile — LLMs produce bad JSON in HTML comments. Need robust parsing.
Gamma: Official MCP Apps extension supersedes it.

Resolution: Short-term: make the parser more robust (Beta's point). Medium-term: adopt structuredContent as the data transport (Alpha's point). Long-term: support official MCP Apps protocol alongside our custom one (Gamma's point).

How Much Testing Is Enough?

Alpha: Add protocol compliance testing (MCP Inspector)
Beta: Need Jest + Playwright automation. Manual doesn't scale.
Gamma: Need quantitative metrics (>95% tool correctness rate) + regression baselines

Resolution: All three are right at different layers. Build a 4-tier automated test stack: MCP Inspector (protocol) → Jest (unit) → Playwright (visual) → Fixture-based routing tests (functional).

Consolidated Priority Actions

TIER 1 — Before Shipping Next Server (1-2 days)

#	Action	Source	Effort
1	Fix WCAG contrast: #96989d → #b0b2b8 in all app templates	Gamma	30 min
2	Add request timeouts (AbortController, 30s default) to server template	Beta	30 min
3	Add "do NOT use when" disambiguation to tool description formula	Gamma	2 hrs
4	Pin SDK to `^1.25.0`, Zod to `^3.25.0`	Alpha	15 min
5	Add `title` field to all tool definitions	Alpha	1 hr
6	Add circuit breaker to API client template	Beta	2 hrs
7	Add structured logging to server template	Beta	1 hr
8	Add error boundaries to all app templates	Gamma	1 hr

TIER 2 — Before the 30-Server Push (1 week)

#	Action	Source	Effort
9	Add structuredContent + outputSchema to server builder	Alpha	4 hrs
10	Build automated QA framework (Jest + Playwright)	Beta+Gamma	2 days
11	Create test data fixtures library (per app type)	Gamma	4 hrs
12	Add quantitative QA metrics (tool correctness, task completion)	Gamma	4 hrs
13	Add integration validation script (cross-reference all 4 files)	Beta	3 hrs
14	Add interactive patterns to apps (sort, filter, copy, expand/collapse)	Gamma	1 day
15	Improve system prompt engineering (routing rules, few-shot examples, negatives)	Gamma	4 hrs
16	Add Streamable HTTP transport option	Alpha	4 hrs

TIER 3 — During/After 30-Server Push (2-4 weeks)

#	Action	Source	Effort
17	Support official MCP Apps extension (`_meta.ui.resourceUri`)	Alpha+Gamma	1 week
18	Implement dynamic tool filtering (context-aware registration)	Beta+Gamma	3 days
19	Add Elicitation support	Alpha	2 days
20	Explore OpenAPI-to-MCP automation for existing servers	Beta	3 days
21	Add visual regression baselines (BackstopJS)	Gamma	2 days
22	Add data visualization primitives (line charts, sparklines, donuts)	Gamma	3 days
23	Implement MCP gateway layer for LocalBosses	Beta	1-2 weeks
24	Pipeline resumability (checkpoints, idempotent phases)	Beta	1 day
25	Add accessibility testing (axe-core, keyboard nav)	Gamma	2 days

TIER 4 — Future / Nice-to-Have

#	Action	Source
26	SDK v2 migration plan	Alpha
27	Non-REST API support (GraphQL, SOAP)	Beta
28	Bidirectional app communication (sendToHost)	Gamma
29	Tasks (async operations) support	Alpha
30	Centralized secret management	Beta
31	App micro-interactions (staggered animations)	Gamma
32	Multi-tenant considerations	Beta

Key Numbers

3 major MCP spec features missing (structuredContent, Elicitation, Tasks)
30% misrouting reduction possible with "do NOT use when" disambiguation
10K+ tokens consumed by 50+ tool definitions (the real bottleneck)
3.7:1 contrast ratio on secondary text (needs 4.5:1 for WCAG AA)
300+ manual test cases needed for 30 servers (need automation)
SDK v1.26.0 is current (we reference v1.x vaguely)

All three reviews are saved in mcp-factory-reviews/ for reference.

8.2 KiB Raw Blame History Unescape Escape