186 lines
9.0 KiB
Markdown
186 lines
9.0 KiB
Markdown
# Research: How MCP Servers Handle Rich Interactive UIs
|
|
|
|
**Date:** 2026-02-03
|
|
**Source:** `github.com/modelcontextprotocol/ext-apps` (official repo) + MCP Apps blog/docs
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
**There is ONE canonical pattern for interactivity in MCP Apps: `callServerTool`.** Every production example that has interactive UI uses `app.callServerTool()` from the client side to communicate with the server. There are NO alternative server-communication patterns. However, there are 3 complementary APIs for communicating with the HOST (not server): `updateModelContext`, `sendMessage`, and `openLink`.
|
|
|
|
The key insight: **Interactivity in MCP Apps is entirely client-side JavaScript.** The server sends data once via tool results, and the UI is a fully interactive web app running in a sandboxed iframe. Buttons, sliders, drag-drop, forms — all work natively because it's just HTML/JS in an iframe. The only time you need `callServerTool` is when you need FRESH DATA from the server.
|
|
|
|
---
|
|
|
|
## The 5 Proven Interactivity Patterns
|
|
|
|
### Pattern 1: Client-Side State Only (Most Common)
|
|
**Used by:** budget-allocator, scenario-modeler, customer-segmentation, cohort-heatmap
|
|
|
|
The server sends ALL data upfront via `structuredContent` in the tool result. The UI is fully interactive using only local JavaScript state — no server calls needed after initial load.
|
|
|
|
```
|
|
Server → ontoolresult(structuredContent) → UI renders with local state
|
|
User interacts → local JS handles everything (sliders, charts, forms)
|
|
```
|
|
|
|
**Example: Budget Allocator** — Server sends categories, history, benchmarks. Sliders, charts, percentile badges all update locally. Zero `callServerTool` calls during interaction.
|
|
|
|
**Example: Scenario Modeler** — Server sends templates + default inputs. User adjusts sliders → React recalculates projections locally using `useMemo`. No server round-trips.
|
|
|
|
**Key takeaway:** If you can send all needed data upfront, do it. This is the simplest and most responsive pattern.
|
|
|
|
### Pattern 2: callServerTool for On-Demand Data
|
|
**Used by:** wiki-explorer, system-monitor, pdf-server
|
|
|
|
The UI calls `app.callServerTool()` when it needs data the server didn't send initially. This is the ONLY way to get fresh data from the server.
|
|
|
|
```
|
|
User clicks "Expand" → app.callServerTool({ name: "get-first-degree-links", arguments: { url } })
|
|
Server returns structuredContent → UI updates graph
|
|
```
|
|
|
|
**Example: Wiki Explorer** — Initial tool result gives first page's links. When user clicks "Expand" on a node, it calls `callServerTool` to fetch that page's links. The graph visualization (force-directed d3) is entirely client-side.
|
|
|
|
**Example: System Monitor** — Two tools:
|
|
- `get-system-info` (model-visible): Returns static config once
|
|
- `poll-system-stats` (app-only, `visibility: ["app"]`): Returns live metrics
|
|
|
|
The UI polls `poll-system-stats` via `setInterval` + `callServerTool` every 2 seconds.
|
|
|
|
### Pattern 3: App-Only Tools with `visibility: ["app"]`
|
|
**Used by:** system-monitor, pdf-server
|
|
|
|
Tools marked `visibility: ["app"]` are hidden from the LLM — only the UI can call them. This is critical for polling, pagination, and UI-driven actions that shouldn't clutter model context.
|
|
|
|
```ts
|
|
// Server: register app-only tool
|
|
registerAppTool(server, "poll-system-stats", {
|
|
_meta: { ui: { visibility: ["app"] } }, // Hidden from model
|
|
// ...
|
|
});
|
|
|
|
// Client: call it on interval
|
|
setInterval(async () => {
|
|
const result = await app.callServerTool({ name: "poll-system-stats", arguments: {} });
|
|
updateUI(result.structuredContent);
|
|
}, 2000);
|
|
```
|
|
|
|
### Pattern 4: updateModelContext (Inform the Model)
|
|
**Used by:** map-server, transcript-server
|
|
|
|
`app.updateModelContext()` pushes context TO the model without triggering a response. It tells the model what the user is currently seeing/doing.
|
|
|
|
**Example: Map Server** — When user pans the map, debounced handler sends location + screenshot:
|
|
```ts
|
|
app.updateModelContext({
|
|
content: [
|
|
{ type: "text", text: `Map centered on [${lat}, ${lon}], ${widthKm}km wide` },
|
|
{ type: "image", data: screenshotBase64, mimeType: "image/png" }
|
|
]
|
|
});
|
|
```
|
|
|
|
**Example: Transcript Server** — Updates model context with unsent transcript text using YAML frontmatter:
|
|
```ts
|
|
app.updateModelContext({
|
|
content: [{ type: "text", text: `---\nstatus: listening\nunsent-entries: 3\n---\n\n${text}` }]
|
|
});
|
|
```
|
|
|
|
### Pattern 5: sendMessage (Trigger Model Response)
|
|
**Used by:** transcript-server
|
|
|
|
`app.sendMessage()` sends a message AS the user, triggering a model response. Used when the UI wants the model to act on accumulated data.
|
|
|
|
```ts
|
|
await app.sendMessage({
|
|
role: "user",
|
|
content: [{ type: "text", text: transcriptText }]
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## How Specific Examples Handle Interactivity
|
|
|
|
| Example | Interactive Elements | Pattern Used | callServerTool? |
|
|
|---------|---------------------|-------------|----------------|
|
|
| **Budget Allocator** | Sliders, dropdowns, charts, sparklines | Client-side state only | ❌ No |
|
|
| **Scenario Modeler** | Sliders, template selector, React charts | Client-side state only | ❌ No |
|
|
| **Cohort Heatmap** | Hover tooltips, data viz | Client-side state only | ❌ No |
|
|
| **Customer Segmentation** | Scatter chart, tooltips | Client-side state only | ❌ No |
|
|
| **Wiki Explorer** | Click nodes, expand graph, zoom, reset | callServerTool on user action | ✅ Yes |
|
|
| **System Monitor** | Start/stop polling, live charts | callServerTool polling | ✅ Yes (every 2s) |
|
|
| **Map Server** | Pan, zoom, fullscreen 3D globe | updateModelContext on move | ❌ No (data from tool input) |
|
|
| **Transcript** | Start/stop recording, send transcript | updateModelContext + sendMessage | ❌ No |
|
|
| **PDF Server** | Page navigation, zoom | callServerTool for chunks | ✅ Yes (chunked loading) |
|
|
| **ShaderToy** | WebGL shader rendering | ontoolinputpartial for streaming | ❌ No |
|
|
|
|
---
|
|
|
|
## What Does NOT Exist (Common Misconceptions)
|
|
|
|
1. **No WebSocket/SSE from server to UI** — Communication is request/response via `callServerTool` only
|
|
2. **No server-push to UI** — The server cannot push data to the UI after initial tool result. The UI must poll.
|
|
3. **No shared state between tool calls** — Each HTTP request creates a new server instance (stateless)
|
|
4. **No DOM access from server** — The server never touches the UI. It only returns data.
|
|
5. **No "re-render" from server** — Once the UI is in the iframe, it's autonomous. The server doesn't control it.
|
|
|
|
---
|
|
|
|
## The Minimal Working Interactive MCP App
|
|
|
|
```
|
|
1. Server registers tool with `_meta.ui.resourceUri`
|
|
2. Server registers resource that returns bundled HTML
|
|
3. HTML includes `<script>` that creates `new App()` and calls `app.connect()`
|
|
4. `app.ontoolresult` receives initial data from the tool call
|
|
5. UI renders with that data
|
|
6. User interactions handled by local JS event listeners
|
|
7. If fresh server data needed: `app.callServerTool()`
|
|
8. If model needs to know what user did: `app.updateModelContext()`
|
|
```
|
|
|
|
---
|
|
|
|
## Recommendations for Our MCP App
|
|
|
|
### Problem: Interactive components lose functionality after rendering
|
|
|
|
This is likely because we're treating the UI as a static render rather than a live web app. In MCP Apps, the UI is a **full JavaScript application** running in an iframe. Event listeners, React state, DOM manipulation — all work normally.
|
|
|
|
### What We Should Adopt
|
|
|
|
1. **Send all data upfront via `structuredContent`** — Don't drip-feed data. Send everything the UI needs in the initial tool result. This is what budget-allocator and scenario-modeler do.
|
|
|
|
2. **Build the UI as a standalone web app** — Use Vite to bundle HTML+JS+CSS into a single file. The UI should work independently with hardcoded test data before integrating with MCP.
|
|
|
|
3. **Use `callServerTool` only for fresh data** — If the user needs to load more data (like wiki-explorer expanding nodes), create an app-only tool with `visibility: ["app"]`.
|
|
|
|
4. **Use `updateModelContext` to keep the model informed** — When users interact (drag items, fill forms), send a summary to the model so it knows the current state.
|
|
|
|
5. **Use `app.ontoolresult` + `app.ontoolinput`** — These are the entry points. `ontoolinput` fires with the tool arguments (can show loading state). `ontoolresult` fires with the actual data.
|
|
|
|
6. **Bundle everything with Vite + vite-plugin-singlefile** — Every example uses this pattern. The entire UI (HTML, JS, CSS) becomes a single HTML file served as a resource.
|
|
|
|
### Architecture Template
|
|
|
|
```
|
|
Server (server.ts):
|
|
- registerAppTool("my-tool", { _meta: { ui: { resourceUri } } }, handler)
|
|
- registerAppTool("my-app-action", { _meta: { ui: { visibility: ["app"] } } }, handler) // optional
|
|
- registerAppResource(resourceUri, ..., returns bundled HTML)
|
|
|
|
Client (src/mcp-app.ts or .tsx):
|
|
- const app = new App({ name, version })
|
|
- app.ontoolinput = (params) => { /* show loading/preview */ }
|
|
- app.ontoolresult = (result) => { /* render UI with result.structuredContent */ }
|
|
- app.connect()
|
|
- Event listeners for user interactions
|
|
- app.callServerTool() when fresh server data needed
|
|
- app.updateModelContext() when model needs to know about user actions
|
|
```
|