clawdbot-workspace/MASTER-PLAN-interactive-agent-factory.md

# MASTER PLAN: Interactive Agent Factory SaaS
## Codename: "GooseFactory" — Your AI Factory, Your Rules

> **Author:** Buba (synthesized from 4 specialized research agents)
> **Date:** 2026-02-06
> **Status:** PLAN — Awaiting Jake's Review
> **Supporting Research:** 4 docs, ~15,000 words, 60+ sources

---

## TL;DR — The 30-Second Pitch

Fork Goose (Block's open-source AI agent). Gut its chat UI. Wire in a **Factory Command Center** — a decision queue, pipeline kanban, and approval system that makes it painfully obvious when YOU are the bottleneck. The backend is an API + MCP server that exposes every factory operation as a conversational tool. You literally type "what needs my attention?" and get a prioritized list with one-click approve/reject. Everything you don't touch auto-advances. Everything that needs you screams at you until you act.

---

## 1. WHY THIS MATTERS

Right now the pipeline has ~64 MCP servers across 8 stages. The bottleneck isn't the AI — it's **you not knowing what's stuck on you**. The current system (Discord channels + cron heartbeats + manual checks) is passive. You have to go looking for what needs attention. That's backwards.

**The fix:** Build a system where decisions come to YOU, not the other way around. Make human-in-the-loop a first-class experience, not an afterthought.

---

## 2. ARCHITECTURE OVERVIEW

```
┌─────────────────────────────────────────────────────────────────┐
│                     YOUR INTERFACE LAYER                         │
│                                                                  │
│  ┌─────────────┐  ┌──────────────┐  ┌──────────────┐           │
│  │ GooseFactory │  │  Discord Bot  │  │   Mobile     │           │
│  │ Desktop App  │  │  (Buttons +   │  │  Push Notifs │           │
│  │ (Forked      │  │   Embeds)     │  │  (Quick      │           │
│  │  Goose)      │  │              │  │   Approve)   │           │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘           │
│         │                  │                  │                   │
│         └──────────────────┼──────────────────┘                   │
│                            │                                      │
│  ┌─────────────────────────▼─────────────────────────────┐      │
│  │            MCP Server (Factory Operations)             │      │
│  │  11 Tools · 6 Resources · 4 Prompts                    │      │
│  │  "what needs attention?" → prioritized decision queue   │      │
│  └─────────────────────────┬─────────────────────────────┘      │
└────────────────────────────┼────────────────────────────────────┘
                             │
┌────────────────────────────┼────────────────────────────────────┐
│                            ▼                                     │
│  ┌─────────────────────────────────────────────────────┐        │
│  │              Factory API (REST + WebSocket)          │        │
│  │  30+ endpoints · Real-time events · GraphQL queries  │        │
│  └─────────────────────────┬───────────────────────────┘        │
│                            │                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐       │
│  │Pipeline  │  │Task      │  │Notif +   │  │Audit     │       │
│  │Engine    │  │Queue     │  │Escalation│  │Logger    │       │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘       │
│                            │                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐                      │
│  │PostgreSQL│  │Redis     │  │S3/R2     │                      │
│  │(State)   │  │(Events)  │  │(Assets)  │                      │
│  └──────────┘  └──────────┘  └──────────┘                      │
└─────────────────────────────────────────────────────────────────┘
```

---

## 3. THE GOOSE FORK — "GooseFactory"

### Why Goose?
- **Rust backend + Electron/React frontend** — production-grade, fast
- **Apache 2.0 license** — full commercial freedom, no copyleft
- **MCP-native** — already a first-class MCP host with dynamic extension discovery
- **Built-in permission system** — 4 modes including Smart Approval (risk-based)
- **Extension ecosystem** — thousands of MCP servers plug in immediately
- **Active community** — but now under Linux Foundation (AAIF), so stable governance

### What We Change

| Component | Current Goose | GooseFactory |
|-----------|--------------|--------------|
| **Branding** | Goose logos, `goose://` protocol | Your brand, `factory://` protocol |
| **Default Extensions** | Developer, Memory, etc. | Factory MCP Server (built-in), Pipeline Manager |
| **Chat UI** | General-purpose assistant | Factory Command Center with decision queue sidebar |
| **Approval Flow** | Simple allow/deny on tool calls | Rich approval cards with context, diffs, metrics |
| **System Prompts** | Generic agent instructions | Factory operator mode — knows about pipeline stages, MCPs |
| **MCP UI Rendering** | Basic inline/sidecar (WIP) | Custom approval UIs, pipeline dashboards, code review panels |
| **Protocol Handler** | `goose://extension?...` | `factory://approve?task_id=...` deep links |

### Fork Strategy

1. **Clone the repo** — `git clone https://github.com/block/goose GooseFactory`
2. **Rebrand** — `package.json`, `main.ts`, assets, protocol handler (~1-2 days)
3. **Add Factory MCP Server** as a built-in Rust extension in `crates/goose-mcp/`
4. **Customize the chat UI** — Add decision queue sidebar in React (the interesting part)
5. **Add MCP UI components** — Custom approval cards using `@mcp-ui/client`
6. **Configure Smart Approval** — Factory operations auto-classified by risk level

### ⚠️ Timing Risk
Goose is actively migrating to ACP (Agent Communication Protocol) — Issue #6642. This replaces the backend REST+SSE with JSON-RPC 2.0. **Recommendation:** Fork AFTER the ACP migration lands (or fork now and track upstream). The migration affects `goosed` ↔ desktop communication.

---

## 4. EVERY MOMENT YOU'RE NEEDED (Taxonomy)

Based on research across 10+ agent products and frameworks, here's every type of human-in-the-loop moment mapped to your factory:

### 🔴 CRITICAL — Always Need You

| Moment | Factory Example | UI Pattern |
|--------|----------------|------------|
| **Deploy to Production** | Promoting an MCP to live | Modal overlay with deploy checklist |
| **API Key Entry** | Configuring Stripe/GHL credentials | Secure input form in chat |
| **Client Communication** | Sending deliverables to the $20k client | Preview + approve before send |
| **Pricing/Positioning** | Setting MCP marketplace pricing | Multi-choice card with tradeoffs |
| **Legal/License Review** | Checking dependency licenses | Sidebar review panel |

### 🟡 HIGH VALUE — Usually Need You

| Moment | Factory Example | UI Pattern |
|--------|----------------|------------|
| **Design Review** | Approving UI/UX for MCP apps | Side-by-side mockup comparison |
| **Code Quality Gate** | Reviewing generated MCP server code | Diff view with inline annotations |
| **Naming/Branding** | Naming a new MCP server | A/B choice between options |
| **Test Failure Triage** | GHL's 42 failing tests — fix or skip? | Error cards with suggested actions |
| **Priority Decisions** | Which MCP to advance next? | Drag-and-drop priority list |

### 🟢 CONTEXTUAL — Sometimes Need You

| Moment | Factory Example | UI Pattern |
|--------|----------------|------------|
| **Routine Approvals** | Stage advances for passing servers | Batch approve with exceptions |
| **Parameter Tuning** | Adjusting test coverage thresholds | Slider controls |
| **Edge Cases** | AI hit a wall building a tool | Escalation card with context |
| **Delegation** | Route task to specialized agent | Dropdown assignment |

### Smart Routing (Confidence-Based)
Not everything needs to block on you:
- **>90% confidence** → Auto-execute, log for async review
- **60-90% confidence** → Queue for review, pipeline continues other work
- **<60% confidence** → Block and escalate immediately

---

## 5. THE DECISION QUEUE — Your Mission Control

This is the centerpiece. A prioritized inbox of every decision the factory needs from you.

### Layout (In GooseFactory Desktop App)

```
┌─────────────────────────────────────────────────────────────┐
│  GooseFactory                                    [≡] [−] [×]│
├──────────────────┬──────────────────────────────────────────┤
│                  │                                          │
│  📥 DECISIONS (6)│  🔴 GHL MCP — Deploy to Production      │
│                  │                                          │
│  🔴 GHL Deploy   │  Pipeline: ghl-mcp-server                │
│  🟡 Stripe Review │  Stage: staging → production             │
│  🟡 3 Batch Items│  Tests: 47/47 ✅  Coverage: 94% ✅      │
│  🟢 2 FYI Items  │  Waiting: 2h 15m  SLA: ⚠️ 45m left     │
│                  │                                          │
│  ── Pipeline ──  │  Changes since last review:              │
│  [Kanban View]   │  + 12 files modified                     │
│                  │  + 3 new API endpoints                   │
│  ── Agents ──    │  + Edge case handling improved           │
│  🟢 Builder: OK  │                                          │
│  🟢 Tester: OK   │  [View Full Diff]  [Run Tests Again]    │
│  🟡 GHL: Waiting │                                          │
│                  │  ┌─────────┐ ┌─────────┐ ┌──────────┐   │
│  ── Stats ──     │  │✅ Deploy│ │❌ Reject│ │⏰ Defer  │   │
│  Today: 12 done  │  └─────────┘ └─────────┘ └──────────┘   │
│  Avg wait: 1.2h  │                                          │
│                  │  💬 Chat: "approve the GHL deploy"       │
│                  │  [________________________________] [⏎]  │
├──────────────────┴──────────────────────────────────────────┤
│  Chat: You can also just type naturally here...             │
│  > "what else needs my attention?"                          │
│  > "approve all low-risk items"                             │
│  > "show me the GHL test failures"                          │
└─────────────────────────────────────────────────────────────┘
```

### Key Features

1. **Left Sidebar: Decision Queue** — Priority-sorted, color-coded, with age timers
2. **Center: Context Panel** — Full details for the selected decision (diffs, metrics, history)
3. **Bottom: Chat** — Natural language interface to the factory ("approve all passing servers")
4. **One-Click Actions** — Approve, reject, defer, reassign, batch approve
5. **Keyboard Shortcuts** — `j/k` navigate, `a` approve, `r` reject, `d` defer
6. **SLA Indicators** — Glowing countdown timers, escalation warnings

---

## 6. MCP SERVER — The Brain

The Factory MCP Server is what makes the chat interface powerful. It exposes 11 tools, 6 resources, and 4 prompts.

### Tools (What You Can Do)

| Tool | What It Does | Example |
|------|-------------|---------|
| `factory_get_pending_tasks` | Your decision inbox | "what needs my attention?" |
| `factory_approve_task` | Approve and advance | "approve the GHL deploy" |
| `factory_reject_task` | Reject with feedback | "reject stripe review — needs more tests" |
| `factory_get_pipeline_status` | Pipeline overview | "show me all active pipelines" |
| `factory_advance_stage` | Manual stage advance | "move notion-mcp to testing" |
| `factory_assign_priority` | Set priority | "make GHL critical priority" |
| `factory_get_blockers` | What's stuck | "what's blocked and why?" |
| `factory_run_tests` | Trigger tests | "run tests on the stripe server" |
| `factory_deploy` | Deploy to env | "deploy freshdesk to staging" |
| `factory_search` | Search everything | "find all servers with auth issues" |
| `factory_create_pipeline` | New server pipeline | "start a new Zendesk MCP server" |

### Resources (What You Can Read)

| Resource | What It Provides |
|----------|-----------------|
| `factory://dashboard/summary` | High-level factory status |
| `factory://pipelines/{id}/state` | Specific pipeline details |
| `factory://servers/{name}/status` | Individual server health |
| `factory://pipelines/{id}/test-results` | Test results + coverage |
| `factory://pipelines/{id}/build-logs` | Build output |
| `factory://config/templates` | Available pipeline templates |

### Prompts (Structured Conversations)

| Prompt | What It Sets Up |
|--------|----------------|
| `review_server` | Pull all context for a full MCP server review |
| `whats_needs_attention` | Prioritized summary of everything pending |
| `deploy_checklist` | Pre-deployment verification checklist |
| `pipeline_retrospective` | Post-completion analysis and lessons learned |

---

## 7. NOTIFICATION ESCALATION — No Decision Falls Through

This is critical. The whole point is that you CANNOT miss something.

```
T+0min     Task created → Decision appears in GooseFactory queue
                        → Discord embed in #factory-tasks with buttons

T+30min    Reminder #1  → Discord DM + badge pulse in app
                        → "⏰ GHL deploy approval waiting 30m"

T+2h       Reminder #2  → Discord @mention + push notification
                        → "🟡 GHL deploy waiting 2h — SLA in 2h"

T+4h       SLA Warning  → Discord @here + sound alert in app
                        → "🔴 GHL deploy SLA breach imminent"

T+SLA      SLA Breach   → Auto-escalate: SMS + all channels
                        → "🚨 GHL deploy SLA BREACHED — action required"

T+SLA+2h   Critical     → Phone notification + auto-default to safest action
                        → Incident report logged
```

### Smart Batching
Instead of 10 separate pings:
```
📋 5 servers ready for review:
  ✅ freshdesk (low risk, tests pass)     [Approve]
  ✅ helpscout (low risk, tests pass)     [Approve]
  ✅ close (low risk, tests pass)         [Approve]
  ⚠️ stripe (med risk, 1 warning)        [Review]
  ❌ ghl (high risk, 42 failures)        [Review Required]

[Approve All Low-Risk (3)] [Review All]
```

---

## 8. EVERY UI PATTERN MAPPED

Based on research across Devin, Cursor, GitHub Copilot Workspace, n8n, Retool, and 20+ other products:

### Pattern → When to Use

| Pattern | Best For | Our Implementation |
|---------|----------|-------------------|
| **Inline Chat Buttons** | Quick approve/reject | Approve/reject buttons in chat messages |
| **Modal Overlay** | Critical/irreversible actions | Production deploy confirmation (type "DEPLOY" to confirm) |
| **Sidebar Panel** | Code/asset review | Diff viewer alongside approval context |
| **Decision Queue** | Managing multiple pending items | Left sidebar in GooseFactory |
| **Kanban Board** | Pipeline stage visualization | Pipeline view tab |
| **Batch Processor** | Many similar decisions | "Approve all matching criteria" |
| **Progress Dashboard** | Long-running agent monitoring | Agent status panel |
| **Run Contract** | Pre-approving expensive operations | "This will use ~$50 in API calls, take ~4h" |
| **Mobile Quick Actions** | Approvals on the go | Push notification with swipe actions |
| **Discord Embeds** | Team visibility + async approval | Rich embeds with buttons in factory channels |
| **MCP Apps** | Complex interactive reviews | Custom HTML UIs rendered in chat (code review, forms) |

---

## 9. TECH STACK

| Layer | Technology | Why |
|-------|-----------|-----|
| **Desktop App** | Forked Goose (Electron + React 19 + Rust) | Best-in-class MCP host, extensible UI |
| **Backend API** | Node.js + Hono | Fast, lightweight, TypeScript-native |
| **Database** | PostgreSQL (Neon/Supabase) | Proven, JSONB support, great for state machines |
| **Cache/Events** | Redis (Upstash) | Pub/sub, streams, fast queue |
| **Object Storage** | Cloudflare R2 | S3-compatible, no egress fees |
| **MCP Server** | TypeScript + @modelcontextprotocol/sdk | Native MCP, stdio + SSE transport |
| **State Machine** | XState-inspired patterns | Explicit states, SLA timers, auto-escalation |
| **Orchestration** | Inngest (step.waitForEvent) | Durable execution, event correlation, timeouts |
| **Discord Bot** | discord.js | Buttons, embeds, modals, slash commands |
| **Auth** | JWT + API keys | Simple, stateless, scoped |
| **CI/CD** | GitHub Actions | Existing infra, dispatch triggers |

### Why Inngest over Temporal?
- **Simpler** — No separate server cluster to manage
- **TypeScript-native** — Matches our stack
- **Event matching** — `waitForEvent` with correlation is exactly our approval pattern
- **Serverless** — Functions dehydrate while waiting, no resource consumption
- Temporal is more powerful but overkill for our scale right now. Can migrate later if needed.

---

## 10. DATABASE SCHEMA (Key Tables)

```
pipelines          — One per MCP server build
├── pipeline_stages — Stage definitions + state machine
├── tasks          — Human decisions needed (the queue)
├── approvals      — Formal gate approvals
├── assets         — Generated code, configs, builds
└── audit_log      — Immutable event log

agents             — AI workers + build agents
notifications      — Multi-channel notification queue
```

8 tables total. Full SQL DDL in `research-factory-api-architecture.md`.

---

## 11. IMPLEMENTATION ROADMAP

### Phase 1: Foundation (Week 1-2) — "The Skeleton"
- [ ] Fork Goose, rebrand basics (name, logo, protocol)
- [ ] Set up PostgreSQL schema + migrations
- [ ] Core REST API (pipelines, tasks, approvals CRUD)
- [ ] JWT auth
- [ ] Basic audit logging
- [ ] **Deliverable:** API accepts requests, data persists

### Phase 2: MCP Server + Real-Time (Week 3-4) — "The Brain"
- [ ] Factory MCP server with core tools (get_pending, approve, reject, status)
- [ ] MCP resources (pipeline state, dashboard summary)
- [ ] WebSocket server for real-time dashboard updates
- [ ] Redis event bus with consumer groups
- [ ] Wire MCP server into GooseFactory as built-in extension
- [ ] **Deliverable:** "What needs my attention?" works in chat

### Phase 3: Decision Queue UI (Week 5-6) — "The Centerpiece"
- [ ] Decision queue sidebar in GooseFactory React UI
- [ ] Context panel with diffs, metrics, history
- [ ] One-click approve/reject/defer actions
- [ ] Keyboard shortcuts (j/k/a/r/d)
- [ ] Pipeline kanban view
- [ ] SLA countdown indicators
- [ ] **Deliverable:** Full Command Center in desktop app

### Phase 4: Notifications + Discord (Week 7-8) — "The Nagger"
- [ ] Discord bot bridge with rich embeds + buttons
- [ ] Escalation ladder (queue → DM → mention → SMS)
- [ ] Smart batching for similar decisions
- [ ] Mobile push notifications
- [ ] SLA monitoring and auto-escalation
- [ ] GitHub webhook integration
- [ ] **Deliverable:** Decisions come to you, not the other way around

### Phase 5: Advanced Features (Week 9-10) — "The Polish"
- [ ] MCP Apps for complex reviews (code diffs, forms in chat)
- [ ] Batch approval processor
- [ ] MCP prompts (review, deploy checklist, retrospective)
- [ ] Analytics dashboard (decision velocity, bottleneck analysis)
- [ ] Confidence-based auto-routing
- [ ] Undo/rollback for 24h post-approval
- [ ] **Deliverable:** Full SaaS-grade product

### Phase 6: SaaS-ify (Week 11-12) — "The Product"
- [ ] Multi-tenant support (separate factory instances)
- [ ] User management + team roles
- [ ] Billing integration
- [ ] Landing page + docs
- [ ] Onboarding flow
- [ ] **Deliverable:** Sellable product

---

## 12. WHAT MAKES THIS DIFFERENT FROM EXISTING TOOLS

| Tool | What It Does | What We Do Better |
|------|-------------|------------------|
| **Devin** | Autonomous coding agent | We're a factory MANAGER, not a single agent |
| **Cursor/Windsurf** | IDE with AI | We manage pipelines of 64+ servers, not single files |
| **n8n/Zapier** | Workflow automation | We're AI-agent-native with MCP, not just webhooks |
| **Linear/Jira** | Project management | We have AI agents doing the work, humans just decide |
| **Retool** | Internal tools | We're purpose-built for AI agent factories |
| **Goose (vanilla)** | General AI assistant | We're a specialized factory operator |

**The unique value:** No one has built a purpose-built human-in-the-loop command center specifically for managing fleets of AI agents building MCP servers. You'd be first.

---

## 13. IMMEDIATE NEXT STEPS

1. **Jake reviews this plan** — What's missing? What's wrong? What's the priority?
2. **Fork Goose** — Clone, rebrand, get building running locally
3. **Spike the MCP Server** — Build the 3 most critical tools (get_pending, approve, reject) and test in Goose
4. **Spike the Decision Queue UI** — Mockup the sidebar in GooseFactory's React app
5. **Wire to existing pipeline** — Connect to `mcp-command-center/state.json` as initial data source

**The MVP is:** Type "what needs my attention?" in GooseFactory → get a prioritized list → approve/reject from chat. Everything else builds on that.

---

## SUPPORTING RESEARCH DOCS

| Doc | Words | Focus |
|-----|-------|-------|
| `research-goose-architecture.md` | ~3,000 | Goose codebase, fork strategy, MCP integration |
| `research-hitl-ux-patterns.md` | ~5,500 | Every HITL interaction type, UI patterns, 10 products analyzed |
| `research-factory-api-architecture.md` | ~4,000 | API design, MCP server spec, database schema, real-time events |
| `research-agent-orchestration-patterns.md` | ~3,500 | LangGraph, Temporal, Inngest, state machines, notification patterns |

---

*"The best interface for managing AI agents isn't more AI — it's making it painfully obvious when a human needs to do something, and making that something take one click."*