# MASTER PLAN: Interactive Agent Factory SaaS ## Codename: "GooseFactory" — Your AI Factory, Your Rules > **Author:** Buba (synthesized from 4 specialized research agents) > **Date:** 2026-02-06 > **Status:** PLAN — Awaiting Jake's Review > **Supporting Research:** 4 docs, ~15,000 words, 60+ sources --- ## TL;DR — The 30-Second Pitch Fork Goose (Block's open-source AI agent). Gut its chat UI. Wire in a **Factory Command Center** — a decision queue, pipeline kanban, and approval system that makes it painfully obvious when YOU are the bottleneck. The backend is an API + MCP server that exposes every factory operation as a conversational tool. You literally type "what needs my attention?" and get a prioritized list with one-click approve/reject. Everything you don't touch auto-advances. Everything that needs you screams at you until you act. --- ## 1. WHY THIS MATTERS Right now the pipeline has ~64 MCP servers across 8 stages. The bottleneck isn't the AI — it's **you not knowing what's stuck on you**. The current system (Discord channels + cron heartbeats + manual checks) is passive. You have to go looking for what needs attention. That's backwards. **The fix:** Build a system where decisions come to YOU, not the other way around. Make human-in-the-loop a first-class experience, not an afterthought. --- ## 2. ARCHITECTURE OVERVIEW ``` ┌─────────────────────────────────────────────────────────────────┐ │ YOUR INTERFACE LAYER │ │ │ │ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ GooseFactory │ │ Discord Bot │ │ Mobile │ │ │ │ Desktop App │ │ (Buttons + │ │ Push Notifs │ │ │ │ (Forked │ │ Embeds) │ │ (Quick │ │ │ │ Goose) │ │ │ │ Approve) │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ │ └──────────────────┼──────────────────┘ │ │ │ │ │ ┌─────────────────────────▼─────────────────────────────┐ │ │ │ MCP Server (Factory Operations) │ │ │ │ 11 Tools · 6 Resources · 4 Prompts │ │ │ │ "what needs attention?" → prioritized decision queue │ │ │ └─────────────────────────┬─────────────────────────────┘ │ └────────────────────────────┼────────────────────────────────────┘ │ ┌────────────────────────────┼────────────────────────────────────┐ │ ▼ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Factory API (REST + WebSocket) │ │ │ │ 30+ endpoints · Real-time events · GraphQL queries │ │ │ └─────────────────────────┬───────────────────────────┘ │ │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │Pipeline │ │Task │ │Notif + │ │Audit │ │ │ │Engine │ │Queue │ │Escalation│ │Logger │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │PostgreSQL│ │Redis │ │S3/R2 │ │ │ │(State) │ │(Events) │ │(Assets) │ │ │ └──────────┘ └──────────┘ └──────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## 3. THE GOOSE FORK — "GooseFactory" ### Why Goose? - **Rust backend + Electron/React frontend** — production-grade, fast - **Apache 2.0 license** — full commercial freedom, no copyleft - **MCP-native** — already a first-class MCP host with dynamic extension discovery - **Built-in permission system** — 4 modes including Smart Approval (risk-based) - **Extension ecosystem** — thousands of MCP servers plug in immediately - **Active community** — but now under Linux Foundation (AAIF), so stable governance ### What We Change | Component | Current Goose | GooseFactory | |-----------|--------------|--------------| | **Branding** | Goose logos, `goose://` protocol | Your brand, `factory://` protocol | | **Default Extensions** | Developer, Memory, etc. | Factory MCP Server (built-in), Pipeline Manager | | **Chat UI** | General-purpose assistant | Factory Command Center with decision queue sidebar | | **Approval Flow** | Simple allow/deny on tool calls | Rich approval cards with context, diffs, metrics | | **System Prompts** | Generic agent instructions | Factory operator mode — knows about pipeline stages, MCPs | | **MCP UI Rendering** | Basic inline/sidecar (WIP) | Custom approval UIs, pipeline dashboards, code review panels | | **Protocol Handler** | `goose://extension?...` | `factory://approve?task_id=...` deep links | ### Fork Strategy 1. **Clone the repo** — `git clone https://github.com/block/goose GooseFactory` 2. **Rebrand** — `package.json`, `main.ts`, assets, protocol handler (~1-2 days) 3. **Add Factory MCP Server** as a built-in Rust extension in `crates/goose-mcp/` 4. **Customize the chat UI** — Add decision queue sidebar in React (the interesting part) 5. **Add MCP UI components** — Custom approval cards using `@mcp-ui/client` 6. **Configure Smart Approval** — Factory operations auto-classified by risk level ### ⚠️ Timing Risk Goose is actively migrating to ACP (Agent Communication Protocol) — Issue #6642. This replaces the backend REST+SSE with JSON-RPC 2.0. **Recommendation:** Fork AFTER the ACP migration lands (or fork now and track upstream). The migration affects `goosed` ↔ desktop communication. --- ## 4. EVERY MOMENT YOU'RE NEEDED (Taxonomy) Based on research across 10+ agent products and frameworks, here's every type of human-in-the-loop moment mapped to your factory: ### 🔴 CRITICAL — Always Need You | Moment | Factory Example | UI Pattern | |--------|----------------|------------| | **Deploy to Production** | Promoting an MCP to live | Modal overlay with deploy checklist | | **API Key Entry** | Configuring Stripe/GHL credentials | Secure input form in chat | | **Client Communication** | Sending deliverables to the $20k client | Preview + approve before send | | **Pricing/Positioning** | Setting MCP marketplace pricing | Multi-choice card with tradeoffs | | **Legal/License Review** | Checking dependency licenses | Sidebar review panel | ### 🟡 HIGH VALUE — Usually Need You | Moment | Factory Example | UI Pattern | |--------|----------------|------------| | **Design Review** | Approving UI/UX for MCP apps | Side-by-side mockup comparison | | **Code Quality Gate** | Reviewing generated MCP server code | Diff view with inline annotations | | **Naming/Branding** | Naming a new MCP server | A/B choice between options | | **Test Failure Triage** | GHL's 42 failing tests — fix or skip? | Error cards with suggested actions | | **Priority Decisions** | Which MCP to advance next? | Drag-and-drop priority list | ### 🟢 CONTEXTUAL — Sometimes Need You | Moment | Factory Example | UI Pattern | |--------|----------------|------------| | **Routine Approvals** | Stage advances for passing servers | Batch approve with exceptions | | **Parameter Tuning** | Adjusting test coverage thresholds | Slider controls | | **Edge Cases** | AI hit a wall building a tool | Escalation card with context | | **Delegation** | Route task to specialized agent | Dropdown assignment | ### Smart Routing (Confidence-Based) Not everything needs to block on you: - **>90% confidence** → Auto-execute, log for async review - **60-90% confidence** → Queue for review, pipeline continues other work - **<60% confidence** → Block and escalate immediately --- ## 5. THE DECISION QUEUE — Your Mission Control This is the centerpiece. A prioritized inbox of every decision the factory needs from you. ### Layout (In GooseFactory Desktop App) ``` ┌─────────────────────────────────────────────────────────────┐ │ GooseFactory [≡] [−] [×]│ ├──────────────────┬──────────────────────────────────────────┤ │ │ │ │ 📥 DECISIONS (6)│ 🔴 GHL MCP — Deploy to Production │ │ │ │ │ 🔴 GHL Deploy │ Pipeline: ghl-mcp-server │ │ 🟡 Stripe Review │ Stage: staging → production │ │ 🟡 3 Batch Items│ Tests: 47/47 ✅ Coverage: 94% ✅ │ │ 🟢 2 FYI Items │ Waiting: 2h 15m SLA: ⚠️ 45m left │ │ │ │ │ ── Pipeline ── │ Changes since last review: │ │ [Kanban View] │ + 12 files modified │ │ │ + 3 new API endpoints │ │ ── Agents ── │ + Edge case handling improved │ │ 🟢 Builder: OK │ │ │ 🟢 Tester: OK │ [View Full Diff] [Run Tests Again] │ │ 🟡 GHL: Waiting │ │ │ │ ┌─────────┐ ┌─────────┐ ┌──────────┐ │ │ ── Stats ── │ │✅ Deploy│ │❌ Reject│ │⏰ Defer │ │ │ Today: 12 done │ └─────────┘ └─────────┘ └──────────┘ │ │ Avg wait: 1.2h │ │ │ │ 💬 Chat: "approve the GHL deploy" │ │ │ [________________________________] [⏎] │ ├──────────────────┴──────────────────────────────────────────┤ │ Chat: You can also just type naturally here... │ │ > "what else needs my attention?" │ │ > "approve all low-risk items" │ │ > "show me the GHL test failures" │ └─────────────────────────────────────────────────────────────┘ ``` ### Key Features 1. **Left Sidebar: Decision Queue** — Priority-sorted, color-coded, with age timers 2. **Center: Context Panel** — Full details for the selected decision (diffs, metrics, history) 3. **Bottom: Chat** — Natural language interface to the factory ("approve all passing servers") 4. **One-Click Actions** — Approve, reject, defer, reassign, batch approve 5. **Keyboard Shortcuts** — `j/k` navigate, `a` approve, `r` reject, `d` defer 6. **SLA Indicators** — Glowing countdown timers, escalation warnings --- ## 6. MCP SERVER — The Brain The Factory MCP Server is what makes the chat interface powerful. It exposes 11 tools, 6 resources, and 4 prompts. ### Tools (What You Can Do) | Tool | What It Does | Example | |------|-------------|---------| | `factory_get_pending_tasks` | Your decision inbox | "what needs my attention?" | | `factory_approve_task` | Approve and advance | "approve the GHL deploy" | | `factory_reject_task` | Reject with feedback | "reject stripe review — needs more tests" | | `factory_get_pipeline_status` | Pipeline overview | "show me all active pipelines" | | `factory_advance_stage` | Manual stage advance | "move notion-mcp to testing" | | `factory_assign_priority` | Set priority | "make GHL critical priority" | | `factory_get_blockers` | What's stuck | "what's blocked and why?" | | `factory_run_tests` | Trigger tests | "run tests on the stripe server" | | `factory_deploy` | Deploy to env | "deploy freshdesk to staging" | | `factory_search` | Search everything | "find all servers with auth issues" | | `factory_create_pipeline` | New server pipeline | "start a new Zendesk MCP server" | ### Resources (What You Can Read) | Resource | What It Provides | |----------|-----------------| | `factory://dashboard/summary` | High-level factory status | | `factory://pipelines/{id}/state` | Specific pipeline details | | `factory://servers/{name}/status` | Individual server health | | `factory://pipelines/{id}/test-results` | Test results + coverage | | `factory://pipelines/{id}/build-logs` | Build output | | `factory://config/templates` | Available pipeline templates | ### Prompts (Structured Conversations) | Prompt | What It Sets Up | |--------|----------------| | `review_server` | Pull all context for a full MCP server review | | `whats_needs_attention` | Prioritized summary of everything pending | | `deploy_checklist` | Pre-deployment verification checklist | | `pipeline_retrospective` | Post-completion analysis and lessons learned | --- ## 7. NOTIFICATION ESCALATION — No Decision Falls Through This is critical. The whole point is that you CANNOT miss something. ``` T+0min Task created → Decision appears in GooseFactory queue → Discord embed in #factory-tasks with buttons T+30min Reminder #1 → Discord DM + badge pulse in app → "⏰ GHL deploy approval waiting 30m" T+2h Reminder #2 → Discord @mention + push notification → "🟡 GHL deploy waiting 2h — SLA in 2h" T+4h SLA Warning → Discord @here + sound alert in app → "🔴 GHL deploy SLA breach imminent" T+SLA SLA Breach → Auto-escalate: SMS + all channels → "🚨 GHL deploy SLA BREACHED — action required" T+SLA+2h Critical → Phone notification + auto-default to safest action → Incident report logged ``` ### Smart Batching Instead of 10 separate pings: ``` 📋 5 servers ready for review: ✅ freshdesk (low risk, tests pass) [Approve] ✅ helpscout (low risk, tests pass) [Approve] ✅ close (low risk, tests pass) [Approve] ⚠️ stripe (med risk, 1 warning) [Review] ❌ ghl (high risk, 42 failures) [Review Required] [Approve All Low-Risk (3)] [Review All] ``` --- ## 8. EVERY UI PATTERN MAPPED Based on research across Devin, Cursor, GitHub Copilot Workspace, n8n, Retool, and 20+ other products: ### Pattern → When to Use | Pattern | Best For | Our Implementation | |---------|----------|-------------------| | **Inline Chat Buttons** | Quick approve/reject | Approve/reject buttons in chat messages | | **Modal Overlay** | Critical/irreversible actions | Production deploy confirmation (type "DEPLOY" to confirm) | | **Sidebar Panel** | Code/asset review | Diff viewer alongside approval context | | **Decision Queue** | Managing multiple pending items | Left sidebar in GooseFactory | | **Kanban Board** | Pipeline stage visualization | Pipeline view tab | | **Batch Processor** | Many similar decisions | "Approve all matching criteria" | | **Progress Dashboard** | Long-running agent monitoring | Agent status panel | | **Run Contract** | Pre-approving expensive operations | "This will use ~$50 in API calls, take ~4h" | | **Mobile Quick Actions** | Approvals on the go | Push notification with swipe actions | | **Discord Embeds** | Team visibility + async approval | Rich embeds with buttons in factory channels | | **MCP Apps** | Complex interactive reviews | Custom HTML UIs rendered in chat (code review, forms) | --- ## 9. TECH STACK | Layer | Technology | Why | |-------|-----------|-----| | **Desktop App** | Forked Goose (Electron + React 19 + Rust) | Best-in-class MCP host, extensible UI | | **Backend API** | Node.js + Hono | Fast, lightweight, TypeScript-native | | **Database** | PostgreSQL (Neon/Supabase) | Proven, JSONB support, great for state machines | | **Cache/Events** | Redis (Upstash) | Pub/sub, streams, fast queue | | **Object Storage** | Cloudflare R2 | S3-compatible, no egress fees | | **MCP Server** | TypeScript + @modelcontextprotocol/sdk | Native MCP, stdio + SSE transport | | **State Machine** | XState-inspired patterns | Explicit states, SLA timers, auto-escalation | | **Orchestration** | Inngest (step.waitForEvent) | Durable execution, event correlation, timeouts | | **Discord Bot** | discord.js | Buttons, embeds, modals, slash commands | | **Auth** | JWT + API keys | Simple, stateless, scoped | | **CI/CD** | GitHub Actions | Existing infra, dispatch triggers | ### Why Inngest over Temporal? - **Simpler** — No separate server cluster to manage - **TypeScript-native** — Matches our stack - **Event matching** — `waitForEvent` with correlation is exactly our approval pattern - **Serverless** — Functions dehydrate while waiting, no resource consumption - Temporal is more powerful but overkill for our scale right now. Can migrate later if needed. --- ## 10. DATABASE SCHEMA (Key Tables) ``` pipelines — One per MCP server build ├── pipeline_stages — Stage definitions + state machine ├── tasks — Human decisions needed (the queue) ├── approvals — Formal gate approvals ├── assets — Generated code, configs, builds └── audit_log — Immutable event log agents — AI workers + build agents notifications — Multi-channel notification queue ``` 8 tables total. Full SQL DDL in `research-factory-api-architecture.md`. --- ## 11. IMPLEMENTATION ROADMAP ### Phase 1: Foundation (Week 1-2) — "The Skeleton" - [ ] Fork Goose, rebrand basics (name, logo, protocol) - [ ] Set up PostgreSQL schema + migrations - [ ] Core REST API (pipelines, tasks, approvals CRUD) - [ ] JWT auth - [ ] Basic audit logging - [ ] **Deliverable:** API accepts requests, data persists ### Phase 2: MCP Server + Real-Time (Week 3-4) — "The Brain" - [ ] Factory MCP server with core tools (get_pending, approve, reject, status) - [ ] MCP resources (pipeline state, dashboard summary) - [ ] WebSocket server for real-time dashboard updates - [ ] Redis event bus with consumer groups - [ ] Wire MCP server into GooseFactory as built-in extension - [ ] **Deliverable:** "What needs my attention?" works in chat ### Phase 3: Decision Queue UI (Week 5-6) — "The Centerpiece" - [ ] Decision queue sidebar in GooseFactory React UI - [ ] Context panel with diffs, metrics, history - [ ] One-click approve/reject/defer actions - [ ] Keyboard shortcuts (j/k/a/r/d) - [ ] Pipeline kanban view - [ ] SLA countdown indicators - [ ] **Deliverable:** Full Command Center in desktop app ### Phase 4: Notifications + Discord (Week 7-8) — "The Nagger" - [ ] Discord bot bridge with rich embeds + buttons - [ ] Escalation ladder (queue → DM → mention → SMS) - [ ] Smart batching for similar decisions - [ ] Mobile push notifications - [ ] SLA monitoring and auto-escalation - [ ] GitHub webhook integration - [ ] **Deliverable:** Decisions come to you, not the other way around ### Phase 5: Advanced Features (Week 9-10) — "The Polish" - [ ] MCP Apps for complex reviews (code diffs, forms in chat) - [ ] Batch approval processor - [ ] MCP prompts (review, deploy checklist, retrospective) - [ ] Analytics dashboard (decision velocity, bottleneck analysis) - [ ] Confidence-based auto-routing - [ ] Undo/rollback for 24h post-approval - [ ] **Deliverable:** Full SaaS-grade product ### Phase 6: SaaS-ify (Week 11-12) — "The Product" - [ ] Multi-tenant support (separate factory instances) - [ ] User management + team roles - [ ] Billing integration - [ ] Landing page + docs - [ ] Onboarding flow - [ ] **Deliverable:** Sellable product --- ## 12. WHAT MAKES THIS DIFFERENT FROM EXISTING TOOLS | Tool | What It Does | What We Do Better | |------|-------------|------------------| | **Devin** | Autonomous coding agent | We're a factory MANAGER, not a single agent | | **Cursor/Windsurf** | IDE with AI | We manage pipelines of 64+ servers, not single files | | **n8n/Zapier** | Workflow automation | We're AI-agent-native with MCP, not just webhooks | | **Linear/Jira** | Project management | We have AI agents doing the work, humans just decide | | **Retool** | Internal tools | We're purpose-built for AI agent factories | | **Goose (vanilla)** | General AI assistant | We're a specialized factory operator | **The unique value:** No one has built a purpose-built human-in-the-loop command center specifically for managing fleets of AI agents building MCP servers. You'd be first. --- ## 13. IMMEDIATE NEXT STEPS 1. **Jake reviews this plan** — What's missing? What's wrong? What's the priority? 2. **Fork Goose** — Clone, rebrand, get building running locally 3. **Spike the MCP Server** — Build the 3 most critical tools (get_pending, approve, reject) and test in Goose 4. **Spike the Decision Queue UI** — Mockup the sidebar in GooseFactory's React app 5. **Wire to existing pipeline** — Connect to `mcp-command-center/state.json` as initial data source **The MVP is:** Type "what needs my attention?" in GooseFactory → get a prioritized list → approve/reject from chat. Everything else builds on that. --- ## SUPPORTING RESEARCH DOCS | Doc | Words | Focus | |-----|-------|-------| | `research-goose-architecture.md` | ~3,000 | Goose codebase, fork strategy, MCP integration | | `research-hitl-ux-patterns.md` | ~5,500 | Every HITL interaction type, UI patterns, 10 products analyzed | | `research-factory-api-architecture.md` | ~4,000 | API design, MCP server spec, database schema, real-time events | | `research-agent-orchestration-patterns.md` | ~3,500 | LangGraph, Temporal, Inngest, state machines, notification patterns | --- *"The best interface for managing AI agents isn't more AI — it's making it painfully obvious when a human needs to do something, and making that something take one click."*