24 KiB
MASTER PLAN: Interactive Agent Factory SaaS
Codename: "GooseFactory" — Your AI Factory, Your Rules
Author: Buba (synthesized from 4 specialized research agents)
Date: 2026-02-06
Status: PLAN — Awaiting Jake's Review
Supporting Research: 4 docs, ~15,000 words, 60+ sources
TL;DR — The 30-Second Pitch
Fork Goose (Block's open-source AI agent). Gut its chat UI. Wire in a Factory Command Center — a decision queue, pipeline kanban, and approval system that makes it painfully obvious when YOU are the bottleneck. The backend is an API + MCP server that exposes every factory operation as a conversational tool. You literally type "what needs my attention?" and get a prioritized list with one-click approve/reject. Everything you don't touch auto-advances. Everything that needs you screams at you until you act.
1. WHY THIS MATTERS
Right now the pipeline has ~64 MCP servers across 8 stages. The bottleneck isn't the AI — it's you not knowing what's stuck on you. The current system (Discord channels + cron heartbeats + manual checks) is passive. You have to go looking for what needs attention. That's backwards.
The fix: Build a system where decisions come to YOU, not the other way around. Make human-in-the-loop a first-class experience, not an afterthought.
2. ARCHITECTURE OVERVIEW
┌─────────────────────────────────────────────────────────────────┐
│ YOUR INTERFACE LAYER │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ GooseFactory │ │ Discord Bot │ │ Mobile │ │
│ │ Desktop App │ │ (Buttons + │ │ Push Notifs │ │
│ │ (Forked │ │ Embeds) │ │ (Quick │ │
│ │ Goose) │ │ │ │ Approve) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └──────────────────┼──────────────────┘ │
│ │ │
│ ┌─────────────────────────▼─────────────────────────────┐ │
│ │ MCP Server (Factory Operations) │ │
│ │ 11 Tools · 6 Resources · 4 Prompts │ │
│ │ "what needs attention?" → prioritized decision queue │ │
│ └─────────────────────────┬─────────────────────────────┘ │
└────────────────────────────┼────────────────────────────────────┘
│
┌────────────────────────────┼────────────────────────────────────┐
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Factory API (REST + WebSocket) │ │
│ │ 30+ endpoints · Real-time events · GraphQL queries │ │
│ └─────────────────────────┬───────────────────────────┘ │
│ │ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Pipeline │ │Task │ │Notif + │ │Audit │ │
│ │Engine │ │Queue │ │Escalation│ │Logger │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │PostgreSQL│ │Redis │ │S3/R2 │ │
│ │(State) │ │(Events) │ │(Assets) │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
3. THE GOOSE FORK — "GooseFactory"
Why Goose?
- Rust backend + Electron/React frontend — production-grade, fast
- Apache 2.0 license — full commercial freedom, no copyleft
- MCP-native — already a first-class MCP host with dynamic extension discovery
- Built-in permission system — 4 modes including Smart Approval (risk-based)
- Extension ecosystem — thousands of MCP servers plug in immediately
- Active community — but now under Linux Foundation (AAIF), so stable governance
What We Change
| Component | Current Goose | GooseFactory |
|---|---|---|
| Branding | Goose logos, goose:// protocol |
Your brand, factory:// protocol |
| Default Extensions | Developer, Memory, etc. | Factory MCP Server (built-in), Pipeline Manager |
| Chat UI | General-purpose assistant | Factory Command Center with decision queue sidebar |
| Approval Flow | Simple allow/deny on tool calls | Rich approval cards with context, diffs, metrics |
| System Prompts | Generic agent instructions | Factory operator mode — knows about pipeline stages, MCPs |
| MCP UI Rendering | Basic inline/sidecar (WIP) | Custom approval UIs, pipeline dashboards, code review panels |
| Protocol Handler | goose://extension?... |
factory://approve?task_id=... deep links |
Fork Strategy
- Clone the repo —
git clone https://github.com/block/goose GooseFactory - Rebrand —
package.json,main.ts, assets, protocol handler (~1-2 days) - Add Factory MCP Server as a built-in Rust extension in
crates/goose-mcp/ - Customize the chat UI — Add decision queue sidebar in React (the interesting part)
- Add MCP UI components — Custom approval cards using
@mcp-ui/client - Configure Smart Approval — Factory operations auto-classified by risk level
⚠️ Timing Risk
Goose is actively migrating to ACP (Agent Communication Protocol) — Issue #6642. This replaces the backend REST+SSE with JSON-RPC 2.0. Recommendation: Fork AFTER the ACP migration lands (or fork now and track upstream). The migration affects goosed ↔ desktop communication.
4. EVERY MOMENT YOU'RE NEEDED (Taxonomy)
Based on research across 10+ agent products and frameworks, here's every type of human-in-the-loop moment mapped to your factory:
🔴 CRITICAL — Always Need You
| Moment | Factory Example | UI Pattern |
|---|---|---|
| Deploy to Production | Promoting an MCP to live | Modal overlay with deploy checklist |
| API Key Entry | Configuring Stripe/GHL credentials | Secure input form in chat |
| Client Communication | Sending deliverables to the $20k client | Preview + approve before send |
| Pricing/Positioning | Setting MCP marketplace pricing | Multi-choice card with tradeoffs |
| Legal/License Review | Checking dependency licenses | Sidebar review panel |
🟡 HIGH VALUE — Usually Need You
| Moment | Factory Example | UI Pattern |
|---|---|---|
| Design Review | Approving UI/UX for MCP apps | Side-by-side mockup comparison |
| Code Quality Gate | Reviewing generated MCP server code | Diff view with inline annotations |
| Naming/Branding | Naming a new MCP server | A/B choice between options |
| Test Failure Triage | GHL's 42 failing tests — fix or skip? | Error cards with suggested actions |
| Priority Decisions | Which MCP to advance next? | Drag-and-drop priority list |
🟢 CONTEXTUAL — Sometimes Need You
| Moment | Factory Example | UI Pattern |
|---|---|---|
| Routine Approvals | Stage advances for passing servers | Batch approve with exceptions |
| Parameter Tuning | Adjusting test coverage thresholds | Slider controls |
| Edge Cases | AI hit a wall building a tool | Escalation card with context |
| Delegation | Route task to specialized agent | Dropdown assignment |
Smart Routing (Confidence-Based)
Not everything needs to block on you:
- >90% confidence → Auto-execute, log for async review
- 60-90% confidence → Queue for review, pipeline continues other work
- <60% confidence → Block and escalate immediately
5. THE DECISION QUEUE — Your Mission Control
This is the centerpiece. A prioritized inbox of every decision the factory needs from you.
Layout (In GooseFactory Desktop App)
┌─────────────────────────────────────────────────────────────┐
│ GooseFactory [≡] [−] [×]│
├──────────────────┬──────────────────────────────────────────┤
│ │ │
│ 📥 DECISIONS (6)│ 🔴 GHL MCP — Deploy to Production │
│ │ │
│ 🔴 GHL Deploy │ Pipeline: ghl-mcp-server │
│ 🟡 Stripe Review │ Stage: staging → production │
│ 🟡 3 Batch Items│ Tests: 47/47 ✅ Coverage: 94% ✅ │
│ 🟢 2 FYI Items │ Waiting: 2h 15m SLA: ⚠️ 45m left │
│ │ │
│ ── Pipeline ── │ Changes since last review: │
│ [Kanban View] │ + 12 files modified │
│ │ + 3 new API endpoints │
│ ── Agents ── │ + Edge case handling improved │
│ 🟢 Builder: OK │ │
│ 🟢 Tester: OK │ [View Full Diff] [Run Tests Again] │
│ 🟡 GHL: Waiting │ │
│ │ ┌─────────┐ ┌─────────┐ ┌──────────┐ │
│ ── Stats ── │ │✅ Deploy│ │❌ Reject│ │⏰ Defer │ │
│ Today: 12 done │ └─────────┘ └─────────┘ └──────────┘ │
│ Avg wait: 1.2h │ │
│ │ 💬 Chat: "approve the GHL deploy" │
│ │ [________________________________] [⏎] │
├──────────────────┴──────────────────────────────────────────┤
│ Chat: You can also just type naturally here... │
│ > "what else needs my attention?" │
│ > "approve all low-risk items" │
│ > "show me the GHL test failures" │
└─────────────────────────────────────────────────────────────┘
Key Features
- Left Sidebar: Decision Queue — Priority-sorted, color-coded, with age timers
- Center: Context Panel — Full details for the selected decision (diffs, metrics, history)
- Bottom: Chat — Natural language interface to the factory ("approve all passing servers")
- One-Click Actions — Approve, reject, defer, reassign, batch approve
- Keyboard Shortcuts —
j/knavigate,aapprove,rreject,ddefer - SLA Indicators — Glowing countdown timers, escalation warnings
6. MCP SERVER — The Brain
The Factory MCP Server is what makes the chat interface powerful. It exposes 11 tools, 6 resources, and 4 prompts.
Tools (What You Can Do)
| Tool | What It Does | Example |
|---|---|---|
factory_get_pending_tasks |
Your decision inbox | "what needs my attention?" |
factory_approve_task |
Approve and advance | "approve the GHL deploy" |
factory_reject_task |
Reject with feedback | "reject stripe review — needs more tests" |
factory_get_pipeline_status |
Pipeline overview | "show me all active pipelines" |
factory_advance_stage |
Manual stage advance | "move notion-mcp to testing" |
factory_assign_priority |
Set priority | "make GHL critical priority" |
factory_get_blockers |
What's stuck | "what's blocked and why?" |
factory_run_tests |
Trigger tests | "run tests on the stripe server" |
factory_deploy |
Deploy to env | "deploy freshdesk to staging" |
factory_search |
Search everything | "find all servers with auth issues" |
factory_create_pipeline |
New server pipeline | "start a new Zendesk MCP server" |
Resources (What You Can Read)
| Resource | What It Provides |
|---|---|
factory://dashboard/summary |
High-level factory status |
factory://pipelines/{id}/state |
Specific pipeline details |
factory://servers/{name}/status |
Individual server health |
factory://pipelines/{id}/test-results |
Test results + coverage |
factory://pipelines/{id}/build-logs |
Build output |
factory://config/templates |
Available pipeline templates |
Prompts (Structured Conversations)
| Prompt | What It Sets Up |
|---|---|
review_server |
Pull all context for a full MCP server review |
whats_needs_attention |
Prioritized summary of everything pending |
deploy_checklist |
Pre-deployment verification checklist |
pipeline_retrospective |
Post-completion analysis and lessons learned |
7. NOTIFICATION ESCALATION — No Decision Falls Through
This is critical. The whole point is that you CANNOT miss something.
T+0min Task created → Decision appears in GooseFactory queue
→ Discord embed in #factory-tasks with buttons
T+30min Reminder #1 → Discord DM + badge pulse in app
→ "⏰ GHL deploy approval waiting 30m"
T+2h Reminder #2 → Discord @mention + push notification
→ "🟡 GHL deploy waiting 2h — SLA in 2h"
T+4h SLA Warning → Discord @here + sound alert in app
→ "🔴 GHL deploy SLA breach imminent"
T+SLA SLA Breach → Auto-escalate: SMS + all channels
→ "🚨 GHL deploy SLA BREACHED — action required"
T+SLA+2h Critical → Phone notification + auto-default to safest action
→ Incident report logged
Smart Batching
Instead of 10 separate pings:
📋 5 servers ready for review:
✅ freshdesk (low risk, tests pass) [Approve]
✅ helpscout (low risk, tests pass) [Approve]
✅ close (low risk, tests pass) [Approve]
⚠️ stripe (med risk, 1 warning) [Review]
❌ ghl (high risk, 42 failures) [Review Required]
[Approve All Low-Risk (3)] [Review All]
8. EVERY UI PATTERN MAPPED
Based on research across Devin, Cursor, GitHub Copilot Workspace, n8n, Retool, and 20+ other products:
Pattern → When to Use
| Pattern | Best For | Our Implementation |
|---|---|---|
| Inline Chat Buttons | Quick approve/reject | Approve/reject buttons in chat messages |
| Modal Overlay | Critical/irreversible actions | Production deploy confirmation (type "DEPLOY" to confirm) |
| Sidebar Panel | Code/asset review | Diff viewer alongside approval context |
| Decision Queue | Managing multiple pending items | Left sidebar in GooseFactory |
| Kanban Board | Pipeline stage visualization | Pipeline view tab |
| Batch Processor | Many similar decisions | "Approve all matching criteria" |
| Progress Dashboard | Long-running agent monitoring | Agent status panel |
| Run Contract | Pre-approving expensive operations | "This will use ~$50 in API calls, take ~4h" |
| Mobile Quick Actions | Approvals on the go | Push notification with swipe actions |
| Discord Embeds | Team visibility + async approval | Rich embeds with buttons in factory channels |
| MCP Apps | Complex interactive reviews | Custom HTML UIs rendered in chat (code review, forms) |
9. TECH STACK
| Layer | Technology | Why |
|---|---|---|
| Desktop App | Forked Goose (Electron + React 19 + Rust) | Best-in-class MCP host, extensible UI |
| Backend API | Node.js + Hono | Fast, lightweight, TypeScript-native |
| Database | PostgreSQL (Neon/Supabase) | Proven, JSONB support, great for state machines |
| Cache/Events | Redis (Upstash) | Pub/sub, streams, fast queue |
| Object Storage | Cloudflare R2 | S3-compatible, no egress fees |
| MCP Server | TypeScript + @modelcontextprotocol/sdk | Native MCP, stdio + SSE transport |
| State Machine | XState-inspired patterns | Explicit states, SLA timers, auto-escalation |
| Orchestration | Inngest (step.waitForEvent) | Durable execution, event correlation, timeouts |
| Discord Bot | discord.js | Buttons, embeds, modals, slash commands |
| Auth | JWT + API keys | Simple, stateless, scoped |
| CI/CD | GitHub Actions | Existing infra, dispatch triggers |
Why Inngest over Temporal?
- Simpler — No separate server cluster to manage
- TypeScript-native — Matches our stack
- Event matching —
waitForEventwith correlation is exactly our approval pattern - Serverless — Functions dehydrate while waiting, no resource consumption
- Temporal is more powerful but overkill for our scale right now. Can migrate later if needed.
10. DATABASE SCHEMA (Key Tables)
pipelines — One per MCP server build
├── pipeline_stages — Stage definitions + state machine
├── tasks — Human decisions needed (the queue)
├── approvals — Formal gate approvals
├── assets — Generated code, configs, builds
└── audit_log — Immutable event log
agents — AI workers + build agents
notifications — Multi-channel notification queue
8 tables total. Full SQL DDL in research-factory-api-architecture.md.
11. IMPLEMENTATION ROADMAP
Phase 1: Foundation (Week 1-2) — "The Skeleton"
- Fork Goose, rebrand basics (name, logo, protocol)
- Set up PostgreSQL schema + migrations
- Core REST API (pipelines, tasks, approvals CRUD)
- JWT auth
- Basic audit logging
- Deliverable: API accepts requests, data persists
Phase 2: MCP Server + Real-Time (Week 3-4) — "The Brain"
- Factory MCP server with core tools (get_pending, approve, reject, status)
- MCP resources (pipeline state, dashboard summary)
- WebSocket server for real-time dashboard updates
- Redis event bus with consumer groups
- Wire MCP server into GooseFactory as built-in extension
- Deliverable: "What needs my attention?" works in chat
Phase 3: Decision Queue UI (Week 5-6) — "The Centerpiece"
- Decision queue sidebar in GooseFactory React UI
- Context panel with diffs, metrics, history
- One-click approve/reject/defer actions
- Keyboard shortcuts (j/k/a/r/d)
- Pipeline kanban view
- SLA countdown indicators
- Deliverable: Full Command Center in desktop app
Phase 4: Notifications + Discord (Week 7-8) — "The Nagger"
- Discord bot bridge with rich embeds + buttons
- Escalation ladder (queue → DM → mention → SMS)
- Smart batching for similar decisions
- Mobile push notifications
- SLA monitoring and auto-escalation
- GitHub webhook integration
- Deliverable: Decisions come to you, not the other way around
Phase 5: Advanced Features (Week 9-10) — "The Polish"
- MCP Apps for complex reviews (code diffs, forms in chat)
- Batch approval processor
- MCP prompts (review, deploy checklist, retrospective)
- Analytics dashboard (decision velocity, bottleneck analysis)
- Confidence-based auto-routing
- Undo/rollback for 24h post-approval
- Deliverable: Full SaaS-grade product
Phase 6: SaaS-ify (Week 11-12) — "The Product"
- Multi-tenant support (separate factory instances)
- User management + team roles
- Billing integration
- Landing page + docs
- Onboarding flow
- Deliverable: Sellable product
12. WHAT MAKES THIS DIFFERENT FROM EXISTING TOOLS
| Tool | What It Does | What We Do Better |
|---|---|---|
| Devin | Autonomous coding agent | We're a factory MANAGER, not a single agent |
| Cursor/Windsurf | IDE with AI | We manage pipelines of 64+ servers, not single files |
| n8n/Zapier | Workflow automation | We're AI-agent-native with MCP, not just webhooks |
| Linear/Jira | Project management | We have AI agents doing the work, humans just decide |
| Retool | Internal tools | We're purpose-built for AI agent factories |
| Goose (vanilla) | General AI assistant | We're a specialized factory operator |
The unique value: No one has built a purpose-built human-in-the-loop command center specifically for managing fleets of AI agents building MCP servers. You'd be first.
13. IMMEDIATE NEXT STEPS
- Jake reviews this plan — What's missing? What's wrong? What's the priority?
- Fork Goose — Clone, rebrand, get building running locally
- Spike the MCP Server — Build the 3 most critical tools (get_pending, approve, reject) and test in Goose
- Spike the Decision Queue UI — Mockup the sidebar in GooseFactory's React app
- Wire to existing pipeline — Connect to
mcp-command-center/state.jsonas initial data source
The MVP is: Type "what needs my attention?" in GooseFactory → get a prioritized list → approve/reject from chat. Everything else builds on that.
SUPPORTING RESEARCH DOCS
| Doc | Words | Focus |
|---|---|---|
research-goose-architecture.md |
~3,000 | Goose codebase, fork strategy, MCP integration |
research-hitl-ux-patterns.md |
~5,500 | Every HITL interaction type, UI patterns, 10 products analyzed |
research-factory-api-architecture.md |
~4,000 | API design, MCP server spec, database schema, real-time events |
research-agent-orchestration-patterns.md |
~3,500 | LangGraph, Temporal, Inngest, state machines, notification patterns |
"The best interface for managing AI agents isn't more AI — it's making it painfully obvious when a human needs to do something, and making that something take one click."