clawdbot-workspace/MASTER-PLAN-interactive-agent-factory.md
2026-02-06 23:01:30 -05:00

24 KiB
Raw Blame History

MASTER PLAN: Interactive Agent Factory SaaS

Codename: "GooseFactory" — Your AI Factory, Your Rules

Author: Buba (synthesized from 4 specialized research agents)
Date: 2026-02-06
Status: PLAN — Awaiting Jake's Review
Supporting Research: 4 docs, ~15,000 words, 60+ sources


TL;DR — The 30-Second Pitch

Fork Goose (Block's open-source AI agent). Gut its chat UI. Wire in a Factory Command Center — a decision queue, pipeline kanban, and approval system that makes it painfully obvious when YOU are the bottleneck. The backend is an API + MCP server that exposes every factory operation as a conversational tool. You literally type "what needs my attention?" and get a prioritized list with one-click approve/reject. Everything you don't touch auto-advances. Everything that needs you screams at you until you act.


1. WHY THIS MATTERS

Right now the pipeline has ~64 MCP servers across 8 stages. The bottleneck isn't the AI — it's you not knowing what's stuck on you. The current system (Discord channels + cron heartbeats + manual checks) is passive. You have to go looking for what needs attention. That's backwards.

The fix: Build a system where decisions come to YOU, not the other way around. Make human-in-the-loop a first-class experience, not an afterthought.


2. ARCHITECTURE OVERVIEW

┌─────────────────────────────────────────────────────────────────┐
│                     YOUR INTERFACE LAYER                         │
│                                                                  │
│  ┌─────────────┐  ┌──────────────┐  ┌──────────────┐           │
│  │ GooseFactory │  │  Discord Bot  │  │   Mobile     │           │
│  │ Desktop App  │  │  (Buttons +   │  │  Push Notifs │           │
│  │ (Forked      │  │   Embeds)     │  │  (Quick      │           │
│  │  Goose)      │  │              │  │   Approve)   │           │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘           │
│         │                  │                  │                   │
│         └──────────────────┼──────────────────┘                   │
│                            │                                      │
│  ┌─────────────────────────▼─────────────────────────────┐      │
│  │            MCP Server (Factory Operations)             │      │
│  │  11 Tools · 6 Resources · 4 Prompts                    │      │
│  │  "what needs attention?" → prioritized decision queue   │      │
│  └─────────────────────────┬─────────────────────────────┘      │
└────────────────────────────┼────────────────────────────────────┘
                             │
┌────────────────────────────┼────────────────────────────────────┐
│                            ▼                                     │
│  ┌─────────────────────────────────────────────────────┐        │
│  │              Factory API (REST + WebSocket)          │        │
│  │  30+ endpoints · Real-time events · GraphQL queries  │        │
│  └─────────────────────────┬───────────────────────────┘        │
│                            │                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐       │
│  │Pipeline  │  │Task      │  │Notif +   │  │Audit     │       │
│  │Engine    │  │Queue     │  │Escalation│  │Logger    │       │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘       │
│                            │                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐                      │
│  │PostgreSQL│  │Redis     │  │S3/R2     │                      │
│  │(State)   │  │(Events)  │  │(Assets)  │                      │
│  └──────────┘  └──────────┘  └──────────┘                      │
└─────────────────────────────────────────────────────────────────┘

3. THE GOOSE FORK — "GooseFactory"

Why Goose?

  • Rust backend + Electron/React frontend — production-grade, fast
  • Apache 2.0 license — full commercial freedom, no copyleft
  • MCP-native — already a first-class MCP host with dynamic extension discovery
  • Built-in permission system — 4 modes including Smart Approval (risk-based)
  • Extension ecosystem — thousands of MCP servers plug in immediately
  • Active community — but now under Linux Foundation (AAIF), so stable governance

What We Change

Component Current Goose GooseFactory
Branding Goose logos, goose:// protocol Your brand, factory:// protocol
Default Extensions Developer, Memory, etc. Factory MCP Server (built-in), Pipeline Manager
Chat UI General-purpose assistant Factory Command Center with decision queue sidebar
Approval Flow Simple allow/deny on tool calls Rich approval cards with context, diffs, metrics
System Prompts Generic agent instructions Factory operator mode — knows about pipeline stages, MCPs
MCP UI Rendering Basic inline/sidecar (WIP) Custom approval UIs, pipeline dashboards, code review panels
Protocol Handler goose://extension?... factory://approve?task_id=... deep links

Fork Strategy

  1. Clone the repogit clone https://github.com/block/goose GooseFactory
  2. Rebrandpackage.json, main.ts, assets, protocol handler (~1-2 days)
  3. Add Factory MCP Server as a built-in Rust extension in crates/goose-mcp/
  4. Customize the chat UI — Add decision queue sidebar in React (the interesting part)
  5. Add MCP UI components — Custom approval cards using @mcp-ui/client
  6. Configure Smart Approval — Factory operations auto-classified by risk level

⚠️ Timing Risk

Goose is actively migrating to ACP (Agent Communication Protocol) — Issue #6642. This replaces the backend REST+SSE with JSON-RPC 2.0. Recommendation: Fork AFTER the ACP migration lands (or fork now and track upstream). The migration affects goosed ↔ desktop communication.


4. EVERY MOMENT YOU'RE NEEDED (Taxonomy)

Based on research across 10+ agent products and frameworks, here's every type of human-in-the-loop moment mapped to your factory:

🔴 CRITICAL — Always Need You

Moment Factory Example UI Pattern
Deploy to Production Promoting an MCP to live Modal overlay with deploy checklist
API Key Entry Configuring Stripe/GHL credentials Secure input form in chat
Client Communication Sending deliverables to the $20k client Preview + approve before send
Pricing/Positioning Setting MCP marketplace pricing Multi-choice card with tradeoffs
Legal/License Review Checking dependency licenses Sidebar review panel

🟡 HIGH VALUE — Usually Need You

Moment Factory Example UI Pattern
Design Review Approving UI/UX for MCP apps Side-by-side mockup comparison
Code Quality Gate Reviewing generated MCP server code Diff view with inline annotations
Naming/Branding Naming a new MCP server A/B choice between options
Test Failure Triage GHL's 42 failing tests — fix or skip? Error cards with suggested actions
Priority Decisions Which MCP to advance next? Drag-and-drop priority list

🟢 CONTEXTUAL — Sometimes Need You

Moment Factory Example UI Pattern
Routine Approvals Stage advances for passing servers Batch approve with exceptions
Parameter Tuning Adjusting test coverage thresholds Slider controls
Edge Cases AI hit a wall building a tool Escalation card with context
Delegation Route task to specialized agent Dropdown assignment

Smart Routing (Confidence-Based)

Not everything needs to block on you:

  • >90% confidence → Auto-execute, log for async review
  • 60-90% confidence → Queue for review, pipeline continues other work
  • <60% confidence → Block and escalate immediately

5. THE DECISION QUEUE — Your Mission Control

This is the centerpiece. A prioritized inbox of every decision the factory needs from you.

Layout (In GooseFactory Desktop App)

┌─────────────────────────────────────────────────────────────┐
│  GooseFactory                                    [≡] [] [×]│
├──────────────────┬──────────────────────────────────────────┤
│                  │                                          │
│  📥 DECISIONS (6)│  🔴 GHL MCP — Deploy to Production      │
│                  │                                          │
│  🔴 GHL Deploy   │  Pipeline: ghl-mcp-server                │
│  🟡 Stripe Review │  Stage: staging → production             │
│  🟡 3 Batch Items│  Tests: 47/47 ✅  Coverage: 94% ✅      │
│  🟢 2 FYI Items  │  Waiting: 2h 15m  SLA: ⚠️ 45m left     │
│                  │                                          │
│  ── Pipeline ──  │  Changes since last review:              │
│  [Kanban View]   │  + 12 files modified                     │
│                  │  + 3 new API endpoints                   │
│  ── Agents ──    │  + Edge case handling improved           │
│  🟢 Builder: OK  │                                          │
│  🟢 Tester: OK   │  [View Full Diff]  [Run Tests Again]    │
│  🟡 GHL: Waiting │                                          │
│                  │  ┌─────────┐ ┌─────────┐ ┌──────────┐   │
│  ── Stats ──     │  │✅ Deploy│ │❌ Reject│ │⏰ Defer  │   │
│  Today: 12 done  │  └─────────┘ └─────────┘ └──────────┘   │
│  Avg wait: 1.2h  │                                          │
│                  │  💬 Chat: "approve the GHL deploy"       │
│                  │  [________________________________] [⏎]  │
├──────────────────┴──────────────────────────────────────────┤
│  Chat: You can also just type naturally here...             │
│  > "what else needs my attention?"                          │
│  > "approve all low-risk items"                             │
│  > "show me the GHL test failures"                          │
└─────────────────────────────────────────────────────────────┘

Key Features

  1. Left Sidebar: Decision Queue — Priority-sorted, color-coded, with age timers
  2. Center: Context Panel — Full details for the selected decision (diffs, metrics, history)
  3. Bottom: Chat — Natural language interface to the factory ("approve all passing servers")
  4. One-Click Actions — Approve, reject, defer, reassign, batch approve
  5. Keyboard Shortcutsj/k navigate, a approve, r reject, d defer
  6. SLA Indicators — Glowing countdown timers, escalation warnings

6. MCP SERVER — The Brain

The Factory MCP Server is what makes the chat interface powerful. It exposes 11 tools, 6 resources, and 4 prompts.

Tools (What You Can Do)

Tool What It Does Example
factory_get_pending_tasks Your decision inbox "what needs my attention?"
factory_approve_task Approve and advance "approve the GHL deploy"
factory_reject_task Reject with feedback "reject stripe review — needs more tests"
factory_get_pipeline_status Pipeline overview "show me all active pipelines"
factory_advance_stage Manual stage advance "move notion-mcp to testing"
factory_assign_priority Set priority "make GHL critical priority"
factory_get_blockers What's stuck "what's blocked and why?"
factory_run_tests Trigger tests "run tests on the stripe server"
factory_deploy Deploy to env "deploy freshdesk to staging"
factory_search Search everything "find all servers with auth issues"
factory_create_pipeline New server pipeline "start a new Zendesk MCP server"

Resources (What You Can Read)

Resource What It Provides
factory://dashboard/summary High-level factory status
factory://pipelines/{id}/state Specific pipeline details
factory://servers/{name}/status Individual server health
factory://pipelines/{id}/test-results Test results + coverage
factory://pipelines/{id}/build-logs Build output
factory://config/templates Available pipeline templates

Prompts (Structured Conversations)

Prompt What It Sets Up
review_server Pull all context for a full MCP server review
whats_needs_attention Prioritized summary of everything pending
deploy_checklist Pre-deployment verification checklist
pipeline_retrospective Post-completion analysis and lessons learned

7. NOTIFICATION ESCALATION — No Decision Falls Through

This is critical. The whole point is that you CANNOT miss something.

T+0min     Task created → Decision appears in GooseFactory queue
                        → Discord embed in #factory-tasks with buttons
                        
T+30min    Reminder #1  → Discord DM + badge pulse in app
                        → "⏰ GHL deploy approval waiting 30m"

T+2h       Reminder #2  → Discord @mention + push notification
                        → "🟡 GHL deploy waiting 2h — SLA in 2h"

T+4h       SLA Warning  → Discord @here + sound alert in app
                        → "🔴 GHL deploy SLA breach imminent"

T+SLA      SLA Breach   → Auto-escalate: SMS + all channels
                        → "🚨 GHL deploy SLA BREACHED — action required"

T+SLA+2h   Critical     → Phone notification + auto-default to safest action
                        → Incident report logged

Smart Batching

Instead of 10 separate pings:

📋 5 servers ready for review:
  ✅ freshdesk (low risk, tests pass)     [Approve]
  ✅ helpscout (low risk, tests pass)     [Approve]  
  ✅ close (low risk, tests pass)         [Approve]
  ⚠️ stripe (med risk, 1 warning)        [Review]
  ❌ ghl (high risk, 42 failures)        [Review Required]

[Approve All Low-Risk (3)] [Review All]

8. EVERY UI PATTERN MAPPED

Based on research across Devin, Cursor, GitHub Copilot Workspace, n8n, Retool, and 20+ other products:

Pattern → When to Use

Pattern Best For Our Implementation
Inline Chat Buttons Quick approve/reject Approve/reject buttons in chat messages
Modal Overlay Critical/irreversible actions Production deploy confirmation (type "DEPLOY" to confirm)
Sidebar Panel Code/asset review Diff viewer alongside approval context
Decision Queue Managing multiple pending items Left sidebar in GooseFactory
Kanban Board Pipeline stage visualization Pipeline view tab
Batch Processor Many similar decisions "Approve all matching criteria"
Progress Dashboard Long-running agent monitoring Agent status panel
Run Contract Pre-approving expensive operations "This will use ~$50 in API calls, take ~4h"
Mobile Quick Actions Approvals on the go Push notification with swipe actions
Discord Embeds Team visibility + async approval Rich embeds with buttons in factory channels
MCP Apps Complex interactive reviews Custom HTML UIs rendered in chat (code review, forms)

9. TECH STACK

Layer Technology Why
Desktop App Forked Goose (Electron + React 19 + Rust) Best-in-class MCP host, extensible UI
Backend API Node.js + Hono Fast, lightweight, TypeScript-native
Database PostgreSQL (Neon/Supabase) Proven, JSONB support, great for state machines
Cache/Events Redis (Upstash) Pub/sub, streams, fast queue
Object Storage Cloudflare R2 S3-compatible, no egress fees
MCP Server TypeScript + @modelcontextprotocol/sdk Native MCP, stdio + SSE transport
State Machine XState-inspired patterns Explicit states, SLA timers, auto-escalation
Orchestration Inngest (step.waitForEvent) Durable execution, event correlation, timeouts
Discord Bot discord.js Buttons, embeds, modals, slash commands
Auth JWT + API keys Simple, stateless, scoped
CI/CD GitHub Actions Existing infra, dispatch triggers

Why Inngest over Temporal?

  • Simpler — No separate server cluster to manage
  • TypeScript-native — Matches our stack
  • Event matchingwaitForEvent with correlation is exactly our approval pattern
  • Serverless — Functions dehydrate while waiting, no resource consumption
  • Temporal is more powerful but overkill for our scale right now. Can migrate later if needed.

10. DATABASE SCHEMA (Key Tables)

pipelines          — One per MCP server build
├── pipeline_stages — Stage definitions + state machine
├── tasks          — Human decisions needed (the queue)
├── approvals      — Formal gate approvals
├── assets         — Generated code, configs, builds
└── audit_log      — Immutable event log

agents             — AI workers + build agents
notifications      — Multi-channel notification queue

8 tables total. Full SQL DDL in research-factory-api-architecture.md.


11. IMPLEMENTATION ROADMAP

Phase 1: Foundation (Week 1-2) — "The Skeleton"

  • Fork Goose, rebrand basics (name, logo, protocol)
  • Set up PostgreSQL schema + migrations
  • Core REST API (pipelines, tasks, approvals CRUD)
  • JWT auth
  • Basic audit logging
  • Deliverable: API accepts requests, data persists

Phase 2: MCP Server + Real-Time (Week 3-4) — "The Brain"

  • Factory MCP server with core tools (get_pending, approve, reject, status)
  • MCP resources (pipeline state, dashboard summary)
  • WebSocket server for real-time dashboard updates
  • Redis event bus with consumer groups
  • Wire MCP server into GooseFactory as built-in extension
  • Deliverable: "What needs my attention?" works in chat

Phase 3: Decision Queue UI (Week 5-6) — "The Centerpiece"

  • Decision queue sidebar in GooseFactory React UI
  • Context panel with diffs, metrics, history
  • One-click approve/reject/defer actions
  • Keyboard shortcuts (j/k/a/r/d)
  • Pipeline kanban view
  • SLA countdown indicators
  • Deliverable: Full Command Center in desktop app

Phase 4: Notifications + Discord (Week 7-8) — "The Nagger"

  • Discord bot bridge with rich embeds + buttons
  • Escalation ladder (queue → DM → mention → SMS)
  • Smart batching for similar decisions
  • Mobile push notifications
  • SLA monitoring and auto-escalation
  • GitHub webhook integration
  • Deliverable: Decisions come to you, not the other way around

Phase 5: Advanced Features (Week 9-10) — "The Polish"

  • MCP Apps for complex reviews (code diffs, forms in chat)
  • Batch approval processor
  • MCP prompts (review, deploy checklist, retrospective)
  • Analytics dashboard (decision velocity, bottleneck analysis)
  • Confidence-based auto-routing
  • Undo/rollback for 24h post-approval
  • Deliverable: Full SaaS-grade product

Phase 6: SaaS-ify (Week 11-12) — "The Product"

  • Multi-tenant support (separate factory instances)
  • User management + team roles
  • Billing integration
  • Landing page + docs
  • Onboarding flow
  • Deliverable: Sellable product

12. WHAT MAKES THIS DIFFERENT FROM EXISTING TOOLS

Tool What It Does What We Do Better
Devin Autonomous coding agent We're a factory MANAGER, not a single agent
Cursor/Windsurf IDE with AI We manage pipelines of 64+ servers, not single files
n8n/Zapier Workflow automation We're AI-agent-native with MCP, not just webhooks
Linear/Jira Project management We have AI agents doing the work, humans just decide
Retool Internal tools We're purpose-built for AI agent factories
Goose (vanilla) General AI assistant We're a specialized factory operator

The unique value: No one has built a purpose-built human-in-the-loop command center specifically for managing fleets of AI agents building MCP servers. You'd be first.


13. IMMEDIATE NEXT STEPS

  1. Jake reviews this plan — What's missing? What's wrong? What's the priority?
  2. Fork Goose — Clone, rebrand, get building running locally
  3. Spike the MCP Server — Build the 3 most critical tools (get_pending, approve, reject) and test in Goose
  4. Spike the Decision Queue UI — Mockup the sidebar in GooseFactory's React app
  5. Wire to existing pipeline — Connect to mcp-command-center/state.json as initial data source

The MVP is: Type "what needs my attention?" in GooseFactory → get a prioritized list → approve/reject from chat. Everything else builds on that.


SUPPORTING RESEARCH DOCS

Doc Words Focus
research-goose-architecture.md ~3,000 Goose codebase, fork strategy, MCP integration
research-hitl-ux-patterns.md ~5,500 Every HITL interaction type, UI patterns, 10 products analyzed
research-factory-api-architecture.md ~4,000 API design, MCP server spec, database schema, real-time events
research-agent-orchestration-patterns.md ~3,500 LangGraph, Temporal, Inngest, state machines, notification patterns

"The best interface for managing AI agents isn't more AI — it's making it painfully obvious when a human needs to do something, and making that something take one click."