clawdbot-workspace/MASTER-PLAN-interactive-agent-factory.md
2026-02-06 23:01:30 -05:00

435 lines
24 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# MASTER PLAN: Interactive Agent Factory SaaS
## Codename: "GooseFactory" — Your AI Factory, Your Rules
> **Author:** Buba (synthesized from 4 specialized research agents)
> **Date:** 2026-02-06
> **Status:** PLAN — Awaiting Jake's Review
> **Supporting Research:** 4 docs, ~15,000 words, 60+ sources
---
## TL;DR — The 30-Second Pitch
Fork Goose (Block's open-source AI agent). Gut its chat UI. Wire in a **Factory Command Center** — a decision queue, pipeline kanban, and approval system that makes it painfully obvious when YOU are the bottleneck. The backend is an API + MCP server that exposes every factory operation as a conversational tool. You literally type "what needs my attention?" and get a prioritized list with one-click approve/reject. Everything you don't touch auto-advances. Everything that needs you screams at you until you act.
---
## 1. WHY THIS MATTERS
Right now the pipeline has ~64 MCP servers across 8 stages. The bottleneck isn't the AI — it's **you not knowing what's stuck on you**. The current system (Discord channels + cron heartbeats + manual checks) is passive. You have to go looking for what needs attention. That's backwards.
**The fix:** Build a system where decisions come to YOU, not the other way around. Make human-in-the-loop a first-class experience, not an afterthought.
---
## 2. ARCHITECTURE OVERVIEW
```
┌─────────────────────────────────────────────────────────────────┐
│ YOUR INTERFACE LAYER │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ GooseFactory │ │ Discord Bot │ │ Mobile │ │
│ │ Desktop App │ │ (Buttons + │ │ Push Notifs │ │
│ │ (Forked │ │ Embeds) │ │ (Quick │ │
│ │ Goose) │ │ │ │ Approve) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └──────────────────┼──────────────────┘ │
│ │ │
│ ┌─────────────────────────▼─────────────────────────────┐ │
│ │ MCP Server (Factory Operations) │ │
│ │ 11 Tools · 6 Resources · 4 Prompts │ │
│ │ "what needs attention?" → prioritized decision queue │ │
│ └─────────────────────────┬─────────────────────────────┘ │
└────────────────────────────┼────────────────────────────────────┘
┌────────────────────────────┼────────────────────────────────────┐
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Factory API (REST + WebSocket) │ │
│ │ 30+ endpoints · Real-time events · GraphQL queries │ │
│ └─────────────────────────┬───────────────────────────┘ │
│ │ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Pipeline │ │Task │ │Notif + │ │Audit │ │
│ │Engine │ │Queue │ │Escalation│ │Logger │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │PostgreSQL│ │Redis │ │S3/R2 │ │
│ │(State) │ │(Events) │ │(Assets) │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
---
## 3. THE GOOSE FORK — "GooseFactory"
### Why Goose?
- **Rust backend + Electron/React frontend** — production-grade, fast
- **Apache 2.0 license** — full commercial freedom, no copyleft
- **MCP-native** — already a first-class MCP host with dynamic extension discovery
- **Built-in permission system** — 4 modes including Smart Approval (risk-based)
- **Extension ecosystem** — thousands of MCP servers plug in immediately
- **Active community** — but now under Linux Foundation (AAIF), so stable governance
### What We Change
| Component | Current Goose | GooseFactory |
|-----------|--------------|--------------|
| **Branding** | Goose logos, `goose://` protocol | Your brand, `factory://` protocol |
| **Default Extensions** | Developer, Memory, etc. | Factory MCP Server (built-in), Pipeline Manager |
| **Chat UI** | General-purpose assistant | Factory Command Center with decision queue sidebar |
| **Approval Flow** | Simple allow/deny on tool calls | Rich approval cards with context, diffs, metrics |
| **System Prompts** | Generic agent instructions | Factory operator mode — knows about pipeline stages, MCPs |
| **MCP UI Rendering** | Basic inline/sidecar (WIP) | Custom approval UIs, pipeline dashboards, code review panels |
| **Protocol Handler** | `goose://extension?...` | `factory://approve?task_id=...` deep links |
### Fork Strategy
1. **Clone the repo**`git clone https://github.com/block/goose GooseFactory`
2. **Rebrand**`package.json`, `main.ts`, assets, protocol handler (~1-2 days)
3. **Add Factory MCP Server** as a built-in Rust extension in `crates/goose-mcp/`
4. **Customize the chat UI** — Add decision queue sidebar in React (the interesting part)
5. **Add MCP UI components** — Custom approval cards using `@mcp-ui/client`
6. **Configure Smart Approval** — Factory operations auto-classified by risk level
### ⚠️ Timing Risk
Goose is actively migrating to ACP (Agent Communication Protocol) — Issue #6642. This replaces the backend REST+SSE with JSON-RPC 2.0. **Recommendation:** Fork AFTER the ACP migration lands (or fork now and track upstream). The migration affects `goosed` ↔ desktop communication.
---
## 4. EVERY MOMENT YOU'RE NEEDED (Taxonomy)
Based on research across 10+ agent products and frameworks, here's every type of human-in-the-loop moment mapped to your factory:
### 🔴 CRITICAL — Always Need You
| Moment | Factory Example | UI Pattern |
|--------|----------------|------------|
| **Deploy to Production** | Promoting an MCP to live | Modal overlay with deploy checklist |
| **API Key Entry** | Configuring Stripe/GHL credentials | Secure input form in chat |
| **Client Communication** | Sending deliverables to the $20k client | Preview + approve before send |
| **Pricing/Positioning** | Setting MCP marketplace pricing | Multi-choice card with tradeoffs |
| **Legal/License Review** | Checking dependency licenses | Sidebar review panel |
### 🟡 HIGH VALUE — Usually Need You
| Moment | Factory Example | UI Pattern |
|--------|----------------|------------|
| **Design Review** | Approving UI/UX for MCP apps | Side-by-side mockup comparison |
| **Code Quality Gate** | Reviewing generated MCP server code | Diff view with inline annotations |
| **Naming/Branding** | Naming a new MCP server | A/B choice between options |
| **Test Failure Triage** | GHL's 42 failing tests — fix or skip? | Error cards with suggested actions |
| **Priority Decisions** | Which MCP to advance next? | Drag-and-drop priority list |
### 🟢 CONTEXTUAL — Sometimes Need You
| Moment | Factory Example | UI Pattern |
|--------|----------------|------------|
| **Routine Approvals** | Stage advances for passing servers | Batch approve with exceptions |
| **Parameter Tuning** | Adjusting test coverage thresholds | Slider controls |
| **Edge Cases** | AI hit a wall building a tool | Escalation card with context |
| **Delegation** | Route task to specialized agent | Dropdown assignment |
### Smart Routing (Confidence-Based)
Not everything needs to block on you:
- **>90% confidence** → Auto-execute, log for async review
- **60-90% confidence** → Queue for review, pipeline continues other work
- **<60% confidence** Block and escalate immediately
---
## 5. THE DECISION QUEUE — Your Mission Control
This is the centerpiece. A prioritized inbox of every decision the factory needs from you.
### Layout (In GooseFactory Desktop App)
```
┌─────────────────────────────────────────────────────────────┐
│ GooseFactory [≡] [] [×]│
├──────────────────┬──────────────────────────────────────────┤
│ │ │
│ 📥 DECISIONS (6)│ 🔴 GHL MCP — Deploy to Production │
│ │ │
│ 🔴 GHL Deploy │ Pipeline: ghl-mcp-server │
│ 🟡 Stripe Review │ Stage: staging → production │
│ 🟡 3 Batch Items│ Tests: 47/47 ✅ Coverage: 94% ✅ │
│ 🟢 2 FYI Items │ Waiting: 2h 15m SLA: ⚠️ 45m left │
│ │ │
│ ── Pipeline ── │ Changes since last review: │
│ [Kanban View] │ + 12 files modified │
│ │ + 3 new API endpoints │
│ ── Agents ── │ + Edge case handling improved │
│ 🟢 Builder: OK │ │
│ 🟢 Tester: OK │ [View Full Diff] [Run Tests Again] │
│ 🟡 GHL: Waiting │ │
│ │ ┌─────────┐ ┌─────────┐ ┌──────────┐ │
│ ── Stats ── │ │✅ Deploy│ │❌ Reject│ │⏰ Defer │ │
│ Today: 12 done │ └─────────┘ └─────────┘ └──────────┘ │
│ Avg wait: 1.2h │ │
│ │ 💬 Chat: "approve the GHL deploy" │
│ │ [________________________________] [⏎] │
├──────────────────┴──────────────────────────────────────────┤
│ Chat: You can also just type naturally here... │
│ > "what else needs my attention?" │
│ > "approve all low-risk items" │
│ > "show me the GHL test failures" │
└─────────────────────────────────────────────────────────────┘
```
### Key Features
1. **Left Sidebar: Decision Queue** Priority-sorted, color-coded, with age timers
2. **Center: Context Panel** Full details for the selected decision (diffs, metrics, history)
3. **Bottom: Chat** Natural language interface to the factory ("approve all passing servers")
4. **One-Click Actions** Approve, reject, defer, reassign, batch approve
5. **Keyboard Shortcuts** `j/k` navigate, `a` approve, `r` reject, `d` defer
6. **SLA Indicators** Glowing countdown timers, escalation warnings
---
## 6. MCP SERVER — The Brain
The Factory MCP Server is what makes the chat interface powerful. It exposes 11 tools, 6 resources, and 4 prompts.
### Tools (What You Can Do)
| Tool | What It Does | Example |
|------|-------------|---------|
| `factory_get_pending_tasks` | Your decision inbox | "what needs my attention?" |
| `factory_approve_task` | Approve and advance | "approve the GHL deploy" |
| `factory_reject_task` | Reject with feedback | "reject stripe review needs more tests" |
| `factory_get_pipeline_status` | Pipeline overview | "show me all active pipelines" |
| `factory_advance_stage` | Manual stage advance | "move notion-mcp to testing" |
| `factory_assign_priority` | Set priority | "make GHL critical priority" |
| `factory_get_blockers` | What's stuck | "what's blocked and why?" |
| `factory_run_tests` | Trigger tests | "run tests on the stripe server" |
| `factory_deploy` | Deploy to env | "deploy freshdesk to staging" |
| `factory_search` | Search everything | "find all servers with auth issues" |
| `factory_create_pipeline` | New server pipeline | "start a new Zendesk MCP server" |
### Resources (What You Can Read)
| Resource | What It Provides |
|----------|-----------------|
| `factory://dashboard/summary` | High-level factory status |
| `factory://pipelines/{id}/state` | Specific pipeline details |
| `factory://servers/{name}/status` | Individual server health |
| `factory://pipelines/{id}/test-results` | Test results + coverage |
| `factory://pipelines/{id}/build-logs` | Build output |
| `factory://config/templates` | Available pipeline templates |
### Prompts (Structured Conversations)
| Prompt | What It Sets Up |
|--------|----------------|
| `review_server` | Pull all context for a full MCP server review |
| `whats_needs_attention` | Prioritized summary of everything pending |
| `deploy_checklist` | Pre-deployment verification checklist |
| `pipeline_retrospective` | Post-completion analysis and lessons learned |
---
## 7. NOTIFICATION ESCALATION — No Decision Falls Through
This is critical. The whole point is that you CANNOT miss something.
```
T+0min Task created → Decision appears in GooseFactory queue
→ Discord embed in #factory-tasks with buttons
T+30min Reminder #1 → Discord DM + badge pulse in app
→ "⏰ GHL deploy approval waiting 30m"
T+2h Reminder #2 → Discord @mention + push notification
→ "🟡 GHL deploy waiting 2h — SLA in 2h"
T+4h SLA Warning → Discord @here + sound alert in app
→ "🔴 GHL deploy SLA breach imminent"
T+SLA SLA Breach → Auto-escalate: SMS + all channels
→ "🚨 GHL deploy SLA BREACHED — action required"
T+SLA+2h Critical → Phone notification + auto-default to safest action
→ Incident report logged
```
### Smart Batching
Instead of 10 separate pings:
```
📋 5 servers ready for review:
✅ freshdesk (low risk, tests pass) [Approve]
✅ helpscout (low risk, tests pass) [Approve]
✅ close (low risk, tests pass) [Approve]
⚠️ stripe (med risk, 1 warning) [Review]
❌ ghl (high risk, 42 failures) [Review Required]
[Approve All Low-Risk (3)] [Review All]
```
---
## 8. EVERY UI PATTERN MAPPED
Based on research across Devin, Cursor, GitHub Copilot Workspace, n8n, Retool, and 20+ other products:
### Pattern → When to Use
| Pattern | Best For | Our Implementation |
|---------|----------|-------------------|
| **Inline Chat Buttons** | Quick approve/reject | Approve/reject buttons in chat messages |
| **Modal Overlay** | Critical/irreversible actions | Production deploy confirmation (type "DEPLOY" to confirm) |
| **Sidebar Panel** | Code/asset review | Diff viewer alongside approval context |
| **Decision Queue** | Managing multiple pending items | Left sidebar in GooseFactory |
| **Kanban Board** | Pipeline stage visualization | Pipeline view tab |
| **Batch Processor** | Many similar decisions | "Approve all matching criteria" |
| **Progress Dashboard** | Long-running agent monitoring | Agent status panel |
| **Run Contract** | Pre-approving expensive operations | "This will use ~$50 in API calls, take ~4h" |
| **Mobile Quick Actions** | Approvals on the go | Push notification with swipe actions |
| **Discord Embeds** | Team visibility + async approval | Rich embeds with buttons in factory channels |
| **MCP Apps** | Complex interactive reviews | Custom HTML UIs rendered in chat (code review, forms) |
---
## 9. TECH STACK
| Layer | Technology | Why |
|-------|-----------|-----|
| **Desktop App** | Forked Goose (Electron + React 19 + Rust) | Best-in-class MCP host, extensible UI |
| **Backend API** | Node.js + Hono | Fast, lightweight, TypeScript-native |
| **Database** | PostgreSQL (Neon/Supabase) | Proven, JSONB support, great for state machines |
| **Cache/Events** | Redis (Upstash) | Pub/sub, streams, fast queue |
| **Object Storage** | Cloudflare R2 | S3-compatible, no egress fees |
| **MCP Server** | TypeScript + @modelcontextprotocol/sdk | Native MCP, stdio + SSE transport |
| **State Machine** | XState-inspired patterns | Explicit states, SLA timers, auto-escalation |
| **Orchestration** | Inngest (step.waitForEvent) | Durable execution, event correlation, timeouts |
| **Discord Bot** | discord.js | Buttons, embeds, modals, slash commands |
| **Auth** | JWT + API keys | Simple, stateless, scoped |
| **CI/CD** | GitHub Actions | Existing infra, dispatch triggers |
### Why Inngest over Temporal?
- **Simpler** No separate server cluster to manage
- **TypeScript-native** Matches our stack
- **Event matching** `waitForEvent` with correlation is exactly our approval pattern
- **Serverless** Functions dehydrate while waiting, no resource consumption
- Temporal is more powerful but overkill for our scale right now. Can migrate later if needed.
---
## 10. DATABASE SCHEMA (Key Tables)
```
pipelines — One per MCP server build
├── pipeline_stages — Stage definitions + state machine
├── tasks — Human decisions needed (the queue)
├── approvals — Formal gate approvals
├── assets — Generated code, configs, builds
└── audit_log — Immutable event log
agents — AI workers + build agents
notifications — Multi-channel notification queue
```
8 tables total. Full SQL DDL in `research-factory-api-architecture.md`.
---
## 11. IMPLEMENTATION ROADMAP
### Phase 1: Foundation (Week 1-2) — "The Skeleton"
- [ ] Fork Goose, rebrand basics (name, logo, protocol)
- [ ] Set up PostgreSQL schema + migrations
- [ ] Core REST API (pipelines, tasks, approvals CRUD)
- [ ] JWT auth
- [ ] Basic audit logging
- [ ] **Deliverable:** API accepts requests, data persists
### Phase 2: MCP Server + Real-Time (Week 3-4) — "The Brain"
- [ ] Factory MCP server with core tools (get_pending, approve, reject, status)
- [ ] MCP resources (pipeline state, dashboard summary)
- [ ] WebSocket server for real-time dashboard updates
- [ ] Redis event bus with consumer groups
- [ ] Wire MCP server into GooseFactory as built-in extension
- [ ] **Deliverable:** "What needs my attention?" works in chat
### Phase 3: Decision Queue UI (Week 5-6) — "The Centerpiece"
- [ ] Decision queue sidebar in GooseFactory React UI
- [ ] Context panel with diffs, metrics, history
- [ ] One-click approve/reject/defer actions
- [ ] Keyboard shortcuts (j/k/a/r/d)
- [ ] Pipeline kanban view
- [ ] SLA countdown indicators
- [ ] **Deliverable:** Full Command Center in desktop app
### Phase 4: Notifications + Discord (Week 7-8) — "The Nagger"
- [ ] Discord bot bridge with rich embeds + buttons
- [ ] Escalation ladder (queue DM mention SMS)
- [ ] Smart batching for similar decisions
- [ ] Mobile push notifications
- [ ] SLA monitoring and auto-escalation
- [ ] GitHub webhook integration
- [ ] **Deliverable:** Decisions come to you, not the other way around
### Phase 5: Advanced Features (Week 9-10) — "The Polish"
- [ ] MCP Apps for complex reviews (code diffs, forms in chat)
- [ ] Batch approval processor
- [ ] MCP prompts (review, deploy checklist, retrospective)
- [ ] Analytics dashboard (decision velocity, bottleneck analysis)
- [ ] Confidence-based auto-routing
- [ ] Undo/rollback for 24h post-approval
- [ ] **Deliverable:** Full SaaS-grade product
### Phase 6: SaaS-ify (Week 11-12) — "The Product"
- [ ] Multi-tenant support (separate factory instances)
- [ ] User management + team roles
- [ ] Billing integration
- [ ] Landing page + docs
- [ ] Onboarding flow
- [ ] **Deliverable:** Sellable product
---
## 12. WHAT MAKES THIS DIFFERENT FROM EXISTING TOOLS
| Tool | What It Does | What We Do Better |
|------|-------------|------------------|
| **Devin** | Autonomous coding agent | We're a factory MANAGER, not a single agent |
| **Cursor/Windsurf** | IDE with AI | We manage pipelines of 64+ servers, not single files |
| **n8n/Zapier** | Workflow automation | We're AI-agent-native with MCP, not just webhooks |
| **Linear/Jira** | Project management | We have AI agents doing the work, humans just decide |
| **Retool** | Internal tools | We're purpose-built for AI agent factories |
| **Goose (vanilla)** | General AI assistant | We're a specialized factory operator |
**The unique value:** No one has built a purpose-built human-in-the-loop command center specifically for managing fleets of AI agents building MCP servers. You'd be first.
---
## 13. IMMEDIATE NEXT STEPS
1. **Jake reviews this plan** What's missing? What's wrong? What's the priority?
2. **Fork Goose** Clone, rebrand, get building running locally
3. **Spike the MCP Server** Build the 3 most critical tools (get_pending, approve, reject) and test in Goose
4. **Spike the Decision Queue UI** Mockup the sidebar in GooseFactory's React app
5. **Wire to existing pipeline** Connect to `mcp-command-center/state.json` as initial data source
**The MVP is:** Type "what needs my attention?" in GooseFactory get a prioritized list approve/reject from chat. Everything else builds on that.
---
## SUPPORTING RESEARCH DOCS
| Doc | Words | Focus |
|-----|-------|-------|
| `research-goose-architecture.md` | ~3,000 | Goose codebase, fork strategy, MCP integration |
| `research-hitl-ux-patterns.md` | ~5,500 | Every HITL interaction type, UI patterns, 10 products analyzed |
| `research-factory-api-architecture.md` | ~4,000 | API design, MCP server spec, database schema, real-time events |
| `research-agent-orchestration-patterns.md` | ~3,500 | LangGraph, Temporal, Inngest, state machines, notification patterns |
---
*"The best interface for managing AI agents isn't more AI — it's making it painfully obvious when a human needs to do something, and making that something take one click."*