clawdbot-workspace/research-factory-api-architecture.md

# AI Agent Factory SaaS — API & MCP Server Architecture

> Comprehensive architecture document for the Factory Dashboard backend, API layer, MCP server integration, and real-time human-in-the-loop pipeline management.
>
> **Date:** 2026-02-06
> **Status:** Research / Design Phase
> **Author:** Architecture Research Agent

---

## Table of Contents

1. [System Overview](#1-system-overview)
2. [API Design (RESTful + WebSocket + GraphQL)](#2-api-design)
3. [MCP Server Design for Factory Operations](#3-mcp-server-design)
4. [Database Schema Design](#4-database-schema-design)
5. [Real-time Architecture](#5-real-time-architecture)
6. [Integration Points](#6-integration-points)
7. [Research: Existing Patterns & Prior Art](#7-existing-patterns)
8. [Deployment & Scaling Considerations](#8-deployment--scaling)
9. [Security Architecture](#9-security-architecture)
10. [Implementation Roadmap](#10-implementation-roadmap)

---

## 1. System Overview

### Architecture Diagram (Text-Based)

```
┌─────────────────────────────────────────────────────────────────────┐
│                        CLIENT LAYER                                  │
│                                                                      │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              │
│  │  Factory      │  │  Goose Chat  │  │  Discord/    │              │
│  │  Dashboard    │  │  UI (MCP     │  │  Slack Bot   │              │
│  │  (React SPA)  │  │  Client)     │  │              │              │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘              │
│         │ REST/WS          │ MCP/stdio        │ Webhooks            │
└─────────┼──────────────────┼──────────────────┼─────────────────────┘
          │                  │                  │
┌─────────┼──────────────────┼──────────────────┼─────────────────────┐
│         ▼                  ▼                  ▼   API GATEWAY       │
│  ┌─────────────────────────────────────────────────────────┐        │
│  │              API Gateway / Load Balancer                  │        │
│  │         (Rate Limiting, Auth, Routing, CORS)             │        │
│  └────────┬────────────┬──────────────┬─────────────────────┘        │
│           │            │              │                               │
│  ┌────────▼─────┐ ┌───▼──────┐ ┌─────▼────────┐                    │
│  │ REST API     │ │ WebSocket│ │ MCP Server   │                    │
│  │ Server       │ │ Server   │ │ (stdio/SSE)  │                    │
│  │ (Express/    │ │ (ws)     │ │              │                    │
│  │  Hono)       │ │          │ │              │                    │
│  └──────┬───────┘ └────┬─────┘ └──────┬───────┘                    │
└─────────┼──────────────┼──────────────┼─────────────────────────────┘
          │              │              │
┌─────────┼──────────────┼──────────────┼─────────────────────────────┐
│         ▼              ▼              ▼    CORE SERVICES             │
│  ┌─────────────────────────────────────────────────┐                │
│  │              Service Layer                        │                │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐         │                │
│  │  │Pipeline  │ │Task      │ │Approval  │         │                │
│  │  │Service   │ │Queue Svc │ │Service   │         │                │
│  │  └──────────┘ └──────────┘ └──────────┘         │                │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐         │                │
│  │  │Agent     │ │Asset     │ │Notif.    │         │                │
│  │  │Health Svc│ │Service   │ │Service   │         │                │
│  │  └──────────┘ └──────────┘ └──────────┘         │                │
│  │  ┌──────────┐ ┌──────────┐                       │                │
│  │  │Deploy    │ │Audit     │                       │                │
│  │  │Service   │ │Service   │                       │                │
│  │  └──────────┘ └──────────┘                       │                │
│  └───────────────────────┬─────────────────────────┘                │
└──────────────────────────┼──────────────────────────────────────────┘
                           │
┌──────────────────────────┼──────────────────────────────────────────┐
│                          ▼    DATA LAYER                            │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐           │
│  │PostgreSQL│  │  Redis    │  │  S3/R2   │  │Event Bus │           │
│  │(Primary  │  │(Cache +   │  │(Assets/  │  │(Redis    │           │
│  │ Store)   │  │ Pub/Sub)  │  │ Builds)  │  │ Streams) │           │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘           │
└────────────────────────────────────────────────────────────────────┘
```

### Core Principles

1. **Human-in-the-Loop First** — Every automated pipeline step can be gated by a human approval checkpoint.
2. **Durable Execution** — Inspired by Temporal.io; pipeline state survives crashes, restarts, and network failures.
3. **MCP-Native** — The factory exposes its operations as MCP tools, resources, and prompts so AI agents (Goose, Claude, etc.) can drive the factory conversationally.
4. **Event-Sourced Audit Trail** — Every state change is an immutable event, enabling full replay and compliance.
5. **Multi-Channel Notifications** — No decision falls through the cracks; escalation paths through Dashboard → Discord → SMS.

---

## 2. API Design

### 2.1 Base URL & Versioning Strategy

```
Base URL:  https://api.agentfactory.dev/v1
Pattern:   URL path versioning (v1, v2)
Why:       Simple, explicit, cache-friendly. Header-based versioning adds complexity
           without benefit at our scale.
Sunset:    v(N-2) deprecated with 90-day sunset headers.
Headers:   X-API-Version (response), Sunset (deprecation date)
```

### 2.2 Authentication & Authorization

```
Auth Method:  Bearer JWT (short-lived, 15min) + Refresh tokens (7d)
API Keys:     For server-to-server / CI/CD integrations (scoped, rotatable)
OAuth 2.0:    For third-party integrations (GitHub, Discord)
RBAC Roles:   owner, admin, operator, viewer, agent (non-human)

Scopes:
  - pipelines:read, pipelines:write
  - tasks:read, tasks:approve, tasks:reject
  - deploy:staging, deploy:production
  - agents:manage
  - assets:read, assets:write
  - audit:read
```

### 2.3 RESTful Endpoints

#### Pipeline Management

```http
# List all pipelines (with filtering + pagination)
GET    /v1/pipelines?status=active&stage=testing&page=1&limit=20

# Create a new pipeline
POST   /v1/pipelines
Body: {
  "name": "ghl-mcp-server",
  "template": "mcp-server-standard",
  "config": { "platform": "go-high-level", "runtime": "node" },
  "priority": "high",
  "assignee_id": "user_abc123"
}

# Get pipeline details (includes current stage, tasks, timeline)
GET    /v1/pipelines/:pipeline_id

# Update pipeline configuration
PATCH  /v1/pipelines/:pipeline_id
Body: { "priority": "critical", "config": { ... } }

# Delete / archive pipeline
DELETE /v1/pipelines/:pipeline_id?archive=true

# Get pipeline stage history (state machine transitions)
GET    /v1/pipelines/:pipeline_id/stages

# Advance pipeline to next stage (triggers validation)
POST   /v1/pipelines/:pipeline_id/stages/advance
Body: { "target_stage": "testing", "notes": "Scaffolding complete" }
```

#### Stage Transitions & Approvals

```http
# List pending approvals (across all pipelines or filtered)
GET    /v1/approvals?status=pending&pipeline_id=xyz&assignee=me

# Get approval details
GET    /v1/approvals/:approval_id

# Approve a gate
POST   /v1/approvals/:approval_id/approve
Body: { "notes": "LGTM, tests passing", "conditions": [] }

# Reject with feedback
POST   /v1/approvals/:approval_id/reject
Body: { "reason": "Test coverage below 80%", "requested_changes": ["Add edge case tests"] }

# Escalate an approval (reassign or bump priority)
POST   /v1/approvals/:approval_id/escalate
Body: { "escalate_to": "user_xyz", "reason": "SLA breach in 2h" }
```

#### Task Queue Management

```http
# List all tasks requiring human attention
GET    /v1/tasks?status=pending&priority=high&sort=-sla_deadline

# Get task details with full context
GET    /v1/tasks/:task_id

# Claim a task (assigns to current user)
POST   /v1/tasks/:task_id/claim

# Complete a task with decision
POST   /v1/tasks/:task_id/complete
Body: { "decision": "approved", "output": { ... }, "notes": "..." }

# Reassign a task
POST   /v1/tasks/:task_id/reassign
Body: { "assignee_id": "user_xyz" }

# Bulk operations
POST   /v1/tasks/bulk
Body: { "action": "approve", "task_ids": ["t1", "t2", "t3"], "notes": "Batch approval" }

# Task queue statistics
GET    /v1/tasks/stats
Response: {
  "pending": 12,
  "in_progress": 3,
  "blocked": 2,
  "avg_wait_time_minutes": 45,
  "sla_breaches": 1,
  "by_priority": { "critical": 1, "high": 4, "medium": 5, "low": 2 }
}
```

#### Agent Status & Health Monitoring

```http
# List all agents (AI workers, build agents, deploy agents)
GET    /v1/agents

# Get specific agent status
GET    /v1/agents/:agent_id
Response: {
  "id": "agent_goose_01",
  "type": "ai_builder",
  "status": "active",
  "current_task": "task_xyz",
  "pipeline": "ghl-mcp-server",
  "health": {
    "uptime_seconds": 84200,
    "tasks_completed_24h": 47,
    "error_rate": 0.02,
    "last_heartbeat": "2026-02-06T17:00:00Z"
  }
}

# Agent heartbeat (agents POST their status)
POST   /v1/agents/:agent_id/heartbeat
Body: { "status": "active", "metrics": { ... } }

# Agent activity log
GET    /v1/agents/:agent_id/activity?since=2026-02-06T00:00:00Z
```

#### Asset Management

```http
# List assets for a pipeline
GET    /v1/pipelines/:pipeline_id/assets?type=code&stage=scaffolding

# Upload an asset
POST   /v1/assets
Body (multipart): {
  "pipeline_id": "pipe_xyz",
  "stage": "scaffolding",
  "type": "code",          # code | config | docs | build | test_report
  "file": <binary>,
  "metadata": { "path": "src/index.ts", "generated_by": "agent_goose_01" }
}

# Get asset (download or metadata)
GET    /v1/assets/:asset_id
GET    /v1/assets/:asset_id/download

# Diff between asset versions
GET    /v1/assets/:asset_id/diff?from_version=3&to_version=4
```

#### Notifications & Webhooks

```http
# List notifications for current user
GET    /v1/notifications?unread=true

# Mark notification as read
POST   /v1/notifications/:id/read

# Register a webhook
POST   /v1/webhooks
Body: {
  "url": "https://example.com/webhook",
  "events": ["pipeline.stage_changed", "approval.pending", "task.sla_breach"],
  "secret": "whsec_..."
}

# List webhooks
GET    /v1/webhooks

# Webhook event types
GET    /v1/webhooks/events
```

#### Audit Trail

```http
# Query audit log (immutable, append-only)
GET    /v1/audit?entity_type=pipeline&entity_id=xyz&since=2026-02-01

# Each entry:
{
  "id": "audit_001",
  "timestamp": "2026-02-06T15:30:00Z",
  "actor": { "type": "user", "id": "user_abc" },
  "action": "approval.approved",
  "entity": { "type": "approval", "id": "appr_xyz" },
  "changes": { "status": { "from": "pending", "to": "approved" } },
  "metadata": { "ip": "...", "user_agent": "..." }
}
```

### 2.4 WebSocket Events

```
Connection:  wss://api.agentfactory.dev/v1/ws?token=<jwt>

Channel subscription model:
  → { "type": "subscribe", "channels": ["pipeline:*", "tasks:pending", "agents:health"] }
  ← { "type": "subscribed", "channels": [...] }

Events emitted:

  pipeline.created          — New pipeline initialized
  pipeline.stage_changed    — Pipeline moved to new stage
  pipeline.completed        — Pipeline finished (success/failure)
  pipeline.blocked          — Pipeline blocked on human decision

  task.created              — New task in queue
  task.claimed              — Task claimed by a user
  task.completed            — Task completed (approved/rejected)
  task.sla_warning          — SLA deadline approaching (30min, 10min)
  task.sla_breached         — SLA exceeded, escalation triggered

  approval.pending          — New approval gate awaiting decision
  approval.resolved         — Approval approved or rejected

  agent.status_changed      — Agent went active/idle/error/offline
  agent.heartbeat           — Periodic agent health update

  asset.created             — New asset uploaded
  asset.updated             — Asset version bumped

  deploy.started            — Deployment initiated
  deploy.succeeded          — Deployment successful
  deploy.failed             — Deployment failed

Event payload format:
{
  "type": "pipeline.stage_changed",
  "timestamp": "2026-02-06T15:30:00Z",
  "data": {
    "pipeline_id": "pipe_xyz",
    "from_stage": "scaffolding",
    "to_stage": "testing",
    "triggered_by": "agent_goose_01"
  }
}
```

### 2.5 GraphQL Considerations

A secondary GraphQL endpoint at `/v1/graphql` provides flexible querying for the dashboard, avoiding over-fetching across the deeply nested pipeline → stages → tasks → assets hierarchy.

```graphql
type Query {
  # Dashboard summary — one query gets everything
  dashboard: DashboardSummary!

  # Flexible pipeline querying with nested resolution
  pipelines(
    filter: PipelineFilter
    sort: PipelineSort
    pagination: Pagination
  ): PipelineConnection!

  pipeline(id: ID!): Pipeline

  # Cross-entity queries (hard with REST)
  pendingDecisions(assignee: ID): [Task!]!
  blockers: [Blocker!]!
}

type Pipeline {
  id: ID!
  name: String!
  currentStage: Stage!
  stages: [Stage!]!
  tasks(status: TaskStatus): [Task!]!
  assets: [Asset!]!
  agent: Agent
  metrics: PipelineMetrics!
  audit: [AuditEntry!]!
}

type Subscription {
  pipelineUpdated(id: ID): Pipeline!
  taskCreated: Task!
  approvalPending: Approval!
}
```

**Decision: REST-primary, GraphQL-secondary.** REST handles mutations and simple queries. GraphQL handles complex dashboard reads and real-time subscriptions. This avoids the "GraphQL for everything" trap while gaining its benefits for deeply nested reads.

---

## 3. MCP Server Design for Factory Operations

The Factory MCP Server is the bridge between AI agents (Goose, Claude, etc.) and the Factory API. It exposes the factory's capabilities as MCP tools, resources, and prompts — enabling conversational factory management.

### 3.1 MCP Tools

```typescript
// ─── TASK & APPROVAL TOOLS ────────────────────────────────────

tool("factory_get_pending_tasks", {
  description: "Get all tasks requiring human attention, sorted by priority and SLA urgency",
  parameters: {
    pipeline_id: { type: "string", optional: true, description: "Filter by pipeline" },
    priority: { type: "string", enum: ["critical","high","medium","low"], optional: true },
    assignee: { type: "string", optional: true, description: "Filter by assignee (use 'me' for current)" },
    limit: { type: "number", optional: true, default: 20 }
  },
  returns: "Array of pending tasks with context, priority, SLA status, and blocking info"
})

tool("factory_approve_task", {
  description: "Approve a pending task or approval gate, allowing the pipeline to proceed",
  parameters: {
    task_id: { type: "string", required: true },
    notes: { type: "string", optional: true, description: "Approval notes or conditions" },
    conditions: { type: "array", items: { type: "string" }, optional: true }
  },
  returns: "Updated task status and next pipeline action triggered"
})

tool("factory_reject_task", {
  description: "Reject a pending task with feedback, sends pipeline back for rework",
  parameters: {
    task_id: { type: "string", required: true },
    reason: { type: "string", required: true, description: "Why this was rejected" },
    requested_changes: { type: "array", items: { type: "string" }, optional: true },
    severity: { type: "string", enum: ["minor","major","critical"], default: "major" }
  },
  returns: "Rejection recorded, pipeline stage reverted, agent notified"
})

// ─── PIPELINE MANAGEMENT TOOLS ────────────────────────────────

tool("factory_get_pipeline_status", {
  description: "Get current state of all pipelines or a specific pipeline with full context",
  parameters: {
    pipeline_id: { type: "string", optional: true, description: "Specific pipeline (omit for all)" },
    include_details: { type: "boolean", default: false, description: "Include tasks, assets, logs" }
  },
  returns: "Pipeline(s) with current stage, progress %, blockers, and timeline"
})

tool("factory_advance_stage", {
  description: "Manually advance a pipeline to its next stage (triggers validation checks)",
  parameters: {
    pipeline_id: { type: "string", required: true },
    target_stage: { type: "string", optional: true, description: "Target stage (default: next in sequence)" },
    skip_validation: { type: "boolean", default: false, description: "Skip gate checks (admin only)" },
    notes: { type: "string", optional: true }
  },
  returns: "Stage transition result, any validation failures, new pending tasks created"
})

tool("factory_assign_priority", {
  description: "Set or change priority on a pipeline or task",
  parameters: {
    entity_type: { type: "string", enum: ["pipeline","task"] },
    entity_id: { type: "string", required: true },
    priority: { type: "string", enum: ["critical","high","medium","low"], required: true },
    reason: { type: "string", optional: true }
  },
  returns: "Updated priority, SLA recalculated, notifications sent if escalated"
})

tool("factory_get_blockers", {
  description: "Get all items that are blocked and why — the 'what's stuck' view",
  parameters: {
    pipeline_id: { type: "string", optional: true },
    include_suggestions: { type: "boolean", default: true }
  },
  returns: "Array of blockers with root cause, duration blocked, and suggested actions"
})

// ─── BUILD & DEPLOY TOOLS ────────────────────────────────────

tool("factory_run_tests", {
  description: "Trigger test suite for a pipeline's current build",
  parameters: {
    pipeline_id: { type: "string", required: true },
    test_type: { type: "string", enum: ["unit","integration","e2e","all"], default: "all" },
    environment: { type: "string", enum: ["local","ci"], default: "ci" }
  },
  returns: "Test run ID, real-time status updates via resource subscription"
})

tool("factory_deploy", {
  description: "Deploy a pipeline's build to staging or production",
  parameters: {
    pipeline_id: { type: "string", required: true },
    target: { type: "string", enum: ["staging","production"], required: true },
    version: { type: "string", optional: true, description: "Specific version (default: latest)" },
    dry_run: { type: "boolean", default: false }
  },
  returns: "Deployment ID, status, URL. Production requires prior staging deploy + approval."
})

// ─── UTILITY TOOLS ────────────────────────────────────────────

tool("factory_search", {
  description: "Search across pipelines, tasks, assets, and logs",
  parameters: {
    query: { type: "string", required: true },
    entity_types: { type: "array", items: { type: "string" }, optional: true },
    limit: { type: "number", default: 10 }
  }
})

tool("factory_create_pipeline", {
  description: "Initialize a new MCP server pipeline from a template",
  parameters: {
    name: { type: "string", required: true },
    platform: { type: "string", required: true, description: "Target platform (e.g., 'go-high-level')" },
    template: { type: "string", default: "mcp-server-standard" },
    priority: { type: "string", enum: ["critical","high","medium","low"], default: "medium" }
  }
})
```

### 3.2 MCP Resources

Resources provide read-only, subscribable data that AI agents can reference contextually.

```typescript
// Dynamic resource: specific pipeline state
resource("factory://pipelines/{pipeline_id}/state", {
  description: "Current state of a specific pipeline including stage, progress, and blockers",
  mimeType: "application/json"
})

// Dynamic resource: individual server status
resource("factory://servers/{server_name}/status", {
  description: "Health, version, deployment status of a specific MCP server",
  mimeType: "application/json"
})

// Dynamic resource: test results for a pipeline
resource("factory://pipelines/{pipeline_id}/test-results", {
  description: "Latest test results with pass/fail counts, coverage, and failure details",
  mimeType: "application/json"
})

// Dynamic resource: build logs
resource("factory://pipelines/{pipeline_id}/build-logs", {
  description: "Build output logs for the latest or specified build",
  mimeType: "text/plain"
})

// Static resource: factory dashboard summary
resource("factory://dashboard/summary", {
  description: "High-level factory status: active pipelines, pending tasks, blockers, health",
  mimeType: "application/json"
})

// Static resource: factory configuration / templates
resource("factory://config/templates", {
  description: "Available pipeline templates and their configurations",
  mimeType: "application/json"
})

// Resource subscription for real-time updates
// Clients subscribe to resources and receive notifications on change
```

### 3.3 MCP Prompts

Prompts are pre-built conversation starters that pull relevant context and structure the AI's response.

```typescript
prompt("review_server", {
  description: "Pull all context needed to review an MCP server: code quality, test results, docs, deployment readiness",
  arguments: {
    server_name: { type: "string", required: true, description: "Name of the MCP server to review" }
  },
  // Returns a structured prompt with:
  // - Pipeline status and history
  // - Latest test results and coverage
  // - Code review checklist
  // - Asset inventory (README, TOOLS.md, package.json)
  // - Open tasks/blockers
  // - Comparison against quality standards
})

prompt("whats_needs_attention", {
  description: "Summary of everything that needs human attention right now, prioritized by urgency",
  arguments: {
    scope: { type: "string", optional: true, description: "all | my-tasks | critical-only" }
  },
  // Returns:
  // - SLA breaches (most urgent first)
  // - Pending approvals
  // - Blocked pipelines
  // - Failed tests/deploys
  // - Agent health issues
  // - Suggested next actions
})

prompt("deploy_checklist", {
  description: "Pre-deployment review checklist for an MCP server",
  arguments: {
    pipeline_id: { type: "string", required: true },
    target: { type: "string", enum: ["staging","production"], required: true }
  },
  // Returns structured checklist:
  // ☐ All tests passing (unit, integration, e2e)
  // ☐ Code review approved
  // ☐ README.md complete and accurate
  // ☐ TOOLS.md documents all tools
  // ☐ Environment variables documented
  // ☐ No hardcoded secrets
  // ☐ Performance benchmarks acceptable
  // ☐ Staging deploy tested
  // ☐ Rollback plan documented
  // ☐ Monitoring/alerting configured
})

prompt("pipeline_retrospective", {
  description: "Generate a retrospective analysis of a completed pipeline",
  arguments: {
    pipeline_id: { type: "string", required: true }
  }
  // Returns: timeline, bottlenecks, quality metrics, lessons learned
})
```

---

## 4. Database Schema Design

### 4.1 Core Tables

```sql
-- ═══════════════════════════════════════════════════════════
-- PIPELINES — The top-level entity, one per MCP server build
-- ═══════════════════════════════════════════════════════════

CREATE TABLE pipelines (
  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name            VARCHAR(255) NOT NULL,
  slug            VARCHAR(255) UNIQUE NOT NULL,
  template        VARCHAR(100) NOT NULL DEFAULT 'mcp-server-standard',
  platform        VARCHAR(100),                          -- e.g., 'go-high-level'

  -- State machine
  current_stage   VARCHAR(50) NOT NULL DEFAULT 'intake',
  status          VARCHAR(30) NOT NULL DEFAULT 'active', -- active, paused, completed, failed, archived
  priority        VARCHAR(20) NOT NULL DEFAULT 'medium', -- critical, high, medium, low

  -- Ownership
  created_by      UUID REFERENCES users(id),
  assignee_id     UUID REFERENCES users(id),

  -- Configuration (JSONB for flexibility)
  config          JSONB DEFAULT '{}',
  metadata        JSONB DEFAULT '{}',

  -- SLA tracking
  sla_deadline    TIMESTAMPTZ,
  started_at      TIMESTAMPTZ DEFAULT NOW(),
  completed_at    TIMESTAMPTZ,

  -- Timestamps
  created_at      TIMESTAMPTZ DEFAULT NOW(),
  updated_at      TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_pipelines_status ON pipelines(status);
CREATE INDEX idx_pipelines_stage ON pipelines(current_stage);
CREATE INDEX idx_pipelines_priority ON pipelines(priority);
CREATE INDEX idx_pipelines_assignee ON pipelines(assignee_id);

-- ═══════════════════════════════════════════════════════════
-- STAGES — Stage definitions and transitions per pipeline
-- ═══════════════════════════════════════════════════════════

CREATE TABLE pipeline_stages (
  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  pipeline_id     UUID REFERENCES pipelines(id) ON DELETE CASCADE,
  stage_name      VARCHAR(50) NOT NULL,
  stage_order     INT NOT NULL,

  -- State
  status          VARCHAR(30) NOT NULL DEFAULT 'pending', -- pending, active, completed, skipped, failed

  -- Gate configuration
  requires_approval BOOLEAN DEFAULT false,
  approval_type     VARCHAR(30),                          -- manual, auto, conditional
  auto_advance      BOOLEAN DEFAULT false,

  -- Validation rules (JSONB: test coverage thresholds, required assets, etc.)
  validation_rules  JSONB DEFAULT '{}',

  -- Timing
  entered_at      TIMESTAMPTZ,
  completed_at    TIMESTAMPTZ,
  duration_seconds INT,

  created_at      TIMESTAMPTZ DEFAULT NOW(),

  UNIQUE(pipeline_id, stage_name)
);

-- ═══════════════════════════════════════════════════════════
-- TASKS — Work items requiring human decisions
-- ═══════════════════════════════════════════════════════════

CREATE TABLE tasks (
  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  pipeline_id     UUID REFERENCES pipelines(id),
  stage_name      VARCHAR(50),

  -- Task details
  type            VARCHAR(50) NOT NULL,  -- approval, review, decision, manual_action, fix_required
  title           VARCHAR(500) NOT NULL,
  description     TEXT,
  context         JSONB DEFAULT '{}',    -- Rich context for the decision-maker

  -- Queue management
  status          VARCHAR(30) NOT NULL DEFAULT 'pending', -- pending, claimed, in_progress, completed, expired, escalated
  priority        VARCHAR(20) NOT NULL DEFAULT 'medium',

  -- Assignment
  assignee_id     UUID REFERENCES users(id),
  claimed_at      TIMESTAMPTZ,
  claimed_by      UUID REFERENCES users(id),

  -- Decision tracking
  decision        VARCHAR(30),           -- approved, rejected, deferred, escalated
  decision_notes  TEXT,
  decision_data   JSONB DEFAULT '{}',
  decided_at      TIMESTAMPTZ,
  decided_by      UUID REFERENCES users(id),

  -- SLA
  sla_deadline    TIMESTAMPTZ,
  sla_warning_sent BOOLEAN DEFAULT false,
  sla_breached    BOOLEAN DEFAULT false,
  escalation_level INT DEFAULT 0,

  -- What this task blocks
  blocks_stage_advance BOOLEAN DEFAULT false,
  blocks_pipeline_id   UUID REFERENCES pipelines(id),

  created_at      TIMESTAMPTZ DEFAULT NOW(),
  updated_at      TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_tasks_status ON tasks(status);
CREATE INDEX idx_tasks_priority_sla ON tasks(priority, sla_deadline);
CREATE INDEX idx_tasks_assignee ON tasks(assignee_id);
CREATE INDEX idx_tasks_pipeline ON tasks(pipeline_id);
CREATE INDEX idx_tasks_pending ON tasks(status) WHERE status = 'pending';

-- ═══════════════════════════════════════════════════════════
-- APPROVALS — Formal gate approvals (linked to tasks)
-- ═══════════════════════════════════════════════════════════

CREATE TABLE approvals (
  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  task_id         UUID REFERENCES tasks(id),
  pipeline_id     UUID REFERENCES pipelines(id),
  stage_name      VARCHAR(50) NOT NULL,

  -- Approval details
  type            VARCHAR(30) NOT NULL,  -- stage_gate, deploy, code_review, manual_check
  status          VARCHAR(30) NOT NULL DEFAULT 'pending',

  -- Decision
  approved_by     UUID REFERENCES users(id),
  approved_at     TIMESTAMPTZ,
  rejection_reason TEXT,
  conditions      JSONB DEFAULT '[]',    -- Conditional approvals

  -- Required approvers (for multi-party approvals)
  required_approvers  INT DEFAULT 1,
  current_approvers   INT DEFAULT 0,

  created_at      TIMESTAMPTZ DEFAULT NOW()
);

-- ═══════════════════════════════════════════════════════════
-- AGENTS — AI workers and build/deploy agents
-- ═══════════════════════════════════════════════════════════

CREATE TABLE agents (
  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name            VARCHAR(255) NOT NULL,
  type            VARCHAR(50) NOT NULL,  -- ai_builder, test_runner, deployer, monitor

  -- Status
  status          VARCHAR(30) NOT NULL DEFAULT 'idle', -- idle, active, error, offline
  current_task_id UUID REFERENCES tasks(id),
  current_pipeline_id UUID REFERENCES pipelines(id),

  -- Health
  last_heartbeat  TIMESTAMPTZ,
  uptime_seconds  INT DEFAULT 0,
  error_count_24h INT DEFAULT 0,

  -- Configuration
  capabilities    JSONB DEFAULT '[]',
  config          JSONB DEFAULT '{}',

  -- Metrics
  tasks_completed_total INT DEFAULT 0,
  avg_task_duration_seconds INT,

  created_at      TIMESTAMPTZ DEFAULT NOW(),
  updated_at      TIMESTAMPTZ DEFAULT NOW()
);

-- ═══════════════════════════════════════════════════════════
-- ASSETS — Generated files, code, configs, build artifacts
-- ═══════════════════════════════════════════════════════════

CREATE TABLE assets (
  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  pipeline_id     UUID REFERENCES pipelines(id),
  stage_name      VARCHAR(50),

  -- Asset info
  type            VARCHAR(30) NOT NULL,  -- code, config, docs, build, test_report, screenshot
  name            VARCHAR(500) NOT NULL,
  path            VARCHAR(1000),         -- Logical path (e.g., src/index.ts)

  -- Storage
  storage_key     VARCHAR(1000) NOT NULL, -- S3/R2 key
  storage_bucket  VARCHAR(255),
  size_bytes      BIGINT,
  content_type    VARCHAR(255),
  checksum_sha256 VARCHAR(64),

  -- Versioning
  version         INT DEFAULT 1,
  previous_version_id UUID REFERENCES assets(id),

  -- Provenance
  generated_by    UUID,                  -- Could be user or agent
  generator_type  VARCHAR(30),           -- user, agent, ci

  metadata        JSONB DEFAULT '{}',

  created_at      TIMESTAMPTZ DEFAULT NOW()
);

-- ═══════════════════════════════════════════════════════════
-- NOTIFICATIONS — Multi-channel notification queue
-- ═══════════════════════════════════════════════════════════

CREATE TABLE notifications (
  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id         UUID REFERENCES users(id),

  -- Content
  type            VARCHAR(50) NOT NULL,  -- task_assigned, approval_pending, sla_warning, deploy_status
  title           VARCHAR(500) NOT NULL,
  body            TEXT,
  data            JSONB DEFAULT '{}',

  -- Delivery
  channels        JSONB DEFAULT '["dashboard"]', -- dashboard, discord, email, sms
  delivered_via   JSONB DEFAULT '[]',

  -- Status
  read            BOOLEAN DEFAULT false,
  read_at         TIMESTAMPTZ,
  dismissed       BOOLEAN DEFAULT false,

  -- Linking
  entity_type     VARCHAR(50),
  entity_id       UUID,
  action_url      VARCHAR(1000),

  created_at      TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_notifications_user_unread ON notifications(user_id, read) WHERE read = false;

-- ═══════════════════════════════════════════════════════════
-- AUDIT LOG — Immutable event log
-- ═══════════════════════════════════════════════════════════

CREATE TABLE audit_log (
  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),

  -- Who
  actor_type      VARCHAR(30) NOT NULL,  -- user, agent, system, webhook
  actor_id        UUID,
  actor_name      VARCHAR(255),

  -- What
  action          VARCHAR(100) NOT NULL, -- e.g., pipeline.created, task.approved, deploy.started

  -- On what
  entity_type     VARCHAR(50) NOT NULL,
  entity_id       UUID NOT NULL,

  -- Details
  changes         JSONB DEFAULT '{}',    -- { field: { from: x, to: y } }
  metadata        JSONB DEFAULT '{}',    -- IP, user agent, etc.

  -- Immutable timestamp
  created_at      TIMESTAMPTZ DEFAULT NOW() NOT NULL
);

-- Partition by month for performance
CREATE INDEX idx_audit_entity ON audit_log(entity_type, entity_id);
CREATE INDEX idx_audit_action ON audit_log(action);
CREATE INDEX idx_audit_actor ON audit_log(actor_type, actor_id);
CREATE INDEX idx_audit_time ON audit_log(created_at);
```

### 4.2 Pipeline State Machine

```
┌─────────┐     ┌──────────────┐     ┌──────────┐     ┌───────────┐
│  INTAKE │────▶│ SCAFFOLDING  │────▶│ BUILDING │────▶│  TESTING  │
│         │     │              │     │          │     │           │
└─────────┘     └──────────────┘     └──────────┘     └─────┬─────┘
                                                            │
                                                    ┌───────▼───────┐
                                                    │    REVIEW     │
                                                    │ ★ HUMAN GATE  │
                                                    └───────┬───────┘
                                                            │
                         ┌──────────────┐           ┌───────▼───────┐
                         │  PRODUCTION  │◀──────────│   STAGING     │
                         │ ★ HUMAN GATE │           │ ★ HUMAN GATE  │
                         └──────┬───────┘           └───────────────┘
                                │
                         ┌──────▼───────┐
                         │  PUBLISHED   │
                         │  (Complete)  │
                         └──────────────┘

★ = Requires human approval (creates a task in the queue)

Transitions encoded as:
  intake → scaffolding         (auto, on pipeline creation)
  scaffolding → building       (auto, when scaffold complete)
  building → testing           (auto, when build succeeds)
  testing → review             (auto, when tests pass; creates approval task)
  review → staging             (manual, requires approval)
  staging → production         (manual, requires approval + staging tests pass)
  production → published       (auto, on successful deploy)

  ANY → failed                 (on unrecoverable error)
  failed → {previous_stage}    (manual retry with approval)
```

### 4.3 Blocker Tracking Query

```sql
-- "What's stuck and why?" — the factory_get_blockers query

WITH blocked_items AS (
  -- Tasks past SLA or pending too long
  SELECT
    'task' as blocker_type,
    t.id as entity_id,
    t.title,
    t.pipeline_id,
    p.name as pipeline_name,
    t.priority,
    t.sla_deadline,
    t.created_at,
    EXTRACT(EPOCH FROM (NOW() - t.created_at)) / 3600 as hours_blocked,
    t.sla_breached,
    t.assignee_id,
    CASE
      WHEN t.sla_breached THEN 'SLA breached'
      WHEN t.status = 'pending' AND t.created_at < NOW() - INTERVAL '4 hours' THEN 'Unclaimed for 4h+'
      WHEN t.assignee_id IS NULL THEN 'Unassigned'
      ELSE 'Awaiting decision'
    END as block_reason
  FROM tasks t
  JOIN pipelines p ON t.pipeline_id = p.id
  WHERE t.status IN ('pending', 'claimed')
    AND t.blocks_stage_advance = true

  UNION ALL

  -- Failed pipeline stages
  SELECT
    'stage' as blocker_type,
    ps.id,
    CONCAT(p.name, ' — ', ps.stage_name, ' failed'),
    p.id,
    p.name,
    p.priority,
    NULL,
    ps.entered_at,
    EXTRACT(EPOCH FROM (NOW() - ps.entered_at)) / 3600,
    false,
    p.assignee_id,
    'Stage failed — requires intervention'
  FROM pipeline_stages ps
  JOIN pipelines p ON ps.pipeline_id = p.id
  WHERE ps.status = 'failed'
    AND p.status = 'active'
)
SELECT * FROM blocked_items
ORDER BY
  CASE priority
    WHEN 'critical' THEN 0
    WHEN 'high' THEN 1
    WHEN 'medium' THEN 2
    WHEN 'low' THEN 3
  END,
  hours_blocked DESC;
```

---

## 5. Real-time Architecture

### 5.1 WebSocket vs SSE Decision

| Criteria | WebSocket | SSE |
|----------|-----------|-----|
| Direction | Bidirectional | Server → Client only |
| Reconnection | Manual | Automatic (built-in) |
| Protocol | ws:// | Standard HTTP |
| Proxy/CDN | Needs config | Works everywhere |
| Browser support | Universal | Universal (except IE) |
| Message types | Binary + Text | Text only |

**Decision: WebSocket for dashboard (bidirectional needed for subscriptions), SSE for MCP transport (one-way server events fit MCP's SSE transport spec).**

### 5.2 Event-Driven Architecture

```
┌──────────────────────────────────────────────────────────┐
│                    EVENT BUS (Redis Streams)              │
│                                                          │
│  Streams:                                                │
│    factory.pipelines    — All pipeline state changes     │
│    factory.tasks        — Task lifecycle events          │
│    factory.agents       — Agent status changes           │
│    factory.deploys      — Deploy lifecycle events        │
│    factory.notifications — Notification dispatch queue   │
│                                                          │
│  Consumer Groups:                                        │
│    ws-broadcaster       — Fans out to WebSocket clients  │
│    notification-worker  — Processes notification queue   │
│    discord-bridge       — Sends to Discord channels      │
│    sla-monitor          — Watches for SLA breaches       │
│    audit-writer         — Writes to audit_log table      │
│    mcp-resource-updater — Triggers MCP resource updates  │
└──────────────────────────────────────────────────────────┘
```

**Key events and their consumers:**

```
Event: task.created
  → ws-broadcaster:       Push to all subscribed dashboard clients
  → notification-worker:  Send notification to assignee
  → discord-bridge:       Post to #factory-tasks channel
  → sla-monitor:          Start SLA countdown timer

Event: task.sla_warning (30min before deadline)
  → notification-worker:  Escalation notification (dashboard + Discord DM)
  → ws-broadcaster:       Update task card with warning indicator

Event: task.sla_breached
  → notification-worker:  URGENT notification to assignee + admin
  → discord-bridge:       @mention in Discord with urgency
  → audit-writer:         Record SLA breach
  → sla-monitor:          Trigger escalation (reassign to admin)

Event: pipeline.stage_changed
  → ws-broadcaster:       Update pipeline view in real-time
  → audit-writer:         Record transition
  → mcp-resource-updater: Update factory://pipelines/{id}/state
```

### 5.3 SLA Escalation Ladder (No Decision Falls Through)

```
T+0min:    Task created → Notification to assignee (dashboard + Discord)
T+30min:   Reminder #1 → Dashboard badge + Discord DM
T+2h:      Reminder #2 → Discord channel mention + email
T+4h:      SLA Warning → Discord @here + push notification
T+SLA:     SLA Breach  → Auto-escalate to admin, Discord @everyone in factory channel
T+SLA+2h:  Critical    → SMS/phone notification to owner
```

---

## 6. Integration Points

### 6.1 Goose Chat UI ↔ MCP Server

The MCP server runs as a stdio transport process that Goose connects to directly. When a user types in Goose:

```
User: "What needs my attention?"

Goose → MCP Client → factory_get_pending_tasks(assignee: "me")
                   → factory_get_blockers()

Response rendered in chat:
  🔴 2 SLA breaches
  🟡 3 pending approvals
  🟢 5 pipelines running smoothly

  Priority items:
  1. [CRITICAL] GHL MCP Server — code review approval (SLA breached 2h ago)
  2. [HIGH] Stripe MCP Server — deploy to production approval
  3. [MEDIUM] Discord MCP Server — test failures need review
```

The "what needs attention" prompt pre-fetches all relevant resources and formats them conversationally. Users can then act directly:

```
User: "Approve the GHL code review"

Goose → MCP Client → factory_approve_task(task_id: "task_xyz", notes: "Approved via Goose")
```

### 6.2 Discord Integration Bridge

```typescript
// Discord bot that bridges factory events to Discord channels

// Channel mapping:
//   #factory-feed        — All pipeline events (low noise)
//   #factory-tasks       — Pending tasks that need human attention
//   #factory-deploys     — Deploy notifications
//   #factory-alerts      — SLA breaches, failures, critical issues

// Discord commands:
//   /factory status                — Dashboard summary
//   /factory approve <task_id>     — Approve a task
//   /factory reject <task_id>      — Reject with reason
//   /factory pipelines             — List active pipelines
//   /factory blockers              — What's stuck

// Embed format for pending approval:
{
  title: "🔔 Approval Required: GHL MCP Server → Production",
  color: 0xFFA500, // orange
  fields: [
    { name: "Pipeline", value: "ghl-mcp-server", inline: true },
    { name: "Stage", value: "staging → production", inline: true },
    { name: "Priority", value: "🔴 Critical", inline: true },
    { name: "Tests", value: "✅ 47/47 passing", inline: true },
    { name: "Coverage", value: "✅ 94%", inline: true },
    { name: "SLA Deadline", value: "<t:1707238800:R>", inline: true },
  ],
  // Interactive buttons:
  components: [
    { type: "button", label: "✅ Approve", style: "SUCCESS", custom_id: "approve:task_xyz" },
    { type: "button", label: "❌ Reject", style: "DANGER", custom_id: "reject:task_xyz" },
    { type: "button", label: "📋 Details", style: "LINK", url: "https://factory.agentfactory.dev/tasks/xyz" }
  ]
}
```

### 6.3 GitHub Integration

```
Events FROM GitHub → Factory:
  - PR opened          → Create review task in factory
  - PR review approved → Auto-advance code review stage
  - PR merged          → Trigger build stage
  - CI check passed    → Update test results resource
  - CI check failed    → Create "fix required" task

Events FROM Factory → GitHub:
  - Pipeline created   → Create tracking issue
  - Stage completed    → Update issue labels
  - Deploy succeeded   → Comment on PR with deploy URL
  - Pipeline completed → Close issue

GitHub App Permissions:
  - Pull requests (read/write)
  - Issues (read/write)
  - Checks (read/write)
  - Webhooks (receive)
```

### 6.4 CI/CD Pipeline Triggers

```yaml
# Factory triggers CI/CD via GitHub Actions dispatch or direct API

# On stage transition: building → testing
factory_event: pipeline.stage_changed
  when: to_stage == "testing"
  action:
    - github_actions.dispatch:
        workflow: "test-mcp-server.yml"
        inputs:
          pipeline_id: "{{ pipeline.id }}"
          server_name: "{{ pipeline.name }}"

# On deploy approval
factory_event: approval.approved
  when: stage == "production"
  action:
    - github_actions.dispatch:
        workflow: "deploy-mcp-server.yml"
        inputs:
          pipeline_id: "{{ pipeline.id }}"
          target: "production"
          version: "{{ pipeline.latest_build_version }}"
```

---

## 7. Existing Patterns & Prior Art

### 7.1 Temporal.io — Durable Human-in-the-Loop

Temporal's approach is the gold standard for human-in-the-loop workflows. Key takeaways:

- **Signals** allow external input (human decisions) to be sent to running workflows. The workflow pauses (via `workflow.wait_condition()`) until the signal arrives. This is exactly what our approval gates need.
- **Durability** means that if the server crashes while waiting for human input, the workflow resumes exactly where it left off. The human's decision (once signaled) is durably stored in the workflow history.
- **Queries** let external systems read workflow state without affecting it — perfect for our "get pipeline status" needs.
- **Child workflows** enable parallel processing of individual pipeline stages while a parent workflow orchestrates the overall pipeline.

**Applicability:** Our pipeline state machine should be modeled as a Temporal workflow (or Temporal-inspired durable execution). Each approval gate is a `wait_for_signal` point. Even if we don't use Temporal directly, we should adopt its patterns: signals for human input, durable state, and child workflows for parallelism.

### 7.2 Inngest — Event-Driven with `waitForEvent`

Inngest's `step.waitForEvent()` primitive is elegant for human-in-the-loop:

```typescript
// Inngest-style pattern for our factory:
const pipelineWorkflow = inngest.createFunction(
  { name: "MCP Pipeline" },
  { event: "pipeline.created" },
  async ({ event, step }) => {
    // Auto steps
    const scaffold = await step.run("scaffold", () => scaffoldServer(event));
    const build = await step.run("build", () => buildServer(scaffold));
    const tests = await step.run("test", () => runTests(build));

    // HUMAN GATE: wait for approval event
    const approval = await step.waitForEvent("wait-for-review", {
      event: "pipeline.review.completed",
      match: "data.pipeline_id",
      timeout: "72h",
    });

    if (!approval || approval.data.decision === "rejected") {
      // Handle rejection
      return { status: "rejected", reason: approval?.data.reason };
    }

    // Continue to deploy...
    await step.run("deploy-staging", () => deployToStaging(build));
  }
);
```

**Key insight:** The `waitForEvent` pattern with a `match` field and timeout is how we should implement our approval gates. The pipeline function literally pauses mid-execution until a human fires the approval event.

### 7.3 Linear — Workflow States with Automation

Linear's approach to workflow automation is relevant:

- **Workflow states** are customizable per team (like our pipeline stages)
- **Automations** trigger on state transitions (our event-driven system)
- **GraphQL API** for complex querying of deeply nested data (issues → projects → cycles)
- **Webhooks** fire on every state change with full payload
- **Triage** concept: new issues land in triage, must be explicitly moved to active (like our task queue)

**What we borrow:** Linear's triage model for our task queue. Items don't auto-proceed — they land in a queue where someone must explicitly act on them.

### 7.4 Vercel — Deployment Approvals

Vercel's deployment protection model:

- **Preview deploys** are automatic (like our staging)
- **Production deploys** can require approval from designated team members
- **Comment-based approvals** on the deployment (similar to PR reviews)
- **Rollback** is always one click away
- **Deployment checks** must pass before promotion is even available

**What we borrow:** The production deploy should be gated behind:
1. Successful staging deploy
2. Staging test pass
3. Explicit human approval
4. Pre-flight checks (the "deploy checklist" prompt)

---

## 8. Deployment & Scaling Considerations

### Technology Stack

```
Runtime:       Node.js (Hono framework for HTTP, native WebSocket)
Database:      PostgreSQL 16 (Neon or Supabase for managed)
Cache/PubSub:  Redis (Upstash for serverless, or self-hosted)
Object Store:  Cloudflare R2 or AWS S3
Queue:         Redis Streams (or BullMQ on top of Redis)
MCP Transport: stdio (local) + SSE (remote)
Auth:          JWT + API keys (consider Clerk or Auth.js)
CI/CD:         GitHub Actions
Hosting:       Railway / Fly.io (containers) or Vercel (edge functions)
Monitoring:    OpenTelemetry → Grafana Cloud
```

### Scaling Points

1. **WebSocket connections** — Use Redis pub/sub to fan out events across multiple server instances. Each WS server subscribes to Redis and pushes to its local clients.
2. **Task queue** — Redis Streams with consumer groups ensures exactly-once processing even with multiple workers.
3. **Database** — Read replicas for dashboard queries. Partition `audit_log` by month. JSONB indexes for metadata queries.
4. **MCP Server** — Stateless; each request hits the API layer. Can run as many instances as needed.

---

## 9. Security Architecture

### API Security Layers

```
1. TLS everywhere (HTTPS, WSS)
2. API Gateway rate limiting (per-user, per-IP)
3. JWT validation on every request (15min expiry)
4. RBAC permission check (scoped to resources)
5. Input validation (Zod schemas on every endpoint)
6. Webhook signature verification (HMAC-SHA256)
7. Audit logging (every mutation logged)
8. CORS whitelist (dashboard domain only)
9. Content Security Policy headers
10. SQL injection protection (parameterized queries, ORM)
```

### MCP Server Security

```
- MCP tools check permissions before executing (user's JWT passed through)
- Production deploy tool requires explicit 'deploy:production' scope
- All tool executions logged to audit trail
- Rate limiting on destructive operations (deploy, reject)
- Sensitive data (API keys, secrets) never exposed through MCP resources
```

---

## 10. Implementation Roadmap

### Phase 1: Foundation (Week 1-2)
- [ ] Database schema + migrations (PostgreSQL)
- [ ] Core REST API (pipelines, tasks, approvals CRUD)
- [ ] JWT authentication
- [ ] Basic audit logging

### Phase 2: Real-time + MCP (Week 3-4)
- [ ] WebSocket server + Redis pub/sub
- [ ] Event bus with Redis Streams
- [ ] MCP server with core tools (get_pending, approve, reject, status)
- [ ] MCP resources (pipeline state, dashboard summary)

### Phase 3: Integrations (Week 5-6)
- [ ] Discord bot bridge
- [ ] GitHub webhook integration
- [ ] CI/CD pipeline triggers
- [ ] Notification system with escalation ladder

### Phase 4: Dashboard (Week 7-8)
- [ ] React dashboard consuming REST + WebSocket
- [ ] GraphQL endpoint for complex queries
- [ ] MCP prompts (review, attention, deploy checklist)
- [ ] SLA monitoring and alerting

### Phase 5: Polish (Week 9-10)
- [ ] Bulk operations
- [ ] Advanced analytics and retrospectives
- [ ] Performance optimization
- [ ] Security audit
- [ ] Documentation and onboarding

---

## Appendix: Full Event Catalog

| Event | Trigger | Consumers |
|-------|---------|-----------|
| `pipeline.created` | New pipeline initialized | ws, audit, discord, mcp |
| `pipeline.stage_changed` | Stage transition | ws, audit, discord, mcp, github |
| `pipeline.blocked` | Human gate reached | ws, audit, discord, notification |
| `pipeline.completed` | Pipeline finished | ws, audit, discord, github |
| `pipeline.failed` | Unrecoverable error | ws, audit, discord, notification |
| `task.created` | New human task | ws, audit, discord, notification, sla-monitor |
| `task.claimed` | User claimed task | ws, audit |
| `task.completed` | Decision made | ws, audit, discord, pipeline-service |
| `task.sla_warning` | SLA approaching | ws, notification, discord |
| `task.sla_breached` | SLA exceeded | ws, notification, discord, escalation |
| `approval.pending` | Gate awaiting approval | ws, audit, notification, discord |
| `approval.approved` | Gate approved | ws, audit, discord, pipeline-service |
| `approval.rejected` | Gate rejected | ws, audit, discord, notification |
| `agent.status_changed` | Agent health change | ws, audit, notification (if error) |
| `deploy.started` | Deploy initiated | ws, audit, discord, github |
| `deploy.succeeded` | Deploy complete | ws, audit, discord, github |
| `deploy.failed` | Deploy failed | ws, audit, discord, notification |
| `asset.created` | New artifact | ws, audit |
| `test.completed` | Test run finished | ws, audit, mcp, pipeline-service |

---

*This architecture document serves as the blueprint for implementing the AI Agent Factory SaaS backend. It should be treated as a living document, updated as implementation progresses and requirements evolve.*