feat(agent): AI agent harness with memory, GitHub, audio & feedback (#37 )

* feat(agent): replace ElizaOS with AI SDK v6 harness

Replace custom ElizaOS sidecar proxy with Vercel AI SDK v6 +
OpenRouter provider for a proper agentic harness with multi-step
tool loops, streaming, and D1 conversation persistence.

- Add AI SDK agent library (provider, tools, system prompt, catalog)
- Rewrite API route to use streamText with 10-step tool loop
- Add server actions for conversation save/load/delete
- Migrate chat-panel and dashboard-chat to useChat hook
- Add action handler dispatch for navigate/toast/render tools
- Use qwen/qwen3-coder-next via OpenRouter (fallbacks disabled)
- Delete src/lib/eliza/ (replaced entirely)
- Exclude references/ from tsconfig build

* fix(chat): improve dashboard chat scroll and text size

- Rewrite auto-scroll: pin user message 75% out of
  frame after send, then follow bottom during streaming
- Use useEffect for scroll timing (DOM guaranteed ready)
  instead of rAF which fired before React commit
- Add user scroll detection to disengage auto-scroll
- Bump assistant text from 13px back to 14px (text-sm)
- Tighten prose spacing for headings and lists

* chore: installing new components

* refactor(chat): unify into one component, two presentations

Extract duplicated chat logic into shared ChatProvider context
and useCompassChat hook. Single ChatView component renders as
full-page hero on /dashboard or sidebar panel elsewhere. Chat
state persists across navigation.

New: chat-provider, chat-view, chat-panel-shell, use-compass-chat
Delete: agent-provider, chat-panel, dashboard-chat, 8 deprecated UI files
Fix: AI component import paths (~/  -> @/), shadcn component updates

* fix(lint): resolve eslint errors in AI components

- escape unescaped entities in demo JSX (actions, artifact,
  branch, reasoning, schema-display, task)
- add eslint-disable for @ts-nocheck in vendor components
  (file-tree, terminal, persona)
- remove unused imports in chat-view (ArrowUp, Square,
  useChatPanel)

* feat(agent): rename AI to Slab, add proactive help

rename assistant from Compass to Slab and add first
interaction guidance so it proactively offers
context-aware help based on the user's current page.

* fix(build): use HTML entity for strict string children

ReasoningContent expects children: string, so JSX
expression {"'"} splits into string[] causing type error.
Use &apos; HTML entity instead.

* feat(agent): add memory, github, audio, feedback

- persistent memory system (remember/recall across sessions)
- github integration (commits, PRs, issues, contributors)
- audio transcription via Whisper API
- UX feedback interview flow with auto-issue creation
- memories management table in settings
- audio waveform visualization component
- new schema tables: slab_memories, feedback_interviews
- enhanced system prompt with proactive tool usage

* feat(agent): unify chat into single morphing instance

Replaces two separate ChatView instances (page + panel) with
one layout-level component that transitions between full-page
and sidebar modes. Navigation now actually works via proper
AI SDK v6 part structure detection, with view transitions for
smooth crossfades, route validation to prevent 404s, and
auto-opening the panel when leaving dashboard.

Also fixes dark mode contrast, user bubble visibility, tool
display names, input focus ring, and system prompt accuracy.

* refactor(agent): rewrite waveform as time-series viz

Replace real-time frequency equalizer with amplitude
history that fills left-to-right as user speaks.
Bars auto-calculated from container width, with
non-linear boost and scroll when full.

* (feat): implemented architecture for plugins and skills, laying a foundation for future implementations of packages separate from the core application

* feat(agent): add skills.sh integration for slab

Skills client fetches SKILL.md from GitHub, parses
YAML frontmatter, and stores content in plugin DB.
Registry injects skill content into system prompt.
Agent tools and settings UI for skill management.

* feat(agent): add interactive UI action bridge

Wire agent-generated UIs to real server actions via
an action bridge API route. Forms submit, checkboxes
persist, and DataTable rows support CRUD operations.

- action-registry.ts: maps 19 dotted action names to
  server actions with zod validation + permissions
- /api/agent/action: POST route with auth, permission
  checks, schema validation, and action execution
- schema-agent.ts: agent_items table for user-scoped
  todos, notes, and checklists
- agent-items.ts: CRUD + toggle actions for agent items
- form-context.ts: FormIdProvider for input namespacing
- catalog.ts: Form component, value/onChangeAction props,
  DataTable rowActions, mutate/confirmDelete actions
- registry.tsx: useDataBinding on all form inputs, Form
  component, DataTable row action buttons, inline
  Checkbox/Switch mutations
- actions.ts: mutate + confirmDelete handlers that call
  the action bridge, formSubmit now collects + submits
- system-prompt.ts: interactive UI patterns section
- render/route.ts: interactive pattern custom rules

* docs: reorganize into topic subdirectories

Move docs into auth/, chat/, openclaw-principles/,
and ui/ subdirectories. Add openclaw architecture
and system prompt documentation.

* feat(agent): add commit diff support to github tools

Add fetchCommitDiff to github client with raw diff
fallback for missing patches. Wire commit_diff query
type into agent github tools.

* fix(ci): guard wrangler proxy init for dev only

initOpenNextCloudflareForDev() was running unconditionally
in next.config.ts, causing CI build and lint to fail with
"You must be logged in to use wrangler dev in remote mode".
Only init the proxy when NODE_ENV is development.

---------

Co-authored-by: Nicholai <nicholaivogelfilms@gmail.com>

2026-02-06 17:04:04 -07:00

8.9 KiB

Executable File

Raw Blame History

OpenClaw Architecture

reference document covering OpenClaw's internal architecture, written during evaluation for Compass's AI backend. covers the agent framework, protocol layer, and authentication system. useful context if we integrate with or build on top of OpenClaw's gateway.

the stack at a glance

OpenClaw is three things layered on top of each other:

pi-ai - model abstraction. talks to Anthropic, OpenAI, Bedrock, and others through a single getModel() interface. handles streaming, token counting, the boring plumbing.
pi-agent-core - the agentic loop. takes a model, system prompt, and tools, then runs the standard prompt-stream-tool-repeat cycle. manages state, message history, tool execution, and event streaming. this is where the actual "agent" behavior lives.
OpenClaw gateway - the infrastructure wrapper. sessions, credential management, multi-channel routing (WhatsApp, Telegram, Discord, Slack, web), and the ACP bridge for IDE integration.

the relationship between these layers matters: pi-agent-core doesn't know about channels or sessions. the gateway doesn't know about model APIs. each layer has a clean boundary, which is what makes the system flexible enough to serve both a WhatsApp bot and a Zed editor plugin from the same codebase.

IDE (Zed, etc) ──ACP──> bridge ──ws──> ┐
                                       │
WhatsApp ──────────────────────────>   │
Telegram ──────────────────────────>   ├── Gateway ── pi-agent-core ── pi-ai ── Model APIs
Discord  ──────────────────────────>   │         │
Slack    ──────────────────────────>   │     sessions
Web UI   ──────────────────────────>   ┘     credentials
                                             routing

agent client protocol (ACP)

ACP is a standardized wire protocol for IDEs to talk to AI agents. think LSP, but for agent interactions instead of language features. OpenClaw implements both sides.

the transport is NDJSON over stdio. no HTTP server, no port management. an IDE spawns openclaw acp as a subprocess and communicates through stdin/stdout. the bridge translates ACP messages into gateway websocket calls.

the protocol surface is small:

initialize - capability handshake (fs access, terminal)
newSession / loadSession / listSessions - session lifecycle
prompt - send user input (text, resources, images)
cancel - abort a running generation
sessionUpdate - streaming notifications back to the client (text chunks, tool calls, command updates)
requestPermission - agent asks the IDE for permission before acting

each ACP session maps to a gateway session key, so reconnects preserve conversation state. the practical value: any editor that speaks ACP can use OpenClaw as its agent backend without needing a bespoke plugin. Zed works today, and adding another editor requires zero changes on the OpenClaw side.

pi-agent-core internals

the agent framework is authored by Mario Zechner (badlogic on GitHub) and lives in the pi-mono monorepo. MIT licensed. the three packages OpenClaw depends on:

@mariozechner/pi-ai (model abstraction)
@mariozechner/pi-agent-core (stateful agent loop)
@mariozechner/pi-coding-agent (coding-specific tools and prompts)

the core loop is an Agent class you configure with a model, system prompt, and tools. calling prompt() kicks off the agentic cycle:

prompt("read config.json and summarize it")
│
├─ agent_start
├─ turn_start
│   ├─ user message sent to LLM
│   ├─ LLM streams response (message_update events with text deltas)
│   ├─ LLM requests tool call: read_file({path: "config.json"})
│   ├─ tool_execution_start -> tool runs -> tool_execution_end
│   └─ tool result fed back to LLM
├─ turn_end
│
├─ turn_start (next turn - LLM responds to tool result)
│   ├─ LLM streams final answer
│   └─ no more tool calls
├─ turn_end
└─ agent_end

two design decisions worth noting:

the message pipeline has two stages. before every LLM call, messages pass through transformContext() (prune old messages, inject external context, compact history) and then convertToLlm() (filter out app-specific message types the model shouldn't see). this separation is what lets OpenClaw store channel-specific metadata, UI messages, and notification types in the conversation history without confusing the model. the LLM only ever sees clean user/assistant/toolResult messages.

message queuing supports mid-execution interrupts. you can inject messages while the agent is running tools. when a queued message is detected after a tool completes, remaining tool calls get skipped and the LLM receives the interruption instead. this matters for interactive use cases where a user changes their mind while a multi-tool operation is in progress.

tools use TypeBox schemas for parameter validation and support streaming progress through an onUpdate callback. errors are thrown (not returned as content), caught by the agent, and reported to the LLM as isError: true tool results.

authentication architecture

this is the part that initially seemed confusing but turns out to be well-structured once you see the two layers.

layer 1: gateway access. this controls who can connect to the gateway at all. it's a simple token or password sent over the websocket connection. no OAuth, no complexity. an IDE, a chat client, or any websocket consumer provides credentials when connecting, and the gateway either accepts or rejects. configured via gateway.auth.token or gateway.auth.password.

layer 2: model provider auth. this controls what API keys the agent uses when calling model providers (Anthropic, OpenAI, Bedrock, etc). this is where OAuth lives.

the two layers are fully independent. you can connect to the gateway with a simple bearer token but have your model calls authenticated via Anthropic's OAuth flow. the gateway doesn't care how your model credentials were obtained.

credentials are stored per-agent at ~/.openclaw/agents/<agentId>/auth-profiles.json and come in three flavors:

api_key - raw API key, user-provided
token - generated via claude setup-token or similar
oauth - full OAuth flow with access/refresh tokens and expiry

when an agent needs to call a model, the resolution chain runs through:

explicit profile ID (if the request specifies one)
configured profile order with round-robin and failure cooldown
environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, etc)
config file values
AWS credential chain (for Bedrock)

the system also mirrors credentials from external tools. if you've already authenticated Claude Code, OpenClaw can read ~/.claude/.credentials.json and use those tokens as a fallback. same for Codex, Qwen, and MiniMax CLI credentials. token refresh uses file locking to prevent multiple agents from refreshing simultaneously.

the Anthropic OAuth flow specifically goes through Chutes (api.chutes.ai) as the identity provider. it uses PKCE, supports both local browser redirects and manual URL pasting for headless/VPS environments, and stores access + refresh tokens with expiry tracking.

login flow:
  openclaw login
  └─> choose provider (anthropic)
      └─> choose auth method (oauth)
          └─> PKCE flow via chutes.ai
              └─> tokens stored in auth-profiles.json

request flow:
  any client ──ws+token──> gateway ──> agent needs claude
      └─> resolveApiKeyForProvider("anthropic")
          └─> finds oauth token from auth-profiles.json
              └─> refreshes if expired (with lockfile)
                  └─> calls anthropic API

the practical consequence: you authenticate once with OpenClaw, and any tool that can talk to the gateway gets access to your model providers. the tool doesn't need to manage its own API keys or know anything about OAuth. it just sends prompts and gets responses.

relevance to compass

if Compass uses OpenClaw as its AI backend, the integration point would be the gateway websocket. Compass would authenticate to the gateway (layer 1) and send prompts. the gateway handles everything else - model selection, credential resolution, session persistence, streaming.

the ACP protocol is worth watching as a potential standard for AI tool integration, but for a web app like Compass, the websocket API is the more natural fit.

the auth architecture means Compass wouldn't need to store or manage model API keys directly. users would authenticate through OpenClaw's onboarding, and the gateway would resolve credentials at request time. this simplifies the security surface - Compass never touches raw API keys.

8.9 KiB Executable File Raw Blame History