* feat(agent): replace ElizaOS with AI SDK v6 harness
Replace custom ElizaOS sidecar proxy with Vercel AI SDK v6 +
OpenRouter provider for a proper agentic harness with multi-step
tool loops, streaming, and D1 conversation persistence.
- Add AI SDK agent library (provider, tools, system prompt, catalog)
- Rewrite API route to use streamText with 10-step tool loop
- Add server actions for conversation save/load/delete
- Migrate chat-panel and dashboard-chat to useChat hook
- Add action handler dispatch for navigate/toast/render tools
- Use qwen/qwen3-coder-next via OpenRouter (fallbacks disabled)
- Delete src/lib/eliza/ (replaced entirely)
- Exclude references/ from tsconfig build
* fix(chat): improve dashboard chat scroll and text size
- Rewrite auto-scroll: pin user message 75% out of
frame after send, then follow bottom during streaming
- Use useEffect for scroll timing (DOM guaranteed ready)
instead of rAF which fired before React commit
- Add user scroll detection to disengage auto-scroll
- Bump assistant text from 13px back to 14px (text-sm)
- Tighten prose spacing for headings and lists
* chore: installing new components
* refactor(chat): unify into one component, two presentations
Extract duplicated chat logic into shared ChatProvider context
and useCompassChat hook. Single ChatView component renders as
full-page hero on /dashboard or sidebar panel elsewhere. Chat
state persists across navigation.
New: chat-provider, chat-view, chat-panel-shell, use-compass-chat
Delete: agent-provider, chat-panel, dashboard-chat, 8 deprecated UI files
Fix: AI component import paths (~/ -> @/), shadcn component updates
* fix(lint): resolve eslint errors in AI components
- escape unescaped entities in demo JSX (actions, artifact,
branch, reasoning, schema-display, task)
- add eslint-disable for @ts-nocheck in vendor components
(file-tree, terminal, persona)
- remove unused imports in chat-view (ArrowUp, Square,
useChatPanel)
* feat(agent): rename AI to Slab, add proactive help
rename assistant from Compass to Slab and add first
interaction guidance so it proactively offers
context-aware help based on the user's current page.
* fix(build): use HTML entity for strict string children
ReasoningContent expects children: string, so JSX
expression {"'"} splits into string[] causing type error.
Use ' HTML entity instead.
* feat(agent): add memory, github, audio, feedback
- persistent memory system (remember/recall across sessions)
- github integration (commits, PRs, issues, contributors)
- audio transcription via Whisper API
- UX feedback interview flow with auto-issue creation
- memories management table in settings
- audio waveform visualization component
- new schema tables: slab_memories, feedback_interviews
- enhanced system prompt with proactive tool usage
* feat(agent): unify chat into single morphing instance
Replaces two separate ChatView instances (page + panel) with
one layout-level component that transitions between full-page
and sidebar modes. Navigation now actually works via proper
AI SDK v6 part structure detection, with view transitions for
smooth crossfades, route validation to prevent 404s, and
auto-opening the panel when leaving dashboard.
Also fixes dark mode contrast, user bubble visibility, tool
display names, input focus ring, and system prompt accuracy.
* refactor(agent): rewrite waveform as time-series viz
Replace real-time frequency equalizer with amplitude
history that fills left-to-right as user speaks.
Bars auto-calculated from container width, with
non-linear boost and scroll when full.
* (feat): implemented architecture for plugins and skills, laying a foundation for future implementations of packages separate from the core application
* feat(agent): add skills.sh integration for slab
Skills client fetches SKILL.md from GitHub, parses
YAML frontmatter, and stores content in plugin DB.
Registry injects skill content into system prompt.
Agent tools and settings UI for skill management.
* feat(agent): add interactive UI action bridge
Wire agent-generated UIs to real server actions via
an action bridge API route. Forms submit, checkboxes
persist, and DataTable rows support CRUD operations.
- action-registry.ts: maps 19 dotted action names to
server actions with zod validation + permissions
- /api/agent/action: POST route with auth, permission
checks, schema validation, and action execution
- schema-agent.ts: agent_items table for user-scoped
todos, notes, and checklists
- agent-items.ts: CRUD + toggle actions for agent items
- form-context.ts: FormIdProvider for input namespacing
- catalog.ts: Form component, value/onChangeAction props,
DataTable rowActions, mutate/confirmDelete actions
- registry.tsx: useDataBinding on all form inputs, Form
component, DataTable row action buttons, inline
Checkbox/Switch mutations
- actions.ts: mutate + confirmDelete handlers that call
the action bridge, formSubmit now collects + submits
- system-prompt.ts: interactive UI patterns section
- render/route.ts: interactive pattern custom rules
* docs: reorganize into topic subdirectories
Move docs into auth/, chat/, openclaw-principles/,
and ui/ subdirectories. Add openclaw architecture
and system prompt documentation.
* feat(agent): add commit diff support to github tools
Add fetchCommitDiff to github client with raw diff
fallback for missing patches. Wire commit_diff query
type into agent github tools.
* fix(ci): guard wrangler proxy init for dev only
initOpenNextCloudflareForDev() was running unconditionally
in next.config.ts, causing CI build and lint to fail with
"You must be logged in to use wrangler dev in remote mode".
Only init the proxy when NODE_ENV is development.
---------
Co-authored-by: Nicholai <nicholaivogelfilms@gmail.com>
102 lines
9.8 KiB
Markdown
Executable File
102 lines
9.8 KiB
Markdown
Executable File
System Prompt Architecture
|
|
===
|
|
|
|
how OpenClaw constructs the system prompt that shapes agent behavior. covers the prompt builder in `src/agents/system-prompt.ts`, the design decisions behind its structure, and why it works the way it does. relevant context for Compass if we build agent features that need to understand or extend how the AI's instructions are assembled.
|
|
|
|
|
|
why a prompt builder, not a prompt template
|
|
---
|
|
|
|
the obvious approach to system prompts is a template — a big string with some variables interpolated in. OpenClaw doesn't do this. instead, `buildAgentSystemPrompt()` assembles the prompt from a set of independent section builders, each returning a `string[]` that gets concatenated at the end.
|
|
|
|
the reason is that the prompt needs to change shape dramatically based on context. a main agent talking through Telegram with inline buttons enabled, a memory store, and a SOUL.md personality file needs a fundamentally different prompt than a subagent spawned to do a background file search. a template approach would drown in conditionals. the section-builder approach means each concern is isolated — the messaging section doesn't know or care about the memory section, and either can be omitted entirely without touching the other.
|
|
|
|
this matters for Compass because if we integrate with OpenClaw's agent layer, the prompt that runs behind our AI chat panel will be shaped by these same mechanics. understanding what's included (and what's excluded) in different modes determines what the agent can and can't do.
|
|
|
|
|
|
prompt modes
|
|
---
|
|
|
|
the builder supports three modes, controlled by a `PromptMode` parameter:
|
|
|
|
**"full"** is the default. every section gets included — tooling, safety, skills, memory, messaging, voice, reactions, heartbeats, silent reply protocol, runtime metadata. this is what the main agent gets when a user talks to it through a channel.
|
|
|
|
**"minimal"** strips the prompt down for subagents. skills, memory, docs, messaging, voice, reactions, heartbeats, and silent replies all get dropped. what remains is tooling, workspace, and runtime info — enough for the subagent to do its job, not enough to make it think it's the primary conversational agent.
|
|
|
|
**"none"** returns a single line: `"You are a personal assistant running inside OpenClaw."` this exists for cases where almost all behavior comes from injected context rather than hardcoded instructions.
|
|
|
|
the distinction between full and minimal is worth understanding. a subagent that inherits the full prompt would try to manage heartbeats, react to messages with emojis, and follow the silent reply protocol — behaviors that make no sense for a background worker. the mode system prevents this without requiring the caller to manually strip sections.
|
|
|
|
|
|
the sections
|
|
---
|
|
|
|
each section is built by a dedicated function that returns an array of strings (empty array means "don't include this section"). the main builder concatenates everything and filters out empty strings.
|
|
|
|
**tooling** is the most complex section. it takes a list of tool names, deduplicates them (case-insensitive but preserving original casing), and renders them in a fixed order with human-readable summaries. core tools like `read`, `exec`, and `browser` have hardcoded summaries. external tools can provide their own summaries through a separate map. tools not in the predefined order get sorted alphabetically at the end. the ordering matters because models tend to weight items higher when they appear earlier in their context.
|
|
|
|
**safety** is a short set of guardrails — no self-preservation, no manipulation, comply with stop requests, don't bypass safeguards. this section is always included in full and minimal modes.
|
|
|
|
**skills** teaches the agent how to discover and use skill files. it instructs the agent to scan available skill descriptions, pick the most specific match, and read exactly one SKILL.md file before proceeding. the constraint against reading multiple skills upfront is deliberate — it keeps the context window lean and forces the agent to commit to an approach rather than hedging.
|
|
|
|
**memory** instructs the agent to search memory files before answering questions about prior work, preferences, or decisions. it supports a citations mode that controls whether the agent includes source paths in its replies.
|
|
|
|
**messaging** handles multi-channel routing. it tells the agent how to reply in the current session versus sending cross-session messages, explains the `message` tool for proactive sends, and includes inline button guidance when the channel supports it. this section only appears when the `message` tool is available.
|
|
|
|
**context files** are user-editable files (like SOUL.md) that get injected verbatim into the prompt under a "Project Context" header. if a SOUL.md is present, the builder adds an extra instruction to embody its persona. this is how personality customization works — the prompt builder provides the mechanism, the user provides the content.
|
|
|
|
the remaining sections — user identity, time, voice, docs, reply tags, reactions, reasoning format, silent replies, heartbeats, and runtime — each handle a specific concern. most are a few lines. none depend on each other.
|
|
|
|
|
|
tool name resolution
|
|
---
|
|
|
|
one subtle design decision: tool names are case-sensitive in the final output but deduplicated case-insensitively. if the caller provides both `Read` and `read`, only the first one survives. but the output preserves whichever casing the caller used. this matters because some model providers are strict about tool name casing in their API, and the prompt needs to match exactly what the tool registry will accept.
|
|
|
|
the resolution works through a `Map<string, string>` that maps normalized (lowercase) names to their first-seen canonical form. when rendering the tool list or referencing a tool in prose (like "use `read` to read files"), the builder calls `resolveToolName()` to get the caller's preferred casing.
|
|
|
|
|
|
the runtime line
|
|
---
|
|
|
|
the prompt ends with a single-line runtime summary:
|
|
|
|
```
|
|
Runtime: agent=main | host=server | os=linux (x86_64) | model=claude-sonnet-4-5-20250929 | channel=telegram | capabilities=inlineButtons | thinking=off
|
|
```
|
|
|
|
this is exposed as a separate export (`buildRuntimeLine()`) because other parts of the system use it independently — for example, status commands that need to show the agent's current configuration.
|
|
|
|
the runtime line is where the agent learns what model it's running on, what channel it's connected to, and what capabilities are available. it's a compressed format because it appears at the end of every prompt and the information density matters more than readability.
|
|
|
|
|
|
the silent reply protocol
|
|
---
|
|
|
|
one of the more interesting design choices. the agent needs a way to say "i have nothing to add" without actually saying that to the user. the prompt defines a `SILENT_REPLY_TOKEN` — a special string that, when returned as the agent's entire response, gets intercepted and discarded before reaching the user.
|
|
|
|
the prompt is explicit and repetitive about the rules for this token because models tend to violate them. the token must be the *entire* message. it can't be appended to a real reply. it can't be wrapped in markdown. the prompt includes both correct and incorrect examples. this is a case where the repetition is load-bearing — without it, the model will occasionally leak the token into visible output.
|
|
|
|
|
|
design tradeoffs
|
|
---
|
|
|
|
the builder is around 680 lines, which is large for a single file. the alternative would be splitting each section into its own module. the reason it stays consolidated is that the sections share a lot of derived state — the `availableTools` set, the `isMinimal` flag, resolved tool names, runtime capabilities. splitting would mean either passing a large context object between modules or computing the same derived values in multiple places. the current approach keeps all the prompt logic in one place where the interactions between sections are visible.
|
|
|
|
the string-array-concatenation approach (as opposed to, say, a template literal or a structured prompt object) was chosen for composability. each section builder can return zero or more lines without knowing what comes before or after it. the top-level builder just concatenates and filters. this makes it straightforward to add, remove, or reorder sections without cascading changes.
|
|
|
|
one limitation: the prompt has no explicit token budget. sections are included or excluded based on mode and feature flags, but there's no mechanism to truncate or summarize if the total prompt exceeds a model's context window. in practice, the full prompt (without context files) stays well under the limit. context files are the wildcard — a large SOUL.md or many embedded files could push it over. this hasn't been a problem yet, but it's a gap worth noting.
|
|
|
|
|
|
relevance to compass
|
|
---
|
|
|
|
if Compass uses OpenClaw's agent layer for its AI chat panel, the system prompt is the primary control surface for agent behavior. understanding the builder means understanding what knobs are available:
|
|
|
|
- **prompt mode** controls how much autonomy the agent has. a chat panel agent probably wants something between minimal and full — tool access and context awareness, but not heartbeat management or reaction guidance.
|
|
- **context files** are how Compass-specific knowledge would be injected. project data, user preferences, or domain-specific instructions would flow through this mechanism.
|
|
- **tool availability** determines what the agent can do. the tool list in the prompt must match what's actually registered in the runtime, and the prompt's tool summaries influence how the model chooses between them.
|
|
- **the messaging section** would need adaptation if Compass routes messages differently than OpenClaw's built-in channels.
|
|
|
|
the builder is designed to be extended. adding a new section means writing a function that returns `string[]` and splicing it into the main assembly. the pattern is consistent enough that this is a low-risk change.
|