clawdbot-memory-system/ARCHITECTURE.md
Jake Shore cb28c2649f Initial commit: Clawdbot Memory System installer
One-command persistent memory for Clawdbot.
Prevents context amnesia during compaction with:
- Two-layer memory: Markdown source of truth + SQLite vector search
- Pre-compaction flush to save context before it's lost
- Semantic search across all memory files
- Daily logs, research intel, and project tracking templates
- Interactive installer with dry-run and uninstall support
2026-02-10 13:35:36 -05:00

309 lines
12 KiB
Markdown

# Architecture
Technical details of how the Clawdbot Memory System works.
---
## System Overview
```
┌─────────────────────────────────────────────────────────────────┐
│ CLAWDBOT AGENT │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Chat Session │ │ Tool: write │ │ Tool: memory_ │ │
│ │ │ │ (file ops) │ │ search │ │
│ └───────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │
│ │ │ │ │
└──────────┼───────────────────┼──────────────────────┼─────────────┘
│ │ │
│ ┌──────▼───────┐ ┌──────▼───────┐
│ │ Markdown │ │ SQLite + │
│ │ Files │◄──────│ sqlite-vec │
│ │ (source of │ index │ Vector Store │
│ │ truth) │───────► │
│ └──────────────┘ └───────────────┘
│ ▲
│ │
└───────────────────┘
Agent writes memories
during session
```
---
## Write Flow
When the agent decides to store a memory:
```
Agent decides to remember something
┌─────────────────┐
│ Write to file │
│ memory/YYYY-MM- │
│ DD.md │
└────────┬────────┘
┌─────────────────┐
│ File watcher │ ← Clawdbot watches memory/ for changes
│ detects change │ (debounced — waits for writes to settle)
└────────┬────────┘
┌─────────────────┐
│ Chunking │ ← File split into meaningful chunks
│ (by section/ │ (headers, paragraphs, list items)
│ paragraph) │
└────────┬────────┘
┌─────────────────┐
│ Embedding │ ← Each chunk → embedding vector
│ Provider │ (OpenAI / Gemini / Local GGUF)
│ │
│ text-embedding- │
│ 3-small (1536d) │
│ or │
│ gemini-embed- │
│ ding-001 │
│ or │
│ local GGUF model │
└────────┬────────┘
┌─────────────────┐
│ SQLite + │ ← Vectors stored in sqlite-vec
│ sqlite-vec │ Alongside original text chunks
│ │ and metadata (file, date, section)
│ memory.db │
└─────────────────┘
```
---
## Search Flow
When the agent needs to recall something:
```
Agent: "What did we decide about the API rate limits?"
┌─────────────────┐
│ memory_search │ ← Tool invoked automatically
│ tool called │ (or agent calls it explicitly)
└────────┬────────┘
┌─────────────────┐
│ Query embedding │ ← Same provider as index
│ generated │ "API rate limits decision"
└────────┬────────┘ → [0.23, -0.11, 0.87, ...]
├─────────────────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Vector search │ │ Keyword search │
│ (cosine sim) │ │ (BM25 / FTS) │
│ │ │ │
│ Finds semanti- │ │ Finds exact │
│ cally similar │ │ keyword matches │
│ chunks │ │ │
└────────┬────────┘ └────────┬────────┘
│ │
└────────────┬────────────┘
┌─────────────────┐
│ Hybrid merge │ ← Combines both result sets
│ & ranking │ Deduplicates, re-ranks
└────────┬────────┘
┌─────────────────┐
│ Top N chunks │ ← Relevant memory fragments
│ returned │ injected into agent context
└────────┬────────┘
Agent has full context
to answer the question 🎉
```
---
## Pre-Compaction Flush Flow
The safety net that prevents amnesia:
```
Context Window
┌──────────────────────────────────────────┐
│ System prompt │
│ AGENTS.md │
│ Memory search results │
│ ───────────────────────────────── │
│ Old messages ← these get compacted │
│ ... │
│ ... │
│ Recent messages │
│ ───────────────────────────────── │
│ Reserve tokens (floor: 20,000) │
└──────────────────────────────────────────┘
Token count approaches limit
(contextWindow - reserveTokensFloor
- softThresholdTokens)
┌───────────────────────┐
│ Clawdbot triggers │
│ memory flush │
│ │
│ Silent system prompt: │
│ "Session nearing │
│ compaction. Store │
│ durable memories." │
│ │
│ Silent user prompt: │
│ "Write lasting notes │
│ to memory/; reply │
│ NO_REPLY if nothing │
│ to store." │
└───────────┬───────────┘
┌───────────────────────┐
│ Agent writes to disk │
│ │
│ • Current work status │
│ • Pending decisions │
│ • Important context │
│ • Where we left off │
└───────────┬───────────┘
┌───────────────────────┐
│ File watcher triggers │
│ re-index │
└───────────┬───────────┘
┌───────────────────────┐
│ Compaction happens │
│ (old messages removed/ │
│ summarized) │
└───────────┬───────────┘
Memories safe on disk ✅
Indexed and searchable ✅
Agent can recall later ✅
```
---
## Storage Layout
```
~/.clawdbot/
├── clawdbot.json ← Config with memorySearch settings
├── workspace/ ← Agent workspace (configurable)
│ ├── AGENTS.md ← Agent instructions (with memory habits)
│ ├── MEMORY.md ← Curated long-term memory (optional)
│ │
│ ├── memory/ ← Daily logs & research intel
│ │ ├── 2026-01-15.md ← Daily log
│ │ ├── 2026-01-16.md
│ │ ├── 2026-02-10.md ← Today
│ │ ├── project-x-research-intel.md
│ │ ├── TEMPLATE-daily.md ← Reference template
│ │ ├── TEMPLATE-research-intel.md
│ │ └── TEMPLATE-project-tracking.md
│ │
│ └── ... (other workspace files)
└── agents/
└── main/
└── agent/
└── memory/ ← Vector index (managed by Clawdbot)
└── memory.db ← SQLite + sqlite-vec database
```
---
## Config Structure
The memory system config lives in `clawdbot.json` under `agents.defaults.memorySearch`:
```jsonc
{
"agents": {
"defaults": {
// ... other config (model, workspace, etc.) ...
"memorySearch": {
// Embedding provider: "openai" | "gemini" | "local"
"provider": "openai",
// Model name (provider-specific)
"model": "text-embedding-3-small",
// Remote provider settings (OpenAI / Gemini)
"remote": {
"apiKey": "sk-...", // Optional if using env var
"baseUrl": "...", // Optional custom endpoint
"headers": {} // Optional extra headers
},
// Additional paths to index (beyond memory/ and MEMORY.md)
"extraPaths": ["../team-docs"],
// Fallback provider if primary fails
"fallback": "local" // "openai" | "gemini" | "local" | "none"
},
// Pre-compaction memory flush (enabled by default)
"compaction": {
"reserveTokensFloor": 20000,
"memoryFlush": {
"enabled": true,
"softThresholdTokens": 4000
}
}
}
}
}
```
---
## Data Flow Summary
```
WRITE PATH READ PATH
────────── ─────────
Agent writes note Agent needs context
│ │
▼ ▼
memory/YYYY-MM-DD.md memory_search("query")
│ │
▼ ▼
File watcher Embed query
│ │
▼ ▼
Chunk + embed Vector + keyword search
│ │
▼ ▼
Store in SQLite Return top chunks
│ │
▼ ▼
Index updated ✅ Context restored ✅
```