Jake Shore cb28c2649f Initial commit: Clawdbot Memory System installer
One-command persistent memory for Clawdbot.
Prevents context amnesia during compaction with:
- Two-layer memory: Markdown source of truth + SQLite vector search
- Pre-compaction flush to save context before it's lost
- Semantic search across all memory files
- Daily logs, research intel, and project tracking templates
- Interactive installer with dry-run and uninstall support
2026-02-10 13:35:36 -05:00

11 KiB

"Why does my agent forget everything after a long session?" Because Clawdbot compacts old context to stay within its context window. Without a memory system, everything that was compacted is gone. This repo fixes that permanently. --- ## What This Is A **two-layer memory system** for Clawdbot: 1. **Markdown files** (source of truth) — Daily logs, research intel, project tracking, and durable notes your agent writes to disk 2. **SQLite vector search** (retrieval layer) — Semantic search index that lets your agent find relevant memories even when wording differs Your agent writes memories to plain Markdown. Those files get indexed into a vector store. When the agent needs context, it searches semantically and finds what it needs — even across sessions, even after compaction. ## Quick Install ```bash bash <(curl -sL https://raw.githubusercontent.com/BusyBee3333/clawdbot-memory-system/main/install.sh) ``` That's it. The installer will: - Detect your Clawdbot installation - Create the `memory/` directory with templates - Patch your `clawdbot.json` with memory search config (without touching anything else) - Add memory habits to your `AGENTS.md` - Build the initial vector index - Verify everything works ### Preview First (Dry Run) ```bash bash <(curl -sL https://raw.githubusercontent.com/BusyBee3333/clawdbot-memory-system/main/install.sh) --dry-run ``` ### Uninstall ```bash bash <(curl -sL https://raw.githubusercontent.com/BusyBee3333/clawdbot-memory-system/main/install.sh) --uninstall ``` --- ## How It Works ``` ┌─────────────────────────────────────────────────────────┐ │ YOUR AGENT SESSION │ │ │ │ Agent writes notes ──→ memory/2026-02-10.md │ │ Agent stores facts ──→ MEMORY.md │ │ │ │ │ ▼ │ │ ┌──────────────┐ │ │ │ File Watcher │ (debounced) │ │ └──────┬───────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────┐ │ │ │ Embedding Provider │ │ │ │ (OpenAI / Gemini / │ │ │ │ Local GGUF) │ │ │ └───────────┬───────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────┐ │ │ │ SQLite + sqlite-vec │ │ │ │ Vector Index │ │ │ └───────────┬───────────┘ │ │ │ │ │ Agent asks ──────────┤ │ │ "what did we decide │ │ │ about the API?" ▼ │ │ ┌───────────────────────┐ │ │ │ Hybrid Search │ │ │ │ (semantic + keyword) │ │ │ └───────────┬───────────┘ │ │ │ │ │ ▼ │ │ Relevant memory chunks │ │ injected into context │ └─────────────────────────────────────────────────────────┘ ``` ### Pre-Compaction Flush This is the secret sauce. When your session nears its context limit: ``` Session approaching limit │ ▼ ┌─────────────────────┐ │ Pre-compaction ping │ ← Clawdbot silently triggers this │ "Store durable │ │ memories now" │ └──────────┬────────────┘ │ ▼ Agent writes lasting notes to memory/YYYY-MM-DD.md │ ▼ Context gets compacted (old messages removed) │ ▼ BUT memories are on disk AND indexed for search │ ▼ Agent can find them anytime 🎉 ``` --- ## Embedding Provider Options The installer will ask which provider you want: | Provider | Speed | Cost | Setup | |----------|-------|------|-------| | **OpenAI** (recommended) | Fast | ~$0.02/million tokens | API key required | | **Gemini** | Fast | Free tier available | API key required | | **Local** | 🐢 Slower first run | Free | Downloads GGUF model (~100MB) | **OpenAI** (`text-embedding-3-small`) is recommended for the best experience. It's extremely cheap and fast. **Gemini** (`gemini-embedding-001`) works great and has a generous free tier. **Local** uses `node-llama-cpp` with a GGUF model — fully offline, no API key needed, but the first index build is slower. --- ## Manual Setup (Alternative) If you prefer to set things up yourself instead of using the installer: ### 1. Create the memory directory ```bash mkdir -p ~/.clawdbot/workspace/memory ``` ### 2. Add memory search config to clawdbot.json Open `~/.clawdbot/clawdbot.json` and add `memorySearch` inside `agents.defaults`: **For OpenAI:** ```json { "agents": { "defaults": { "memorySearch": { "provider": "openai", "model": "text-embedding-3-small" } } } } ``` **For Gemini:** ```json { "agents": { "defaults": { "memorySearch": { "provider": "gemini", "model": "gemini-embedding-001" } } } } ``` **For Local:** ```json { "agents": { "defaults": { "memorySearch": { "provider": "local" } } } } ``` ### 3. Set your API key (if using OpenAI or Gemini) For OpenAI, set `OPENAI_API_KEY` in your environment or in `clawdbot.json` under `models.providers.openai.apiKey`. For Gemini, set `GEMINI_API_KEY` in your environment or in `clawdbot.json` under `models.providers.google.apiKey`. ### 4. Build the index ```bash clawdbot memory index --verbose ``` ### 5. Verify ```bash clawdbot memory status --deep ``` ### 6. Restart the gateway ```bash clawdbot gateway restart ``` --- ## What Gets Indexed By default, Clawdbot indexes: - `MEMORY.md` — Long-term curated memory - `memory/*.md` — Daily logs and all memory files All files must be Markdown (`.md`). The index watches for changes and re-indexes automatically. ### Adding Extra Paths Want to index files outside the default layout? Add `extraPaths`: ```json { "agents": { "defaults": { "memorySearch": { "extraPaths": ["../team-docs", "/path/to/other/notes"] } } } } ``` --- ## Troubleshooting ### "No API key found for provider openai/google" You need to set your embedding API key. Either: - Set the environment variable (`OPENAI_API_KEY` or `GEMINI_API_KEY`) - Or add it to `clawdbot.json` under `models.providers` ### "Memory search stays disabled" Run `clawdbot memory status --deep` to see what's wrong. Common causes: - No embedding provider configured - API key missing or invalid - No `.md` files in `memory/` directory ### Index not updating Run a manual reindex: ```bash clawdbot memory index --force --verbose ``` ### Agent still seems to forget things Make sure your `AGENTS.md` includes memory instructions. The agent needs to be told to: 1. Search memory before answering questions about prior work 2. Write important things to daily logs 3. Flush memories before compaction The installer handles this automatically. ### Installer fails with "jq not found" The installer needs `jq` for safe JSON patching. Install it: ```bash # macOS brew install jq # Ubuntu/Debian sudo apt-get install jq # Or download from https://jqlang.github.io/jq/ ``` --- ## FAQ ### Why does my agent forget everything? Clawdbot uses a context window with a token limit. When a session gets long, old messages are **compacted** (summarized and removed) to make room. Without a memory system, the details in those old messages are lost forever. This memory system solves it by: 1. Writing important context to files on disk (survives any compaction) 2. Indexing those files for semantic search (agent can find them later) 3. Flushing memories right before compaction happens (nothing falls through the cracks) ### How is this different from just having MEMORY.md? `MEMORY.md` alone is a single file that the agent reads at session start. It works for small amounts of info, but: - It doesn't scale (gets too big to fit in context) - It's not searchable (agent has to read the whole thing) - Daily details get lost (you can't put everything in one file) This system adds **daily logs** (unlimited history) + **vector search** (find anything semantically) + **pre-compaction flush** (automatic safety net). ### Does this cost money? - **Local embeddings**: Free (but slower) - **OpenAI embeddings**: ~$0.02 per million tokens (essentially free for personal use) - **Gemini embeddings**: Free tier available For reference, indexing 100 daily logs costs about $0.001 with OpenAI. ### Can I use this with multiple agents? Yes. Each agent uses the same workspace `memory/` directory by default. You can scope with `--agent ` for commands. ### Is my data sent to the cloud? Only if you use remote embeddings (OpenAI/Gemini). The embedding vectors are generated from your text, but they can't be reversed back to the original text. If you want full privacy, use `local` embeddings — everything stays on your machine. ### Can I run the installer multiple times? Yes! It's idempotent. It checks for existing files and config before making changes, and backs up your config before patching. --- ## Architecture See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed diagrams. ## Migrating from Another Setup See [MIGRATION.md](MIGRATION.md) for step-by-step migration guides. ## License MIT — see [LICENSE](LICENSE) --- **Built for the Clawdbot community** by people who got tired of explaining things to their agent twice. ]]>