clawdbot-memory-system/README.md

<![CDATA[# 🧠 Clawdbot Memory System

**One-command persistent memory for Clawdbot — never lose context to compaction again.**

> "Why does my agent forget everything after a long session?"

Because Clawdbot compacts old context to stay within its context window. Without a memory system, everything that was compacted is gone. This repo fixes that permanently.

---

## What This Is

A **two-layer memory system** for Clawdbot:

1. **Markdown files** (source of truth) — Daily logs, research intel, project tracking, and durable notes your agent writes to disk
2. **SQLite vector search** (retrieval layer) — Semantic search index that lets your agent find relevant memories even when wording differs

Your agent writes memories to plain Markdown. Those files get indexed into a vector store. When the agent needs context, it searches semantically and finds what it needs — even across sessions, even after compaction.

## Quick Install

```bash
bash <(curl -sL https://raw.githubusercontent.com/BusyBee3333/clawdbot-memory-system/main/install.sh)
```

That's it. The installer will:
- ✅ Detect your Clawdbot installation
- ✅ Create the `memory/` directory with templates
- ✅ Patch your `clawdbot.json` with memory search config (without touching anything else)
- ✅ Add memory habits to your `AGENTS.md`
- ✅ Build the initial vector index
- ✅ Verify everything works

### Preview First (Dry Run)

```bash
bash <(curl -sL https://raw.githubusercontent.com/BusyBee3333/clawdbot-memory-system/main/install.sh) --dry-run
```

### Uninstall

```bash
bash <(curl -sL https://raw.githubusercontent.com/BusyBee3333/clawdbot-memory-system/main/install.sh) --uninstall
```

---

## How It Works

```
┌─────────────────────────────────────────────────────────┐
│                    YOUR AGENT SESSION                     │
│                                                           │
│  Agent writes notes ──→ memory/2026-02-10.md             │
│  Agent stores facts ──→ MEMORY.md                        │
│                          │                                │
│                          ▼                                │
│                   ┌──────────────┐                        │
│                   │  File Watcher │ (debounced)           │
│                   └──────┬───────┘                        │
│                          │                                │
│                          ▼                                │
│              ┌───────────────────────┐                    │
│              │   Embedding Provider   │                   │
│              │  (OpenAI / Gemini /    │                   │
│              │   Local GGUF)          │                   │
│              └───────────┬───────────┘                    │
│                          │                                │
│                          ▼                                │
│              ┌───────────────────────┐                    │
│              │   SQLite + sqlite-vec  │                   │
│              │   Vector Index          │                  │
│              └───────────┬───────────┘                    │
│                          │                                │
│     Agent asks ──────────┤                                │
│     "what did we decide  │                                │
│      about the API?"     ▼                                │
│              ┌───────────────────────┐                    │
│              │   Hybrid Search        │                   │
│              │   (semantic + keyword)  │                  │
│              └───────────┬───────────┘                    │
│                          │                                │
│                          ▼                                │
│              Relevant memory chunks                       │
│              injected into context                        │
└─────────────────────────────────────────────────────────┘
```

### Pre-Compaction Flush

This is the secret sauce. When your session nears its context limit:

```
Session approaching limit
         │
         ▼
┌─────────────────────┐
│  Pre-compaction ping  │  ← Clawdbot silently triggers this
│  "Store durable       │
│   memories now"       │
└──────────┬────────────┘
           │
           ▼
   Agent writes lasting notes
   to memory/YYYY-MM-DD.md
           │
           ▼
   Context gets compacted
   (old messages removed)
           │
           ▼
   BUT memories are on disk
   AND indexed for search
           │
           ▼
   Agent can find them anytime 🎉
```

---

## Embedding Provider Options

The installer will ask which provider you want:

| Provider | Speed | Cost | Setup |
|----------|-------|------|-------|
| **OpenAI** (recommended) | ⚡ Fast | ~$0.02/million tokens | API key required |
| **Gemini** | ⚡ Fast | Free tier available | API key required |
| **Local** | 🐢 Slower first run | Free | Downloads GGUF model (~100MB) |

**OpenAI** (`text-embedding-3-small`) is recommended for the best experience. It's extremely cheap and fast.

**Gemini** (`gemini-embedding-001`) works great and has a generous free tier.

**Local** uses `node-llama-cpp` with a GGUF model — fully offline, no API key needed, but the first index build is slower.

---

## Manual Setup (Alternative)

If you prefer to set things up yourself instead of using the installer:

### 1. Create the memory directory

```bash
mkdir -p ~/.clawdbot/workspace/memory
```

### 2. Add memory search config to clawdbot.json

Open `~/.clawdbot/clawdbot.json` and add `memorySearch` inside `agents.defaults`:

**For OpenAI:**
```json
{
  "agents": {
    "defaults": {
      "memorySearch": {
        "provider": "openai",
        "model": "text-embedding-3-small"
      }
    }
  }
}
```

**For Gemini:**
```json
{
  "agents": {
    "defaults": {
      "memorySearch": {
        "provider": "gemini",
        "model": "gemini-embedding-001"
      }
    }
  }
}
```

**For Local:**
```json
{
  "agents": {
    "defaults": {
      "memorySearch": {
        "provider": "local"
      }
    }
  }
}
```

### 3. Set your API key (if using OpenAI or Gemini)

For OpenAI, set `OPENAI_API_KEY` in your environment or in `clawdbot.json` under `models.providers.openai.apiKey`.

For Gemini, set `GEMINI_API_KEY` in your environment or in `clawdbot.json` under `models.providers.google.apiKey`.

### 4. Build the index

```bash
clawdbot memory index --verbose
```

### 5. Verify

```bash
clawdbot memory status --deep
```

### 6. Restart the gateway

```bash
clawdbot gateway restart
```

---

## What Gets Indexed

By default, Clawdbot indexes:
- `MEMORY.md` — Long-term curated memory
- `memory/*.md` — Daily logs and all memory files

All files must be Markdown (`.md`). The index watches for changes and re-indexes automatically.

### Adding Extra Paths

Want to index files outside the default layout? Add `extraPaths`:

```json
{
  "agents": {
    "defaults": {
      "memorySearch": {
        "extraPaths": ["../team-docs", "/path/to/other/notes"]
      }
    }
  }
}
```

---

## Troubleshooting

### "No API key found for provider openai/google"

You need to set your embedding API key. Either:
- Set the environment variable (`OPENAI_API_KEY` or `GEMINI_API_KEY`)
- Or add it to `clawdbot.json` under `models.providers`

### "Memory search stays disabled"

Run `clawdbot memory status --deep` to see what's wrong. Common causes:
- No embedding provider configured
- API key missing or invalid
- No `.md` files in `memory/` directory

### Index not updating

Run a manual reindex:
```bash
clawdbot memory index --force --verbose
```

### Agent still seems to forget things

Make sure your `AGENTS.md` includes memory instructions. The agent needs to be told to:
1. Search memory before answering questions about prior work
2. Write important things to daily logs
3. Flush memories before compaction

The installer handles this automatically.

### Installer fails with "jq not found"

The installer needs `jq` for safe JSON patching. Install it:
```bash
# macOS
brew install jq

# Ubuntu/Debian
sudo apt-get install jq

# Or download from https://jqlang.github.io/jq/
```

---

## FAQ

### Why does my agent forget everything?

Clawdbot uses a context window with a token limit. When a session gets long, old messages are **compacted** (summarized and removed) to make room. Without a memory system, the details in those old messages are lost forever.

This memory system solves it by:
1. Writing important context to files on disk (survives any compaction)
2. Indexing those files for semantic search (agent can find them later)
3. Flushing memories right before compaction happens (nothing falls through the cracks)

### How is this different from just having MEMORY.md?

`MEMORY.md` alone is a single file that the agent reads at session start. It works for small amounts of info, but:
- It doesn't scale (gets too big to fit in context)
- It's not searchable (agent has to read the whole thing)
- Daily details get lost (you can't put everything in one file)

This system adds **daily logs** (unlimited history) + **vector search** (find anything semantically) + **pre-compaction flush** (automatic safety net).

### Does this cost money?

- **Local embeddings**: Free (but slower)
- **OpenAI embeddings**: ~$0.02 per million tokens (essentially free for personal use)
- **Gemini embeddings**: Free tier available

For reference, indexing 100 daily logs costs about $0.001 with OpenAI.

### Can I use this with multiple agents?

Yes. Each agent uses the same workspace `memory/` directory by default. You can scope with `--agent <id>` for commands.

### Is my data sent to the cloud?

Only if you use remote embeddings (OpenAI/Gemini). The embedding vectors are generated from your text, but they can't be reversed back to the original text. If you want full privacy, use `local` embeddings — everything stays on your machine.

### Can I run the installer multiple times?

Yes! It's idempotent. It checks for existing files and config before making changes, and backs up your config before patching.

---

## Architecture

See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed diagrams.

## Migrating from Another Setup

See [MIGRATION.md](MIGRATION.md) for step-by-step migration guides.

## License

MIT — see [LICENSE](LICENSE)

---

**Built for the Clawdbot community** by people who got tired of explaining things to their agent twice.
]]>