clawdbot-workspace/MEMORY-SYSTEM-COMPARISON.md

10 KiB
Raw Permalink Blame History

Memory System Comparison Matrix

Detailed comparison of Clawdbot's memory system vs. alternatives.


Quick Comparison Table

Feature Clawdbot Memory Long Context RAG on Docs Vector DB SaaS Notion/Obsidian
Persistent across sessions
Survives crashes
Semantic search ⚠️ (limited)
Human-editable ⚠️
Git-backed ⚠️ ⚠️
Free/Low Cost (~$0.50/mo) (token-heavy) ($50+/mo) ⚠️ ($10/mo)
No cloud dependency (local SQLite)
Agent can write ⚠️
Fast search (<100ms) ⚠️ (network) ⚠️
Data sovereignty (your disk)
Hybrid search (semantic + keyword) ⚠️ ⚠️
Auto-indexing N/A ⚠️ ⚠️
Multi-agent support ⚠️ ⚠️

Legend:

  • = Full support, works well
  • ⚠️ = Partial support or caveats
  • = Not supported or poor fit

Detailed Comparison

1. Clawdbot Memory System (This System)

Architecture: Markdown files + SQLite + vector embeddings

Pros:

  • Agent actively curates its own memory
  • Human-readable and editable (plain Markdown)
  • Git-backed (full version history)
  • Fast semantic search (<100ms)
  • Hybrid search (semantic + keyword)
  • Local storage (no cloud lock-in)
  • Free (after embedding setup)
  • Survives crashes and restarts
  • Pre-compaction auto-flush
  • Multi-session persistence

Cons:

  • ⚠️ Requires API key for embeddings (or local setup)
  • ⚠️ Initial indexing takes a few seconds
  • ⚠️ Embedding costs scale with memory size (~$0.50/mo at 35 files)

Best for:

  • Personal AI assistants
  • Long-running projects
  • Multi-session workflows
  • Agents that need to "remember" decisions

Cost: ~$0.50/month (OpenAI Batch API)


2. Long Context Windows (Claude 200K, GPT-4 128K)

Architecture: Everything in prompt context

Pros:

  • Simple (no separate storage)
  • Agent has "all" context available
  • No indexing delay

Cons:

  • Ephemeral (lost on crash/restart)
  • Expensive at scale ($5-20 per long session)
  • Degrades with very long contexts (needle-in-haystack)
  • No semantic search (model must scan)
  • Compaction loses old context

Best for:

  • Single-session tasks
  • One-off questions
  • Contexts that fit in <50K tokens

Cost: $5-20 per session (for 100K+ token contexts)


3. RAG on External Docs

Architecture: Vector DB over static documentation

Pros:

  • Good for large doc corpora
  • Semantic search
  • Persistent

Cons:

  • Agent can't write/update docs (passive)
  • Requires separate ingestion pipeline
  • ⚠️ Human editing is indirect
  • ⚠️ Git backing depends on doc format
  • Agent doesn't "learn" (docs are static)

Best for:

  • Technical documentation search
  • Knowledge base Q&A
  • Support chatbots

Cost: Varies (Pinecone: $70/mo, OpenAI embeddings: $0.50+/mo)


4. Vector DB SaaS (Pinecone, Weaviate, Qdrant Cloud)

Architecture: Cloud-hosted vector database

Pros:

  • Fast semantic search
  • Scalable (millions of vectors)
  • Managed infrastructure

Cons:

  • Expensive ($70+/mo for production tier)
  • Cloud lock-in
  • Network latency on every search
  • Data lives on their servers
  • ⚠️ Human editing requires API calls
  • Not git-backed (proprietary storage)

Best for:

  • Enterprise-scale deployments
  • Multi-tenant apps
  • High-throughput search

Cost: $70-500/month


5. Notion / Obsidian / Roam

Architecture: Note-taking app with API

Pros:

  • Human-friendly UI
  • Rich formatting
  • Collaboration features (Notion)
  • Agent can write via API

Cons:

  • Not designed for AI memory (UI overhead)
  • ⚠️ Search is UI-focused, not API-optimized
  • Notion: cloud lock-in, $10/mo
  • ⚠️ Obsidian: local but not structured for agents
  • No vector search (keyword only)
  • ⚠️ Git backing: manual or plugin-dependent

Best for:

  • Human-first note-taking
  • Team collaboration
  • Visual knowledge graphs

Cost: $0-10/month


Architecture: Markdown files, no indexing

Pros:

  • Simple
  • Free
  • Git-backed
  • Human-editable

Cons:

  • No semantic search (grep only)
  • Slow to find info (must scan all files)
  • Agent can't recall context efficiently
  • No hybrid search

Best for:

  • Very small memory footprints (<10 files)
  • Temporary projects
  • Humans who manually search

Cost: Free


When to Choose Which

Choose Clawdbot Memory if:

  • You want persistent, searchable memory
  • Agent needs to write its own memory
  • You value data sovereignty (local storage)
  • Budget is <$5/month
  • You want git-backed history
  • Multi-session workflows

Choose Long Context if:

  • Single-session tasks only
  • Budget is flexible ($5-20/session OK)
  • Context fits in <50K tokens
  • Don't need persistence

Choose RAG on Docs if:

  • Large existing doc corpus
  • Docs rarely change
  • Agent doesn't need to write
  • Multiple agents share same knowledge

Choose Vector DB SaaS if:

  • Enterprise scale (millions of vectors)
  • Multi-tenant app
  • Budget is $100+/month
  • Data sovereignty isn't critical

Choose Notion/Obsidian if:

  • Humans are primary users
  • Visual knowledge graphs matter
  • Collaboration is key
  • ⚠️ Agent memory is secondary

Choose Pure Filesystem if:

  • Tiny memory footprint (<10 files)
  • Temporary project
  • Search speed doesn't matter

Hybrid Approaches

Clawdbot Memory + Long Context

Best of both worlds:

  • Use memory for durable facts/decisions
  • Use context for current session detail
  • Pre-compaction flush keeps memory updated
  • This is what Jake's setup does

Clawdbot Memory + RAG

For large doc sets:

  • Memory: agent's personal notes
  • RAG: external documentation
  • Agent searches both as needed

Clawdbot Memory + Notion

For team collaboration:

  • Memory: agent's internal state
  • Notion: shared team wiki
  • Agent syncs key info to Notion

Migration Paths

From Long Context → Clawdbot Memory

  1. Extract key facts from long sessions
  2. Write to memory/ files
  3. Index via clawdbot memory index
  4. Continue with hybrid approach

From Notion → Clawdbot Memory

  1. Export Notion pages as Markdown
  2. Move to memory/ directory
  3. Index via clawdbot memory index
  4. Keep Notion for team wiki, memory for agent state

From Vector DB → Clawdbot Memory

  1. Export vectors (if possible) or re-embed
  2. Convert to Markdown + SQLite
  3. Index locally
  4. Optionally keep Vector DB for shared/production data

Real-World Performance

Jake's Production Stats (26 days, 35 files)

Metric Value
Files 35 markdown files
Chunks 121
Memories 116
SQLite size 15 MB
Search speed <100ms
Embedding cost ~$0.50/month
Crashes survived 5+
Data loss Zero
Daily usage 10-50 searches/day
Git commits Daily (automated)

Scaling Projection

Scale Files Chunks SQLite Size Search Speed Monthly Cost
Small 10-50 50-200 5-20 MB <100ms $0.50
Medium 50-200 200-1000 20-80 MB <200ms $2-5
Large 200-500 1000-2500 80-200 MB <500ms $10-20
XL 500-1000 2500-5000 200-500 MB <1s $30-50
XXL 1000+ 5000+ 500+ MB Consider partitioning $50+

Note: At 1000+ files, consider archiving old logs or partitioning by date/project.


Cost Breakdown (OpenAI Batch API)

Initial Indexing (35 files, 121 chunks)

  • Tokens: ~50,000 (121 chunks × ~400 tokens avg)
  • Embedding cost: $0.001 per 1K tokens (Batch API)
  • Total: ~$0.05

Daily Updates (3 files, ~10 chunks)

  • Tokens: ~4,000
  • Embedding cost: $0.004
  • Monthly: ~$0.12

Ongoing Search (100 searches/day)

  • Search: Local SQLite (free)
  • No per-query cost

Total Monthly: ~$0.50

Compare to:

  • Long context (100K tokens/session): $5-20/session
  • Pinecone: $70/month (starter tier)
  • Notion API: $10/month (plus rate limits)

Feature Matrix Deep Dive

Persistence

System Survives Crash Survives Restart Survives Power Loss
Clawdbot Memory (if git pushed)
Long Context
RAG
Vector DB SaaS ⚠️ (cloud dependent)
Notion (cloud)

Search Quality

System Semantic Keyword Hybrid Speed
Clawdbot Memory <100ms
Long Context ⚠️ (model scan) ⚠️ (model scan) Slow
RAG ⚠️ ⚠️ <200ms
Vector DB SaaS ⚠️ <300ms (network)
Notion Varies

Agent Control

System Agent Can Write Agent Can Edit Agent Can Delete Auto-Index
Clawdbot Memory
Long Context N/A
RAG ⚠️
Vector DB SaaS ⚠️ (via API) ⚠️ (via API) ⚠️ (via API) ⚠️
Notion (via API) (via API) (via API)

Bottom Line

For personal AI assistants like Buba:

🥇 #1: Clawdbot Memory System

  • Best balance of cost, control, persistence, and search
  • Agent-friendly (write/edit/delete)
  • Git-backed safety
  • Local storage (data sovereignty)

🥈 #2: Clawdbot Memory + Long Context (Hybrid)

  • Memory for durable facts
  • Context for current session
  • This is Jake's setup — it works great

🥉 #3: RAG on Docs

  • If you have massive existing docs
  • Agent doesn't need to write

Avoid for personal assistants:

  • Vector DB SaaS (overkill + expensive)
  • Pure long context (not persistent)
  • Notion/Obsidian (not optimized for AI)

END OF COMPARISON

ᕕ( ᐛ )ᕗ