Jake Shore ecf6cd7a48 Daily backup: 2026-02-06 — OSKV coaching day 1 (3 check-ins), competitor/edtech intel crons, goosefactory scaffold

2026-02-12 18:04:57 -05:00

10 KiB

Raw Permalink Blame History

Memory System Comparison Matrix

Detailed comparison of Clawdbot's memory system vs. alternatives.

Quick Comparison Table

Feature	Clawdbot Memory	Long Context	RAG on Docs	Vector DB SaaS	Notion/Obsidian
Persistent across sessions	✅	❌	✅	✅	✅
Survives crashes	✅	❌	✅	✅	✅
Semantic search	✅	❌	✅	✅	⚠️ (limited)
Human-editable	✅	❌	⚠️	❌	✅
Git-backed	✅	❌	⚠️	❌	⚠️
Free/Low Cost	✅ (~$0.50/mo)	❌ (token-heavy)	✅	❌ ($50+/mo)	⚠️ ($10/mo)
No cloud dependency	✅ (local SQLite)	✅	✅	❌	❌
Agent can write	✅	✅	❌	⚠️	✅
Fast search (<100ms)	✅	❌	✅	⚠️ (network)	⚠️
Data sovereignty	✅ (your disk)	✅	✅	❌	❌
Hybrid search (semantic + keyword)	✅	❌	⚠️	✅	⚠️
Auto-indexing	✅	N/A	⚠️	✅	⚠️
Multi-agent support	✅	⚠️	⚠️	✅	❌

Legend:

✅ = Full support, works well
⚠️ = Partial support or caveats
❌ = Not supported or poor fit

Detailed Comparison

1. Clawdbot Memory System (This System)

Architecture: Markdown files + SQLite + vector embeddings

Pros:

✅ Agent actively curates its own memory
✅ Human-readable and editable (plain Markdown)
✅ Git-backed (full version history)
✅ Fast semantic search (<100ms)
✅ Hybrid search (semantic + keyword)
✅ Local storage (no cloud lock-in)
✅ Free (after embedding setup)
✅ Survives crashes and restarts
✅ Pre-compaction auto-flush
✅ Multi-session persistence

Cons:

⚠️ Requires API key for embeddings (or local setup)
⚠️ Initial indexing takes a few seconds
⚠️ Embedding costs scale with memory size (~$0.50/mo at 35 files)

Best for:

Personal AI assistants
Long-running projects
Multi-session workflows
Agents that need to "remember" decisions

Cost: ~$0.50/month (OpenAI Batch API)

2. Long Context Windows (Claude 200K, GPT-4 128K)

Architecture: Everything in prompt context

Pros:

✅ Simple (no separate storage)
✅ Agent has "all" context available
✅ No indexing delay

Cons:

❌ Ephemeral (lost on crash/restart)
❌ Expensive at scale ($5-20 per long session)
❌ Degrades with very long contexts (needle-in-haystack)
❌ No semantic search (model must scan)
❌ Compaction loses old context

Best for:

Single-session tasks
One-off questions
Contexts that fit in <50K tokens

Cost: $5-20 per session (for 100K+ token contexts)

3. RAG on External Docs

Architecture: Vector DB over static documentation

Pros:

✅ Good for large doc corpora
✅ Semantic search
✅ Persistent

Cons:

❌ Agent can't write/update docs (passive)
❌ Requires separate ingestion pipeline
⚠️ Human editing is indirect
⚠️ Git backing depends on doc format
❌ Agent doesn't "learn" (docs are static)

Best for:

Technical documentation search
Knowledge base Q&A
Support chatbots

Cost: Varies (Pinecone: $70/mo, OpenAI embeddings: $0.50+/mo)

4. Vector DB SaaS (Pinecone, Weaviate, Qdrant Cloud)

Architecture: Cloud-hosted vector database

Pros:

✅ Fast semantic search
✅ Scalable (millions of vectors)
✅ Managed infrastructure

Cons:

❌ Expensive ($70+/mo for production tier)
❌ Cloud lock-in
❌ Network latency on every search
❌ Data lives on their servers
⚠️ Human editing requires API calls
❌ Not git-backed (proprietary storage)

Best for:

Enterprise-scale deployments
Multi-tenant apps
High-throughput search

Cost: $70-500/month

5. Notion / Obsidian / Roam

Architecture: Note-taking app with API

Pros:

✅ Human-friendly UI
✅ Rich formatting
✅ Collaboration features (Notion)
✅ Agent can write via API

Cons:

❌ Not designed for AI memory (UI overhead)
⚠️ Search is UI-focused, not API-optimized
❌ Notion: cloud lock-in, $10/mo
⚠️ Obsidian: local but not structured for agents
❌ No vector search (keyword only)
⚠️ Git backing: manual or plugin-dependent

Best for:

Human-first note-taking
Team collaboration
Visual knowledge graphs

Cost: $0-10/month

6. Pure Filesystem (No Search)

Architecture: Markdown files, no indexing

Pros:

✅ Simple
✅ Free
✅ Git-backed
✅ Human-editable

Cons:

❌ No semantic search (grep only)
❌ Slow to find info (must scan all files)
❌ Agent can't recall context efficiently
❌ No hybrid search

Best for:

Very small memory footprints (<10 files)
Temporary projects
Humans who manually search

Cost: Free

When to Choose Which

Choose Clawdbot Memory if:

✅ You want persistent, searchable memory
✅ Agent needs to write its own memory
✅ You value data sovereignty (local storage)
✅ Budget is <$5/month
✅ You want git-backed history
✅ Multi-session workflows

Choose Long Context if:

✅ Single-session tasks only
✅ Budget is flexible ($5-20/session OK)
✅ Context fits in <50K tokens
❌ Don't need persistence

Choose RAG on Docs if:

✅ Large existing doc corpus
✅ Docs rarely change
❌ Agent doesn't need to write
✅ Multiple agents share same knowledge

Choose Vector DB SaaS if:

✅ Enterprise scale (millions of vectors)
✅ Multi-tenant app
✅ Budget is $100+/month
❌ Data sovereignty isn't critical

Choose Notion/Obsidian if:

✅ Humans are primary users
✅ Visual knowledge graphs matter
✅ Collaboration is key
⚠️ Agent memory is secondary

Choose Pure Filesystem if:

✅ Tiny memory footprint (<10 files)
✅ Temporary project
❌ Search speed doesn't matter

Hybrid Approaches

Clawdbot Memory + Long Context

Best of both worlds:

Use memory for durable facts/decisions
Use context for current session detail
Pre-compaction flush keeps memory updated
This is what Jake's setup does

Clawdbot Memory + RAG

For large doc sets:

Memory: agent's personal notes
RAG: external documentation
Agent searches both as needed

Clawdbot Memory + Notion

For team collaboration:

Memory: agent's internal state
Notion: shared team wiki
Agent syncs key info to Notion

Migration Paths

From Long Context → Clawdbot Memory

Extract key facts from long sessions
Write to memory/ files
Index via clawdbot memory index
Continue with hybrid approach

From Notion → Clawdbot Memory

Export Notion pages as Markdown
Move to memory/ directory
Index via clawdbot memory index
Keep Notion for team wiki, memory for agent state

From Vector DB → Clawdbot Memory

Export vectors (if possible) or re-embed
Convert to Markdown + SQLite
Index locally
Optionally keep Vector DB for shared/production data

Real-World Performance

Jake's Production Stats (26 days, 35 files)

Metric	Value
Files	35 markdown files
Chunks	121
Memories	116
SQLite size	15 MB
Search speed	<100ms
Embedding cost	~$0.50/month
Crashes survived	5+
Data loss	Zero
Daily usage	10-50 searches/day
Git commits	Daily (automated)

Scaling Projection

Scale	Files	Chunks	SQLite Size	Search Speed	Monthly Cost
Small	10-50	50-200	5-20 MB	<100ms	$0.50
Medium	50-200	200-1000	20-80 MB	<200ms	$2-5
Large	200-500	1000-2500	80-200 MB	<500ms	$10-20
XL	500-1000	2500-5000	200-500 MB	<1s	$30-50
XXL	1000+	5000+	500+ MB	Consider partitioning	$50+

Note: At 1000+ files, consider archiving old logs or partitioning by date/project.

Cost Breakdown (OpenAI Batch API)

Initial Indexing (35 files, 121 chunks)

Tokens: ~50,000 (121 chunks × ~400 tokens avg)
Embedding cost: $0.001 per 1K tokens (Batch API)
Total: ~$0.05

Daily Updates (3 files, ~10 chunks)

Tokens: ~4,000
Embedding cost: $0.004
Monthly: ~$0.12

Ongoing Search (100 searches/day)

Search: Local SQLite (free)
No per-query cost

Total Monthly: ~$0.50

Compare to:

Long context (100K tokens/session): $5-20/session
Pinecone: $70/month (starter tier)
Notion API: $10/month (plus rate limits)

Feature Matrix Deep Dive

Persistence

System	Survives Crash	Survives Restart	Survives Power Loss
Clawdbot Memory	✅	✅	✅ (if git pushed)
Long Context	❌	❌	❌
RAG	✅	✅	✅
Vector DB SaaS	✅	✅	⚠️ (cloud dependent)
Notion	✅	✅	✅ (cloud)

Search Quality

System	Semantic	Keyword	Hybrid	Speed
Clawdbot Memory	✅	✅	✅	<100ms
Long Context	⚠️ (model scan)	⚠️ (model scan)	❌	Slow
RAG	✅	⚠️	⚠️	<200ms
Vector DB SaaS	✅	❌	⚠️	<300ms (network)
Notion	❌	✅	❌	Varies

Agent Control

System	Agent Can Write	Agent Can Edit	Agent Can Delete	Auto-Index
Clawdbot Memory	✅	✅	✅	✅
Long Context	✅	✅	✅	N/A
RAG	❌	❌	❌	⚠️
Vector DB SaaS	⚠️ (via API)	⚠️ (via API)	⚠️ (via API)	⚠️
Notion	✅ (via API)	✅ (via API)	✅ (via API)	❌

Bottom Line

For personal AI assistants like Buba:

🥇 #1: Clawdbot Memory System

Best balance of cost, control, persistence, and search
Agent-friendly (write/edit/delete)
Git-backed safety
Local storage (data sovereignty)

🥈 #2: Clawdbot Memory + Long Context (Hybrid)

Memory for durable facts
Context for current session
This is Jake's setup — it works great

🥉 #3: RAG on Docs

If you have massive existing docs
Agent doesn't need to write

❌ Avoid for personal assistants:

Vector DB SaaS (overkill + expensive)
Pure long context (not persistent)
Notion/Obsidian (not optimized for AI)

END OF COMPARISON

ᕕ( ᐛ )ᕗ

10 KiB Raw Permalink Blame History Unescape Escape

Memory System Comparison Matrix

Quick Comparison Table

Detailed Comparison

1. Clawdbot Memory System (This System)

2. Long Context Windows (Claude 200K, GPT-4 128K)

3. RAG on External Docs

4. Vector DB SaaS (Pinecone, Weaviate, Qdrant Cloud)

5. Notion / Obsidian / Roam

6. Pure Filesystem (No Search)

When to Choose Which

Choose Clawdbot Memory if:

Choose Long Context if:

Choose RAG on Docs if:

Choose Vector DB SaaS if:

Choose Notion/Obsidian if:

Choose Pure Filesystem if:

Hybrid Approaches

Clawdbot Memory + Long Context

Clawdbot Memory + RAG

Clawdbot Memory + Notion

Migration Paths

From Long Context → Clawdbot Memory

From Notion → Clawdbot Memory

From Vector DB → Clawdbot Memory

Real-World Performance

Jake's Production Stats (26 days, 35 files)

Scaling Projection

Cost Breakdown (OpenAI Batch API)

Initial Indexing (35 files, 121 chunks)

Daily Updates (3 files, ~10 chunks)

Ongoing Search (100 searches/day)

Total Monthly: ~$0.50

Feature Matrix Deep Dive

Persistence

Search Quality

Agent Control

Bottom Line

10 KiB

Raw Permalink Blame History