480 lines
14 KiB
Markdown
480 lines
14 KiB
Markdown
# Clawdbot Memory System Migration Plan
|
|
|
|
**Created:** 2026-01-27
|
|
**Status:** Ready to Execute
|
|
**Risk Level:** Low (old system preserved, incremental migration)
|
|
|
|
---
|
|
|
|
## Current State Inventory
|
|
|
|
| Asset | Location | Size | Records |
|
|
|-------|----------|------|---------|
|
|
| Main SQLite | `~/.clawdbot/memory/main.sqlite` | 9.0 MB | 56 chunks |
|
|
| iMessage SQLite | `~/.clawdbot/memory/imessage.sqlite` | 8.1 MB | ~42 chunks |
|
|
| Markdown files | `~/.clawdbot/workspace/memory/*.md` | 17 files | ~60KB total |
|
|
| INDEX.json | `~/.clawdbot/workspace/memory/INDEX.json` | 7.1 KB | 6 categories, 20 nodes |
|
|
| Session transcripts | `~/.clawdbot/agents/*/sessions/*.jsonl` | 23 files | 5,593 lines |
|
|
| New memories table | `~/.clawdbot/memory/main.sqlite` | - | 36 records (migrated) |
|
|
|
|
---
|
|
|
|
## Migration Phases
|
|
|
|
### Phase 0: Backup Everything (REQUIRED FIRST)
|
|
**Time:** 5 minutes
|
|
**Risk:** None
|
|
|
|
```bash
|
|
# Create timestamped backup directory
|
|
BACKUP_DIR=~/.clawdbot/backups/pre-migration-$(date +%Y%m%d-%H%M%S)
|
|
mkdir -p "$BACKUP_DIR"
|
|
|
|
# Backup SQLite databases
|
|
cp ~/.clawdbot/memory/main.sqlite "$BACKUP_DIR/"
|
|
cp ~/.clawdbot/memory/imessage.sqlite "$BACKUP_DIR/"
|
|
|
|
# Backup markdown memory files
|
|
cp -r ~/.clawdbot/workspace/memory "$BACKUP_DIR/memory-markdown"
|
|
|
|
# Backup session transcripts
|
|
cp -r ~/.clawdbot/agents "$BACKUP_DIR/agents"
|
|
|
|
# Backup config
|
|
cp ~/.clawdbot/clawdbot.json "$BACKUP_DIR/"
|
|
|
|
# Verify backup
|
|
echo "Backup created at: $BACKUP_DIR"
|
|
ls -la "$BACKUP_DIR"
|
|
```
|
|
|
|
**Checkpoint:** Verify backup directory has all files before proceeding.
|
|
|
|
---
|
|
|
|
### Phase 1: Complete Markdown Migration
|
|
**Time:** 15 minutes
|
|
**Risk:** Low (additive only)
|
|
|
|
We already migrated CRITICAL-REFERENCE.md, Genre Universe, and Remix Sniper. Now migrate the remaining files.
|
|
|
|
#### Files to Migrate:
|
|
|
|
| File | Content Type | Priority |
|
|
|------|-------------|----------|
|
|
| `2026-01-14.md` | Daily log - GOG setup | Medium |
|
|
| `2026-01-15.md` | Daily log - agent-browser, Reonomy | High |
|
|
| `2026-01-25.md` | Security incident - Reed breach | High |
|
|
| `2026-01-26.md` | Daily log - Reonomy v13 | Medium |
|
|
| `2026-01-19-backup-system.md` | Backup system setup | Medium |
|
|
| `2026-01-19-cloud-backup.md` | Cloud backup config | Medium |
|
|
| `burton-method-research-intel.md` | Competitor research | High |
|
|
| `contacts-leaf-gc.md` | Contact info | Medium |
|
|
| `contacts-skivals-gc.md` | Contact info | Medium |
|
|
| `imessage-rules.md` | Security rules | High |
|
|
| `imessage-security-rules.md` | Security rules | High |
|
|
| `remi-self-healing.md` | Remix Sniper healing | Medium |
|
|
| `voice-ai-comparison-2026.md` | Research | Low |
|
|
| `accounts.md` | Accounts | Low |
|
|
|
|
#### Migration Script Extension:
|
|
|
|
```python
|
|
# Add to migrate-memories.py
|
|
|
|
def migrate_daily_logs(db):
|
|
"""Migrate daily log files."""
|
|
memories = []
|
|
|
|
# 2026-01-14 - GOG setup
|
|
memories.append((
|
|
"GOG (Google Workspace CLI) configured with 3 accounts: jake@burtonmethod.com, jake@localbosses.org, jakeshore98@gmail.com",
|
|
"fact", None, "2026-01-14.md"
|
|
))
|
|
|
|
# 2026-01-15 - agent-browser
|
|
memories.append((
|
|
"agent-browser is Vercel Labs headless browser CLI with ref-based navigation, semantic locators, state persistence. Commands: open, snapshot -i, click @ref, type @ref 'text'",
|
|
"fact", None, "2026-01-15.md"
|
|
))
|
|
memories.append((
|
|
"Reonomy scraper attempted with agent-browser. URL pattern discovered: ownership tab in search filters allows searching by Owner Contact Information.",
|
|
"fact", None, "2026-01-15.md"
|
|
))
|
|
|
|
# 2026-01-25 - Security incident
|
|
memories.append((
|
|
"SECURITY INCIDENT 2026-01-25: Reed breach. Contact memory poisoning. Password leaked. Rules updated. Rotate all passwords after breach.",
|
|
"security", None, "2026-01-25.md"
|
|
))
|
|
|
|
# ... continue for all files
|
|
|
|
for content, mtype, guild_id, source in memories:
|
|
insert_memory(db, content, mtype, source, guild_id)
|
|
|
|
return len(memories)
|
|
|
|
def migrate_security_rules(db):
|
|
"""Migrate iMessage security rules."""
|
|
memories = [
|
|
("iMessage password gating: Password JAJAJA2026 required. Mention gating (Buba). Never reveal password in any context.", "security", None),
|
|
("iMessage trust chain: Only trust Jake (914-500-9208). Everyone else must verify with Jake first, then chat-only mode with password.", "security", None),
|
|
]
|
|
for content, mtype, guild_id in memories:
|
|
insert_memory(db, content, mtype, "imessage-security-rules.md", guild_id)
|
|
return len(memories)
|
|
|
|
def migrate_contacts(db):
|
|
"""Migrate contact information (non-sensitive parts only)."""
|
|
memories = [
|
|
("Contact: Leaf GC - group chat contact for Leaf-related communications", "relationship", None),
|
|
("Contact: Skivals GC - group chat contact for Skivals-related communications", "relationship", None),
|
|
]
|
|
for content, mtype, guild_id in memories:
|
|
insert_memory(db, content, mtype, "contacts.md", guild_id)
|
|
return len(memories)
|
|
```
|
|
|
|
**Checkpoint:** Run `python memory-retrieval.py stats` and verify count increased.
|
|
|
|
---
|
|
|
|
### Phase 2: Migrate Existing Chunks (Vector Embeddings)
|
|
**Time:** 10 minutes
|
|
**Risk:** Low (copies data, doesn't delete)
|
|
|
|
The existing `chunks` table has 56 pre-embedded chunks. We should copy these to memories table to preserve the embeddings.
|
|
|
|
```sql
|
|
-- Copy chunks to memories (preserving embeddings)
|
|
INSERT INTO memories (
|
|
content,
|
|
embedding,
|
|
memory_type,
|
|
source,
|
|
source_file,
|
|
created_at,
|
|
confidence
|
|
)
|
|
SELECT
|
|
text as content,
|
|
embedding,
|
|
'fact' as memory_type,
|
|
'chunks_migration' as source,
|
|
path as source_file,
|
|
COALESCE(updated_at, unixepoch()) as created_at,
|
|
1.0 as confidence
|
|
FROM chunks
|
|
WHERE NOT EXISTS (
|
|
SELECT 1 FROM memories m
|
|
WHERE m.source_file = chunks.path
|
|
AND m.source = 'chunks_migration'
|
|
);
|
|
```
|
|
|
|
**Checkpoint:** Verify with `SELECT COUNT(*) FROM memories WHERE source = 'chunks_migration'`
|
|
|
|
---
|
|
|
|
### Phase 3: Session Transcript Indexing (Optional - Later)
|
|
**Time:** 30-60 minutes
|
|
**Risk:** Medium (large data volume)
|
|
|
|
Session transcripts contain conversation history. This is valuable but voluminous.
|
|
|
|
#### Strategy: Selective Indexing
|
|
|
|
Don't index every message. Index:
|
|
1. Messages where Clawdbot learned something (contains "I'll remember", "noted", "got it")
|
|
2. User corrections ("actually it's...", "no, the correct...")
|
|
3. Explicit requests ("remember that...", "don't forget...")
|
|
|
|
```python
|
|
def extract_memorable_from_sessions():
|
|
"""Extract memorable moments from session transcripts."""
|
|
import json
|
|
import glob
|
|
|
|
session_files = glob.glob(os.path.expanduser(
|
|
"~/.clawdbot/agents/*/sessions/*.jsonl"
|
|
))
|
|
|
|
memorable_patterns = [
|
|
r"I'll remember",
|
|
r"I've noted",
|
|
r"remember that",
|
|
r"don't forget",
|
|
r"actually it's",
|
|
r"the correct",
|
|
r"important:",
|
|
r"key point:",
|
|
]
|
|
|
|
memories = []
|
|
for fpath in session_files:
|
|
with open(fpath) as f:
|
|
for line in f:
|
|
try:
|
|
entry = json.loads(line)
|
|
# Check if it matches memorable patterns
|
|
# Extract and store
|
|
except:
|
|
pass
|
|
|
|
return memories
|
|
```
|
|
|
|
**Recommendation:** Skip this for now. The markdown files contain the curated important stuff. Sessions are backup/audit trail.
|
|
|
|
---
|
|
|
|
### Phase 4: Wire Into Clawdbot Runtime
|
|
**Time:** 30-60 minutes
|
|
**Risk:** Medium (changes bot behavior)
|
|
|
|
This requires modifying Clawdbot's code to use the new memory system.
|
|
|
|
#### 4.1 Create Memory Interface Module
|
|
|
|
Location: `~/.clawdbot/workspace/memory_interface.py`
|
|
|
|
```python
|
|
"""
|
|
Memory interface for Clawdbot runtime.
|
|
Import this in your bot's message handler.
|
|
"""
|
|
|
|
from memory_retrieval import (
|
|
search_memories,
|
|
add_memory,
|
|
get_recent_memories,
|
|
supersede_memory
|
|
)
|
|
|
|
def get_context_for_message(message, guild_id, channel_id, user_id):
|
|
"""
|
|
Get relevant memory context for responding to a message.
|
|
Call this before generating a response.
|
|
"""
|
|
# Search for relevant memories
|
|
results = search_memories(
|
|
query=message,
|
|
guild_id=guild_id,
|
|
limit=5
|
|
)
|
|
|
|
if not results:
|
|
# Fall back to recent memories for this guild
|
|
results = get_recent_memories(guild_id=guild_id, limit=3)
|
|
|
|
# Format for context injection
|
|
context_lines = []
|
|
for r in results:
|
|
context_lines.append(f"[Memory] {r['content']}")
|
|
|
|
return "\n".join(context_lines)
|
|
|
|
def should_remember(response_text):
|
|
"""
|
|
Check if the bot's response indicates something should be remembered.
|
|
"""
|
|
triggers = [
|
|
"i'll remember",
|
|
"i've noted",
|
|
"got it",
|
|
"noted",
|
|
"understood",
|
|
]
|
|
lower = response_text.lower()
|
|
return any(t in lower for t in triggers)
|
|
|
|
def extract_and_store(message, response, guild_id, channel_id, user_id):
|
|
"""
|
|
If the response indicates learning, extract and store the memory.
|
|
"""
|
|
if not should_remember(response):
|
|
return None
|
|
|
|
# The message itself is what should be remembered
|
|
memory_id = add_memory(
|
|
content=message,
|
|
memory_type="fact",
|
|
guild_id=guild_id,
|
|
channel_id=channel_id,
|
|
user_id=user_id,
|
|
source="conversation"
|
|
)
|
|
|
|
return memory_id
|
|
```
|
|
|
|
#### 4.2 Integration Points
|
|
|
|
In Clawdbot's message handler:
|
|
|
|
```python
|
|
# Before generating response:
|
|
memory_context = get_context_for_message(
|
|
message=user_message,
|
|
guild_id=str(message.guild.id) if message.guild else None,
|
|
channel_id=str(message.channel.id),
|
|
user_id=str(message.author.id)
|
|
)
|
|
|
|
# Inject into prompt:
|
|
system_prompt = f"""
|
|
{base_system_prompt}
|
|
|
|
Relevant memories:
|
|
{memory_context}
|
|
"""
|
|
|
|
# After generating response:
|
|
extract_and_store(
|
|
message=user_message,
|
|
response=bot_response,
|
|
guild_id=...,
|
|
channel_id=...,
|
|
user_id=...
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 5: Deprecate Old System
|
|
**Time:** 5 minutes
|
|
**Risk:** Low (keep files, just stop using)
|
|
|
|
Once the new system is validated:
|
|
|
|
1. **Keep old files** - Don't delete markdown files, they're human-readable backup
|
|
2. **Stop writing to old locations** - New memories go to SQLite only
|
|
3. **Archive old chunks table** - Rename to `chunks_archive`
|
|
|
|
```sql
|
|
-- Archive old chunks table (don't delete)
|
|
ALTER TABLE chunks RENAME TO chunks_archive;
|
|
ALTER TABLE chunks_fts RENAME TO chunks_fts_archive;
|
|
```
|
|
|
|
**DO NOT** delete the old files until you've run the new system for at least 2 weeks without issues.
|
|
|
|
---
|
|
|
|
## Validation Checkpoints
|
|
|
|
### After Each Phase:
|
|
|
|
| Check | Command | Expected |
|
|
|-------|---------|----------|
|
|
| Memory count | `python memory-retrieval.py stats` | Count increases |
|
|
| Search works | `python memory-retrieval.py search "Das"` | Returns results |
|
|
| Guild scoping | `python memory-retrieval.py search "remix" --guild 1449158500344270961` | Only The Hive results |
|
|
| FTS works | `sqlite3 ~/.clawdbot/memory/main.sqlite "SELECT COUNT(*) FROM memories_fts"` | Matches memories count |
|
|
|
|
### Integration Test (After Phase 4):
|
|
|
|
1. Send message to Clawdbot: "What do you know about Das?"
|
|
2. Verify response includes Genre Universe info
|
|
3. Send message: "Remember that Das prefers releasing on Fridays"
|
|
4. Search: `python memory-retrieval.py search "Das Friday"`
|
|
5. Verify new memory exists
|
|
|
|
---
|
|
|
|
## Rollback Plan
|
|
|
|
If anything goes wrong:
|
|
|
|
```bash
|
|
# 1. Restore from backup
|
|
BACKUP_DIR=~/.clawdbot/backups/pre-migration-YYYYMMDD-HHMMSS
|
|
|
|
# Restore databases
|
|
cp "$BACKUP_DIR/main.sqlite" ~/.clawdbot/memory/
|
|
cp "$BACKUP_DIR/imessage.sqlite" ~/.clawdbot/memory/
|
|
|
|
# Restore markdown (if needed)
|
|
cp -r "$BACKUP_DIR/memory-markdown/"* ~/.clawdbot/workspace/memory/
|
|
|
|
# 2. Drop new tables (if needed)
|
|
sqlite3 ~/.clawdbot/memory/main.sqlite "
|
|
DROP TABLE IF EXISTS memories;
|
|
DROP TABLE IF EXISTS memories_fts;
|
|
"
|
|
|
|
# 3. Restart Clawdbot
|
|
# (your restart command here)
|
|
```
|
|
|
|
---
|
|
|
|
## Timeline
|
|
|
|
| Phase | Duration | Dependency |
|
|
|-------|----------|------------|
|
|
| Phase 0: Backup | 5 min | None |
|
|
| Phase 1: Markdown migration | 15 min | Phase 0 |
|
|
| Phase 2: Chunks migration | 10 min | Phase 1 |
|
|
| Phase 3: Sessions (optional) | 30-60 min | Phase 2 |
|
|
| Phase 4: Runtime integration | 30-60 min | Phase 2 |
|
|
| Phase 5: Deprecate old | 5 min | Phase 4 validated |
|
|
|
|
**Total: 1-2 hours** (excluding Phase 3)
|
|
|
|
---
|
|
|
|
## Post-Migration Maintenance
|
|
|
|
### Weekly (Cron):
|
|
```bash
|
|
# Add to crontab
|
|
0 3 * * 0 cd ~/.clawdbot/workspace && python3 memory-maintenance.py run >> ~/.clawdbot/logs/memory-maintenance.log 2>&1
|
|
```
|
|
|
|
### Monthly:
|
|
- Review `python memory-maintenance.py stats`
|
|
- Check for memories stuck at low confidence
|
|
- Verify per-guild counts are balanced
|
|
|
|
### Quarterly:
|
|
- Full backup
|
|
- Review if session indexing is needed
|
|
- Consider re-embedding if switching embedding models
|
|
|
|
---
|
|
|
|
## Files Reference
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `migrate-memories.py` | One-time migration script |
|
|
| `memory-retrieval.py` | Search/add/supersede API |
|
|
| `memory-maintenance.py` | Decay/prune/limits |
|
|
| `memory_interface.py` | Runtime integration (create in Phase 4) |
|
|
| `MEMORY-MIGRATION-PLAN.md` | This document |
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
The migration is complete when:
|
|
|
|
1. ✅ All markdown files have been processed (key facts extracted)
|
|
2. ✅ Old chunks are copied to memories table with embeddings
|
|
3. ✅ Search returns relevant results for test queries
|
|
4. ✅ Guild scoping works correctly
|
|
5. ✅ Clawdbot uses new memory in responses
|
|
6. ✅ "Remember this" creates new memories
|
|
7. ✅ Weekly maintenance cron is running
|
|
8. ✅ Old system files are preserved but not actively used
|
|
|
|
---
|
|
|
|
## Questions Before Starting
|
|
|
|
1. **Do you want to migrate session transcripts?** (Recommended: No, for now)
|
|
2. **Which guild should we test first?** (Recommended: Das server - most memories)
|
|
3. **When do you want to do the runtime integration?** (Requires Clawdbot restart)
|