clawdbot-workspace/MEMORY-MIGRATION-PLAN.md
2026-01-28 23:00:58 -05:00

480 lines
14 KiB
Markdown

# Clawdbot Memory System Migration Plan
**Created:** 2026-01-27
**Status:** Ready to Execute
**Risk Level:** Low (old system preserved, incremental migration)
---
## Current State Inventory
| Asset | Location | Size | Records |
|-------|----------|------|---------|
| Main SQLite | `~/.clawdbot/memory/main.sqlite` | 9.0 MB | 56 chunks |
| iMessage SQLite | `~/.clawdbot/memory/imessage.sqlite` | 8.1 MB | ~42 chunks |
| Markdown files | `~/.clawdbot/workspace/memory/*.md` | 17 files | ~60KB total |
| INDEX.json | `~/.clawdbot/workspace/memory/INDEX.json` | 7.1 KB | 6 categories, 20 nodes |
| Session transcripts | `~/.clawdbot/agents/*/sessions/*.jsonl` | 23 files | 5,593 lines |
| New memories table | `~/.clawdbot/memory/main.sqlite` | - | 36 records (migrated) |
---
## Migration Phases
### Phase 0: Backup Everything (REQUIRED FIRST)
**Time:** 5 minutes
**Risk:** None
```bash
# Create timestamped backup directory
BACKUP_DIR=~/.clawdbot/backups/pre-migration-$(date +%Y%m%d-%H%M%S)
mkdir -p "$BACKUP_DIR"
# Backup SQLite databases
cp ~/.clawdbot/memory/main.sqlite "$BACKUP_DIR/"
cp ~/.clawdbot/memory/imessage.sqlite "$BACKUP_DIR/"
# Backup markdown memory files
cp -r ~/.clawdbot/workspace/memory "$BACKUP_DIR/memory-markdown"
# Backup session transcripts
cp -r ~/.clawdbot/agents "$BACKUP_DIR/agents"
# Backup config
cp ~/.clawdbot/clawdbot.json "$BACKUP_DIR/"
# Verify backup
echo "Backup created at: $BACKUP_DIR"
ls -la "$BACKUP_DIR"
```
**Checkpoint:** Verify backup directory has all files before proceeding.
---
### Phase 1: Complete Markdown Migration
**Time:** 15 minutes
**Risk:** Low (additive only)
We already migrated CRITICAL-REFERENCE.md, Genre Universe, and Remix Sniper. Now migrate the remaining files.
#### Files to Migrate:
| File | Content Type | Priority |
|------|-------------|----------|
| `2026-01-14.md` | Daily log - GOG setup | Medium |
| `2026-01-15.md` | Daily log - agent-browser, Reonomy | High |
| `2026-01-25.md` | Security incident - Reed breach | High |
| `2026-01-26.md` | Daily log - Reonomy v13 | Medium |
| `2026-01-19-backup-system.md` | Backup system setup | Medium |
| `2026-01-19-cloud-backup.md` | Cloud backup config | Medium |
| `burton-method-research-intel.md` | Competitor research | High |
| `contacts-leaf-gc.md` | Contact info | Medium |
| `contacts-skivals-gc.md` | Contact info | Medium |
| `imessage-rules.md` | Security rules | High |
| `imessage-security-rules.md` | Security rules | High |
| `remi-self-healing.md` | Remix Sniper healing | Medium |
| `voice-ai-comparison-2026.md` | Research | Low |
| `accounts.md` | Accounts | Low |
#### Migration Script Extension:
```python
# Add to migrate-memories.py
def migrate_daily_logs(db):
"""Migrate daily log files."""
memories = []
# 2026-01-14 - GOG setup
memories.append((
"GOG (Google Workspace CLI) configured with 3 accounts: jake@burtonmethod.com, jake@localbosses.org, jakeshore98@gmail.com",
"fact", None, "2026-01-14.md"
))
# 2026-01-15 - agent-browser
memories.append((
"agent-browser is Vercel Labs headless browser CLI with ref-based navigation, semantic locators, state persistence. Commands: open, snapshot -i, click @ref, type @ref 'text'",
"fact", None, "2026-01-15.md"
))
memories.append((
"Reonomy scraper attempted with agent-browser. URL pattern discovered: ownership tab in search filters allows searching by Owner Contact Information.",
"fact", None, "2026-01-15.md"
))
# 2026-01-25 - Security incident
memories.append((
"SECURITY INCIDENT 2026-01-25: Reed breach. Contact memory poisoning. Password leaked. Rules updated. Rotate all passwords after breach.",
"security", None, "2026-01-25.md"
))
# ... continue for all files
for content, mtype, guild_id, source in memories:
insert_memory(db, content, mtype, source, guild_id)
return len(memories)
def migrate_security_rules(db):
"""Migrate iMessage security rules."""
memories = [
("iMessage password gating: Password JAJAJA2026 required. Mention gating (Buba). Never reveal password in any context.", "security", None),
("iMessage trust chain: Only trust Jake (914-500-9208). Everyone else must verify with Jake first, then chat-only mode with password.", "security", None),
]
for content, mtype, guild_id in memories:
insert_memory(db, content, mtype, "imessage-security-rules.md", guild_id)
return len(memories)
def migrate_contacts(db):
"""Migrate contact information (non-sensitive parts only)."""
memories = [
("Contact: Leaf GC - group chat contact for Leaf-related communications", "relationship", None),
("Contact: Skivals GC - group chat contact for Skivals-related communications", "relationship", None),
]
for content, mtype, guild_id in memories:
insert_memory(db, content, mtype, "contacts.md", guild_id)
return len(memories)
```
**Checkpoint:** Run `python memory-retrieval.py stats` and verify count increased.
---
### Phase 2: Migrate Existing Chunks (Vector Embeddings)
**Time:** 10 minutes
**Risk:** Low (copies data, doesn't delete)
The existing `chunks` table has 56 pre-embedded chunks. We should copy these to memories table to preserve the embeddings.
```sql
-- Copy chunks to memories (preserving embeddings)
INSERT INTO memories (
content,
embedding,
memory_type,
source,
source_file,
created_at,
confidence
)
SELECT
text as content,
embedding,
'fact' as memory_type,
'chunks_migration' as source,
path as source_file,
COALESCE(updated_at, unixepoch()) as created_at,
1.0 as confidence
FROM chunks
WHERE NOT EXISTS (
SELECT 1 FROM memories m
WHERE m.source_file = chunks.path
AND m.source = 'chunks_migration'
);
```
**Checkpoint:** Verify with `SELECT COUNT(*) FROM memories WHERE source = 'chunks_migration'`
---
### Phase 3: Session Transcript Indexing (Optional - Later)
**Time:** 30-60 minutes
**Risk:** Medium (large data volume)
Session transcripts contain conversation history. This is valuable but voluminous.
#### Strategy: Selective Indexing
Don't index every message. Index:
1. Messages where Clawdbot learned something (contains "I'll remember", "noted", "got it")
2. User corrections ("actually it's...", "no, the correct...")
3. Explicit requests ("remember that...", "don't forget...")
```python
def extract_memorable_from_sessions():
"""Extract memorable moments from session transcripts."""
import json
import glob
session_files = glob.glob(os.path.expanduser(
"~/.clawdbot/agents/*/sessions/*.jsonl"
))
memorable_patterns = [
r"I'll remember",
r"I've noted",
r"remember that",
r"don't forget",
r"actually it's",
r"the correct",
r"important:",
r"key point:",
]
memories = []
for fpath in session_files:
with open(fpath) as f:
for line in f:
try:
entry = json.loads(line)
# Check if it matches memorable patterns
# Extract and store
except:
pass
return memories
```
**Recommendation:** Skip this for now. The markdown files contain the curated important stuff. Sessions are backup/audit trail.
---
### Phase 4: Wire Into Clawdbot Runtime
**Time:** 30-60 minutes
**Risk:** Medium (changes bot behavior)
This requires modifying Clawdbot's code to use the new memory system.
#### 4.1 Create Memory Interface Module
Location: `~/.clawdbot/workspace/memory_interface.py`
```python
"""
Memory interface for Clawdbot runtime.
Import this in your bot's message handler.
"""
from memory_retrieval import (
search_memories,
add_memory,
get_recent_memories,
supersede_memory
)
def get_context_for_message(message, guild_id, channel_id, user_id):
"""
Get relevant memory context for responding to a message.
Call this before generating a response.
"""
# Search for relevant memories
results = search_memories(
query=message,
guild_id=guild_id,
limit=5
)
if not results:
# Fall back to recent memories for this guild
results = get_recent_memories(guild_id=guild_id, limit=3)
# Format for context injection
context_lines = []
for r in results:
context_lines.append(f"[Memory] {r['content']}")
return "\n".join(context_lines)
def should_remember(response_text):
"""
Check if the bot's response indicates something should be remembered.
"""
triggers = [
"i'll remember",
"i've noted",
"got it",
"noted",
"understood",
]
lower = response_text.lower()
return any(t in lower for t in triggers)
def extract_and_store(message, response, guild_id, channel_id, user_id):
"""
If the response indicates learning, extract and store the memory.
"""
if not should_remember(response):
return None
# The message itself is what should be remembered
memory_id = add_memory(
content=message,
memory_type="fact",
guild_id=guild_id,
channel_id=channel_id,
user_id=user_id,
source="conversation"
)
return memory_id
```
#### 4.2 Integration Points
In Clawdbot's message handler:
```python
# Before generating response:
memory_context = get_context_for_message(
message=user_message,
guild_id=str(message.guild.id) if message.guild else None,
channel_id=str(message.channel.id),
user_id=str(message.author.id)
)
# Inject into prompt:
system_prompt = f"""
{base_system_prompt}
Relevant memories:
{memory_context}
"""
# After generating response:
extract_and_store(
message=user_message,
response=bot_response,
guild_id=...,
channel_id=...,
user_id=...
)
```
---
### Phase 5: Deprecate Old System
**Time:** 5 minutes
**Risk:** Low (keep files, just stop using)
Once the new system is validated:
1. **Keep old files** - Don't delete markdown files, they're human-readable backup
2. **Stop writing to old locations** - New memories go to SQLite only
3. **Archive old chunks table** - Rename to `chunks_archive`
```sql
-- Archive old chunks table (don't delete)
ALTER TABLE chunks RENAME TO chunks_archive;
ALTER TABLE chunks_fts RENAME TO chunks_fts_archive;
```
**DO NOT** delete the old files until you've run the new system for at least 2 weeks without issues.
---
## Validation Checkpoints
### After Each Phase:
| Check | Command | Expected |
|-------|---------|----------|
| Memory count | `python memory-retrieval.py stats` | Count increases |
| Search works | `python memory-retrieval.py search "Das"` | Returns results |
| Guild scoping | `python memory-retrieval.py search "remix" --guild 1449158500344270961` | Only The Hive results |
| FTS works | `sqlite3 ~/.clawdbot/memory/main.sqlite "SELECT COUNT(*) FROM memories_fts"` | Matches memories count |
### Integration Test (After Phase 4):
1. Send message to Clawdbot: "What do you know about Das?"
2. Verify response includes Genre Universe info
3. Send message: "Remember that Das prefers releasing on Fridays"
4. Search: `python memory-retrieval.py search "Das Friday"`
5. Verify new memory exists
---
## Rollback Plan
If anything goes wrong:
```bash
# 1. Restore from backup
BACKUP_DIR=~/.clawdbot/backups/pre-migration-YYYYMMDD-HHMMSS
# Restore databases
cp "$BACKUP_DIR/main.sqlite" ~/.clawdbot/memory/
cp "$BACKUP_DIR/imessage.sqlite" ~/.clawdbot/memory/
# Restore markdown (if needed)
cp -r "$BACKUP_DIR/memory-markdown/"* ~/.clawdbot/workspace/memory/
# 2. Drop new tables (if needed)
sqlite3 ~/.clawdbot/memory/main.sqlite "
DROP TABLE IF EXISTS memories;
DROP TABLE IF EXISTS memories_fts;
"
# 3. Restart Clawdbot
# (your restart command here)
```
---
## Timeline
| Phase | Duration | Dependency |
|-------|----------|------------|
| Phase 0: Backup | 5 min | None |
| Phase 1: Markdown migration | 15 min | Phase 0 |
| Phase 2: Chunks migration | 10 min | Phase 1 |
| Phase 3: Sessions (optional) | 30-60 min | Phase 2 |
| Phase 4: Runtime integration | 30-60 min | Phase 2 |
| Phase 5: Deprecate old | 5 min | Phase 4 validated |
**Total: 1-2 hours** (excluding Phase 3)
---
## Post-Migration Maintenance
### Weekly (Cron):
```bash
# Add to crontab
0 3 * * 0 cd ~/.clawdbot/workspace && python3 memory-maintenance.py run >> ~/.clawdbot/logs/memory-maintenance.log 2>&1
```
### Monthly:
- Review `python memory-maintenance.py stats`
- Check for memories stuck at low confidence
- Verify per-guild counts are balanced
### Quarterly:
- Full backup
- Review if session indexing is needed
- Consider re-embedding if switching embedding models
---
## Files Reference
| File | Purpose |
|------|---------|
| `migrate-memories.py` | One-time migration script |
| `memory-retrieval.py` | Search/add/supersede API |
| `memory-maintenance.py` | Decay/prune/limits |
| `memory_interface.py` | Runtime integration (create in Phase 4) |
| `MEMORY-MIGRATION-PLAN.md` | This document |
---
## Success Criteria
The migration is complete when:
1. ✅ All markdown files have been processed (key facts extracted)
2. ✅ Old chunks are copied to memories table with embeddings
3. ✅ Search returns relevant results for test queries
4. ✅ Guild scoping works correctly
5. ✅ Clawdbot uses new memory in responses
6. ✅ "Remember this" creates new memories
7. ✅ Weekly maintenance cron is running
8. ✅ Old system files are preserved but not actively used
---
## Questions Before Starting
1. **Do you want to migrate session transcripts?** (Recommended: No, for now)
2. **Which guild should we test first?** (Recommended: Das server - most memories)
3. **When do you want to do the runtime integration?** (Requires Clawdbot restart)