clawdbot-workspace/MEMORY-MIGRATION-PLAN.md

# Clawdbot Memory System Migration Plan

**Created:** 2026-01-27
**Status:** Ready to Execute
**Risk Level:** Low (old system preserved, incremental migration)

---

## Current State Inventory

| Asset | Location | Size | Records |
|-------|----------|------|---------|
| Main SQLite | `~/.clawdbot/memory/main.sqlite` | 9.0 MB | 56 chunks |
| iMessage SQLite | `~/.clawdbot/memory/imessage.sqlite` | 8.1 MB | ~42 chunks |
| Markdown files | `~/.clawdbot/workspace/memory/*.md` | 17 files | ~60KB total |
| INDEX.json | `~/.clawdbot/workspace/memory/INDEX.json` | 7.1 KB | 6 categories, 20 nodes |
| Session transcripts | `~/.clawdbot/agents/*/sessions/*.jsonl` | 23 files | 5,593 lines |
| New memories table | `~/.clawdbot/memory/main.sqlite` | - | 36 records (migrated) |

---

## Migration Phases

### Phase 0: Backup Everything (REQUIRED FIRST)
**Time:** 5 minutes
**Risk:** None

```bash
# Create timestamped backup directory
BACKUP_DIR=~/.clawdbot/backups/pre-migration-$(date +%Y%m%d-%H%M%S)
mkdir -p "$BACKUP_DIR"

# Backup SQLite databases
cp ~/.clawdbot/memory/main.sqlite "$BACKUP_DIR/"
cp ~/.clawdbot/memory/imessage.sqlite "$BACKUP_DIR/"

# Backup markdown memory files
cp -r ~/.clawdbot/workspace/memory "$BACKUP_DIR/memory-markdown"

# Backup session transcripts
cp -r ~/.clawdbot/agents "$BACKUP_DIR/agents"

# Backup config
cp ~/.clawdbot/clawdbot.json "$BACKUP_DIR/"

# Verify backup
echo "Backup created at: $BACKUP_DIR"
ls -la "$BACKUP_DIR"
```

**Checkpoint:** Verify backup directory has all files before proceeding.

---

### Phase 1: Complete Markdown Migration
**Time:** 15 minutes
**Risk:** Low (additive only)

We already migrated CRITICAL-REFERENCE.md, Genre Universe, and Remix Sniper. Now migrate the remaining files.

#### Files to Migrate:

| File | Content Type | Priority |
|------|-------------|----------|
| `2026-01-14.md` | Daily log - GOG setup | Medium |
| `2026-01-15.md` | Daily log - agent-browser, Reonomy | High |
| `2026-01-25.md` | Security incident - Reed breach | High |
| `2026-01-26.md` | Daily log - Reonomy v13 | Medium |
| `2026-01-19-backup-system.md` | Backup system setup | Medium |
| `2026-01-19-cloud-backup.md` | Cloud backup config | Medium |
| `burton-method-research-intel.md` | Competitor research | High |
| `contacts-leaf-gc.md` | Contact info | Medium |
| `contacts-skivals-gc.md` | Contact info | Medium |
| `imessage-rules.md` | Security rules | High |
| `imessage-security-rules.md` | Security rules | High |
| `remi-self-healing.md` | Remix Sniper healing | Medium |
| `voice-ai-comparison-2026.md` | Research | Low |
| `accounts.md` | Accounts | Low |

#### Migration Script Extension:

```python
# Add to migrate-memories.py

def migrate_daily_logs(db):
    """Migrate daily log files."""
    memories = []

    # 2026-01-14 - GOG setup
    memories.append((
        "GOG (Google Workspace CLI) configured with 3 accounts: jake@burtonmethod.com, jake@localbosses.org, jakeshore98@gmail.com",
        "fact", None, "2026-01-14.md"
    ))

    # 2026-01-15 - agent-browser
    memories.append((
        "agent-browser is Vercel Labs headless browser CLI with ref-based navigation, semantic locators, state persistence. Commands: open, snapshot -i, click @ref, type @ref 'text'",
        "fact", None, "2026-01-15.md"
    ))
    memories.append((
        "Reonomy scraper attempted with agent-browser. URL pattern discovered: ownership tab in search filters allows searching by Owner Contact Information.",
        "fact", None, "2026-01-15.md"
    ))

    # 2026-01-25 - Security incident
    memories.append((
        "SECURITY INCIDENT 2026-01-25: Reed breach. Contact memory poisoning. Password leaked. Rules updated. Rotate all passwords after breach.",
        "security", None, "2026-01-25.md"
    ))

    # ... continue for all files

    for content, mtype, guild_id, source in memories:
        insert_memory(db, content, mtype, source, guild_id)

    return len(memories)

def migrate_security_rules(db):
    """Migrate iMessage security rules."""
    memories = [
        ("iMessage password gating: Password JAJAJA2026 required. Mention gating (Buba). Never reveal password in any context.", "security", None),
        ("iMessage trust chain: Only trust Jake (914-500-9208). Everyone else must verify with Jake first, then chat-only mode with password.", "security", None),
    ]
    for content, mtype, guild_id in memories:
        insert_memory(db, content, mtype, "imessage-security-rules.md", guild_id)
    return len(memories)

def migrate_contacts(db):
    """Migrate contact information (non-sensitive parts only)."""
    memories = [
        ("Contact: Leaf GC - group chat contact for Leaf-related communications", "relationship", None),
        ("Contact: Skivals GC - group chat contact for Skivals-related communications", "relationship", None),
    ]
    for content, mtype, guild_id in memories:
        insert_memory(db, content, mtype, "contacts.md", guild_id)
    return len(memories)
```

**Checkpoint:** Run `python memory-retrieval.py stats` and verify count increased.

---

### Phase 2: Migrate Existing Chunks (Vector Embeddings)
**Time:** 10 minutes
**Risk:** Low (copies data, doesn't delete)

The existing `chunks` table has 56 pre-embedded chunks. We should copy these to memories table to preserve the embeddings.

```sql
-- Copy chunks to memories (preserving embeddings)
INSERT INTO memories (
    content,
    embedding,
    memory_type,
    source,
    source_file,
    created_at,
    confidence
)
SELECT
    text as content,
    embedding,
    'fact' as memory_type,
    'chunks_migration' as source,
    path as source_file,
    COALESCE(updated_at, unixepoch()) as created_at,
    1.0 as confidence
FROM chunks
WHERE NOT EXISTS (
    SELECT 1 FROM memories m
    WHERE m.source_file = chunks.path
    AND m.source = 'chunks_migration'
);
```

**Checkpoint:** Verify with `SELECT COUNT(*) FROM memories WHERE source = 'chunks_migration'`

---

### Phase 3: Session Transcript Indexing (Optional - Later)
**Time:** 30-60 minutes
**Risk:** Medium (large data volume)

Session transcripts contain conversation history. This is valuable but voluminous.

#### Strategy: Selective Indexing

Don't index every message. Index:
1. Messages where Clawdbot learned something (contains "I'll remember", "noted", "got it")
2. User corrections ("actually it's...", "no, the correct...")
3. Explicit requests ("remember that...", "don't forget...")

```python
def extract_memorable_from_sessions():
    """Extract memorable moments from session transcripts."""
    import json
    import glob

    session_files = glob.glob(os.path.expanduser(
        "~/.clawdbot/agents/*/sessions/*.jsonl"
    ))

    memorable_patterns = [
        r"I'll remember",
        r"I've noted",
        r"remember that",
        r"don't forget",
        r"actually it's",
        r"the correct",
        r"important:",
        r"key point:",
    ]

    memories = []
    for fpath in session_files:
        with open(fpath) as f:
            for line in f:
                try:
                    entry = json.loads(line)
                    # Check if it matches memorable patterns
                    # Extract and store
                except:
                    pass

    return memories
```

**Recommendation:** Skip this for now. The markdown files contain the curated important stuff. Sessions are backup/audit trail.

---

### Phase 4: Wire Into Clawdbot Runtime
**Time:** 30-60 minutes
**Risk:** Medium (changes bot behavior)

This requires modifying Clawdbot's code to use the new memory system.

#### 4.1 Create Memory Interface Module

Location: `~/.clawdbot/workspace/memory_interface.py`

```python
"""
Memory interface for Clawdbot runtime.
Import this in your bot's message handler.
"""

from memory_retrieval import (
    search_memories,
    add_memory,
    get_recent_memories,
    supersede_memory
)

def get_context_for_message(message, guild_id, channel_id, user_id):
    """
    Get relevant memory context for responding to a message.
    Call this before generating a response.
    """
    # Search for relevant memories
    results = search_memories(
        query=message,
        guild_id=guild_id,
        limit=5
    )

    if not results:
        # Fall back to recent memories for this guild
        results = get_recent_memories(guild_id=guild_id, limit=3)

    # Format for context injection
    context_lines = []
    for r in results:
        context_lines.append(f"[Memory] {r['content']}")

    return "\n".join(context_lines)

def should_remember(response_text):
    """
    Check if the bot's response indicates something should be remembered.
    """
    triggers = [
        "i'll remember",
        "i've noted",
        "got it",
        "noted",
        "understood",
    ]
    lower = response_text.lower()
    return any(t in lower for t in triggers)

def extract_and_store(message, response, guild_id, channel_id, user_id):
    """
    If the response indicates learning, extract and store the memory.
    """
    if not should_remember(response):
        return None

    # The message itself is what should be remembered
    memory_id = add_memory(
        content=message,
        memory_type="fact",
        guild_id=guild_id,
        channel_id=channel_id,
        user_id=user_id,
        source="conversation"
    )

    return memory_id
```

#### 4.2 Integration Points

In Clawdbot's message handler:

```python
# Before generating response:
memory_context = get_context_for_message(
    message=user_message,
    guild_id=str(message.guild.id) if message.guild else None,
    channel_id=str(message.channel.id),
    user_id=str(message.author.id)
)

# Inject into prompt:
system_prompt = f"""
{base_system_prompt}

Relevant memories:
{memory_context}
"""

# After generating response:
extract_and_store(
    message=user_message,
    response=bot_response,
    guild_id=...,
    channel_id=...,
    user_id=...
)
```

---

### Phase 5: Deprecate Old System
**Time:** 5 minutes
**Risk:** Low (keep files, just stop using)

Once the new system is validated:

1. **Keep old files** - Don't delete markdown files, they're human-readable backup
2. **Stop writing to old locations** - New memories go to SQLite only
3. **Archive old chunks table** - Rename to `chunks_archive`

```sql
-- Archive old chunks table (don't delete)
ALTER TABLE chunks RENAME TO chunks_archive;
ALTER TABLE chunks_fts RENAME TO chunks_fts_archive;
```

**DO NOT** delete the old files until you've run the new system for at least 2 weeks without issues.

---

## Validation Checkpoints

### After Each Phase:

| Check | Command | Expected |
|-------|---------|----------|
| Memory count | `python memory-retrieval.py stats` | Count increases |
| Search works | `python memory-retrieval.py search "Das"` | Returns results |
| Guild scoping | `python memory-retrieval.py search "remix" --guild 1449158500344270961` | Only The Hive results |
| FTS works | `sqlite3 ~/.clawdbot/memory/main.sqlite "SELECT COUNT(*) FROM memories_fts"` | Matches memories count |

### Integration Test (After Phase 4):

1. Send message to Clawdbot: "What do you know about Das?"
2. Verify response includes Genre Universe info
3. Send message: "Remember that Das prefers releasing on Fridays"
4. Search: `python memory-retrieval.py search "Das Friday"`
5. Verify new memory exists

---

## Rollback Plan

If anything goes wrong:

```bash
# 1. Restore from backup
BACKUP_DIR=~/.clawdbot/backups/pre-migration-YYYYMMDD-HHMMSS

# Restore databases
cp "$BACKUP_DIR/main.sqlite" ~/.clawdbot/memory/
cp "$BACKUP_DIR/imessage.sqlite" ~/.clawdbot/memory/

# Restore markdown (if needed)
cp -r "$BACKUP_DIR/memory-markdown/"* ~/.clawdbot/workspace/memory/

# 2. Drop new tables (if needed)
sqlite3 ~/.clawdbot/memory/main.sqlite "
DROP TABLE IF EXISTS memories;
DROP TABLE IF EXISTS memories_fts;
"

# 3. Restart Clawdbot
# (your restart command here)
```

---

## Timeline

| Phase | Duration | Dependency |
|-------|----------|------------|
| Phase 0: Backup | 5 min | None |
| Phase 1: Markdown migration | 15 min | Phase 0 |
| Phase 2: Chunks migration | 10 min | Phase 1 |
| Phase 3: Sessions (optional) | 30-60 min | Phase 2 |
| Phase 4: Runtime integration | 30-60 min | Phase 2 |
| Phase 5: Deprecate old | 5 min | Phase 4 validated |

**Total: 1-2 hours** (excluding Phase 3)

---

## Post-Migration Maintenance

### Weekly (Cron):
```bash
# Add to crontab
0 3 * * 0 cd ~/.clawdbot/workspace && python3 memory-maintenance.py run >> ~/.clawdbot/logs/memory-maintenance.log 2>&1
```

### Monthly:
- Review `python memory-maintenance.py stats`
- Check for memories stuck at low confidence
- Verify per-guild counts are balanced

### Quarterly:
- Full backup
- Review if session indexing is needed
- Consider re-embedding if switching embedding models

---

## Files Reference

| File | Purpose |
|------|---------|
| `migrate-memories.py` | One-time migration script |
| `memory-retrieval.py` | Search/add/supersede API |
| `memory-maintenance.py` | Decay/prune/limits |
| `memory_interface.py` | Runtime integration (create in Phase 4) |
| `MEMORY-MIGRATION-PLAN.md` | This document |

---

## Success Criteria

The migration is complete when:

1. ✅ All markdown files have been processed (key facts extracted)
2. ✅ Old chunks are copied to memories table with embeddings
3. ✅ Search returns relevant results for test queries
4. ✅ Guild scoping works correctly
5. ✅ Clawdbot uses new memory in responses
6. ✅ "Remember this" creates new memories
7. ✅ Weekly maintenance cron is running
8. ✅ Old system files are preserved but not actively used

---

## Questions Before Starting

1. **Do you want to migrate session transcripts?** (Recommended: No, for now)
2. **Which guild should we test first?** (Recommended: Das server - most memories)
3. **When do you want to do the runtime integration?** (Requires Clawdbot restart)