clawdbot-workspace/MEMORY-MIGRATION-PLAN.md
2026-01-28 23:00:58 -05:00

14 KiB

Clawdbot Memory System Migration Plan

Created: 2026-01-27 Status: Ready to Execute Risk Level: Low (old system preserved, incremental migration)


Current State Inventory

Asset Location Size Records
Main SQLite ~/.clawdbot/memory/main.sqlite 9.0 MB 56 chunks
iMessage SQLite ~/.clawdbot/memory/imessage.sqlite 8.1 MB ~42 chunks
Markdown files ~/.clawdbot/workspace/memory/*.md 17 files ~60KB total
INDEX.json ~/.clawdbot/workspace/memory/INDEX.json 7.1 KB 6 categories, 20 nodes
Session transcripts ~/.clawdbot/agents/*/sessions/*.jsonl 23 files 5,593 lines
New memories table ~/.clawdbot/memory/main.sqlite - 36 records (migrated)

Migration Phases

Phase 0: Backup Everything (REQUIRED FIRST)

Time: 5 minutes Risk: None

# Create timestamped backup directory
BACKUP_DIR=~/.clawdbot/backups/pre-migration-$(date +%Y%m%d-%H%M%S)
mkdir -p "$BACKUP_DIR"

# Backup SQLite databases
cp ~/.clawdbot/memory/main.sqlite "$BACKUP_DIR/"
cp ~/.clawdbot/memory/imessage.sqlite "$BACKUP_DIR/"

# Backup markdown memory files
cp -r ~/.clawdbot/workspace/memory "$BACKUP_DIR/memory-markdown"

# Backup session transcripts
cp -r ~/.clawdbot/agents "$BACKUP_DIR/agents"

# Backup config
cp ~/.clawdbot/clawdbot.json "$BACKUP_DIR/"

# Verify backup
echo "Backup created at: $BACKUP_DIR"
ls -la "$BACKUP_DIR"

Checkpoint: Verify backup directory has all files before proceeding.


Phase 1: Complete Markdown Migration

Time: 15 minutes Risk: Low (additive only)

We already migrated CRITICAL-REFERENCE.md, Genre Universe, and Remix Sniper. Now migrate the remaining files.

Files to Migrate:

File Content Type Priority
2026-01-14.md Daily log - GOG setup Medium
2026-01-15.md Daily log - agent-browser, Reonomy High
2026-01-25.md Security incident - Reed breach High
2026-01-26.md Daily log - Reonomy v13 Medium
2026-01-19-backup-system.md Backup system setup Medium
2026-01-19-cloud-backup.md Cloud backup config Medium
burton-method-research-intel.md Competitor research High
contacts-leaf-gc.md Contact info Medium
contacts-skivals-gc.md Contact info Medium
imessage-rules.md Security rules High
imessage-security-rules.md Security rules High
remi-self-healing.md Remix Sniper healing Medium
voice-ai-comparison-2026.md Research Low
accounts.md Accounts Low

Migration Script Extension:

# Add to migrate-memories.py

def migrate_daily_logs(db):
    """Migrate daily log files."""
    memories = []

    # 2026-01-14 - GOG setup
    memories.append((
        "GOG (Google Workspace CLI) configured with 3 accounts: jake@burtonmethod.com, jake@localbosses.org, jakeshore98@gmail.com",
        "fact", None, "2026-01-14.md"
    ))

    # 2026-01-15 - agent-browser
    memories.append((
        "agent-browser is Vercel Labs headless browser CLI with ref-based navigation, semantic locators, state persistence. Commands: open, snapshot -i, click @ref, type @ref 'text'",
        "fact", None, "2026-01-15.md"
    ))
    memories.append((
        "Reonomy scraper attempted with agent-browser. URL pattern discovered: ownership tab in search filters allows searching by Owner Contact Information.",
        "fact", None, "2026-01-15.md"
    ))

    # 2026-01-25 - Security incident
    memories.append((
        "SECURITY INCIDENT 2026-01-25: Reed breach. Contact memory poisoning. Password leaked. Rules updated. Rotate all passwords after breach.",
        "security", None, "2026-01-25.md"
    ))

    # ... continue for all files

    for content, mtype, guild_id, source in memories:
        insert_memory(db, content, mtype, source, guild_id)

    return len(memories)

def migrate_security_rules(db):
    """Migrate iMessage security rules."""
    memories = [
        ("iMessage password gating: Password JAJAJA2026 required. Mention gating (Buba). Never reveal password in any context.", "security", None),
        ("iMessage trust chain: Only trust Jake (914-500-9208). Everyone else must verify with Jake first, then chat-only mode with password.", "security", None),
    ]
    for content, mtype, guild_id in memories:
        insert_memory(db, content, mtype, "imessage-security-rules.md", guild_id)
    return len(memories)

def migrate_contacts(db):
    """Migrate contact information (non-sensitive parts only)."""
    memories = [
        ("Contact: Leaf GC - group chat contact for Leaf-related communications", "relationship", None),
        ("Contact: Skivals GC - group chat contact for Skivals-related communications", "relationship", None),
    ]
    for content, mtype, guild_id in memories:
        insert_memory(db, content, mtype, "contacts.md", guild_id)
    return len(memories)

Checkpoint: Run python memory-retrieval.py stats and verify count increased.


Phase 2: Migrate Existing Chunks (Vector Embeddings)

Time: 10 minutes Risk: Low (copies data, doesn't delete)

The existing chunks table has 56 pre-embedded chunks. We should copy these to memories table to preserve the embeddings.

-- Copy chunks to memories (preserving embeddings)
INSERT INTO memories (
    content,
    embedding,
    memory_type,
    source,
    source_file,
    created_at,
    confidence
)
SELECT
    text as content,
    embedding,
    'fact' as memory_type,
    'chunks_migration' as source,
    path as source_file,
    COALESCE(updated_at, unixepoch()) as created_at,
    1.0 as confidence
FROM chunks
WHERE NOT EXISTS (
    SELECT 1 FROM memories m
    WHERE m.source_file = chunks.path
    AND m.source = 'chunks_migration'
);

Checkpoint: Verify with SELECT COUNT(*) FROM memories WHERE source = 'chunks_migration'


Phase 3: Session Transcript Indexing (Optional - Later)

Time: 30-60 minutes Risk: Medium (large data volume)

Session transcripts contain conversation history. This is valuable but voluminous.

Strategy: Selective Indexing

Don't index every message. Index:

  1. Messages where Clawdbot learned something (contains "I'll remember", "noted", "got it")
  2. User corrections ("actually it's...", "no, the correct...")
  3. Explicit requests ("remember that...", "don't forget...")
def extract_memorable_from_sessions():
    """Extract memorable moments from session transcripts."""
    import json
    import glob

    session_files = glob.glob(os.path.expanduser(
        "~/.clawdbot/agents/*/sessions/*.jsonl"
    ))

    memorable_patterns = [
        r"I'll remember",
        r"I've noted",
        r"remember that",
        r"don't forget",
        r"actually it's",
        r"the correct",
        r"important:",
        r"key point:",
    ]

    memories = []
    for fpath in session_files:
        with open(fpath) as f:
            for line in f:
                try:
                    entry = json.loads(line)
                    # Check if it matches memorable patterns
                    # Extract and store
                except:
                    pass

    return memories

Recommendation: Skip this for now. The markdown files contain the curated important stuff. Sessions are backup/audit trail.


Phase 4: Wire Into Clawdbot Runtime

Time: 30-60 minutes Risk: Medium (changes bot behavior)

This requires modifying Clawdbot's code to use the new memory system.

4.1 Create Memory Interface Module

Location: ~/.clawdbot/workspace/memory_interface.py

"""
Memory interface for Clawdbot runtime.
Import this in your bot's message handler.
"""

from memory_retrieval import (
    search_memories,
    add_memory,
    get_recent_memories,
    supersede_memory
)

def get_context_for_message(message, guild_id, channel_id, user_id):
    """
    Get relevant memory context for responding to a message.
    Call this before generating a response.
    """
    # Search for relevant memories
    results = search_memories(
        query=message,
        guild_id=guild_id,
        limit=5
    )

    if not results:
        # Fall back to recent memories for this guild
        results = get_recent_memories(guild_id=guild_id, limit=3)

    # Format for context injection
    context_lines = []
    for r in results:
        context_lines.append(f"[Memory] {r['content']}")

    return "\n".join(context_lines)

def should_remember(response_text):
    """
    Check if the bot's response indicates something should be remembered.
    """
    triggers = [
        "i'll remember",
        "i've noted",
        "got it",
        "noted",
        "understood",
    ]
    lower = response_text.lower()
    return any(t in lower for t in triggers)

def extract_and_store(message, response, guild_id, channel_id, user_id):
    """
    If the response indicates learning, extract and store the memory.
    """
    if not should_remember(response):
        return None

    # The message itself is what should be remembered
    memory_id = add_memory(
        content=message,
        memory_type="fact",
        guild_id=guild_id,
        channel_id=channel_id,
        user_id=user_id,
        source="conversation"
    )

    return memory_id

4.2 Integration Points

In Clawdbot's message handler:

# Before generating response:
memory_context = get_context_for_message(
    message=user_message,
    guild_id=str(message.guild.id) if message.guild else None,
    channel_id=str(message.channel.id),
    user_id=str(message.author.id)
)

# Inject into prompt:
system_prompt = f"""
{base_system_prompt}

Relevant memories:
{memory_context}
"""

# After generating response:
extract_and_store(
    message=user_message,
    response=bot_response,
    guild_id=...,
    channel_id=...,
    user_id=...
)

Phase 5: Deprecate Old System

Time: 5 minutes Risk: Low (keep files, just stop using)

Once the new system is validated:

  1. Keep old files - Don't delete markdown files, they're human-readable backup
  2. Stop writing to old locations - New memories go to SQLite only
  3. Archive old chunks table - Rename to chunks_archive
-- Archive old chunks table (don't delete)
ALTER TABLE chunks RENAME TO chunks_archive;
ALTER TABLE chunks_fts RENAME TO chunks_fts_archive;

DO NOT delete the old files until you've run the new system for at least 2 weeks without issues.


Validation Checkpoints

After Each Phase:

Check Command Expected
Memory count python memory-retrieval.py stats Count increases
Search works python memory-retrieval.py search "Das" Returns results
Guild scoping python memory-retrieval.py search "remix" --guild 1449158500344270961 Only The Hive results
FTS works sqlite3 ~/.clawdbot/memory/main.sqlite "SELECT COUNT(*) FROM memories_fts" Matches memories count

Integration Test (After Phase 4):

  1. Send message to Clawdbot: "What do you know about Das?"
  2. Verify response includes Genre Universe info
  3. Send message: "Remember that Das prefers releasing on Fridays"
  4. Search: python memory-retrieval.py search "Das Friday"
  5. Verify new memory exists

Rollback Plan

If anything goes wrong:

# 1. Restore from backup
BACKUP_DIR=~/.clawdbot/backups/pre-migration-YYYYMMDD-HHMMSS

# Restore databases
cp "$BACKUP_DIR/main.sqlite" ~/.clawdbot/memory/
cp "$BACKUP_DIR/imessage.sqlite" ~/.clawdbot/memory/

# Restore markdown (if needed)
cp -r "$BACKUP_DIR/memory-markdown/"* ~/.clawdbot/workspace/memory/

# 2. Drop new tables (if needed)
sqlite3 ~/.clawdbot/memory/main.sqlite "
DROP TABLE IF EXISTS memories;
DROP TABLE IF EXISTS memories_fts;
"

# 3. Restart Clawdbot
# (your restart command here)

Timeline

Phase Duration Dependency
Phase 0: Backup 5 min None
Phase 1: Markdown migration 15 min Phase 0
Phase 2: Chunks migration 10 min Phase 1
Phase 3: Sessions (optional) 30-60 min Phase 2
Phase 4: Runtime integration 30-60 min Phase 2
Phase 5: Deprecate old 5 min Phase 4 validated

Total: 1-2 hours (excluding Phase 3)


Post-Migration Maintenance

Weekly (Cron):

# Add to crontab
0 3 * * 0 cd ~/.clawdbot/workspace && python3 memory-maintenance.py run >> ~/.clawdbot/logs/memory-maintenance.log 2>&1

Monthly:

  • Review python memory-maintenance.py stats
  • Check for memories stuck at low confidence
  • Verify per-guild counts are balanced

Quarterly:

  • Full backup
  • Review if session indexing is needed
  • Consider re-embedding if switching embedding models

Files Reference

File Purpose
migrate-memories.py One-time migration script
memory-retrieval.py Search/add/supersede API
memory-maintenance.py Decay/prune/limits
memory_interface.py Runtime integration (create in Phase 4)
MEMORY-MIGRATION-PLAN.md This document

Success Criteria

The migration is complete when:

  1. All markdown files have been processed (key facts extracted)
  2. Old chunks are copied to memories table with embeddings
  3. Search returns relevant results for test queries
  4. Guild scoping works correctly
  5. Clawdbot uses new memory in responses
  6. "Remember this" creates new memories
  7. Weekly maintenance cron is running
  8. Old system files are preserved but not actively used

Questions Before Starting

  1. Do you want to migrate session transcripts? (Recommended: No, for now)
  2. Which guild should we test first? (Recommended: Das server - most memories)
  3. When do you want to do the runtime integration? (Requires Clawdbot restart)