clawdbot-workspace/buba-memory-system-spec.md

# Buba Memory System — Build Spec for Claude Code

## Goal
Build a searchable memory index for Clawdbot (Buba) that indexes Discord messages and session transcripts, enabling semantic search over conversation history.

---

## Data Sources

1. **Discord messages** — accessible via Clawdbot's `message` tool (`action=search`, `action=read`)
2. **Session transcripts** — JSONL files in `~/.clawdbot/agents/main/sessions/`
3. **Existing memory files** — `~/.clawdbot/workspace/memory/*.md` and `MEMORY.md`

---

## Core Requirements

### 1. Indexer
- Crawl Discord channels and extract messages (use Clawdbot's message tool or Discord API)
- Parse session JSONL files for conversation history
- Extract: timestamp, author, channel/source, content, message ID
- Chunk long conversations into searchable segments
- Generate embeddings for semantic search (OpenAI `text-embedding-3-small` or local)

### 2. Storage
- SQLite or similar for metadata + vector store
- Store in `~/.clawdbot/workspace/memory-index/` or similar
- Support incremental updates (track last indexed message/timestamp)

### 3. Search Interface
- Semantic search (query → top-k relevant messages/chunks)
- Filters: date range, channel, author
- Return: content, source, timestamp, relevance score

### 4. Clawdbot Integration
- Either: CLI tool Buba can call via `exec`
- Or: extend `memory_search` tool to query this index
- Output format Buba can parse and use in responses

---

## Nice-to-haves
- Auto-summarization of indexed content
- Entity extraction (projects, people, decisions)
- Deduplication across sources
- Time-aware queries ("what did we discuss last Tuesday")

---

## Tech Stack Suggestions
- Python (matches existing PageIndex framework in `~/.clawdbot/workspace/pageindex-framework/`)
- ChromaDB or FAISS for vector storage
- OpenAI embeddings or local alternative

---

## Test Data Available
- Session files: `~/.clawdbot/agents/main/sessions/*.jsonl`
- Discord history: accessible via Clawdbot message tool
- Existing memory: `~/.clawdbot/workspace/memory/`

---

## Why This Matters
Currently Buba relies on manually logging things to memory files, which means context gets lost if not explicitly written down. With this system, Buba can search actual conversation history and have much better recall of past decisions, preferences, and project context.

---

## Output Format Example (for search results)

```json
{
  "query": "remix scoring algorithm",
  "results": [
    {
      "content": "We decided to weight TikTok velocity at 2x for the remix scoring...",
      "source": "discord",
      "channel": "#general",
      "author": "JakeShore",
      "timestamp": "2026-01-15T14:32:00Z",
      "message_id": "1234567890",
      "score": 0.89
    }
  ]
}
```

---

## File Locations Reference
- Clawdbot workspace: `~/.clawdbot/workspace/`
- Session transcripts: `~/.clawdbot/agents/main/sessions/`
- PageIndex framework: `~/.clawdbot/workspace/pageindex-framework/`
- Memory files: `~/.clawdbot/workspace/memory/`