diff --git a/memory/2026-02-25-ingest-pipeline-refactoring-kickoff.md b/memory/2026-02-25-ingest-pipeline-refactoring-kickoff.md new file mode 100644 index 000000000..d0f52683d --- /dev/null +++ b/memory/2026-02-25-ingest-pipeline-refactoring-kickoff.md @@ -0,0 +1,30 @@ +# 2026-02-25 Session Notes + +## Ingest Pipeline Refactoring Kickoff + +Nicholai initiated work on the `nicholai/ingest-pipeline` branch with a comprehensive refactoring plan for `packages/core/src/ingest/` (5631 LOC, 14 files). The core problem: each extractor wraps its own Ollama HTTP client instead of using the daemon's `LlmProvider` interface, hardcoding ingest to Ollama and blocking Claude Code support. + +### Plan Overview + +The refactoring has seven major components: +1. **Move `LlmProvider` interface** from `packages/daemon/src/pipeline/provider.ts` to `packages/core/src/types.ts` for dependency isolation +2. **Refactor extractors** (`extractor.ts`, `chat-extractor.ts`, `entire-extractor.ts`) to accept `LlmProvider` instead of Ollama config +3. **Rename `ollama-client.ts`** → `response-parser.ts` after deleting HTTP-specific code +4. **Unify extractor configs** by consolidating three identical config interfaces into a single `ExtractionOptions` type +5. **Extract `findGit()`** utility from code/entire parsers into new `git-utils.ts` +6. **Extract chat utilities** (time-gap batching, thread grouping constants) into new `chat-utils.ts` for slack/discord parsers +7. **Update `ingestPath()` entry point** to accept `provider: LlmProvider` parameter + +### Key Architectural Decisions + +- Factory implementations (`createOllamaProvider`, `createClaudeCodeProvider`) remain in daemon; only the interface moves to core +- Daemon's `llm.ts` singleton stays in daemon +- Domain-specific logic (prompts, parsing, message filtering) is intentionally NOT deduplicated +- Backwards compatibility via type aliases for old config interface names +- Circular dependency avoided by moving interface only, not implementations + +### Scope Boundaries + +Will NOT change: prompt builders, `parseExtractionResponse()`, format-specific message filtering, `threadToSection()` rendering, sequential processing loops, `index.ts` orchestration, chunker/provenance/parser logic. + +Verification plan includes typecheck, build ordering (core → daemon), full test suite, and biome linting. \ No newline at end of file diff --git a/memory/memories.db-shm b/memory/memories.db-shm index dbd15ee80..5f8961189 100644 Binary files a/memory/memories.db-shm and b/memory/memories.db-shm differ diff --git a/memory/memories.db-wal b/memory/memories.db-wal index ebc671150..e109d4983 100644 Binary files a/memory/memories.db-wal and b/memory/memories.db-wal differ