2026-02-25T10-06-47_auto_memory/memories.db-wal, memory/2026-02-25-ingestio

This commit is contained in:
Nicholai Vogel 2026-02-25 03:06:47 -07:00
parent 706e487735
commit 1144a04361
3 changed files with 32 additions and 0 deletions

View File

@ -0,0 +1,32 @@
# 2026-02-25 Session Notes
## Ingestion Pipeline Cherry-Pick Planning
Nicholai outlined a detailed plan to cherry-pick the document ingestion pipeline from PR #25 (web3-identity branch) onto a clean branch off main. The ingestion pipeline at `packages/core/src/ingest/` parses markdown, PDFs, code repositories, Slack/Discord exports, and git history into Signet memories via LLM extraction (Ollama).
### Scope
Copying 14 self-contained files from the ingest directory, plus one migration file to be renumbered from 014 to 013. The pipeline includes parsers for multiple formats (markdown, PDF, code, Discord, Slack, entire.io sessions), a chunker, extractors for LLM processing, and provenance tracking for deduplication.
### Key Fixes Required
Seven code quality fixes were identified from the code review:
1. **Prompt injection protection**: Wrap untrusted content in XML delimiters across three extractor files
2. **Type safety**: Replace inline `db as {...}` casts with a formal `DatabaseLike` interface
3. **PDF parser typing**: Remove `as any` by defining interfaces for pdf-parse v2 API
4. **Non-null assertions**: Replace `!` with explicit guards in slack-parser.ts
5. **Error logging**: Add warn-level logging for silent memory insert failures
6. **Validation**: Add field presence checks before casting Discord/Slack exports
7. **Cleanup**: Remove unused loop variable in markdown-parser.ts
### Next Steps
Implementation plan: create branch `nicholai/ingest-pipeline` off main, copy files, apply all 7 fixes, register migration, then build/typecheck/lint/test to verify.
## Technical Notes
- Migration renumbering: main ends at 012-scheduled-tasks, so ingestion becomes 013
- No package.json changes needed (pdf-parse is optional dynamic import)
- No daemon routes or CLI changes included in this cherry-pick
- Branch names: source is web3-identity, target is nicholai/ingest-pipeline

Binary file not shown.

Binary file not shown.