2026-02-25T10-06-47_auto_memory/memories.db-wal, memory/2026-02-25-ingestio
This commit is contained in:
parent
706e487735
commit
1144a04361
32
memory/2026-02-25-ingestion-pipeline-cherry-pick-planning.md
Normal file
32
memory/2026-02-25-ingestion-pipeline-cherry-pick-planning.md
Normal file
@ -0,0 +1,32 @@
|
||||
# 2026-02-25 Session Notes
|
||||
|
||||
## Ingestion Pipeline Cherry-Pick Planning
|
||||
|
||||
Nicholai outlined a detailed plan to cherry-pick the document ingestion pipeline from PR #25 (web3-identity branch) onto a clean branch off main. The ingestion pipeline at `packages/core/src/ingest/` parses markdown, PDFs, code repositories, Slack/Discord exports, and git history into Signet memories via LLM extraction (Ollama).
|
||||
|
||||
### Scope
|
||||
|
||||
Copying 14 self-contained files from the ingest directory, plus one migration file to be renumbered from 014 to 013. The pipeline includes parsers for multiple formats (markdown, PDF, code, Discord, Slack, entire.io sessions), a chunker, extractors for LLM processing, and provenance tracking for deduplication.
|
||||
|
||||
### Key Fixes Required
|
||||
|
||||
Seven code quality fixes were identified from the code review:
|
||||
|
||||
1. **Prompt injection protection**: Wrap untrusted content in XML delimiters across three extractor files
|
||||
2. **Type safety**: Replace inline `db as {...}` casts with a formal `DatabaseLike` interface
|
||||
3. **PDF parser typing**: Remove `as any` by defining interfaces for pdf-parse v2 API
|
||||
4. **Non-null assertions**: Replace `!` with explicit guards in slack-parser.ts
|
||||
5. **Error logging**: Add warn-level logging for silent memory insert failures
|
||||
6. **Validation**: Add field presence checks before casting Discord/Slack exports
|
||||
7. **Cleanup**: Remove unused loop variable in markdown-parser.ts
|
||||
|
||||
### Next Steps
|
||||
|
||||
Implementation plan: create branch `nicholai/ingest-pipeline` off main, copy files, apply all 7 fixes, register migration, then build/typecheck/lint/test to verify.
|
||||
|
||||
## Technical Notes
|
||||
|
||||
- Migration renumbering: main ends at 012-scheduled-tasks, so ingestion becomes 013
|
||||
- No package.json changes needed (pdf-parse is optional dynamic import)
|
||||
- No daemon routes or CLI changes included in this cherry-pick
|
||||
- Branch names: source is web3-identity, target is nicholai/ingest-pipeline
|
||||
Binary file not shown.
Binary file not shown.
Loading…
x
Reference in New Issue
Block a user