.agents/memory/2026-02-24-memory-content-size-guardrails-implementation.md

19 lines
1.9 KiB
Markdown

# 2026-02-24 Session Notes
## Memory Content Size Guardrails Implementation
Session accepted and documented a comprehensive plan to add guardrails for memory ingestion and recall in the Signet daemon. Beta testers' OpenClaw agents were manually embedding entire files (10K-100K+ chars) as single memories, bloating the database and demolishing context windows on recall. Additionally, no validation existed for embedding dimensions, risking silent corruption of the vec0 virtual table (hardcoded to 768-dimensional floats).
The implementation plan spans six files across core, daemon, and pipeline packages:
**Config additions:** New `PipelineGuardrailsConfig` interface added to types.ts with three configurable thresholds (maxContentChars: 500, chunkTargetChars: 300, recallTruncateChars: 500).
**Auto-chunking logic:** Memories exceeding maxContentChars are split into sentence-aware chunks, deduplicated via content hash, inserted as individual memories linked to a chunk_group entity, and embedded asynchronously. Normal-sized memories flow unchanged.
**Dimension validation:** After embedding generation, vectors are validated against cfg.embedding.dimensions before insertion. Mismatches are logged as warnings and skipped to prevent corrupting the vec0 virtual table. This validation is added to three code paths: daemon remember endpoint, document-worker pipeline, and extraction worker.
**Recall truncation:** Added as a safety net for existing oversized memories. Results are truncated at recallTruncateChars with metadata indicating original length and truncation status.
**Graph linking:** Chunks linked to chunk_group entities via memory_entity_mentions table, allowing existing graph boost logic to surface related chunks during recall.
Verification plan includes testing chunking behavior, entity graph structure, dimension validation, and sibling chunk surfacing via graph boost.