1.7 KiB
2026-02-25 Session Notes
Predictive Memory Scorer Spec Review & Feedback Integration
Nicholai received detailed external code review feedback on the predictive memory scorer specification (docs/wip/predictive-memory-scorer.md). The reviewer provided high-level validation of the overall architecture—calling it "exceptionally well-thought-out" and praising the North Star vision ("difference between a tool that remembers and a mind that persists")—while identifying five concrete technical refinements.
Feedback Highlights
The reviewer validated three core design strengths: (1) Dynamic baseline weighting via RRF with EMA success rate ensures graceful degradation; (2) Zero-dependency Rust sidecar keeps binary size negligible and inference sub-millisecond; (3) Outcome-driven labels via continuity scorer avoids hand-labeling overhead.
Five Technical Refinements Identified
- HashTrick bucket count: Bump from 4,096 to 16,384 to reduce collisions in code-heavy memories (still ~1.1M parameters, fits L2 cache)
- Listwise loss temperature: Use T < 1.0 to sharpen soft label distributions; flat P_true from continuity scorer soft labels requires aggressive sharpening
- Negative sample filtering: Apply cosine similarity filter before assigning strict 0.0 labels; prevents training model to replicate baseline mistakes
- Drift reset strategy: Use replay buffer (80% recent + 20% historical by continuity score) instead of 2x learning rate to avoid catastrophic forgetting
- RRF constant k: Drop from 60 to 10–15 for ~50 candidate memories; k=60 compresses rank variance when candidate list is small
All five recommendations are sound and likely to be incorporated into the specification.