2.0 KiB
2026-03-01 Session Notes
Incremental Embedding Refresh Tracker Implementation Plan
Received detailed plan for implementing a background polling loop to detect and refresh stale/missing embeddings in the Signet memory pipeline. The tracker runs independently and processes embeddings in small batches to avoid overwhelming the system.
Architecture Decisions
The tracker uses a setTimeout chain instead of setInterval for natural backpressure — each cycle schedules the next after current processing completes, rather than on a fixed timer. This prevents queue buildup if embedding fetches slow down.
Core Mechanism
Polling loop:
- Check embedding provider health (uses existing 30s cache)
- Query stale embeddings: missing embeddings, content hash mismatches, or model drift
- Fetch embeddings sequentially with 30s timeout per request
- Batch write successful results in single transaction
- Idempotent via
ON CONFLICT(content_hash) DO UPDATE
Configuration
Three parameters in PipelineEmbeddingTrackerConfig:
enabled(boolean, default true)pollMs(5000ms default, clamped 1000–60000ms)batchSize(8 default, clamped 1–20)
Integration Points
- types.ts — Add
PipelineEmbeddingTrackerConfiginterface toPipelineV2Config - memory-config.ts — Parse tracker config using existing
clampPositivepattern - embedding-tracker.ts (new ~200 LOC) — Core polling module with graceful shutdown
- daemon.ts — Start tracker after pipeline init, stop before DB cleanup, enhance
/api/embeddings/statusendpoint
Edge Cases Handled
Race conditions on concurrent remember/update (idempotent write), provider downtime (retries next cycle), model switching (old embeddings deleted by hash), large backlogs (intentional backpressure at ~100/min), empty DB (returns immediately).
Next Steps
Implementation in order: types → memory-config → new embedding-tracker module → daemon wiring. Verification via build, typecheck, lint, then manual testing with stale embeddings.