.agents/memory/2026-03-01-incremental-embedding-refresh-tracker-implementati.md

40 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 2026-03-01 Session Notes
## Incremental Embedding Refresh Tracker Implementation Plan
Received detailed plan for implementing a background polling loop to detect and refresh stale/missing embeddings in the Signet memory pipeline. The tracker runs independently and processes embeddings in small batches to avoid overwhelming the system.
### Architecture Decisions
The tracker uses a **setTimeout chain** instead of setInterval for natural backpressure — each cycle schedules the next after current processing completes, rather than on a fixed timer. This prevents queue buildup if embedding fetches slow down.
### Core Mechanism
Polling loop:
1. Check embedding provider health (uses existing 30s cache)
2. Query stale embeddings: missing embeddings, content hash mismatches, or model drift
3. Fetch embeddings sequentially with 30s timeout per request
4. Batch write successful results in single transaction
5. Idempotent via `ON CONFLICT(content_hash) DO UPDATE`
### Configuration
Three parameters in `PipelineEmbeddingTrackerConfig`:
- `enabled` (boolean, default true)
- `pollMs` (5000ms default, clamped 100060000ms)
- `batchSize` (8 default, clamped 120)
### Integration Points
1. **types.ts** — Add `PipelineEmbeddingTrackerConfig` interface to `PipelineV2Config`
2. **memory-config.ts** — Parse tracker config using existing `clampPositive` pattern
3. **embedding-tracker.ts** (new ~200 LOC) — Core polling module with graceful shutdown
4. **daemon.ts** — Start tracker after pipeline init, stop before DB cleanup, enhance `/api/embeddings/status` endpoint
### Edge Cases Handled
Race conditions on concurrent remember/update (idempotent write), provider downtime (retries next cycle), model switching (old embeddings deleted by hash), large backlogs (intentional backpressure at ~100/min), empty DB (returns immediately).
### Next Steps
Implementation in order: types → memory-config → new embedding-tracker module → daemon wiring. Verification via build, typecheck, lint, then manual testing with stale embeddings.