.agents/memory/2026-03-01-incremental-embedding-refresh-tracker-implementati.md

2.0 KiB
Raw Permalink Blame History

2026-03-01 Session Notes

Incremental Embedding Refresh Tracker Implementation Plan

Received detailed plan for implementing a background polling loop to detect and refresh stale/missing embeddings in the Signet memory pipeline. The tracker runs independently and processes embeddings in small batches to avoid overwhelming the system.

Architecture Decisions

The tracker uses a setTimeout chain instead of setInterval for natural backpressure — each cycle schedules the next after current processing completes, rather than on a fixed timer. This prevents queue buildup if embedding fetches slow down.

Core Mechanism

Polling loop:

  1. Check embedding provider health (uses existing 30s cache)
  2. Query stale embeddings: missing embeddings, content hash mismatches, or model drift
  3. Fetch embeddings sequentially with 30s timeout per request
  4. Batch write successful results in single transaction
  5. Idempotent via ON CONFLICT(content_hash) DO UPDATE

Configuration

Three parameters in PipelineEmbeddingTrackerConfig:

  • enabled (boolean, default true)
  • pollMs (5000ms default, clamped 100060000ms)
  • batchSize (8 default, clamped 120)

Integration Points

  1. types.ts — Add PipelineEmbeddingTrackerConfig interface to PipelineV2Config
  2. memory-config.ts — Parse tracker config using existing clampPositive pattern
  3. embedding-tracker.ts (new ~200 LOC) — Core polling module with graceful shutdown
  4. daemon.ts — Start tracker after pipeline init, stop before DB cleanup, enhance /api/embeddings/status endpoint

Edge Cases Handled

Race conditions on concurrent remember/update (idempotent write), provider downtime (retries next cycle), model switching (old embeddings deleted by hash), large backlogs (intentional backpressure at ~100/min), empty DB (returns immediately).

Next Steps

Implementation in order: types → memory-config → new embedding-tracker module → daemon wiring. Verification via build, typecheck, lint, then manual testing with stale embeddings.