Skip to content

Decision caching (LRU / content-hash based) #34

@temp-noob

Description

@temp-noob

Why

Every tool call currently goes through the full evaluation pipeline including potentially an HTTP round-trip to an LLM provider. For agents making 50-200 tool calls per task, many calls are repeated or near-identical (e.g., reading the same set of files).

What

Cache semantic evaluation results to avoid redundant LLM calls.

Acceptance Criteria

  • Cache keyed by content hash of (tool_name, sorted arguments, task_context)
  • LRU cache with configurable max size (default: 256)
  • Configurable TTL (default: 300 seconds)
  • Static checks always run (fast, no caching needed)
  • Only semantic evaluation results are cached
  • Cache hit/miss logged in audit decision
  • Cache stats available via metrics (if Add OpenTelemetry spans and decision/provider metrics #4 OTEL is implemented)
  • Tests: cache hit, cache miss, TTL expiry, LRU eviction

Related

Partially addresses #23 (latency budget).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions