Skip to content

Latency budget and performance benchmarks are missing #23

@temp-noob

Description

@temp-noob

Why

Every tool call goes through: JSON parse → static checks → HTTP round-trip to Ollama/LiteLLM → regex parse → decision. In enforce mode, this adds latency to every single agent action. For a coding agent making 50-200 tool calls per task, even 500ms per call adds 25-100 seconds of overhead.

There are no benchmarks, no latency SLOs, no caching of repeated identical calls, and no async/batch evaluation path.

Acceptance Criteria

  • Publish p50/p95/p99 latency benchmarks for static-only and static+semantic paths
  • Add decision caching (LRU or content-hash based) for repeated identical tool calls
  • Add async evaluation option for non-blocking advisory mode
  • Document latency budget expectations for enterprise deployments

Enterprise impact

Developer experience is the #1 adoption killer. If IntentGuard makes agents noticeably slower, teams will disable it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions