Why
Every tool call goes through: JSON parse → static checks → HTTP round-trip to Ollama/LiteLLM → regex parse → decision. In enforce mode, this adds latency to every single agent action. For a coding agent making 50-200 tool calls per task, even 500ms per call adds 25-100 seconds of overhead.
There are no benchmarks, no latency SLOs, no caching of repeated identical calls, and no async/batch evaluation path.
Acceptance Criteria
- Publish p50/p95/p99 latency benchmarks for static-only and static+semantic paths
- Add decision caching (LRU or content-hash based) for repeated identical tool calls
- Add async evaluation option for non-blocking advisory mode
- Document latency budget expectations for enterprise deployments
Enterprise impact
Developer experience is the #1 adoption killer. If IntentGuard makes agents noticeably slower, teams will disable it.
Why
Every tool call goes through: JSON parse → static checks → HTTP round-trip to Ollama/LiteLLM → regex parse → decision. In enforce mode, this adds latency to every single agent action. For a coding agent making 50-200 tool calls per task, even 500ms per call adds 25-100 seconds of overhead.
There are no benchmarks, no latency SLOs, no caching of repeated identical calls, and no async/batch evaluation path.
Acceptance Criteria
Enterprise impact
Developer experience is the #1 adoption killer. If IntentGuard makes agents noticeably slower, teams will disable it.