diff --git a/blog/openclaw-budget-guard-five-dollar-agent.md b/blog/openclaw-budget-guard-five-dollar-agent.md new file mode 100644 index 0000000..097ed6b --- /dev/null +++ b/blog/openclaw-budget-guard-five-dollar-agent.md @@ -0,0 +1,186 @@ +--- +title: "We Gave Our OpenClaw Agent a $5 Budget and Watched It Adapt" +date: 2026-03-28 +author: Albert Mavashev +tags: [openclaw, budgets, agents, graceful-degradation, model-downgrade, production, cost-control, ai-agent-cost, llm-cost-management] +description: "A representative OpenClaw research session that would have cost $12 is constrained to a $5 Cycles budget. The agent downgrades models, disables expensive tools, self-regulates via prompt hints, and finishes for $4.85." +blog: true +sidebar: false +--- + +# We Gave Our OpenClaw Agent a $5 Budget and Watched It Adapt + +Too many AI agent cost controls are kill switches. Budget runs out, agent dies mid-task, user gets nothing. [Cycles](https://runcycles.io) does something different: it makes the agent *adapt*. + +A research agent running on OpenClaw picks up a complex competitive analysis. It starts with Claude Opus to draft the report, calls web search to find market data, runs code execution to build charts, and iterates. Normal sessions cost $2–4. This one is harder — it needs 3x the usual tool calls. + +Without budget enforcement, the session would have cost roughly $12. The agent doesn't know or care. It calls whatever model and tool the task needs, and the bill arrives later. + +We set a $5 budget using the [`cycles-openclaw-budget-guard`](https://github.com/runcycles/cycles-openclaw-budget-guard) plugin and let it run. It didn't stop. It *adapted*. + +When the session crossed the $1.50 low-budget threshold, the plugin downgraded from Opus to Sonnet. As budget tightened further, it blocked expensive tools like code execution and injected budget hints into the system prompt. The model responded by writing shorter outputs and skipping optional searches. The task finished with $0.15 remaining — $4.85 total instead of $12. + +That's the difference between a kill switch and [runtime authority](/blog/what-is-runtime-authority-for-ai-agents). + +> **TL;DR:** Install the plugin, set a budget, and your OpenClaw agent automatically downgrades models, disables expensive tools, and self-regulates when budget gets tight — instead of crashing. + +*Note: The session described below is a representative walkthrough based on real plugin behavior with realistic cost estimates. The numbers, logs, and config are all producible with the plugin — we've simplified the narrative for clarity, but nothing is fabricated.* + + + +## What the logs looked like + +Here's the plugin output from that session, at info level — no debug mode needed: + +``` +Cycles Budget Guard for OpenClaw v0.7.5 + tenant: research-team + defaultModelName: claude-opus-4-20250514 + failClosed: true + lowBudgetThreshold: 150000000 + +Model reserved: claude-opus-4-20250514 (estimate=15000000, remaining=500000000) +Model committed: claude-opus-4-20250514 (cost=15000000 USD_MICROCENTS) +Tool reserved: web_search (estimate=5000000, remaining=485000000) +Tool committed: web_search (cost=5000000 USD_MICROCENTS) +Model reserved: claude-opus-4-20250514 (estimate=15000000, remaining=480000000) +Model committed: claude-opus-4-20250514 (cost=15000000 USD_MICROCENTS) +... +Budget level changed: healthy → low (remaining=150000000) +Budget low — downgrading model claude-opus-4-20250514 → claude-sonnet-4-20250514 +Model reserved: claude-sonnet-4-20250514 (estimate=3000000, remaining=147000000) +... +Tool "code_execution" blocked: cost 10000000 exceeds expensive threshold 5000000 +... +Model committed: claude-sonnet-4-20250514 (cost=3000000 USD_MICROCENTS) +Agent session budget summary: remaining=15000000 spent=485000000 reservations=34 +``` + +Every reservation, commit, downgrade, and block is visible. No digging through provider dashboards. This is what AI agent cost management looks like when it's built into the execution lifecycle — not bolted on after the fact. + +## What the agent saw + +When budget first crossed the `lowBudgetThreshold` ($1.50), the plugin triggered model downgrade and tool blocking. Later in the same session, with only 7% of budget remaining, the plugin injected this into the system prompt: + +``` +Budget: 35000000 USD_MICROCENTS remaining. Budget is low — prefer cheaper models +and avoid expensive tools. 7% of budget remaining. Est. ~11 tool calls and +~3 model calls remaining at current rate. Limit responses to 1024 tokens. +``` + +The model responded to this signal by reducing optional web searches, writing tighter prose, and skipping the summary paragraph it usually generates. We did not hardcode any task-specific fallback behavior — the model adapted to the budget constraint on its own, like it adapts to other system prompt instructions. + +This is the part that surprises most teams: **budget-aware agents tend to be more disciplined and less wasteful.** When the model knows resources are limited, it focuses. Fewer tangents, less padding, more direct answers. The prompt hint turns a blunt cost limit into a soft constraint the model can reason about. + +## What the session summary told us + +```json +{ + "remaining": 15000000, + "spent": 485000000, + "costBreakdown": { + "model:claude-opus-4-20250514": { "count": 8, "totalCost": 120000000 }, + "model:claude-sonnet-4-20250514": { "count": 14, "totalCost": 42000000 }, + "tool:web_search": { "count": 9, "totalCost": 45000000 }, + "tool:code_execution": { "count": 3, "totalCost": 30000000 } + }, + "unconfiguredTools": [ + { "name": "read_file", "callCount": 4, "estimatedTotalCost": 4000000 } + ] +} +``` + +Three things jumped out: + +1. **Opus cost $1.20 for 8 calls. Sonnet cost $0.42 for 14 calls.** Sonnet handled nearly twice as many calls for a third of the cost. In our testing, output quality was comparable for this type of task. + +2. **Code execution was blocked after 3 calls.** Each call cost $0.10. The `disable_expensive_tools` strategy kicked in at low budget. The agent compensated by describing the analysis in text instead of generating charts. + +3. **`read_file` was unconfigured.** The session summary flagged it — 4 calls using the default estimate. Now we know to add it to `toolBaseCosts`. + +## Three patterns we observed + +Running this config across multiple test sessions, three patterns emerged that changed how we think about LLM cost management. + +### Model downgrade is usually invisible + +Sonnet's output quality for research and analysis tasks is comparable to Opus in most cases. In our test sessions, the downgraded outputs were difficult to distinguish from the Opus-generated ones. The 5x cost reduction was measurable; the quality difference was hard to detect. + +The key is configuring the fallback chain correctly. `"claude-opus-4-20250514": ["claude-sonnet-4-20250514", "claude-haiku-4-5-20251001"]` gives the plugin two steps to try. It picks the cheapest model that fits within the remaining budget. + +### Tool limits catch more bugs than budget limits + +A `toolCallLimits: { "web_search": 20 }` caught a search loop that budget enforcement alone would have allowed to continue. Each search cost $0.05 — cheap individually, but 200 of them would have burned $10 on a single tool. The limit fired at call #21 and the agent adapted by working with the data it already had. + +### The session summary is your tuning guide + +Every session produces a cost breakdown. After a few days, patterns are obvious: which tools are overpriced in your estimates, which models are being downgraded too aggressively, which tools need explicit `toolCallLimits`. The `unconfiguredTools` list is a concrete TODO — no guessing about what to configure next. + +## What we'd change + +Three things we learned the hard way: + +**Enable `enableEventLog` from day one.** When a session behaves unexpectedly, the event log tells you exactly what happened — which tools were blocked, when models were downgraded, why a reservation was denied. Without it, you're reading tea leaves from the session summary. + +**Model costs are estimates.** The plugin reserves a fixed amount per Opus call regardless of how many tokens are actually used. A short response costs the same as a long one. The `modelCostEstimator` callback can improve this if you have a proxy that tracks token usage, but out of the box, expect ±20% variance. + +**OpenClaw doesn't pass the model name in hook events.** We had to add `defaultModelName` to the config because the `before_model_resolve` event only contains `{ prompt }`. We've filed a [feature request](https://github.com/openclaw/openclaw/issues/55771) — until it's resolved, set `defaultModelName` to your agent's model. + +## The config that made it work + +```json +{ + "plugins": { + "entries": { + "openclaw-budget-guard": { + "config": { + "cyclesBaseUrl": "${CYCLES_BASE_URL}", + "cyclesApiKey": "${CYCLES_API_KEY}", + "tenant": "research-team", + "defaultModelName": "claude-opus-4-20250514", + "modelFallbacks": { + "claude-opus-4-20250514": ["claude-sonnet-4-20250514", "claude-haiku-4-5-20251001"] + }, + "modelBaseCosts": { + "claude-opus-4-20250514": 15000000, + "claude-sonnet-4-20250514": 3000000, + "claude-haiku-4-5-20251001": 1000000 + }, + "toolBaseCosts": { + "web_search": 5000000, + "code_execution": 10000000, + "read_file": 1000000 + }, + "toolCallLimits": { + "web_search": 20, + "code_execution": 10 + }, + "lowBudgetStrategies": ["downgrade_model", "reduce_max_tokens", "disable_expensive_tools"], + "maxTokensWhenLow": 1024, + "expensiveToolThreshold": 5000000, + "lowBudgetThreshold": 150000000, + "failClosed": true + } + } + } + } +} +``` + +> **New to Cycles?** [Cycles](https://runcycles.io) is an open-source runtime authority system for AI agents. It enforces budgets, action limits, and resource boundaries — before execution, not after. The [`cycles-openclaw-budget-guard`](https://github.com/runcycles/cycles-openclaw-budget-guard) plugin brings Cycles to OpenClaw without changing agent logic. See [What is Cycles?](/quickstart/what-is-cycles) to learn more. + +## Try it + +```bash +openclaw plugins install @runcycles/openclaw-budget-guard +``` + +Start with [dry-run mode](/how-to/integrating-cycles-with-openclaw#try-it-without-a-server) to see degradation without a Cycles server. Then [deploy the full stack](/quickstart/deploying-the-full-cycles-stack) and watch your agent adapt instead of crash. Full documentation: [Integrating Cycles with OpenClaw](/how-to/integrating-cycles-with-openclaw). Source: [github.com/runcycles/cycles-openclaw-budget-guard](https://github.com/runcycles/cycles-openclaw-budget-guard). + +## Related reading + +- [Your OpenClaw Agent Has No Spending Limit — Here's How to Fix That](/blog/openclaw-budget-guard-stop-agents-burning-money) — the first post in this series, covering the five problems the plugin solves +- [Your AI Agent Just Burned $6 in 30 Seconds](/blog/runaway-demo-agent-cost-blowup-walkthrough) — step-by-step walkthrough of a runaway agent demo with Cycles +- [AI Agent Budget Control: Enforce Hard Spend Limits](/blog/ai-agent-budget-control-enforce-hard-spend-limits) — why cost control must happen before execution +- [Degradation Paths in Cycles](/how-to/how-to-think-about-degradation-paths-in-cycles-deny-downgrade-disable-or-defer) — deny, downgrade, disable, or defer +- [How Much Do AI Agents Cost?](/blog/how-much-do-ai-agents-cost) — the economics of agent execution diff --git a/blog/openclaw-plugin-lessons-learned.md b/blog/openclaw-plugin-lessons-learned.md new file mode 100644 index 0000000..c94c8d2 --- /dev/null +++ b/blog/openclaw-plugin-lessons-learned.md @@ -0,0 +1,175 @@ +--- +title: "Five Lessons from Building a Production OpenClaw Plugin" +date: 2026-03-28 +author: Albert Mavashev +tags: [openclaw, plugins, engineering, hooks, workarounds, developer-experience, production, openclaw-plugin-development] +description: "We built a budget enforcement plugin for OpenClaw and hit five undocumented behaviors — including the discovery that you can't actually block a model call. Here are the workarounds we shipped and the feature requests we filed." +blog: true +sidebar: false +--- + +# Five Lessons from Building a Production OpenClaw Plugin + +We built a non-trivial [budget enforcement plugin](https://github.com/runcycles/cycles-openclaw-budget-guard) for OpenClaw and ran into several behaviors that were not obvious from the public plugin surface: missing model metadata, no clean way to block model calls, install-time config validation traps, and a security-scanner false positive. The most surprising discovery: OpenClaw's `before_model_resolve` hook has no way to prevent a model call — we had to redirect to a fake model name to force a provider-side rejection. + +This post is a practical writeup of the five issues that mattered most, the workarounds we shipped, and the feature requests we filed. + +*None of this is a complaint about OpenClaw. The platform is well-designed and the hook lifecycle is the right abstraction. These are field notes from building a production plugin, shared so other developers don't have to rediscover the same things.* + + + +## Lesson 1: The model name isn't in the model resolve event + +The `before_model_resolve` hook is called before the LLM provider is invoked. You'd expect the event to include which model is being resolved. It doesn't. + +```typescript +// What we expected +interface BeforeModelResolveEvent { + model: string; + prompt: string; +} + +// What OpenClaw actually passes +interface BeforeModelResolveEvent { + prompt: string; // that's it +} +``` + +We discovered this by logging `Object.keys(event)` — which returned `["prompt"]`. No `model`, `modelId`, `modelName`, `model_id`, or any variant. + +**Why it matters:** Our plugin needs the model name to look up per-model cost estimates, apply fallback chains (Opus → Sonnet → Haiku), and track per-model spend in the session summary. Without it, budget enforcement for models is blind. + +**Workaround:** We added a `defaultModelName` config property and a multi-source auto-detection chain that checks `api.config`, `api.pluginConfig`, and several nested paths: + +```typescript +const eventModel = event.model + ?? (event as Record).modelId + ?? (event as Record).modelName + ?? (ctx.metadata as Record)?.model + ?? config.defaultModelName; +``` + +If none of those resolve, the plugin logs the available keys at info level so operators can configure `defaultModelName`: + +``` +before_model_resolve: cannot determine model name. +Event keys: [prompt]. Metadata keys: []. +Set defaultModelName in plugin config. +``` + +**Feature request:** [openclaw/openclaw#55771](https://github.com/openclaw/openclaw/issues/55771) — include `model` and `provider` in the `before_model_resolve` event. + +## Lesson 2: You can't cleanly block a model call + +OpenClaw's `before_tool_call` hook has clean blocking semantics: + +```typescript +// Tool hooks support this — works perfectly +return { block: true, blockReason: "Budget exhausted" }; +``` + +The `before_model_resolve` hook has no equivalent. The return type only supports `{ modelOverride?, providerOverride? }`. There is no `block` field and no `shouldStop` policy in the hook runner. + +When our plugin throws `BudgetExhaustedError`, OpenClaw catches it (the default `catchErrors: true` behavior), logs "handler failed," and proceeds with the model call. The agent gets a response. Budget enforcement is bypassed. + +**Workaround:** We redirect to a non-existent model. When budget is exhausted, the plugin returns: + +```typescript +return { modelOverride: "__cycles_budget_exhausted__" }; +``` + +OpenClaw passes this to the LLM provider, which rejects it (`model not found`). The provider rejects the call before generation, so the agent produces no response. The user sees: + +``` +⚠ Agent failed before reply: Unknown model: openai/__cycles_budget_exhausted__ +``` + +Not pretty, but the budget is enforced. The model call costs nothing because the provider never executes it. + +**Feature request:** We've asked for `block` support in `before_model_resolve`, matching the `before_tool_call` pattern. + +## Lesson 3: Your plugin initializes multiple times + +A smaller but confusing runtime behavior: OpenClaw calls the plugin's default export once per internal channel or worker — typically 4–5 times on startup. Each instance gets its own isolated state, which is correct for concurrency. But our startup banner printed 5 times and it looked broken. + +**Workaround:** A module-level `startupBannerShown` flag shows the full config banner once; subsequent inits get a one-liner with a sequential instance counter: `Cycles Budget Guard initialized (tenant=cyclist, dryRun=false, instance=3)`. + +## Lesson 4: process.env triggers a security warning + +OpenClaw's plugin installer scans the bundled `dist/index.js` for dangerous code patterns. Our plugin read `process.env.CYCLES_API_KEY` as a config fallback, and the same bundle contained `fetch()` calls for webhook delivery and OTLP metrics. + +The scanner flagged this combination: + +``` +WARNING: Plugin "openclaw-budget-guard" contains dangerous code patterns: +Environment variable access combined with network send — possible +credential harvesting +``` + +This is a false positive — we read the API key to authenticate with the Cycles server, not to exfiltrate it. But users see "dangerous code patterns" during `openclaw plugins install` and understandably hesitate. + +**Workaround:** We removed all `process.env` access from the plugin. Both `cyclesBaseUrl` and `cyclesApiKey` are now required in the plugin config. For secrets management, we document OpenClaw's built-in env var interpolation: + +```json +{ + "cyclesBaseUrl": "${CYCLES_BASE_URL}", + "cyclesApiKey": "${CYCLES_API_KEY}" +} +``` + +OpenClaw resolves `${...}` before passing config to the plugin, so the env var access happens in OpenClaw's trusted code — not in the scanned plugin bundle. + +Verification: `grep -c process.env dist/index.js` returns `0`. + +## Lesson 5: The plugin contract has undocumented rules + +Several behaviors of the OpenClaw plugin system are not documented but are critical to get right: + +**`api.pluginConfig` vs `api.config`:** Your plugin config is on `api.pluginConfig` (from `plugins.entries..config` in `openclaw.json`). We initially read `api.config` — which is the *full system config* — and couldn't figure out why our settings were always undefined. + +**Manifest `id` derivation:** The `id` field in `openclaw.plugin.json` must match what OpenClaw derives from the npm package name. For `@runcycles/openclaw-budget-guard`, OpenClaw strips the scope and gets `openclaw-budget-guard`. Our manifest originally said `cycles-openclaw-budget-guard` — a mismatch warning on every load. + +**Config validation timing:** If your `configSchema` includes `required` fields, OpenClaw validates during `openclaw plugins install` — before the user has written any config. We had `required: ["tenant"]` which crashed the install. Fix: remove `required` from the schema and validate at runtime in your `resolveConfig()`. + +**Install-time loading:** OpenClaw loads and executes the plugin during install to inspect it. If your plugin throws on missing config, the install fails with a confusing error. Wrap your initialization in try/catch and log a friendly message: + +```typescript +try { + config = resolveConfig(raw); +} catch (err) { + api.logger.warn(`[openclaw-budget-guard] Skipping registration: ${err.message}`); + return; +} +``` + +## What OpenClaw gets right + +This post focuses on rough edges, but the foundation is solid: + +- **The 5-hook lifecycle is well-designed.** `before_model_resolve` → `before_prompt_build` → `before_tool_call` → `after_tool_call` → `agent_end` covers the full agent execution lifecycle. You can build meaningful enforcement without modifying agent code. +- **`before_tool_call` blocking is clean.** `{ block: true, blockReason }` with `shouldStop` is exactly the right pattern. We just want the same for model calls. +- **Plugin isolation per channel is correct.** Each channel gets its own plugin instance with its own state. No shared-state bugs across concurrent sessions. +- **`api.logger` integration works well.** Plugin log output appears in OpenClaw's log stream with proper prefixes and levels. +- **The install/enable flow is simple.** `openclaw plugins install` + `openclaw plugins enable` — two commands and you're running. + +## What we'd like to see + +These are filed or planned feature requests: + +1. **`block` support in `before_model_resolve`** — same pattern as `before_tool_call` +2. **Model name in `before_model_resolve` event** — `event.model` and `event.provider` ([#55771](https://github.com/openclaw/openclaw/issues/55771)) +3. **`after_model_call` hook** — with `tokensInput`, `tokensOutput`, `latencyMs` for actual cost tracking +4. **Channel/worker ID on the `api` object** — so plugins can differentiate instances in logs +5. **Plugin contract documentation** — `api.pluginConfig` vs `api.config`, manifest `id` rules, config validation timing, install-time behavior + +## Build your own + +If you're building an OpenClaw plugin, start with our source as a reference: [github.com/runcycles/cycles-openclaw-budget-guard](https://github.com/runcycles/cycles-openclaw-budget-guard). The patterns for config resolution, hook registration, state management, and error handling are all used in our released plugin. + +Full integration guide: [Integrating Cycles with OpenClaw](/how-to/integrating-cycles-with-openclaw) + +## Related reading + +- [We Gave Our OpenClaw Agent a $5 Budget and Watched It Adapt](/blog/openclaw-budget-guard-five-dollar-agent) — what graceful degradation looks like in practice +- [Your OpenClaw Agent Has No Spending Limit](/blog/openclaw-budget-guard-stop-agents-burning-money) — the five problems the plugin solves +- [Action Authority: Controlling What Agents Do](/concepts/action-authority-controlling-what-agents-do) — why cost limits alone aren't enough