Skip to content

fix(circuit-breaker): preserve recovery timeout when recordFailure called in OPEN state#470

Open
nikolasdehor wants to merge 1 commit intoSynkraAI:mainfrom
nikolasdehor:fix/circuit-breaker-open-state-timeout
Open

fix(circuit-breaker): preserve recovery timeout when recordFailure called in OPEN state#470
nikolasdehor wants to merge 1 commit intoSynkraAI:mainfrom
nikolasdehor:fix/circuit-breaker-open-state-timeout

Conversation

@nikolasdehor
Copy link
Contributor

@nikolasdehor nikolasdehor commented Feb 23, 2026

Resumo

  • Corrige recordFailure() no CircuitBreaker que sobrescrevia o timeout de recuperação quando chamado no estado OPEN
  • Antes: cada falha no estado OPEN reiniciava recoveryTimeout, impedindo que o circuito transitasse para HALF_OPEN
  • Depois: falhas no estado OPEN são contadas mas o timeout original é preservado

Plano de teste

  • Testes passam localmente
  • Teste de regressão confirma que timeout não é reiniciado em estado OPEN

Copilot AI review requested due to automatic review settings February 23, 2026 01:09
@vercel
Copy link

vercel bot commented Feb 23, 2026

@nikolasdehor is attempting to deploy a commit to the Pedro Valério Lopez's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

Walkthrough

Fixes a bug in the CircuitBreaker module's recordFailure() method to prevent the recovery timeout from being reset while the circuit is in OPEN state, ensuring proper state transitions. Includes a comprehensive test suite validating the state machine behavior and edge cases.

Changes

Cohort / File(s) Summary
CircuitBreaker Bug Fix
.aios-core/core/ids/circuit-breaker.js
Modified recordFailure() to only update _lastFailureTime during CLOSED and HALF_OPEN states, preventing recovery timeout reset while OPEN. Addresses regression where failures during OPEN state indefinitely delayed transition to HALF_OPEN.
CircuitBreaker Test Suite
tests/core/ids/circuit-breaker.test.js
Added comprehensive Jest test suite covering full state machine cycles (CLOSED → OPEN → HALF_OPEN → CLOSED), default and custom thresholds, state transitions, probe behavior in HALF_OPEN, timeout preservation in OPEN state, and edge cases including regression #469.
Manifest Updates
.aios-core/install-manifest.yaml
Updated generated timestamp and file integrity hashes reflecting content changes across the module.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

core, tests, ci/cd

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the specific fix: preserving recovery timeout when recordFailure is called in OPEN state, matching the core bug being addressed.
Linked Issues check ✅ Passed Code changes implement all objectives from #469: preventing _lastFailureTime update in STATE_OPEN, updating it only during transitions, making recordFailure a no-op in STATE_OPEN, and comprehensive tests verify the fix.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the circuit-breaker bug: modifications to recordFailure logic, comprehensive test coverage, and manifest hash updates reflecting the code changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in the CircuitBreaker implementation where calling recordFailure() while the circuit is in the OPEN state would indefinitely reset the recovery timeout, preventing the circuit from ever transitioning to HALF_OPEN. The fix moves the _lastFailureTime update from the top of recordFailure() into the state-specific branches (CLOSED and HALF_OPEN only), making recordFailure() a no-op when the circuit is OPEN.

Changes:

  • Fixed recordFailure() to preserve the recovery timeout when called in OPEN state
  • Added comprehensive unit tests for CircuitBreaker with specific regression test for bug #469
  • Updated install manifest with file hashes and metadata

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
.aios-core/core/ids/circuit-breaker.js Fixed recordFailure() to only update _lastFailureTime and _failureCount in CLOSED and HALF_OPEN states, making it a no-op in OPEN state
tests/core/ids/circuit-breaker.test.js Added comprehensive test suite covering all states, transitions, and edge cases including regression test for issue #469
.aios-core/install-manifest.yaml Updated auto-generated manifest with new hashes and timestamps

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

#
version: 4.2.13
generated_at: "2026-02-17T12:48:43.195Z"
generated_at: "2026-02-23T01:09:10.561Z"
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manifest shows size changes for many files that don't appear to be part of this PR (e.g., multiple development/scripts files, templates, etc.). While the manifest is auto-generated, the PR description and changes only mention circuit-breaker.js and its tests. This suggests either:

  1. The manifest generation included unrelated changes from the working directory
  2. There are unpublished formatting/whitespace changes across many files
  3. The manifest generation logic has changed

Please verify that only the intended files (circuit-breaker.js and its test file) were modified in this PR, and regenerate the manifest if needed to ensure it only reflects the changes in this PR.

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/core/ids/circuit-breaker.test.js (1)

218-228: Consider adding a _failureCount assertion for completeness.

The recordFailure in HALF_OPEN now increments _failureCount (line 101 of the source). The test at lines 223–228 checks _successCount and totalTrips, but doesn't verify _failureCount was incremented. A minor addition for full coverage of the new behavior:

💡 Suggested addition
     test('recordFailure in HALF_OPEN resets successCount and increments totalTrips', () => {
       const tripsBefore = cb.getStats().totalTrips;
+      const failuresBefore = cb.getStats().failureCount;
       cb.recordFailure();
       expect(cb._successCount).toBe(0);
       expect(cb.getStats().totalTrips).toBe(tripsBefore + 1);
+      expect(cb.getStats().failureCount).toBe(failuresBefore + 1);
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/ids/circuit-breaker.test.js` around lines 218 - 228, Add an
assertion to the existing test "recordFailure in HALF_OPEN resets successCount
and increments totalTrips" to also verify that cb._failureCount increments by
one after calling recordFailure(); locate the test block using the test name or
the cb.recordFailure() call and add an expectation that cb._failureCount equals
its previous value plus one (capture previous value before calling
recordFailure()), keeping the existing assertions for cb._successCount and
cb.getStats().totalTrips.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/core/ids/circuit-breaker.test.js`:
- Around line 218-228: Add an assertion to the existing test "recordFailure in
HALF_OPEN resets successCount and increments totalTrips" to also verify that
cb._failureCount increments by one after calling recordFailure(); locate the
test block using the test name or the cb.recordFailure() call and add an
expectation that cb._failureCount equals its previous value plus one (capture
previous value before calling recordFailure()), keeping the existing assertions
for cb._successCount and cb.getStats().totalTrips.

coderabbitai[bot]
coderabbitai bot previously approved these changes Feb 23, 2026
@nikolasdehor
Copy link
Contributor Author

@Pedrovaleriolopez @oalanicolas, esse fix do circuit-breaker (issue #469) tá aberto desde 23/fev. São quase 3 semanas. Estamos entre os contribuidores mais ativos do projeto — 24 PRs abertos, reportamos dezenas de issues, criamos módulos inteiros. Gostaríamos de um retorno, mesmo que seja pra pedir mudanças.

…lled in OPEN state

When the circuit was OPEN, recordFailure() unconditionally updated
_lastFailureTime, resetting the recovery timeout on every call. This
prevented the circuit from ever transitioning to HALF_OPEN in systems
that continued to record failures while the circuit was open.

Fix: only update _lastFailureTime when transitioning to OPEN (from
CLOSED or HALF_OPEN states). When already OPEN, recordFailure is a no-op.

Closes SynkraAI#469
@nikolasdehor nikolasdehor force-pushed the fix/circuit-breaker-open-state-timeout branch from 02042b2 to 8139235 Compare March 13, 2026 00:48
@nikolasdehor
Copy link
Contributor Author

Rebased on latest main and resolved the install-manifest.yaml conflict. Fixed an incorrect module path in the test file as well. All 29 tests pass. This PR fixes #469 — when recordFailure() was called while the circuit breaker was already in OPEN state, it was resetting the recovery timeout, which meant the breaker could never transition to HALF_OPEN for probe attempts. Ready for review.

@github-actions github-actions bot added area: agents Agent system related area: workflows Workflow system related squad mcp type: test Test coverage and quality area: core Core framework (.aios-core/core/) area: installer Installer and setup (packages/installer/) area: synapse SYNAPSE context engine area: cli CLI tools (bin/, packages/aios-pro-cli/) area: pro Pro features (pro/) area: health-check Health check system area: docs Documentation (docs/) labels Mar 13, 2026
@github-actions github-actions bot added the area: devops CI/CD, GitHub Actions (.github/) label Mar 13, 2026
@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@nikolasdehor
Copy link
Contributor Author

@Pedrovaleriolopez, solicitando review deste PR. Aberto desde 23/fev, corrige a issue #469 (CircuitBreaker resetando lastFailureTime em estado OPEN, impedindo transição para HALF_OPEN sob carga).

O PR duplicado #584 (aberto 11/mar) inclui a mesma correção. Este PR está MERGEABLE e pronto para merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: agents Agent system related area: cli CLI tools (bin/, packages/aios-pro-cli/) area: core Core framework (.aios-core/core/) area: devops CI/CD, GitHub Actions (.github/) area: docs Documentation (docs/) area: health-check Health check system area: installer Installer and setup (packages/installer/) area: pro Pro features (pro/) area: synapse SYNAPSE context engine area: workflows Workflow system related mcp squad type: test Test coverage and quality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants