feat(resilience): Agent Immortality Protocol — agentes que nunca morrem (#568) by nikolasdehor · Pull Request #590 · SynkraAI/aiox-core

nikolasdehor · 2026-03-12T18:33:27Z

Summary

Heartbeat monitoring com detecção de falha
State snapshots para recovery
Crash detection com auto-revival automático
Behavioral fingerprint e health score
Cascade protection para evitar falhas em cadeia

Testes

126 testes unitários passando

Reabertura do PR #576 (fechado acidentalmente). Resolve issue #568.

Summary by CodeRabbit

New Features
- Added automatic agent recovery system with crash detection and periodic state snapshots enabling self-healing capabilities
- Introduced comprehensive health monitoring, behavioral anomaly detection, and cascade failure protection

…g para agentes (SynkraAI#568) Implementa o protocolo completo de imortalidade de agentes com heartbeat, snapshots, deteccao de crash, auto-revival, fingerprint comportamental, protecao contra cascata e health score composto. 126 testes unitarios.

vercel · 2026-03-12T18:33:31Z

@nikolasdehor is attempting to deploy a commit to the Pedro Valério Lopez's projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-03-12T18:33:47Z

Walkthrough

This PR introduces the Agent Immortality Protocol, a resilience mechanism enabling agent self-healing and state recovery through heartbeat monitoring, periodic state snapshots, crash detection, and auto-revival capabilities. A compatibility wrapper is provided for backward compatibility, alongside a comprehensive test suite covering all functionality.

Changes

Cohort / File(s)	Summary
Agent Immortality Protocol Implementation `.aiox-core/core/resilience/agent-immortality.js`	Core module introducing the `AgentImmortalityProtocol` class with agent lifecycle management, heartbeat monitoring, snapshot creation and persistence, crash detection, auto-revival, behavioral fingerprinting with anomaly detection, health scoring, cascade protection via dependency tracking, and event emission for key milestones.
Compatibility Wrapper `.aios-core/core/resilience/agent-immortality.js`	Retro-compatibility module that re-exports the canonical implementation from `.aiox-core/core/resilience/agent-immortality`, enabling existing import paths to resolve correctly without code duplication.
Manifest Update `.aiox-core/install-manifest.yaml`	Timestamp refresh, registration of new core/resilience/agent-immortality.js file, removal of development/tasks/review-prs.md entry, and size value updates across manifest entries.
Test Suite `tests/core/resilience/agent-immortality.test.js`	Comprehensive unit test coverage exercising exports, constructor behavior, agent lifecycle, monitoring, heartbeats, snapshots, revival mechanisms, health scoring, fingerprinting, anomaly detection, cascade protection, persistence, and error handling.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant Protocol as AgentImmortalityProtocol
    participant Monitor as Heartbeat Monitor
    participant Snapshots as Snapshot Persistence
    participant Disk as Disk Storage

    Agent->>Protocol: registerAgent(agentId)
    Protocol->>Protocol: Initialize agent state
    
    Protocol->>Monitor: startMonitoring(agentId)
    Monitor->>Monitor: Start interval check
    
    loop Periodic Heartbeats
        Agent->>Protocol: heartbeat(agentId, stateData)
        Protocol->>Protocol: Update fingerprint & health
        Protocol->>Snapshots: createSnapshot(agentId)
        Snapshots->>Disk: Persist snapshot
    end
    
    loop Monitor Check Interval
        Monitor->>Protocol: Check last heartbeat
        alt Heartbeat missed beyond grace
            Protocol->>Protocol: Mark agent as DEAD
            Protocol->>Protocol: Emit death-detected event
            Protocol->>Protocol: Queue auto-revival
            Protocol->>Snapshots: getLatestSnapshot(agentId)
            Snapshots->>Disk: Load snapshot
            Protocol->>Agent: reviveAgent(agentId)
            Agent->>Agent: Restore from snapshot
            Agent->>Protocol: heartbeat (resumed)
        else Recent heartbeat
            Protocol->>Protocol: Update health score
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly describes the main change: introduction of an Agent Immortality Protocol for resilience. It accurately reflects the primary addition of a comprehensive agent lifecycle management system with heartbeat monitoring, state recovery, and health scoring.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

.aios-core/core/resilience/agent-immortality.js (1)
1-2: Prefer the project's absolute import form for this shim.

This wrapper hardcodes a repo-relative hop into .aiox-core, so the compatibility layer depends on the current directory layout. Re-export from the package's absolute internal path instead of ../../../....

As per coding guidelines, "Use absolute imports instead of relative imports in all code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.aios-core/core/resilience/agent-immortality.js around lines 1 - 2, The shim
currently uses a relative require in module.exports
(require('../../../.aiox-core/core/resilience/agent-immortality')) which couples
it to repo layout; change the export to re-export the package's absolute
internal path instead (use the package's absolute import for the internal
module) so module.exports =
require('<package-name>/core/resilience/agent-immortality') (replace
<package-name> with the actual package identifier) to follow the project's
absolute-import guideline.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.aiox-core/core/resilience/agent-immortality.js:
- Around line 672-696: The saved JSON only contains summary metadata and
loadState() must rebuild in-memory structures so recovery works; modify
loadState to, when reading the saved payload, rehydrate this.agents (creating
Agent objects or restoring their full state fields including snapshots array,
revivalHistory array, lastSnapshot, registeredAt, lastHeartbeat, errorCount,
revivalCount, healthScore, and per-agent config/fingerprint baselines),
repopulate this._dependencies from data.dependencies, and restore any snapshot
storage indexes/refs so reviveAgent(agentId) can find snapshots and history;
update any helper methods (e.g., _calculateHealthScore) to accept restored
agents and ensure reviveAgent, snapshot handling, and fingerprint baseline
lookup use the rehydrated agent instances instead of the summary JSON.
- Around line 599-610: declareDependency currently allows creating transitive
cycles (e.g., adding A->B when B already depends on A); before pushing
dependsOnId into this._dependencies for agentId, call the internal traversal
(e.g., _findDependents(dependsOnId)) to see if it already reaches agentId and if
so throw an Error like "Declaring this dependency would create a cycle"; update
declareDependency to perform this check and refuse the mutation so
getCascadeRisk and dependents remain correct.
- Around line 663-700: saveState currently assigns a rejected promise to
this._saveQueue which leaves the queue permanently rejected; fix saveState by
wrapping the chained async callback (the function passed to
this._saveQueue.then) in a try/catch: perform the directory creation, data
prepare and fs.writeFileSync inside try, and in catch reset the queue to a
resolved promise (e.g. this._saveQueue = Promise.resolve()) so future calls to
saveState/_persistSnapshot can continue, then rethrow or log the error to
preserve error visibility; reference the saveState method and the
this._saveQueue field when making this change.

In @.aiox-core/install-manifest.yaml:
- Around line 1039-1042: The install manifest is missing the compatibility shim
entry for the retrocompat file; add a manifest entry for
".aios-core/core/resilience/agent-immortality.js" (matching the actual shim file
you added) alongside the existing "core/resilience/agent-immortality.js" entry,
supplying the correct sha256 hash, size and type so brownfield upgrades will
install the shim path and preserve existing imports.

In `@tests/core/resilience/agent-immortality.test.js`:
- Around line 974-987: The test is a no-op because it never asserts that the
cascade event or status changes actually occurred; update the spec around
protocol.registerAgent, protocol.declareDependency, protocol.getCascadeRisk, and
the Events.CASCADE_RISK listener to assert observable behavior: attach the
handler via protocol.on(Events.CASCADE_RISK, handler) and
expect(handler).toHaveBeenCalledTimes(1) (and/or toHaveBeenCalledWith(...)
validating payload), and for crash-detection/revival cases advance timers and
then assert agent.status (AgentStatus.DEAD or AgentStatus.ALIVE) and any emitted
Events (e.g., revive/crash events) were fired so the test fails if core
failure/recovery paths regress.

---

Nitpick comments:
In @.aios-core/core/resilience/agent-immortality.js:
- Around line 1-2: The shim currently uses a relative require in module.exports
(require('../../../.aiox-core/core/resilience/agent-immortality')) which couples
it to repo layout; change the export to re-export the package's absolute
internal path instead (use the package's absolute import for the internal
module) so module.exports =
require('<package-name>/core/resilience/agent-immortality') (replace
<package-name> with the actual package identifier) to follow the project's
absolute-import guideline.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 531d113d-b04f-4ff2-915a-4e0f3c954dcd

📥 Commits

Reviewing files that changed from the base of the PR and between f74e3e7 and f943268.

📒 Files selected for processing (4)

.aios-core/core/resilience/agent-immortality.js
.aiox-core/core/resilience/agent-immortality.js
.aiox-core/install-manifest.yaml
tests/core/resilience/agent-immortality.test.js

coderabbitai · 2026-03-12T18:45:48Z

.aiox-core/core/resilience/agent-immortality.js

+  declareDependency(agentId, dependsOnId) {
+    this._assertAgentExists(agentId);
+    this._assertAgentExists(dependsOnId);
+
+    if (agentId === dependsOnId) {
+      throw new Error('An agent cannot depend on itself');
+    }
+
+    const deps = this._dependencies.get(agentId) ?? [];
+    if (!deps.includes(dependsOnId)) {
+      deps.push(dependsOnId);
+      this._dependencies.set(agentId, deps);


⚠️ Potential issue | 🟠 Major

Reject transitive dependency cycles here.

declareDependency('A', 'B') currently allows the reverse edge when B already depends on A. In that state, _findDependents('A') returns ['B', 'A'], so the target becomes its own dependent and getCascadeRisk() overstates cascade risk.

Suggested guard

if (agentId === dependsOnId) { throw new Error('An agent cannot depend on itself'); } + if (this._findDependents(agentId).includes(dependsOnId)) { + throw new Error( + `Declaring "${agentId}" -> "${dependsOnId}" would create a dependency cycle` + ); + } const deps = this._dependencies.get(agentId) ?? [];

As per coding guidelines, "Check for proper input validation on public API methods".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.aiox-core/core/resilience/agent-immortality.js around lines 599 - 610, declareDependency currently allows creating transitive cycles (e.g., adding A->B when B already depends on A); before pushing dependsOnId into this._dependencies for agentId, call the internal traversal (e.g., _findDependents(dependsOnId)) to see if it already reaches agentId and if so throw an Error like "Declaring this dependency would create a cycle"; update declareDependency to perform this check and refuse the mutation so getCascadeRisk and dependents remain correct.

coderabbitai · 2026-03-12T18:45:49Z

.aiox-core/core/resilience/agent-immortality.js

+  async saveState() {
+    this._saveQueue = this._saveQueue.then(async () => {
+      const filePath = path.resolve(this.projectRoot, this.config.stateFile);
+      const dir = path.dirname(filePath);
+
+      if (!fs.existsSync(dir)) {
+        fs.mkdirSync(dir, { recursive: true });
+      }
+
+      const data = {
+        schemaVersion: this.config.schemaVersion,
+        savedAt: new Date().toISOString(),
+        agents: {},
+        dependencies: {},
+      };
+
+      for (const [id, agent] of this.agents.entries()) {
+        data.agents[id] = {
+          id: agent.id,
+          status: agent.status,
+          registeredAt: agent.registeredAt,
+          lastHeartbeat: agent.lastHeartbeat,
+          lastSnapshot: agent.lastSnapshot,
+          errorCount: agent.errorCount,
+          snapshotCount: agent.snapshots.length,
+          revivalCount: agent.revivalHistory.length,
+          healthScore: this._calculateHealthScore(agent),
+        };
+      }
+
+      for (const [id, deps] of this._dependencies.entries()) {
+        data.dependencies[id] = [...deps];
+      }
+
+      fs.writeFileSync(filePath, JSON.stringify(data, null, 2), 'utf-8');
+    });
+
+    await this._saveQueue;


⚠️ Potential issue | 🟠 Major

A single saveState() failure bricks later persistence.

saveState() assigns the raw chained promise back to _saveQueue and awaits it, but never resets the queue on rejection. After one filesystem error, _saveQueue stays rejected forever; every later saveState() and _persistSnapshot() call chains off that rejected promise and is skipped.

One way to keep the queue usable after a failed write

async saveState() { - this._saveQueue = this._saveQueue.then(async () => { + const op = this._saveQueue.catch(() => {}).then(async () => { const filePath = path.resolve(this.projectRoot, this.config.stateFile); const dir = path.dirname(filePath); ... fs.writeFileSync(filePath, JSON.stringify(data, null, 2), 'utf-8'); }); - - await this._saveQueue; + this._saveQueue = op.catch(() => {}); + await op; }

As per coding guidelines, "Verify error handling is comprehensive with proper try/catch and error context".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.aiox-core/core/resilience/agent-immortality.js around lines 663 - 700, saveState currently assigns a rejected promise to this._saveQueue which leaves the queue permanently rejected; fix saveState by wrapping the chained async callback (the function passed to this._saveQueue.then) in a try/catch: perform the directory creation, data prepare and fs.writeFileSync inside try, and in catch reset the queue to a resolved promise (e.g. this._saveQueue = Promise.resolve()) so future calls to saveState/_persistSnapshot can continue, then rethrow or log the error to preserve error visibility; reference the saveState method and the this._saveQueue field when making this change.

coderabbitai · 2026-03-12T18:45:49Z

.aiox-core/core/resilience/agent-immortality.js

+      const data = {
+        schemaVersion: this.config.schemaVersion,
+        savedAt: new Date().toISOString(),
+        agents: {},
+        dependencies: {},
+      };
+
+      for (const [id, agent] of this.agents.entries()) {
+        data.agents[id] = {
+          id: agent.id,
+          status: agent.status,
+          registeredAt: agent.registeredAt,
+          lastHeartbeat: agent.lastHeartbeat,
+          lastSnapshot: agent.lastSnapshot,
+          errorCount: agent.errorCount,
+          snapshotCount: agent.snapshots.length,
+          revivalCount: agent.revivalHistory.length,
+          healthScore: this._calculateHealthScore(agent),
+        };
+      }
+
+      for (const [id, deps] of this._dependencies.entries()) {
+        data.dependencies[id] = [...deps];
+      }
+


⚠️ Potential issue | 🟠 Major

loadState() doesn't restore anything the protocol needs for recovery.

The serialized payload only keeps summary metadata, and loadState() just returns that JSON without rebuilding this.agents, this._dependencies, snapshots, revival history, fingerprint baselines, or per-agent config. After a process restart, reviveAgent() still has no in-memory snapshot/history to work with, so the advertised disk-backed recovery path never actually comes back online.

Also applies to: 707-719

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.aiox-core/core/resilience/agent-immortality.js around lines 672 - 696, The saved JSON only contains summary metadata and loadState() must rebuild in-memory structures so recovery works; modify loadState to, when reading the saved payload, rehydrate this.agents (creating Agent objects or restoring their full state fields including snapshots array, revivalHistory array, lastSnapshot, registeredAt, lastHeartbeat, errorCount, revivalCount, healthScore, and per-agent config/fingerprint baselines), repopulate this._dependencies from data.dependencies, and restore any snapshot storage indexes/refs so reviveAgent(agentId) can find snapshots and history; update any helper methods (e.g., _calculateHealthScore) to accept restored agents and ensure reviveAgent, snapshot handling, and fingerprint baseline lookup use the rehydrated agent instances instead of the summary JSON.

coderabbitai · 2026-03-12T18:45:49Z

.aiox-core/install-manifest.yaml

+  - path: core/resilience/agent-immortality.js
+    hash: sha256:89ae4bac066088e76071cfc9b391418e9eba804bcc2b2f943edb1ce38974735c
+    type: core
+    size: 37573


⚠️ Potential issue | 🟠 Major

The compatibility shim is missing from the install manifest.

This manifest adds core/resilience/agent-immortality.js, but the new retrocompat file at .aios-core/core/resilience/agent-immortality.js is not listed anywhere in the generated manifest. Brownfield upgrades driven by this file will install the canonical module without the shim, so existing imports still break.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.aiox-core/install-manifest.yaml around lines 1039 - 1042, The install manifest is missing the compatibility shim entry for the retrocompat file; add a manifest entry for ".aios-core/core/resilience/agent-immortality.js" (matching the actual shim file you added) alongside the existing "core/resilience/agent-immortality.js" entry, supplying the correct sha256 hash, size and type so brownfield upgrades will install the shim path and preserve existing imports.

coderabbitai · 2026-03-12T18:45:49Z

tests/core/resilience/agent-immortality.test.js

+    it('should emit cascade-risk for high/critical risks', () => {
+      protocol.registerAgent('agent-1');
+      protocol.registerAgent('agent-2');
+      protocol.declareDependency('agent-2', 'agent-1');
+
+      const agent = protocol.agents.get('agent-1');
+      agent.status = AgentStatus.DEAD;
+
+      const handler = jest.fn();
+      protocol.on(Events.CASCADE_RISK, handler);
+      protocol.getCascadeRisk('agent-1');
+      // 1 dependent + dead = high (nao critical)
+      // high emite cascade-risk
+    });


⚠️ Potential issue | 🟠 Major

Several of these tests are currently no-ops.

The cascade-risk case never asserts that the handler fired, and the crash-detection/revival cases only advance timers or read locals without checking status or emitted events. Those tests will stay green even if the core failure/recovery path regresses.

Example assertions to make these cases observable

protocol.on(Events.CASCADE_RISK, handler); protocol.getCascadeRisk('agent-1'); - // 1 dependent + dead = high (nao critical) - // high emite cascade-risk + expect(handler).toHaveBeenCalledTimes(1); + expect(handler.mock.calls[0][0]).toMatchObject({ + agentId: 'agent-1', + riskLevel: 'high', + }); @@ jest.advanceTimersByTime(2000); - - const agent = protocol.agents.get('agent-1'); - // O agente pode estar como SUSPECT + expect(protocol.agents.get('agent-1').status).toBe(AgentStatus.SUSPECT); @@ await Promise.resolve(); await Promise.resolve(); + expect(revivalHandler).toHaveBeenCalledTimes(1); + expect(protocol.agents.get('agent-1').status).toBe(AgentStatus.ALIVE);

As per coding guidelines, "Verify test coverage exists for new/modified functions".

Also applies to: 1020-1059

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/core/resilience/agent-immortality.test.js` around lines 974 - 987, The test is a no-op because it never asserts that the cascade event or status changes actually occurred; update the spec around protocol.registerAgent, protocol.declareDependency, protocol.getCascadeRisk, and the Events.CASCADE_RISK listener to assert observable behavior: attach the handler via protocol.on(Events.CASCADE_RISK, handler) and expect(handler).toHaveBeenCalledTimes(1) (and/or toHaveBeenCalledWith(...) validating payload), and for crash-detection/revival cases advance timers and then assert agent.status (AgentStatus.DEAD or AgentStatus.ALIVE) and any emitted Events (e.g., revive/crash events) were fired so the test fails if core failure/recovery paths regress.

coderabbitai bot requested changes Mar 12, 2026

View reviewed changes

This was referenced Mar 13, 2026

[ARCHITECTURE] Project Autogenesis: Automated Resilience & Dynamic Scaling #493

Open

[EPIC] Agent Immortality Protocol: Self-Healing, Memory Persistence & Evolutionary Learning System #482

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(resilience): Agent Immortality Protocol — agentes que nunca morrem (#568)#590

feat(resilience): Agent Immortality Protocol — agentes que nunca morrem (#568)#590
nikolasdehor wants to merge 1 commit intoSynkraAI:mainfrom
nikolasdehor:feat/agent-immortality-protocol

nikolasdehor commented Mar 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 12, 2026

Uh oh!

coderabbitai bot Mar 12, 2026

Uh oh!

coderabbitai bot Mar 12, 2026

Uh oh!

coderabbitai bot Mar 12, 2026

Uh oh!

coderabbitai bot Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

nikolasdehor commented Mar 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testes

Summary by CodeRabbit

Uh oh!

vercel bot commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nikolasdehor commented Mar 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 12, 2026 •

edited

Loading