Skip to content

Comments

sdk: fail fast when spawn manager is disabled#387

Open
willwashburn wants to merge 6 commits intomainfrom
fix/sdk-standalone-spawn-defaults
Open

sdk: fail fast when spawn manager is disabled#387
willwashburn wants to merge 6 commits intomainfrom
fix/sdk-standalone-spawn-defaults

Conversation

@willwashburn
Copy link
Member

@willwashburn willwashburn commented Feb 8, 2026

Summary

  • make RelayClient reject pending spawn/release/send-input/list-workers operations immediately when daemon returns ERROR with "SpawnManager not enabled"
  • surface server ERROR frames through onError so callers can handle daemon-side failures directly
  • add SDK client regression test to ensure spawn promise rejects quickly instead of timing out

Why

In standalone mode with spawn manager disabled, client.spawn() currently waits until timeout even though the daemon has already returned an explicit error. This change is non-breaking and improves developer feedback/latency.

Validation

  • npm --prefix ~/Projects/relay/packages/sdk test
  • all tests pass (77/77)

Open with Devin

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Node SDK RelayClient to fail fast on daemon-side “SpawnManager not enabled” errors and to propagate server ERROR frames to SDK consumers via onError, with added regression tests to prevent spawn calls from timing out unnecessarily in this scenario.

Changes:

  • Detect “SpawnManager not enabled” daemon ERROR frames and immediately reject pending spawn/release/send-input/list-workers operations.
  • Invoke RelayClient.onError for server-side ERROR frames.
  • Add tests verifying spawn() rejects quickly and clears pending spawn/ready bookkeeping when SpawnManager is disabled.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
packages/sdk/src/client.ts Adds daemon-error detection to fail fast on disabled SpawnManager and surfaces server errors via onError; introduces clearPendingAgentReady().
packages/sdk/src/client.test.ts Adds regression tests ensuring spawn rejects promptly and pending state is cleared when SpawnManager is disabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

this.rejectPendingReleases(err);
this.rejectPendingSendInputs(err);
this.rejectPendingListWorkers(err);
this.clearPendingAgentReady();
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearPendingAgentReady() clears all pendingAgentReady entries when the daemon reports SpawnManager is disabled. This can break unrelated waitForAgentReady() callers by removing their waiter and cancelling its timeout, leaving the returned promise permanently pending. Prefer cleaning up only the waiter(s) associated with the rejected spawn(s), or reject the relevant pendingAgentReady entries with the same error while ensuring any internal, un-awaited ready promises from spawn(waitForReady) won’t produce unhandled rejections (e.g., by handling cleanup in spawn() when the spawn promise rejects).

Suggested change
this.clearPendingAgentReady();

Copilot uses AI. Check for mistakes.
Comment on lines 1990 to 1997
for (const [agentName, pending] of this.pendingAgentReady.entries()) {
clearTimeout(pending.timeoutHandle);
this.pendingAgentReady.delete(agentName);
}
}

private scheduleReconnect(): void {
this.setState('BACKOFF');
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearPendingAgentReady() deletes entries without calling pending.reject(...). If any consumer is awaiting waitForAgentReady(), this will leave their promise hanging forever (and the timeout has been cleared). Consider removing this helper and using rejectPendingAgentReady(err) (or otherwise settling the promises) so callers deterministically receive an error instead of a never-resolving promise.

Suggested change
for (const [agentName, pending] of this.pendingAgentReady.entries()) {
clearTimeout(pending.timeoutHandle);
this.pendingAgentReady.delete(agentName);
}
}
private scheduleReconnect(): void {
this.setState('BACKOFF');
this.rejectPendingAgentReady(new Error('Pending agent readiness wait was cleared.'));
}
private scheduleReconnect(): void {
this.setState('BACKOFF');
private scheduleReconnect(): void {
this.setState('BACKOFF');

Copilot uses AI. Check for mistakes.
Comment on lines 1881 to 1884
// Surface server-side ERROR frames to SDK consumers.
if (this.onError) {
this.onError(new Error(errorMessage));
}
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onError is now used to surface server ERROR frames, but the callback only receives a generic Error with the message, losing structured context like code, fatal, and the server frame id. To let callers handle daemon-side failures “directly”, consider attaching the ErrorPayload (and possibly the full envelope) via Error’s cause option or a dedicated error class with code/fatal fields.

Copilot uses AI. Check for mistakes.
Comment on lines +192 to +223
it('rejects pending spawn when SpawnManager is disabled', async () => {
const client = new RelayClient({ reconnect: false, quiet: true });
(client as any)._state = 'READY';
const sendMock = vi.fn().mockReturnValue(true);
(client as any).send = sendMock;

const spawnPromise = client.spawn(
{
name: 'Worker',
cli: 'codex',
task: 'Do work',
},
10000
);

const errorEnvelope: Envelope<ErrorPayload> = {
v: 1,
type: 'ERROR',
id: 'err-spawn-disabled',
ts: Date.now(),
payload: {
code: 'INTERNAL' as any,
message: 'SpawnManager not enabled. Configure spawnManager: true in daemon config.',
fatal: false,
},
};

(client as any).processFrame(errorEnvelope);

await expect(spawnPromise).rejects.toThrow('SpawnManager not enabled');
expect((client as any).pendingSpawns.size).toBe(0);
});
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior surfaces server ERROR frames via onError, but there’s no regression test asserting that onError is invoked (and ideally that it exposes enough context to be actionable). Adding a focused test here would help prevent accidental removal/behavior drift of this new public-facing signal.

Copilot uses AI. Check for mistakes.
khaliqgant
khaliqgant previously approved these changes Feb 8, 2026
Copy link
Collaborator

@khaliqgant khaliqgant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Devin comments look worth addressing imo

@willwashburn
Copy link
Member Author

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link

Copilot AI commented Feb 8, 2026

@willwashburn I've opened a new pull request, #388, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 3 commits February 8, 2026 20:44
…layServerError with structured context, add onError test, handle unhandled rejections

Co-authored-by: willwashburn <957608+willwashburn@users.noreply.github.com>
Co-authored-by: willwashburn <957608+willwashburn@users.noreply.github.com>
sdk: properly reject pending agent-ready promises and surface structured server errors
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

⚠️ 1 issue in files not directly in the diff

⚠️ RelayServerError not exported from SDK public API, so consumers cannot type-check it (packages/sdk/src/index.ts:176-184)

The PR introduces RelayServerError as a structured error that is passed to onError callbacks in handleErrorFrame (packages/sdk/src/client.ts:1894-1901), but the class is never exported from the SDK's public entry point at packages/sdk/src/index.ts:176-184.

Root Cause and Impact

The onError callback (packages/sdk/src/client.ts:305) is typed as (error: Error) => void. When a server ERROR frame arrives, handleErrorFrame now creates a RelayServerError (with .code, .fatal, .envelope properties) and passes it to onError. However, packages/sdk/src/index.ts only re-exports the older error classes:

export {
  RelayError,
  DaemonNotRunningError,
  AgentNotFoundError,
  TimeoutError,
  ConnectionError,
  ChannelNotFoundError,
  SpawnError,
} from './errors.js';

RelayServerError is missing. SDK consumers who want to distinguish server-side errors from transport errors in their onError handler cannot do instanceof RelayServerError because the class is not importable from @agent-relay/sdk or @agent-relay/sdk/errors. They would need to resort to fragile checks like error.name === 'RelayServerError' or duck-typing for code/fatal properties.

Impact: Consumers receiving the new structured error cannot properly type-narrow it, undermining the stated goal of letting "callers handle daemon-side failures directly."

View 7 additional findings in Devin Review.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants