sdk: fail fast when spawn manager is disabled#387
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the Node SDK RelayClient to fail fast on daemon-side “SpawnManager not enabled” errors and to propagate server ERROR frames to SDK consumers via onError, with added regression tests to prevent spawn calls from timing out unnecessarily in this scenario.
Changes:
- Detect “SpawnManager not enabled” daemon
ERRORframes and immediately reject pending spawn/release/send-input/list-workers operations. - Invoke
RelayClient.onErrorfor server-sideERRORframes. - Add tests verifying
spawn()rejects quickly and clears pending spawn/ready bookkeeping when SpawnManager is disabled.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
packages/sdk/src/client.ts |
Adds daemon-error detection to fail fast on disabled SpawnManager and surfaces server errors via onError; introduces clearPendingAgentReady(). |
packages/sdk/src/client.test.ts |
Adds regression tests ensuring spawn rejects promptly and pending state is cleared when SpawnManager is disabled. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
packages/sdk/src/client.ts
Outdated
| this.rejectPendingReleases(err); | ||
| this.rejectPendingSendInputs(err); | ||
| this.rejectPendingListWorkers(err); | ||
| this.clearPendingAgentReady(); |
There was a problem hiding this comment.
clearPendingAgentReady() clears all pendingAgentReady entries when the daemon reports SpawnManager is disabled. This can break unrelated waitForAgentReady() callers by removing their waiter and cancelling its timeout, leaving the returned promise permanently pending. Prefer cleaning up only the waiter(s) associated with the rejected spawn(s), or reject the relevant pendingAgentReady entries with the same error while ensuring any internal, un-awaited ready promises from spawn(waitForReady) won’t produce unhandled rejections (e.g., by handling cleanup in spawn() when the spawn promise rejects).
| this.clearPendingAgentReady(); |
packages/sdk/src/client.ts
Outdated
| for (const [agentName, pending] of this.pendingAgentReady.entries()) { | ||
| clearTimeout(pending.timeoutHandle); | ||
| this.pendingAgentReady.delete(agentName); | ||
| } | ||
| } | ||
|
|
||
| private scheduleReconnect(): void { | ||
| this.setState('BACKOFF'); |
There was a problem hiding this comment.
clearPendingAgentReady() deletes entries without calling pending.reject(...). If any consumer is awaiting waitForAgentReady(), this will leave their promise hanging forever (and the timeout has been cleared). Consider removing this helper and using rejectPendingAgentReady(err) (or otherwise settling the promises) so callers deterministically receive an error instead of a never-resolving promise.
| for (const [agentName, pending] of this.pendingAgentReady.entries()) { | |
| clearTimeout(pending.timeoutHandle); | |
| this.pendingAgentReady.delete(agentName); | |
| } | |
| } | |
| private scheduleReconnect(): void { | |
| this.setState('BACKOFF'); | |
| this.rejectPendingAgentReady(new Error('Pending agent readiness wait was cleared.')); | |
| } | |
| private scheduleReconnect(): void { | |
| this.setState('BACKOFF'); | |
| private scheduleReconnect(): void { | |
| this.setState('BACKOFF'); |
| // Surface server-side ERROR frames to SDK consumers. | ||
| if (this.onError) { | ||
| this.onError(new Error(errorMessage)); | ||
| } |
There was a problem hiding this comment.
onError is now used to surface server ERROR frames, but the callback only receives a generic Error with the message, losing structured context like code, fatal, and the server frame id. To let callers handle daemon-side failures “directly”, consider attaching the ErrorPayload (and possibly the full envelope) via Error’s cause option or a dedicated error class with code/fatal fields.
| it('rejects pending spawn when SpawnManager is disabled', async () => { | ||
| const client = new RelayClient({ reconnect: false, quiet: true }); | ||
| (client as any)._state = 'READY'; | ||
| const sendMock = vi.fn().mockReturnValue(true); | ||
| (client as any).send = sendMock; | ||
|
|
||
| const spawnPromise = client.spawn( | ||
| { | ||
| name: 'Worker', | ||
| cli: 'codex', | ||
| task: 'Do work', | ||
| }, | ||
| 10000 | ||
| ); | ||
|
|
||
| const errorEnvelope: Envelope<ErrorPayload> = { | ||
| v: 1, | ||
| type: 'ERROR', | ||
| id: 'err-spawn-disabled', | ||
| ts: Date.now(), | ||
| payload: { | ||
| code: 'INTERNAL' as any, | ||
| message: 'SpawnManager not enabled. Configure spawnManager: true in daemon config.', | ||
| fatal: false, | ||
| }, | ||
| }; | ||
|
|
||
| (client as any).processFrame(errorEnvelope); | ||
|
|
||
| await expect(spawnPromise).rejects.toThrow('SpawnManager not enabled'); | ||
| expect((client as any).pendingSpawns.size).toBe(0); | ||
| }); |
There was a problem hiding this comment.
New behavior surfaces server ERROR frames via onError, but there’s no regression test asserting that onError is invoked (and ideally that it exposes enough context to be actionable). Adding a focused test here would help prevent accidental removal/behavior drift of this new public-facing signal.
khaliqgant
left a comment
There was a problem hiding this comment.
Looks good. Devin comments look worth addressing imo
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@willwashburn I've opened a new pull request, #388, to work on those changes. Once the pull request is ready, I'll request review from you. |
…layServerError with structured context, add onError test, handle unhandled rejections Co-authored-by: willwashburn <957608+willwashburn@users.noreply.github.com>
Co-authored-by: willwashburn <957608+willwashburn@users.noreply.github.com>
sdk: properly reject pending agent-ready promises and surface structured server errors
There was a problem hiding this comment.
Devin Review found 1 new potential issue.
⚠️ 1 issue in files not directly in the diff
⚠️ RelayServerError not exported from SDK public API, so consumers cannot type-check it (packages/sdk/src/index.ts:176-184)
The PR introduces RelayServerError as a structured error that is passed to onError callbacks in handleErrorFrame (packages/sdk/src/client.ts:1894-1901), but the class is never exported from the SDK's public entry point at packages/sdk/src/index.ts:176-184.
Root Cause and Impact
The onError callback (packages/sdk/src/client.ts:305) is typed as (error: Error) => void. When a server ERROR frame arrives, handleErrorFrame now creates a RelayServerError (with .code, .fatal, .envelope properties) and passes it to onError. However, packages/sdk/src/index.ts only re-exports the older error classes:
export {
RelayError,
DaemonNotRunningError,
AgentNotFoundError,
TimeoutError,
ConnectionError,
ChannelNotFoundError,
SpawnError,
} from './errors.js';RelayServerError is missing. SDK consumers who want to distinguish server-side errors from transport errors in their onError handler cannot do instanceof RelayServerError because the class is not importable from @agent-relay/sdk or @agent-relay/sdk/errors. They would need to resort to fragile checks like error.name === 'RelayServerError' or duck-typing for code/fatal properties.
Impact: Consumers receiving the new structured error cannot properly type-narrow it, undermining the stated goal of letting "callers handle daemon-side failures directly."
View 7 additional findings in Devin Review.
Summary
RelayClientreject pending spawn/release/send-input/list-workers operations immediately when daemon returnsERRORwith "SpawnManager not enabled"ERRORframes throughonErrorso callers can handle daemon-side failures directlyWhy
In standalone mode with spawn manager disabled,
client.spawn()currently waits until timeout even though the daemon has already returned an explicit error. This change is non-breaking and improves developer feedback/latency.Validation
npm --prefix ~/Projects/relay/packages/sdk test