fix(onboard): add timeout to spawnSync calls to prevent hung processes by latenighthackathon · Pull Request #1069 · NVIDIA/NemoClaw

latenighthackathon · 2026-03-29T19:52:52Z

Summary

All spawnSync calls in onboard.js lacked a timeout option. A hung curl or stalled download would freeze the entire wizard indefinitely. Added process-level timeouts and timeout-specific error messaging.

Related Issue

Closes #1017

Changes

30s timeout on curl endpoint probes (buffer over curl --max-time 20)
10min timeout on ollama model downloads
5min timeout on install-openshell.sh
SIGTERM detection in pullOllamaModel() for timeout-specific error message

Type of Change

Code change for a new feature, bug fix, or refactor.
Code change with doc updates.
Doc only. Prose changes without code sample modifications.
Doc only. Includes code sample changes.

Testing

npx prek run --all-files passes (or equivalently make check).
npm test passes.
make docs builds without warnings. (for doc-only changes)

Checklist

General

I have read and followed the contributing guide.

Code Changes

Formatters applied.
No secrets, API keys, or credentials committed.

Summary by CodeRabbit

Bug Fixes
- Added execution time limits to remote operations and model setup tasks.
- Improved error handling and reporting for operations that exceed designated time limits.

coderabbitai · 2026-03-29T19:53:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cf5c7d75-c8cd-4ff4-b5b1-62a7e193e132

📥 Commits

Reviewing files that changed from the base of the PR and between 5624521 and f3460d7.

📒 Files selected for processing (1)

bin/lib/onboard.js

🚧 Files skipped from review as they are similar to previous changes (1)

bin/lib/onboard.js

📝 Walkthrough

Walkthrough

Added explicit timeout options to several spawnSync calls in the onboarding script: remote probes and model-list fetches now use 30s timeouts; local long-running operations use longer timeouts (pullOllamaModel: 10m, installOpenshell: 5m). pullOllamaModel returns false on SIGTERM.

Changes

Cohort / File(s)	Summary
Onboarding timeouts `bin/lib/onboard.js`	Added `timeout: 30_000` for remote endpoint probes and model-list fetches (OpenAI-like, Anthropic, NVIDIA). Increased timeouts for local long-running operations: `pullOllamaModel()` uses `timeout: 600_000` and returns `false` when terminated by `SIGTERM`; `installOpenshell()` uses `timeout: 300_000`. No other logic changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A rabbit taps a stopwatch bright, 🐇
Probes that wait now end polite,
Long pulls linger, timed just so,
No more freezes in the flow,
Hops resume — onward we go! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main change: adding timeout parameters to spawnSync calls to prevent hung processes.
Linked Issues check	✅ Passed	The PR implements all coding requirements from issue `#1017`: adds timeout parameters to all spawnSync calls (30s for probes, 10m for ollama, 5m for installer) and detects SIGTERM for timeout-specific error messaging.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to issue `#1017`: only adds timeout options to spawnSync calls and timeout-specific error handling in pullOllamaModel().

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

bin/lib/onboard.js (1)

1227-1234: 10-minute timeout added for model pulls.

The 600-second timeout prevents indefinite hanging during ollama pull operations. When a timeout occurs, the function returns false, and the caller displays an error message.

Consider: The error message at Line 1245 doesn't distinguish between timeout and other failures (network errors, invalid model name, etc.). Users might benefit from timeout-specific guidance.

💡 Optional: Improve timeout diagnostics

Add a check for result.signal to provide timeout-specific error messages:

 function pullOllamaModel(model) {
   const result = spawnSync("bash", ["-c", `ollama pull ${shellQuote(model)}`], {
     cwd: ROOT,
     encoding: "utf8",
     stdio: "inherit",
     timeout: 600_000,
     env: { ...process.env },
   });
-  return result.status === 0;
+  if (result.signal === 'SIGTERM' && result.status === null) {
+    return { ok: false, timeout: true };
+  }
+  return { ok: result.status === 0 };
 }

Then update the caller (Line 1241) to check for timeout and provide specific guidance:

-    if (!pullOllamaModel(model)) {
+    const pullResult = pullOllamaModel(model);
+    if (!pullResult.ok) {
+      if (pullResult.timeout) {
+        return {
+          ok: false,
+          message: `Timed out pulling Ollama model '${model}' after 10 minutes. ` +
+            "Large models may need more time. Try: ollama pull ${model} manually, or choose a smaller model."
+        };
+      }
       return {

Note: For very large models (>20GB) on slower connections, the 10-minute timeout might be tight, though this is an edge case.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 1227 - 1234, The current spawnSync("bash",
["-c", `ollama pull ${shellQuote(model)}`]) invocation only returns a boolean
via result.status === 0, which loses timeout vs other-failure diagnostics;
change the function to detect and propagate timeout-specific info by inspecting
the spawnSync result (check result.signal and result.error if present, as well
as result.status) and return or throw a value that distinguishes a timeout
(e.g., result.signal === 'SIGTERM' or a custom status). Then update the caller
that currently treats a false return from this function to check for that
timeout indicator and show a timeout-specific error/help message (suggest
increasing timeout or checking network) while preserving the existing generic
error path for other failures; reference spawnSync, result.signal,
result.status, and the "ollama pull" command to locate the relevant code.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 1227-1234: The current spawnSync("bash", ["-c", `ollama pull
${shellQuote(model)}`]) invocation only returns a boolean via result.status ===
0, which loses timeout vs other-failure diagnostics; change the function to
detect and propagate timeout-specific info by inspecting the spawnSync result
(check result.signal and result.error if present, as well as result.status) and
return or throw a value that distinguishes a timeout (e.g., result.signal ===
'SIGTERM' or a custom status). Then update the caller that currently treats a
false return from this function to check for that timeout indicator and show a
timeout-specific error/help message (suggest increasing timeout or checking
network) while preserving the existing generic error path for other failures;
reference spawnSync, result.signal, result.status, and the "ollama pull" command
to locate the relevant code.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2b26a543-25d5-4344-8342-eef25264cdba

📥 Commits

Reviewing files that changed from the base of the PR and between eb4ba8c and 0f928f7.

📒 Files selected for processing (1)

bin/lib/onboard.js

All spawnSync calls in the onboarding wizard lacked a timeout option, meaning a hung curl or stalled download would freeze the entire wizard with no recovery path. Add process-level timeouts as a safety net: - 30s for curl endpoint probes (10s buffer over curl --max-time 20) - 10min for ollama model downloads - 5min for install-openshell.sh execution Closes NVIDIA#1017

When spawnSync kills the process due to timeout, result.signal is SIGTERM. Surface a specific error message so users know the 10-minute limit was hit, rather than seeing the generic pull failure message. Addresses CodeRabbit review nitpick.

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

latenighthackathon added 2 commits March 29, 2026 17:11

latenighthackathon force-pushed the fix/onboard-spawnsync-timeout branch from 5624521 to f3460d7 Compare March 29, 2026 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(onboard): add timeout to spawnSync calls to prevent hung processes#1069

fix(onboard): add timeout to spawnSync calls to prevent hung processes#1069
latenighthackathon wants to merge 2 commits intoNVIDIA:mainfrom
latenighthackathon:fix/onboard-spawnsync-timeout

latenighthackathon commented Mar 29, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 29, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

latenighthackathon commented Mar 29, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Testing

Checklist

General

Code Changes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

latenighthackathon commented Mar 29, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 29, 2026 •

edited

Loading