fix(onboard): add timeout to spawnSync calls to prevent hung processes#1069
fix(onboard): add timeout to spawnSync calls to prevent hung processes#1069latenighthackathon wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdded explicit Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
bin/lib/onboard.js (1)
1227-1234: 10-minute timeout added for model pulls.The 600-second timeout prevents indefinite hanging during
ollama pulloperations. When a timeout occurs, the function returnsfalse, and the caller displays an error message.Consider: The error message at Line 1245 doesn't distinguish between timeout and other failures (network errors, invalid model name, etc.). Users might benefit from timeout-specific guidance.
💡 Optional: Improve timeout diagnostics
Add a check for
result.signalto provide timeout-specific error messages:function pullOllamaModel(model) { const result = spawnSync("bash", ["-c", `ollama pull ${shellQuote(model)}`], { cwd: ROOT, encoding: "utf8", stdio: "inherit", timeout: 600_000, env: { ...process.env }, }); - return result.status === 0; + if (result.signal === 'SIGTERM' && result.status === null) { + return { ok: false, timeout: true }; + } + return { ok: result.status === 0 }; }Then update the caller (Line 1241) to check for timeout and provide specific guidance:
- if (!pullOllamaModel(model)) { + const pullResult = pullOllamaModel(model); + if (!pullResult.ok) { + if (pullResult.timeout) { + return { + ok: false, + message: `Timed out pulling Ollama model '${model}' after 10 minutes. ` + + "Large models may need more time. Try: ollama pull ${model} manually, or choose a smaller model." + }; + } return {Note: For very large models (>20GB) on slower connections, the 10-minute timeout might be tight, though this is an edge case.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 1227 - 1234, The current spawnSync("bash", ["-c", `ollama pull ${shellQuote(model)}`]) invocation only returns a boolean via result.status === 0, which loses timeout vs other-failure diagnostics; change the function to detect and propagate timeout-specific info by inspecting the spawnSync result (check result.signal and result.error if present, as well as result.status) and return or throw a value that distinguishes a timeout (e.g., result.signal === 'SIGTERM' or a custom status). Then update the caller that currently treats a false return from this function to check for that timeout indicator and show a timeout-specific error/help message (suggest increasing timeout or checking network) while preserving the existing generic error path for other failures; reference spawnSync, result.signal, result.status, and the "ollama pull" command to locate the relevant code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 1227-1234: The current spawnSync("bash", ["-c", `ollama pull
${shellQuote(model)}`]) invocation only returns a boolean via result.status ===
0, which loses timeout vs other-failure diagnostics; change the function to
detect and propagate timeout-specific info by inspecting the spawnSync result
(check result.signal and result.error if present, as well as result.status) and
return or throw a value that distinguishes a timeout (e.g., result.signal ===
'SIGTERM' or a custom status). Then update the caller that currently treats a
false return from this function to check for that timeout indicator and show a
timeout-specific error/help message (suggest increasing timeout or checking
network) while preserving the existing generic error path for other failures;
reference spawnSync, result.signal, result.status, and the "ollama pull" command
to locate the relevant code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2b26a543-25d5-4344-8342-eef25264cdba
📒 Files selected for processing (1)
bin/lib/onboard.js
All spawnSync calls in the onboarding wizard lacked a timeout option, meaning a hung curl or stalled download would freeze the entire wizard with no recovery path. Add process-level timeouts as a safety net: - 30s for curl endpoint probes (10s buffer over curl --max-time 20) - 10min for ollama model downloads - 5min for install-openshell.sh execution Closes NVIDIA#1017
When spawnSync kills the process due to timeout, result.signal is SIGTERM. Surface a specific error message so users know the 10-minute limit was hit, rather than seeing the generic pull failure message. Addresses CodeRabbit review nitpick.
5624521 to
f3460d7
Compare
Summary
All spawnSync calls in onboard.js lacked a timeout option. A hung curl or stalled download would freeze the entire wizard indefinitely. Added process-level timeouts and timeout-specific error messaging.
Related Issue
Closes #1017
Changes
Type of Change
Testing
npx prek run --all-filespasses (or equivalentlymake check).npm testpasses.make docsbuilds without warnings. (for doc-only changes)Checklist
General
Code Changes
Summary by CodeRabbit