feat(pathfinder): support "cuda"/"nvml" driver libs and reject unsupported libnames#1602
Conversation
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
182be13 to
058f824
Compare
…orted libnames
Add support for loading NVIDIA driver libraries ("cuda", "nvml") via
load_nvidia_dynamic_lib(). These are part of the display driver (not the
CTK) and use a simplified system-search-only path, skipping site-packages,
conda, CUDA_HOME, and canary probe steps.
Also reject unrecognized libnames with a ValueError instead of silently
falling through to system search, which could produce surprising results
for unsupported libs like "cupti".
Closes NVIDIA#1288, closes NVIDIA#1564.
Co-authored-by: Cursor <cursoragent@cursor.com>
058f824 to
902a7f5
Compare
|
/ok to test |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Adds explicit support in cuda.pathfinder for loading NVIDIA driver libraries ("cuda", "nvml") via a system-search-only path, and hard-fails on unknown library names to avoid surprising implicit loads.
Changes:
- Add platform-specific driver library name mappings (
libcuda.so.1/nvcuda.dll,libnvidia-ml.so.1/nvml.dll) to the supported-lib registries. - Introduce a dedicated driver-only loading path (skip wheel/conda/CUDA_HOME/canary) and validate
libnameagainst a platform-aware supported set. - Expand tests to cover driver lib dispatch behavior and unsupported-libname validation.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
cuda_pathfinder/cuda/pathfinder/_dynamic_libs/supported_nvidia_libs.py |
Adds driver-library mappings and folds them into the platform registries. |
cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py |
Adds supported-name validation and a driver-only load path that bypasses the CTK cascade. |
cuda_pathfinder/tests/test_driver_lib_loading.py |
New targeted unit tests for driver-only behavior and dispatch rules. |
cuda_pathfinder/tests/test_load_nvidia_dynamic_lib.py |
Updates tests to use mocker and adds unsupported-libname validation tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
/ok to test |
rwgk
left a comment
There was a problem hiding this comment.
Mostly nits, but we need real tests ("all_must_work") or we'll be blind to regressions.
| # CUDA_HOME). Note the non-standard naming: "cuda" → libcuda.so.1, | ||
| # "nvml" → libnvidia-ml.so.1. |
There was a problem hiding this comment.
| # CUDA_HOME). Note the non-standard naming: "cuda" → libcuda.so.1, | |
| # "nvml" → libnvidia-ml.so.1. | |
| # CUDA_HOME). |
There was a problem hiding this comment.
(seems entirely obvious; avoids duplicating names in comment that may go stale)
| @pytest.mark.parametrize("libname", ["cuda", "nvml"]) | ||
| def test_driver_only_libnames_contains(libname): | ||
| assert libname in _DRIVER_ONLY_LIBNAMES | ||
|
|
||
|
|
||
| @pytest.mark.parametrize("libname", ["cudart", "nvrtc", "cublas", "nvvm"]) | ||
| def test_driver_only_libnames_excludes_ctk_libs(libname): | ||
| assert libname not in _DRIVER_ONLY_LIBNAMES |
There was a problem hiding this comment.
Best removed: these just seem to exercise in and not in based on made-up lists of specific libnames.
| assert libname not in _DRIVER_ONLY_LIBNAMES | ||
|
|
||
|
|
||
| # --------------------------------------------------------------------------- |
There was a problem hiding this comment.
We need some real tests; mocks are great for hard-to-reach-branch coverage, but other than that, they give a false sense of "oh this works".
We should have tests for real loading, and track them with INFOs, so that we can validate that it actually works by inspecting the test logs (CI, QA).
We should have at least some all_must_work on platforms where we're sure it must work. Without, we'll be blind to regressions.
I realize this is unusual, but that's the nature of a piece of software with the sole purpose of adapting to highly diverse environments that we don't control.
There was a problem hiding this comment.
Can we avoid the info logging stuff until we merge #1593? I'd like to get that in so I don't have to redo related changes from that PR.
There was a problem hiding this comment.
Ok, nevermind about the INFO logging, I'll just do it here. It's more important to just get this feature in.
| load_nvidia_dynamic_lib("cudart") | ||
|
|
||
|
|
||
| @pytest.mark.parametrize("libname", ["bogus", "not_a_real_lib", "cupti"]) |
There was a problem hiding this comment.
Do we need to include "cupti" here? I'd rather stay clear of mentioning any real names here, if for nothing else but avoiding head scratching when searching the codebase for cupti.
There was a problem hiding this comment.
Not really, fixes incoming.
Use only obviously fake names to prevent confusion when searching the codebase for real library names like "cupti". Co-authored-by: Cursor <cursoragent@cursor.com>
The duplicated soname/DLL names in the comment can drift from the dict values below; the dict itself is the source of truth. Co-authored-by: Cursor <cursoragent@cursor.com>
…ibnames These just exercise Python's `in` operator against a hardcoded list and don't provide meaningful coverage. Co-authored-by: Cursor <cursoragent@cursor.com>
Run "cuda" and "nvml" through the actual OS loader in spawned child processes, following the same pattern as test_load_nvidia_dynamic_lib. Results are tracked via INFO summary lines for CI/QA log inspection. Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
||
|
|
||
| @pytest.mark.parametrize("libname", sorted(_DRIVER_ONLY_LIBNAMES)) | ||
| def test_real_load_driver_lib(info_summary_append, libname): |
|
/ok to test ba23547 |
|
It is working as expected: Using logs_57373437624.zip (1st attempt, two jobs are missing): There are The lists below cover |
This comment has been minimized.
This comment has been minimized.
|
Summary
load_nvidia_dynamic_lib()"cuda","nvml"#1288: Add support forload_nvidia_dynamic_lib("cuda")andload_nvidia_dynamic_lib("nvml"). These are NVIDIA driver libraries (not CTK) with non-standard naming (libcuda.so.1/nvcuda.dll,libnvidia-ml.so.1/nvml.dll). They use a simplified system-search-only path, skipping site-packages, conda, CUDA_HOME, and canary probe.ValueErrorinstead of silently falling through to system search. The error message includes the full list of supported names for discoverability.Breaking change
load_nvidia_dynamic_lib()now raisesValueErrorfor unrecognized libnames. Any downstream code passing names like"cupti"that happened to work via system search will break. Adding CUPTI as a supported lib (#1572) before the next release would restore that path through the proper search cascade.Changes
supported_nvidia_libs.py— newSUPPORTED_LINUX_SONAMES_DRIVER/SUPPORTED_WINDOWS_DLLS_DRIVERdictsload_nvidia_dynamic_lib.py—_DRIVER_ONLY_LIBNAMES,_load_driver_lib_no_cache(),_ALL_SUPPORTED_LIBNAMESvalidation, updated docstringtests/test_driver_lib_loading.py— 10 new tests for driver lib flowtests/test_load_nvidia_dynamic_lib.py— 2 new tests for validation, migrated tomockerCloses #1288, closes #1564.
Made with Cursor