Add Device context manager for temporary device switching#1597
Draft
Andy-Jost wants to merge 1 commit intoNVIDIA:mainfrom
Draft
Add Device context manager for temporary device switching#1597Andy-Jost wants to merge 1 commit intoNVIDIA:mainfrom
Andy-Jost wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Closes NVIDIA#1586. Adds __enter__/__exit__ to Device so it can be used as a context manager that saves the current CUDA context on entry and restores it on exit. Uses cuCtxGetCurrent/cuCtxSetCurrent (not push/pop) for interoperability with the runtime API. Saved contexts are stored on a per-thread stack (_tls._ctx_stack) so nested and reentrant usage works correctly. Also adds teardown to mempool_device_x2/x3 fixtures to clean up residual contexts between tests. Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
Contributor
Author
|
/ok to test f02b730 |
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #1586. Adds
__enter__/__exit__toDeviceso it can be used as a context manager that temporarily activates a device and restores the previous CUDA context on exit.Changes
cuda/core/_device.pyx: Added__enter__and__exit__methods toDevice. On enter, queries the current context viacuCtxGetCurrentand saves it on a per-thread stack (_tls._ctx_stack), then callsset_current(). On exit, restores the saved context viacuCtxSetCurrent. Uses peek-then-pop ordering so the stack is not corrupted ifcuCtxSetCurrentraises.tests/test_device.py: Added 12 tests covering basic usage, context restoration, exception safety, same-device nesting, deep nesting, multi-GPU nesting,set_current()inside awithblock, device usability after exit, device initialization, and thread safety (3 threads on 3 GPUs).tests/conftest.py: Added teardown tomempool_device_x2andmempool_device_x3fixtures to clean up residual contexts between tests.Design
__enter__queries the actual CUDA driver state rather than maintaining a Python-side device cache. This ensures correct interoperability with other libraries (PyTorch, CuPy) that usecudaSetDevice/cuCtxSetCurrent.Devicesingleton), so nested and reentrant usage works correctly.cuCtxGetCurrent/cuCtxSetCurrent: Consistent withset_current()and the runtime API model. Does not usecuCtxPushCurrent/cuCtxPopCurrent.Test Coverage
All tests pass locally on single-GPU (L40) and multi-GPU (3x RTX PRO 6000 Blackwell) machines. Stress-tested with 20 randomized iterations via
pytest-repeat+pytest-randomlywith no ordering issues.Made with Cursor