Skip to content

[UR][CUDA] Fix urUSMContextMemcpyExp synchronization issue#21602

Draft
kekaczma wants to merge 1 commit intosyclfrom
fix-cuda-usm-context-memcpy
Draft

[UR][CUDA] Fix urUSMContextMemcpyExp synchronization issue#21602
kekaczma wants to merge 1 commit intosyclfrom
fix-cuda-usm-context-memcpy

Conversation

@kekaczma
Copy link
Contributor

cuMemcpy is synchronous with respect to the host, but it does not synchronize with device operations in other streams. This can lead to race conditions where urUSMContextMemcpyExp reads stale data if there are pending operations on the source or destination buffers.

The issue manifests as sporadic test failures in CI where host_mem reads as 0 instead of the expected value (42), indicating the copy happened before the fill operations completed.

Fix: Add cuCtxSynchronize() before cuMemcpy to ensure all pending device operations in the context have completed. This guarantees data consistency at the cost of a device-wide synchronization.

Since urUSMContextMemcpyExp is not performance-critical and should provide strong consistency guarantees, this trade-off is acceptable.

Fixes #19688

Test: exp_usm_context_memcpy/urUSMContextMemcpyExpTestDevice.Success now passes consistently on CUDA.

cuMemcpy is synchronous with respect to the host, but it does not
synchronize with device operations in other streams. This can lead to
race conditions where urUSMContextMemcpyExp reads stale data if there
are pending operations on the source or destination buffers.

The issue manifests as sporadic test failures in CI where host_mem
reads as 0 instead of the expected value (42), indicating the copy
happened before the fill operations completed.

Fix: Add cuCtxSynchronize() before cuMemcpy to ensure all pending
device operations in the context have completed. This guarantees
data consistency at the cost of a device-wide synchronization.

Since urUSMContextMemcpyExp is not performance-critical and should
provide strong consistency guarantees, this trade-off is acceptable.

Fixes #19688

Test: exp_usm_context_memcpy/urUSMContextMemcpyExpTestDevice.Success
now passes consistently on CUDA.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[UR][CUDA] precommit conformance test exp_usm_context_memcpy/urUSMContextMemcpyExp failing

1 participant