Fix missing wait_on_cuda_event_record_corr_id in Event Synchronize activities#1318
Open
jiannanWang wants to merge 8 commits intopytorch:mainfrom
Open
Fix missing wait_on_cuda_event_record_corr_id in Event Synchronize activities#1318jiannanWang wants to merge 8 commits intopytorch:mainfrom
jiannanWang wants to merge 8 commits intopytorch:mainfrom
Conversation
|
@jiannanWang has imported this pull request. If you are a Meta employee, you can view this in D97553834. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CUPTI provides no ordering guarantees for activity records in its buffers. SYNCHRONIZATION records (Event Synchronize, Stream Wait Event) can appear before the CUDA_EVENT record that populates the waitEventMap() with the source stream and correlation ID.
The existing code performed the waitEventMap() lookup eagerly at processing time. Stream Wait Event was already deferred (for stream-filtering purposes), so it happened to work — but Event Synchronize was logged immediately. When its CUDA_EVENT record hadn't been seen yet, wait_on_cuda_event_record_corr_id and wait_on_stream were left as -1.
Fix:
Test
SyncEventCorrIdOutOfOrder — places SYNCHRONIZATION records before their CUDA_EVENT record in the mock buffer and verifies that wait_on_cuda_event_record_corr_id and wait_on_stream are correctly populated for both sync types.