Introduce benchmark framework using CUDA events#157
Open
mcgibbon wants to merge 3 commits intoNVIDIA:mainfrom
Open
Introduce benchmark framework using CUDA events#157mcgibbon wants to merge 3 commits intoNVIDIA:mainfrom
mcgibbon wants to merge 3 commits intoNVIDIA:mainfrom
Conversation
Introduce a torch_harmonics.benchmark subpackage with: - Timer infrastructure (CUDATimer, NullTimer, CPUEventPair) for GPU event-based and CPU wall-clock timing - BenchmarkABC base class with registry via @register_benchmark - CLI runner (python -m torch_harmonics.benchmark) that saves JSON results - RealSHT and InverseRealSHT benchmarks at 1-degree resolution Also add benchmark_results to .gitignore. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Register a disco_conv_s2_torch_1deg benchmark at 1-degree resolution (B=4, 4 channels, 180x360) using the non-optimized torch contraction path, which does not require the custom CUDA extension. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce hardware.py with a device-name-to-scale-factor lookup table so benchmark batch sizes adapt to different GPUs. Base batch sizes are tuned for Tesla T4 (factor 1.0). Unknown devices default to 1.0 with a warning to add an entry for their hardware. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
Author
|
@azrael417 you may find this helpful to check the SHT timings on your hardware, for #155. You'll want to insert new batch size scaling factors to fully occupy the hardware. I tried to make it straightforward to add new benchmarks. The entrypoint will create git-tag labelled json files under benchmark_results/ in the directory you run it from (location modifiable by flag). |
Collaborator
|
Hello Jeremy, thanks for putting this together.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds timing for the SHT and for the torch implementation of DISCO convolution through a new benchmarking framework, run through
python -m torch_harmonics.benchmark.This is largely taken from the implementation we used/I authored in https://github.com/ai2cm/ace