Skip to content

Apply hooks in install_cuda_and_libraries#165

Open
casparvl wants to merge 1 commit intoEESSI:mainfrom
casparvl:apply_hooks_to_cuda_installs
Open

Apply hooks in install_cuda_and_libraries#165
casparvl wants to merge 1 commit intoEESSI:mainfrom
casparvl:apply_hooks_to_cuda_installs

Conversation

@casparvl
Copy link
Contributor

@casparvl casparvl commented Feb 19, 2026

Make sure the hooks are applied in install_cuda_and_libraries, but that the directory check is skipped, as we INTEND to install in an exceptional (i.e. non-accelerator) path.

This solves an issue I had running install_cuda_and_libraries.sh on an H100 system. It would cause the CUDA sanity check to fail since the hooks weren't being applied that would turn the 9.0a CC into a 9.0.

Originally, we skipped the hooks alltogether. But the real solution is to apply the hooks, but make sure that the hook that does the installation path check is skipped.

…at the directory check is skipped, as we INTEND to install in an exceptional (i.e. non-accelerator) path
@casparvl casparvl marked this pull request as ready for review February 19, 2026 21:55
@ocaisa
Copy link
Member

ocaisa commented Feb 19, 2026

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-deucalion for:arch=aarch64/a64fx
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-deucalion for:arch=aarch64/a64fx

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Feb 19, 2026

New job on instance eessi-bot-deucalion for repository eessi.io-2023.06-software
Building on: a64fx
Building for: aarch64/a64fx
Job dir: /home/eessibot/new-bot/jobs/2026.02/pr_165/974648

date job status comment
Feb 19 22:09:15 UTC 2026 submitted job id 974648 awaits release by job manager
Feb 19 22:09:38 UTC 2026 released job awaits launch by Slurm scheduler
Feb 19 22:10:41 UTC 2026 running job 974648 is running
Feb 19 22:20:08 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-974648.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2023.06-software-linux-aarch64-a64fx-17715391760.tar.zstsize: 0 MiB (30242 bytes)
entries: 2
modules under 2023.06/software/linux/aarch64/a64fx/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/a64fx/software
no software packages in tarball
reprod directories under 2023.06/software/linux/aarch64/a64fx/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/a64fx
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Feb 19 22:20:09 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 2/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 3/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 4/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed
[ OK ] ( 5/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:a64fx+default
P: perf: 580.952 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:a64fx+default
P: perf: 526.343 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:a64fx+default
P: latency: 1.69 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:a64fx+default
P: latency: 1.71 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:a64fx+default
P: bandwidth: 8762.93 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:a64fx+default
P: bandwidth: 8708.05 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 6/10 test case(s) from 10 check(s) (0 failure(s), 4 skipped, 0 aborted)
Details
✅ job output file slurm-974648.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Feb 19, 2026

New job on instance eessi-bot-deucalion for repository eessi.io-2025.06-software
Building on: a64fx
Building for: aarch64/a64fx
Job dir: /home/eessibot/new-bot/jobs/2026.02/pr_165/974650

date job status comment
Feb 19 22:09:22 UTC 2026 submitted job id 974650 awaits release by job manager
Feb 19 22:09:35 UTC 2026 released job awaits launch by Slurm scheduler
Feb 19 22:10:44 UTC 2026 running job 974650 is running
Feb 19 22:17:03 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-974650.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-a64fx-17715391230.tar.zstsize: 0 MiB (30246 bytes)
entries: 2
modules under 2025.06/software/linux/aarch64/a64fx/modules/all
no module files in tarball
software under 2025.06/software/linux/aarch64/a64fx/software
no software packages in tarball
reprod directories under 2025.06/software/linux/aarch64/a64fx/reprod
no reprod directories in tarball
other under 2025.06/software/linux/aarch64/a64fx
2025.06/init/easybuild/eb_hooks.py
2025.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Feb 19 22:17:03 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:a64fx+default [Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed]
[ SKIP ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:a64fx+default [Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed]
[ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:a64fx+default
P: latency: 0.89 us (r:0, l:None, u:None)
[ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:a64fx+default
P: bandwidth: 7909.23 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 2/4 test case(s) from 4 check(s) (0 failure(s), 2 skipped, 0 aborted)
Details
✅ job output file slurm-974650.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@ocaisa
Copy link
Member

ocaisa commented Feb 19, 2026

@casparvl This looks fine to me, can you confirm here that it works in real life

Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

# Store MODULEPATH so it can be restored at the end of each loop iteration
SAVE_MODULEPATH=${MODULEPATH}

for EASYSTACK_FILE in ${TOPDIR}/easystacks/eessi-*CUDA*.yml; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know how you managed to get this to work, that path is not correct...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be

${TOPDIR}/easystacks/${EESSI_VERSION}/eessi-*CUDA*.yml

(I think)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments