Add parallel tuning on multiple remote GPUs using Ray by isazi · Pull Request #328 · KernelTuner/kernel_tuner

isazi · 2025-08-13T09:39:21Z

Working on a simple parallel runner that uses Ray to distribute the benchmarking of different configurations to remote Ray workers.

sonarqubecloud · 2025-08-13T09:39:59Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

stijnh · 2026-01-20T13:32:17Z

The current parallel runner works. I've been able to run on multiple GPUs on DAS6-VU and DAS6-Leiden.

There are several remaining problems:

The timings are incorrect as the host assumes that the total time is just the sum over individual configurations
Use of tuning_options need to be refactored, as now it the entire object is sent to every node for each benchmark job
Logging information can be improved
The strategies are not parallel-aware yet (except brute-force)
A guide needs to be added to the docs explaining how to launch a Ray cluster on DAS6

…FF, GA, PSO, all hillclimbers, random

benvanwerkhoven · 2026-01-30T09:14:28Z

I have resolved a couple of issues with sequential tuning that had arisen from the changes to the 'eval_all' costfunc. I haven't actually tested the parallel runner yet. But there are couple of other things I need to attend to now.

stijnh · 2026-02-10T14:52:55Z

This branch is almost ready for merging. The only big remaining issue is that the timing are not tracked correctly (strategy_time and framework_time) and currently these values are incorrect

benvanwerkhoven · 2026-02-16T14:44:45Z

Sometimes with Python you run into an error and think: 'How on Earth has this error not surfaced years ago?'

It seems that observers has been None by default instead of an empty list since forever and miraculously it was never an issue. It seems that the code responsible for replacing a None value with an empty list is currently hidden in (and duplicated across) the backends, and because there is now code that (rightfully so) assumes observers is a list just before the backends are created this is suddenly an issue. The real issue is of course that the backends have somehow become responsible for sanitizing user input, which is not what a backend should do.

sonarqubecloud · 2026-02-16T14:59:19Z

Quality Gate passed

Issues
75 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

isazi added 15 commits May 23, 2025 11:12

Typo.

37c5338

Add missing parameter to the interface.

6c5b360

Formatting.

a21caf8

First early draft of the parallel runner.

a2328c4

Merge branch 'master' into parallel_runner

c5896ad

Need a dummy DeviceInterface even on the master.

68a569b

Missing device_options in state.

9d0dee4

Flatten the results.

aff21f0

Various bug fixes.

d7e8cae

Add another example for the parallel runner.

b4ff7fa

Merge branch 'master' into parallel_runner

dd4f5ff

Merge branch 'master' into parallel_runner

5cb0243

Merge branch 'master' into parallel_runner

c4f7f32

Merge branch 'master' into parallel_runner

dd4a4ed

Merge branch 'master' into parallel_runner

e322824

isazi self-assigned this Aug 13, 2025

isazi added the enhancement label Aug 13, 2025

isazi marked this pull request as draft August 13, 2025 09:39

Rewrite parallel runner to use stateful actors

426dd2a

stijnh changed the title ~~Simple parallel runner~~ Add parallel tuning on multiple remote GPUs using Ray Jan 19, 2026

stijnh self-assigned this Jan 19, 2026

stijnh added 3 commits January 19, 2026 16:49

Merge branch 'master' into parallel_runner

baf4fd1

Move tuning_options to constructor of ParallelRunner

f585d42

Fix several errors related to parallel runner

ad55ba4

stijnh added 4 commits January 20, 2026 17:33

Extend several strategies with support for parallel tuning: DiffEvo, …

4d8f4f5

…FF, GA, PSO, all hillclimbers, random

Add pcu_bus_id to environment for Nvidia backends

fd41333

Add support eval_all in CostFunc

96e168d

Remove return_raw from CostFunc as it is unused

d7129cd

benvanwerkhoven added 2 commits January 30, 2026 08:15

merge master

35e61fb

fix budget overshoot issue for sequential tuning

ec30052

stijnh added 6 commits February 2, 2026 09:23

Add TuningBudget class and modify runners to respect the budget

d18bfdf

Add tests for TuningBudget

05bc4a6

Fix mismatch between milliseconds and seconds

8e28f02

Move unique_results from tuning_options to CostFunc

6d09ca6

Remove tuning_options from base_hillclimb

da4fa97

Fix CostFunc returning results containing InvalidConfig

40a956e

stijnh force-pushed the parallel_runner branch from 07b599f to 40a956e Compare February 2, 2026 16:44

Fix random_sample incorrectly sampling too many configurations

2b61ddb

stijnh force-pushed the parallel_runner branch from 2418097 to 2b61ddb Compare February 2, 2026 16:53

stijnh added 8 commits February 2, 2026 18:48

Fix test_cost_func

9f53715

Add page on parallel tuning to documentation

d7eb725

Remove simulation of benchmark time in sequential and parallel runner

2534c99

Change how timings are processed in CostFunc

7410063

Speed up initialization of parallel Ray workers

c960b1b

Rename parallel_workers keyword argument to parallel

ea13fdb

Add more tunable parameters to vector_add_parallel.py example

2c33501

Add support for KERNEL_TUNER_PARALLEL environment variable

a1c87db

stijnh force-pushed the parallel_runner branch from e1b8fd9 to a1c87db Compare February 9, 2026 16:57

Move check for non-unique configuration from CostFunc to runners

ceaa96c

stijnh and others added 4 commits February 10, 2026 18:15

Change how timings are collected in all runners

d844c19

Add ability to pass lambda factory functions as observers

4a57daa

Add test to check if observer can be passed as lambda function

3ce803f

fix test_runner

3d54317

ensure observers is a list in DeviceInterface

d74271b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel tuning on multiple remote GPUs using Ray #328

Add parallel tuning on multiple remote GPUs using Ray #328
isazi wants to merge 54 commits intomasterfrom
parallel_runner

isazi commented Aug 13, 2025

Uh oh!

sonarqubecloud bot commented Aug 13, 2025

Uh oh!

stijnh commented Jan 20, 2026 •

edited

Loading

Uh oh!

benvanwerkhoven commented Jan 30, 2026

Uh oh!

stijnh commented Feb 10, 2026 •

edited

Loading

Uh oh!

benvanwerkhoven commented Feb 16, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

isazi commented Aug 13, 2025

Uh oh!

sonarqubecloud bot commented Aug 13, 2025

Quality Gate passed

Uh oh!

stijnh commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benvanwerkhoven commented Jan 30, 2026

Uh oh!

stijnh commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benvanwerkhoven commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Feb 16, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stijnh commented Jan 20, 2026 •

edited

Loading

stijnh commented Feb 10, 2026 •

edited

Loading

benvanwerkhoven commented Feb 16, 2026 •

edited

Loading