Add parallel tuning on multiple remote GPUs using Ray #328
Add parallel tuning on multiple remote GPUs using Ray #328
Conversation
|
|
The current parallel runner works. I've been able to run on multiple GPUs on DAS6-VU and DAS6-Leiden. There are several remaining problems:
|
|
I have resolved a couple of issues with sequential tuning that had arisen from the changes to the 'eval_all' costfunc. I haven't actually tested the parallel runner yet. But there are couple of other things I need to attend to now. |
07b599f to
40a956e
Compare
2418097 to
2b61ddb
Compare
e1b8fd9 to
a1c87db
Compare
|
This branch is almost ready for merging. The only big remaining issue is that the timing are not tracked correctly ( |
|
Sometimes with Python you run into an error and think: 'How on Earth has this error not surfaced years ago?' It seems that observers has been None by default instead of an empty list since forever and miraculously it was never an issue. It seems that the code responsible for replacing a None value with an empty list is currently hidden in (and duplicated across) the backends, and because there is now code that (rightfully so) assumes observers is a list just before the backends are created this is suddenly an issue. The real issue is of course that the backends have somehow become responsible for sanitizing user input, which is not what a backend should do. |
|



Working on a simple parallel runner that uses Ray to distribute the benchmarking of different configurations to remote Ray workers.