ALCF Reconstruction and Segmentation for bl832#114
ALCF Reconstruction and Segmentation for bl832#114davramov wants to merge 56 commits intoals-computing:mainfrom
Conversation
…t for reconstruction on ALCF
…t for segmentation on ALCF. Still needs to be configured for GPU and the environment with dependencies
…e for the TomographyController. Turning off TIFF to ZARR on ALCF for the demo
…, and segmented data
3b2c506 to
7599f2e
Compare
…ant to collapse these into the final version, but for testing purposes I'm leaving both codes
…olaris that scales well to multiple gpu nodes
… its own subfolder
…ctions for each segmentation step
…sly separated from dino segmentation
…n, multiresolution flows
… nature comms paper
xiaoyachong
left a comment
There was a problem hiding this comment.
Thanks for your PR. I’ve left a few comments. I didn’t repeat similar issues in nersc.py, but they follow the same pattern. We could address the generalization aspect in a future PR to make ALCF support different segmentation routines.
| return result | ||
|
|
||
| @staticmethod | ||
| def _segmentation_sam3_wrapper( |
There was a problem hiding this comment.
Could we also define a _segmentation_sam3_wrapper() alongside the segmentation_sam3 function in nersc.py? This separation would make the code clearer and improve readability.
| master_addr = "localhost" | ||
| print("No PBS_NODEFILE, single node mode") | ||
|
|
||
| venv_path = "/eagle/SYNAPS-I/segmentation/env" |
There was a problem hiding this comment.
Shall we add venv_path to the function’s argument list?
|
|
||
| @staticmethod | ||
| def _reconstruct_wrapper( | ||
| rundir: str = "/eagle/IRIProd/ALS/data/raw", |
There was a problem hiding this comment.
It looks like reconstruct calls _reconstruct_wrapper_multinode, so _reconstruct_wrapper may no longer be used. Should we remove it, or merge the two into a single reconstruct wrapper?
|
|
||
| if dino_success and sam3_success: | ||
| logger.info("Running segmentation combination (SAM3+DINO).") | ||
| combine_success = tomography_controller.combine_segmentations( |
There was a problem hiding this comment.
alcf_forge_recon_multisegment_flow calls combine_segmentations directly on the controller, rather than through alcf_combine_segmentations_task. That seems inconsistent with nersc.py. Should we use the same pattern here?
|
|
||
| # STEP 2B: Run the Tiff to Zarr Globus Flow | ||
| logger.info(f"Starting ALCF tiff to zarr flow for {file_path=}") | ||
| alcf_multi_res_success = tomography_controller.build_multi_resolution( |
There was a problem hiding this comment.
It seems the new alcf_forge_recon_segment_flow has removed the Zarr conversion. Is that intentional?
| entrypoint: orchestration/flows/bl832/alcf.py:alcf_forge_recon_multisegment_flow | ||
| work_pool: | ||
| name: alcf_recon_flow_pool | ||
| work_queue_name: alcf_forge_recon_segment_flow_queue |
There was a problem hiding this comment.
It seems that alcf_forge_recon_multisegment_flow and alcf_forge_recon_segment_flow share the same queue name.
There was a problem hiding this comment.
This file seems to be intended for the manuscript table, so it includes some hardcoded logic.
| worker_init: "module use /soft/modulefiles; module load conda; conda activate /eagle/SYNAPS-I/reconstruction/env/tomopy; export PATH=$PATH:/eagle/SYNAPSE-I/; cd $HOME/.globus_compute/globus_compute_reconstruction" | ||
|
|
||
| walltime: 00:60:00 # Jobs will end after 60 minutes | ||
| nodes_per_block: 2 # All jobs will have 1 node |
There was a problem hiding this comment.
The comment does not match nodes_per_block and max_blocks.
| scheduler_options: "#PBS -l filesystems=home:eagle" | ||
|
|
||
| # Node setup: activate necessary conda environment and such | ||
| worker_init: "module use /soft/modulefiles; module load conda; conda activate /eagle/SYNAPS-I/reconstruction/env/tomopy; export PATH=$PATH:/eagle/SYNAPSE-I/; cd $HOME/.globus_compute/globus_compute_reconstruction" |
There was a problem hiding this comment.
There’s a typo in worker_init: “SYNAPSE-I” should be “SYNAPS-I.”
| ) | ||
|
|
||
| # Prune TIFFs from ALCF scratch/segmentation | ||
| if alcf_segmentation_success: |
There was a problem hiding this comment.
Shall we check both alcf_segmentation_success and segment_transfer_success
This PR adds segmentation logic to the HPCTomographyController, a Prefect task for segmentation, and updated Globus endpoint configuration.