Skip to content

ALCF Reconstruction and Segmentation for bl832#114

Open
davramov wants to merge 56 commits intoals-computing:mainfrom
davramov:alcf_recon_segmentation
Open

ALCF Reconstruction and Segmentation for bl832#114
davramov wants to merge 56 commits intoals-computing:mainfrom
davramov:alcf_recon_segmentation

Conversation

@davramov
Copy link
Copy Markdown
Contributor

This PR adds segmentation logic to the HPCTomographyController, a Prefect task for segmentation, and updated Globus endpoint configuration.

…t for segmentation on ALCF. Still needs to be configured for GPU and the environment with dependencies
…e for the TomographyController. Turning off TIFF to ZARR on ALCF for the demo
@davramov davramov force-pushed the alcf_recon_segmentation branch from 3b2c506 to 7599f2e Compare January 28, 2026 19:50
@davramov davramov mentioned this pull request Feb 2, 2026
@davramov davramov marked this pull request as ready for review March 13, 2026 21:35
@davramov davramov requested review from xiaoyachong March 13, 2026 21:36
Copy link
Copy Markdown
Contributor

@xiaoyachong xiaoyachong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR. I’ve left a few comments. I didn’t repeat similar issues in nersc.py, but they follow the same pattern. We could address the generalization aspect in a future PR to make ALCF support different segmentation routines.

return result

@staticmethod
def _segmentation_sam3_wrapper(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also define a _segmentation_sam3_wrapper() alongside the segmentation_sam3 function in nersc.py? This separation would make the code clearer and improve readability.

master_addr = "localhost"
print("No PBS_NODEFILE, single node mode")

venv_path = "/eagle/SYNAPS-I/segmentation/env"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add venv_path to the function’s argument list?


@staticmethod
def _reconstruct_wrapper(
rundir: str = "/eagle/IRIProd/ALS/data/raw",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like reconstruct calls _reconstruct_wrapper_multinode, so _reconstruct_wrapper may no longer be used. Should we remove it, or merge the two into a single reconstruct wrapper?


if dino_success and sam3_success:
logger.info("Running segmentation combination (SAM3+DINO).")
combine_success = tomography_controller.combine_segmentations(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alcf_forge_recon_multisegment_flow calls combine_segmentations directly on the controller, rather than through alcf_combine_segmentations_task. That seems inconsistent with nersc.py. Should we use the same pattern here?


# STEP 2B: Run the Tiff to Zarr Globus Flow
logger.info(f"Starting ALCF tiff to zarr flow for {file_path=}")
alcf_multi_res_success = tomography_controller.build_multi_resolution(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the new alcf_forge_recon_segment_flow has removed the Zarr conversion. Is that intentional?

entrypoint: orchestration/flows/bl832/alcf.py:alcf_forge_recon_multisegment_flow
work_pool:
name: alcf_recon_flow_pool
work_queue_name: alcf_forge_recon_segment_flow_queue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that alcf_forge_recon_multisegment_flow and alcf_forge_recon_segment_flow share the same queue name.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems to be intended for the manuscript table, so it includes some hardcoded logic.

worker_init: "module use /soft/modulefiles; module load conda; conda activate /eagle/SYNAPS-I/reconstruction/env/tomopy; export PATH=$PATH:/eagle/SYNAPSE-I/; cd $HOME/.globus_compute/globus_compute_reconstruction"

walltime: 00:60:00 # Jobs will end after 60 minutes
nodes_per_block: 2 # All jobs will have 1 node
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment does not match nodes_per_block and max_blocks.

scheduler_options: "#PBS -l filesystems=home:eagle"

# Node setup: activate necessary conda environment and such
worker_init: "module use /soft/modulefiles; module load conda; conda activate /eagle/SYNAPS-I/reconstruction/env/tomopy; export PATH=$PATH:/eagle/SYNAPSE-I/; cd $HOME/.globus_compute/globus_compute_reconstruction"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s a typo in worker_init: “SYNAPSE-I” should be “SYNAPS-I.”

)

# Prune TIFFs from ALCF scratch/segmentation
if alcf_segmentation_success:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we check both alcf_segmentation_success and segment_transfer_success

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants