Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
e09c193
initial commit with multinode support
davramov Jan 6, 2026
21244e5
adding logic for determining qos based on number of nodes requested
davramov Jan 6, 2026
77d15cc
Adding specific tag for microct image (for more efficient caching on …
davramov Jan 6, 2026
9e90a26
Adding reconstruct_multinode() method
davramov Jan 15, 2026
31ed453
Making the cancel_sfapi_job.py script more useful
davramov Jan 15, 2026
a88fd45
in setup.cfg, adding a new section for flake8 to ignore the reconstru…
davramov Jan 15, 2026
3691601
Adding nersc_recon_num_nodes = 4 to Config832, which is used in bl832…
davramov Jan 15, 2026
1f48743
separating single node (production) nersc reconstruction flow from th…
davramov Jan 28, 2026
8ad972f
Making a spearate deployment for the nersc multinode reconstruction flow
davramov Jan 28, 2026
81dd6c9
Creating option to turn on/off the nersc multinode reconstruction flo…
davramov Jan 28, 2026
141a5a6
Updating segmentation to use inference_v4.
davramov Feb 9, 2026
b4cab67
removing comments. segmentation still isn't working
davramov Feb 9, 2026
271d2bf
this configuration worked with 1 node for segmentation, testing with …
davramov Feb 10, 2026
9171ea5
adding nersc_forge_recon_segment_flow to prefect.yaml for deployment
davramov Feb 10, 2026
b5bd66d
removing comments
davramov Feb 10, 2026
3a5b1d2
making config.nersc_recon_num_nodes to set number of nodes for segmen…
davramov Feb 10, 2026
9df2f97
Using the amsc006 reservation for recon+segmentation
davramov Feb 10, 2026
340fcb2
Configuring to use all the nodes in the reservation
davramov Feb 11, 2026
f2f8806
num_nodes fix
davramov Feb 11, 2026
347816d
changing segmentation confidence from 0.5 to 0.2
davramov Feb 11, 2026
dd07993
Setting patch-size=400 and confidence=0.5
davramov Feb 12, 2026
54d832c
confidence=0.2
davramov Feb 12, 2026
e9336dc
Adding prefect variable to override defaults for segmentation
davramov Feb 12, 2026
567899e
updating segmentation to v5
davramov Feb 12, 2026
68490df
new checkpoint
davramov Feb 13, 2026
290a983
adding checkpoint as part of the segmentation variable
davramov Feb 13, 2026
1d5b8b4
adding support for a list of confidence scores that map to the prompt…
davramov Feb 13, 2026
b8d6cee
updaing nersc flows with multisegmentation flows
davramov Feb 20, 2026
9ae81fc
updaing nersc recon reservation name
davramov Feb 20, 2026
5532d12
updating reservation name
davramov Feb 20, 2026
4dea9a7
adjusting node numbers
davramov Feb 20, 2026
d37213f
adjusting path for combine step scripts
davramov Feb 20, 2026
e88ad2d
Update prompt list for latest sam3 version
davramov Feb 20, 2026
7e1562c
transferring segmented results to data832 as they complete rather tha…
davramov Feb 20, 2026
cd239b3
making sam3 results go into its own folder so its not messy
davramov Feb 20, 2026
0b4c709
using the latest segmentation versions
davramov Feb 20, 2026
3461d17
using combine_sam_dino_v2
davramov Feb 21, 2026
a869d02
removing cellpose from the multiseg workflow and increasing the numbe…
davramov Feb 21, 2026
790e1e0
updating default number of num nodes for sam3 segmenation
davramov Feb 21, 2026
a5d92c8
adding extract_regions task to the multiseg flow
davramov Feb 21, 2026
f0a91fd
removing some commented code
davramov Feb 21, 2026
ba6dd58
copy recon/segment results from prscratch to cfs when the flow is done
davramov Feb 21, 2026
8e588de
using new code
davramov Feb 21, 2026
42dce23
shortened prompt list
davramov Feb 21, 2026
0a1682f
fixing combine step
davramov Feb 21, 2026
51c85f9
reservation
davramov Feb 24, 2026
2aebe99
fixing combine step reservation (CPU)
davramov Feb 24, 2026
5e1778c
linting
davramov Mar 13, 2026
dd78a6e
removing cellpose
davramov Mar 13, 2026
b73bfe5
removing extract_regions flow (replaced by the combine step)
davramov Mar 13, 2026
58fe7c5
renaming segmantion flows/tasks to segmentation_sam3 to differentiate…
davramov Mar 13, 2026
827d7c7
removing commented code
davramov Mar 13, 2026
6e03983
removing multiresolution multinode optimization efforts from this PR
davramov Mar 13, 2026
eee3733
Adding pytests for bl832/nersc.py: reconstruction, segmentation, mult…
davramov Mar 13, 2026
87e107d
Adding pytests for bl832/nersc.py: reconstruction, segmentation, mult…
davramov Mar 13, 2026
9509447
Moving recon/segmentation num_nodes configuration to config.yaml
davramov Mar 18, 2026
d637f5e
Replacing the original reconstruct code with the multinode version th…
davramov Mar 18, 2026
35c8019
removing the sam3 forge segmentation flow, and renaming the nersc_for…
davramov Mar 18, 2026
8c25418
if to elif
davramov Mar 18, 2026
1ea50f6
setting nersc account for slurm based on config settings
davramov Mar 18, 2026
28ce316
Loading cpus-per-task from config for reconstruction slurm submission
davramov Mar 18, 2026
1e97c55
setting sam3 checkpoint/conda/model/vocab paths in config
davramov Mar 18, 2026
664671a
adding prompts to config
davramov Mar 18, 2026
5f694d2
Adding dino and combine segmentations settings to config
davramov Mar 18, 2026
0e66b2f
Updating pytests
davramov Mar 20, 2026
8726dda
Moving the rest of job submission variables to config.yml, created a …
davramov Mar 23, 2026
b0844b3
updating pytests
davramov Mar 23, 2026
e5be7d5
removing commented code
davramov Mar 23, 2026
b7b785a
updating prefect.yaml
davramov Mar 23, 2026
836c9e8
including script_name as part of config for sam3/dino/combine
davramov Mar 23, 2026
0be1eeb
renaming DINO references to DINOv3
davramov Apr 1, 2026
d696931
Updating pytest for nersc bc of the DINO -> DINOv3 naming changes
davramov Apr 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 88 additions & 2 deletions config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ globus:
uri: beegfs.als.lbl.gov
uuid: d33b5d6e-1603-414e-93cb-bcb732b7914a
name: bl733-beegfs-data

# 8.3.2 ENDPOINTS

spot832:
Expand Down Expand Up @@ -148,8 +149,8 @@ harbor_images832:
multires_image: tomorecon_nersc_mpi_hdf5@sha256:cc098a2cfb6b1632ea872a202c66cb7566908da066fd8f8c123b92fa95c2a43c

ghcr_images832:
recon_image: ghcr.io/als-computing/microct:master
multires_image: ghcr.io/als-computing/microct:master
recon_image: ghcr.io/als-computing/microct@sha256:1fdfb786726ee03301d624319e3d16702045072f38e2b0cca9d6237e5ab3f5ff
multires_image: ghcr.io/als-computing/microct@sha256:1fdfb786726ee03301d624319e3d16702045072f38e2b0cca9d6237e5ab3f5ff

prefect:
deployments:
Expand All @@ -158,3 +159,88 @@ prefect:

scicat:
jobs_api_url: https://dataportal.als.lbl.gov/api/ingest/jobs

hpc_submission_settings832:
nersc_reconstruction:
# ── SLURM resource allocation ─────────────────────────────────────────────
qos: realtime
account: als
reservation: ""
num_nodes: 4
cpus-per-task: 128
walltime: "0:30:00"
nersc_multiresolution:
# ── SLURM resource allocation ─────────────────────────────────────────────
qos: realtime
account: als
reservation: ""
cpus-per-task: 128
walltime: "0:15:00"
nersc_segmentation_sam3:
# ── SLURM resource allocation ─────────────────────────────────────────────
qos: regular
account: als
constraint: gpu
reservation: ""
num_nodes: 4
ntasks-per-node: 1
gpus-per-node: 4
cpus-per-task: 128
walltime: "00:59:00"
# ── Inference parameters ──────────────────────────────────────────────────
script_name: "src/inference_v6.py"
batch_size: 1
patch_size: 400
confidence:
- 0.5
overlap: 0.25
prompts:
- "Phloem Fibers"
- "Hydrated Xylem vessels"
- "Air-based Pith cells"
- "Dehydrated Xylem vessels"
# ── Paths ─────────────────────────────────────────────────────────────────
cfs_path: /global/cfs/cdirs/als/data_mover/8.3.2
conda_env_path: /global/cfs/cdirs/als/data_mover/8.3.2/envs/sam3-py311
seg_scripts_dir: /global/cfs/cdirs/als/data_mover/8.3.2/tomography_segmentation_scripts/inference_latest/forge_feb_seg_model_demo/
checkpoints_dir: /global/cfs/cdirs/als/data_mover/8.3.2/tomography_segmentation_scripts/sam3_finetune/sam3/
bpe_path: /global/cfs/cdirs/als/data_mover/8.3.2/tomography_segmentation_scripts/sam3_finetune/sam3/bpe_simple_vocab_16e6.txt.gz
original_checkpoint_path: /global/cfs/cdirs/als/data_mover/8.3.2/tomography_segmentation_scripts/sam3_finetune/sam3/sam3.pt
finetuned_checkpoint_path: /global/cfs/cdirs/als/data_mover/8.3.2/tomography_segmentation_scripts/sam3_finetune/sam3/checkpoint_v6.pt
nersc_segmentation_dinov3:
# ── SLURM resource allocation ─────────────────────────────────────────────
qos: regular
account: als
constraint: gpu
reservation: ""
num_nodes: 4
ntasks-per-node: 1
nproc_per_node: 4
gpus-per-node: 4
cpus-per-task: 128
walltime: "00:59:00"
# ── Inference parameters ──────────────────────────────────────────────────
script_name: "src.inference_dino_v1"
batch_size: 4
# ── Paths ─────────────────────────────────────────────────────────────────
cfs_path: /global/cfs/cdirs/als/data_mover/8.3.2
conda_env_path: /global/cfs/cdirs/als/data_mover/8.3.2/envs/dino_demo
seg_scripts_dir: /global/cfs/cdirs/als/data_mover/8.3.2/tomography_segmentation_scripts/inference_v5_multiseg/forge_feb_seg_model_demo/
dino_checkpoint_path: /global/cfs/cdirs/als/data_mover/8.3.2/tomography_segmentation_scripts/dino/best.ckpt
nersc_combine_segmentations:
# ── SLURM resource allocation ─────────────────────────────────────────────
qos: regular
account: als
constraint: cpu
reservation: ""
num_nodes: 4
ntasks: 128
cpus-per-task: 1
walltime: "01:00:00"
# ── Combination parameters ────────────────────────────────────────────────
script_name: "src.combine_sam_dino_v3"
dilate_px: 5
# ── Paths ─────────────────────────────────────────────────────────────────
cfs_path: /global/cfs/cdirs/als/data_mover/8.3.2
conda_env_path: /global/cfs/cdirs/als/data_mover/8.3.2/envs/dino_demo
seg_scripts_dir: /global/cfs/cdirs/als/data_mover/8.3.2/tomography_segmentation_scripts/inference_latest/forge_feb_seg_model_demo
Empty file.
Loading
Loading