Skip to content

Sub-batches with equal flux in photon pooling#513

Open
welucas2 wants to merge 9 commits intomainfrom
u/welucas2/equal-flux-subbatches
Open

Sub-batches with equal flux in photon pooling#513
welucas2 wants to merge 9 commits intomainfrom
u/welucas2/equal-flux-subbatches

Conversation

@welucas2
Copy link
Copy Markdown
Collaborator

Currently, the sub-batching in photon pooling is performed object-by-object. So, for example, if there are 100 objects in the batch, and 20 sub-batches, then the first five objects will go in the first sub-batch, the next five in the second, and so on. This means that if the objects vary a lot in brightness (and they do) then the flux per sub-batch will also vary a lot. In a typical run and setup provided by Jim, I found that most sub-batches contained a few thousand photons, and a handful contained several tens of millions. This leads to regular memory spikes when those sub-batches are processed in each new photon pool.

This PR smooths those memory spikes out by creating sub-batches with roughly equal fluxes. This is an implementation of the bin packing problem with fragmentation: our goal is to split the total flux in the pool across the sub-batches, and to achieve this we have to be able to fragment those extremely bright objects across however many sub-batches.

At the same time, each object fragmentation means another object lookup, and we had determined previously that with photon pooling these can mount up and become quite costly. For this reason, I'm allowing the flux in each sub-batch to vary slightly, letting objects to fill up over the limit to 105% of the expected per sub-batch flux if it prevents a fragmentation.

This is ready to go, but needs #511 to be merged in first and then this should be rebased onto it - so for now I'm leaving this as a draft.

@welucas2 welucas2 force-pushed the u/welucas2/equal-flux-subbatches branch from 04a30a2 to 02b52ba Compare March 26, 2026 10:46
@welucas2
Copy link
Copy Markdown
Collaborator Author

This is ready for review, though CI failed earlier on today during Conda setup: CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/linux-64/repodata.json>. Could someone with permission please re-run CI to see if this gets past it?

@welucas2 welucas2 marked this pull request as ready for review March 26, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant