Skip to content

Comments

Perf: Optimize Network.run loop via unswitching and int tracking#1752

Open
ayush4874 wants to merge 1 commit intobrian-team:masterfrom
ayush4874:perf/optimize-network-run
Open

Perf: Optimize Network.run loop via unswitching and int tracking#1752
ayush4874 wants to merge 1 commit intobrian-team:masterfrom
ayush4874:perf/optimize-network-run

Conversation

@ayush4874
Copy link
Contributor

I noticed the main simulation loop in Network.run had some overhead from repeated checks that don't change during a run (like single_clock and device).

I refactored it to split the "fast path" (single clock) from the multi-clock logic.
Also, timestep[0] is a numpy scalar, which is surprisingly slow to access in a hot loop. I replaced it with a local integer counter just for the loop condition.

Benchmarks (1M steps, pure Python):

  • Before: ~7.99s
  • After: ~7.59s
  • Speedup: ~5%

Verified that stop() and NetworkOperation still trigger correctly with the new logic.

@ayush4874
Copy link
Contributor Author

@mstimberg can you also have a look here?

@mstimberg
Copy link
Member

Hi @ayush4874, thanks for the PR and apologies that I did not reply earlier. I appreciate the reasoning behind the PR, but I am hesitant accepting it in its current form. I agree that checking single_clock every time step is wasteful, but I am not sure that fixing this outweighs the increased code complexity and redundancy. Imagine that we wanted to change the way we profile code objects, we'd now have to make the same change in two places. I don't know what kind of network you used for measuring the 5% gain, but given that complex networks spend a lot of time in the actual computation, this will probably not be noticeable in practice. Having said that, the repeated if single_clock checks were ugly before, so I'd be happy to get rid of them. But what would you think of an approach more like the following, here for the advance function, but similar for other places that had the check previously?

if single_clock:
    advance = clock.advance
else:
   def advance():
      for c in clocks:
          c.advance()
...
while ...:
   ...
   advance()

Again, not quite sure about the best approach here, but generally speaking I'd prefer a clean/readable solution over an optimized one (if it is only about a few percentage points in edge cases).

@ayush4874 ayush4874 force-pushed the perf/optimize-network-run branch 4 times, most recently from 1b26c1d to a7c367d Compare February 11, 2026 16:39
@ayush4874 ayush4874 force-pushed the perf/optimize-network-run branch from a7c367d to 0aacbf2 Compare February 11, 2026 16:55
@ayush4874
Copy link
Contributor Author

Hi @mstimberg, I've updated the PR. I switched to the function dispatch approach you suggested (for both advance and update), so the loop is much cleaner now.

To keep performance up, I pre-bound obj.run in the setup phase to avoid repeated attribute lookups inside the loop. It’s consistently benchmarking faster than the original code (~4% speedup on my end).

@mstimberg
Copy link
Member

Thanks again @ayush4874 – same comment as for #1766 :-) Looks good at first sight, but I will do a more thorough review beginning of next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants