Extending benchmarking to allow pre-schedules#65
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Extends the Lighthouse workload/benchmarking interface to support applying multiple transform schedules, enabling “pre-schedules” to run before the benchmark wrapper is emitted (to accommodate signature changes from sharding/partitioning).
Changes:
- Rename workload API from
schedule_module()toschedule_modules()and apply schedules sequentially during lowering. - Update benchmarking flow to optionally apply the first schedule before emitting the benchmark wrapper, then apply remaining schedules.
- Update examples (XeGPU and mlp-mpi) to return schedule lists; adjust mlp-mpi payload/signature and pipeline to support benchmarking.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| lighthouse/workload/workload.py | Updates the Workload interface to return multiple schedules and applies them in order during lowering. |
| lighthouse/workload/runner.py | Updates benchmark() to support pre-schedules before emitting the benchmark wrapper; adds an execution print. |
| examples/xegpu/mlp.py | Migrates to schedule_modules() returning a single schedule in a list. |
| examples/xegpu/matmul.py | Migrates to schedule_modules() returning a single schedule in a list. |
| examples/workload/example.py | Migrates to schedule_modules() and returns a list, but still has an early return path that returns a single module. |
| examples/mlp-mpi/mlp_weight_stationary.py | Adjusts payload signature to take an explicit destination argument and uses bufferization materialization into destination. |
| examples/mlp-mpi/mlp-mpi.py | Switches example driver to use benchmark() and splits schedule into pre/main schedules for signature-sensitive benchmarking. |
Comments suppressed due to low confidence (1)
examples/workload/example.py:151
schedule_modulesis now expected to returnlist[ir.Module], but this implementation still annotates-> ir.Moduleand (more importantly) returns a bareschedule_modulewhenstop_at_stage == "bufferized"(line 151). This will violate the new interface and will fail at runtime due to theassert isinstance(schedule_modules, list)inWorkload.lower_payload. Update the return annotation and make the early return return a list as well (or restructure to avoid returning from inside the insertion-point block).
def schedule_modules(
self, stop_at_stage: Optional[str] = None, parameters: Optional[dict] = None
) -> ir.Module:
schedule_module = ir.Module.create()
schedule_module.operation.attributes["transform.with_named_sequence"] = (
ir.UnitAttr.get()
)
with ir.InsertionPoint(schedule_module.body):
named_sequence = transform.named_sequence(
"__transform_main",
[transform.AnyOpType.get()],
[],
arg_attrs=[{"transform.readonly": ir.UnitAttr.get()}],
)
with ir.InsertionPoint(named_sequence.body):
anytype = transform.AnyOpType.get()
func = match(named_sequence.bodyTarget, ops={"func.func"})
mod = transform.get_parent_op(
anytype,
func,
op_name="builtin.module",
deduplicate=True,
)
mod = apply_registered_pass(mod, "one-shot-bufferize")
mod = apply_registered_pass(mod, "convert-linalg-to-loops")
transform.apply_cse(mod)
canonicalize(mod)
if stop_at_stage == "bufferized":
transform.YieldOp()
return schedule_module
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sharding modifies function signatures to make them operate on partitions, not the whole tensor.
The benchmark utility extracted the payload's signature from the unpartitioned IR and created the wrapper function based on that.
This of course break when it tries to call the partitioned function with global shapes.
This PR allows workloads to return more than one schedule. If so, the benchmark will apply the first, add the benchmark wrapper and finally apply all the remaining schedules.
The mlp-mpi example can now be properly benchmarked. For this mild the payload function was adjusted to accept a return argument instead of returning a tensor and it provides two schedules. All other examples simply return a list with a single schedule.
Requires a fix in MLIR to make mlp-mpi.py pass.