Skip to content

Extending benchmarking to allow pre-schedules#65

Open
fschlimb wants to merge 3 commits intollvm:mainfrom
fschlimb:mult_sched
Open

Extending benchmarking to allow pre-schedules#65
fschlimb wants to merge 3 commits intollvm:mainfrom
fschlimb:mult_sched

Conversation

@fschlimb
Copy link
Contributor

@fschlimb fschlimb commented Mar 6, 2026

Sharding modifies function signatures to make them operate on partitions, not the whole tensor.

The benchmark utility extracted the payload's signature from the unpartitioned IR and created the wrapper function based on that.
This of course break when it tries to call the partitioned function with global shapes.

This PR allows workloads to return more than one schedule. If so, the benchmark will apply the first, add the benchmark wrapper and finally apply all the remaining schedules.

The mlp-mpi example can now be properly benchmarked. For this mild the payload function was adjusted to accept a return argument instead of returning a tensor and it provides two schedules. All other examples simply return a list with a single schedule.

Requires a fix in MLIR to make mlp-mpi.py pass.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the Lighthouse workload/benchmarking interface to support applying multiple transform schedules, enabling “pre-schedules” to run before the benchmark wrapper is emitted (to accommodate signature changes from sharding/partitioning).

Changes:

  • Rename workload API from schedule_module() to schedule_modules() and apply schedules sequentially during lowering.
  • Update benchmarking flow to optionally apply the first schedule before emitting the benchmark wrapper, then apply remaining schedules.
  • Update examples (XeGPU and mlp-mpi) to return schedule lists; adjust mlp-mpi payload/signature and pipeline to support benchmarking.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
lighthouse/workload/workload.py Updates the Workload interface to return multiple schedules and applies them in order during lowering.
lighthouse/workload/runner.py Updates benchmark() to support pre-schedules before emitting the benchmark wrapper; adds an execution print.
examples/xegpu/mlp.py Migrates to schedule_modules() returning a single schedule in a list.
examples/xegpu/matmul.py Migrates to schedule_modules() returning a single schedule in a list.
examples/workload/example.py Migrates to schedule_modules() and returns a list, but still has an early return path that returns a single module.
examples/mlp-mpi/mlp_weight_stationary.py Adjusts payload signature to take an explicit destination argument and uses bufferization materialization into destination.
examples/mlp-mpi/mlp-mpi.py Switches example driver to use benchmark() and splits schedule into pre/main schedules for signature-sensitive benchmarking.
Comments suppressed due to low confidence (1)

examples/workload/example.py:151

  • schedule_modules is now expected to return list[ir.Module], but this implementation still annotates -> ir.Module and (more importantly) returns a bare schedule_module when stop_at_stage == "bufferized" (line 151). This will violate the new interface and will fail at runtime due to the assert isinstance(schedule_modules, list) in Workload.lower_payload. Update the return annotation and make the early return return a list as well (or restructure to avoid returning from inside the insertion-point block).
    def schedule_modules(
        self, stop_at_stage: Optional[str] = None, parameters: Optional[dict] = None
    ) -> ir.Module:
        schedule_module = ir.Module.create()
        schedule_module.operation.attributes["transform.with_named_sequence"] = (
            ir.UnitAttr.get()
        )
        with ir.InsertionPoint(schedule_module.body):
            named_sequence = transform.named_sequence(
                "__transform_main",
                [transform.AnyOpType.get()],
                [],
                arg_attrs=[{"transform.readonly": ir.UnitAttr.get()}],
            )
            with ir.InsertionPoint(named_sequence.body):
                anytype = transform.AnyOpType.get()
                func = match(named_sequence.bodyTarget, ops={"func.func"})
                mod = transform.get_parent_op(
                    anytype,
                    func,
                    op_name="builtin.module",
                    deduplicate=True,
                )
                mod = apply_registered_pass(mod, "one-shot-bufferize")
                mod = apply_registered_pass(mod, "convert-linalg-to-loops")
                transform.apply_cse(mod)
                canonicalize(mod)

                if stop_at_stage == "bufferized":
                    transform.YieldOp()
                    return schedule_module

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants