Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 24 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,34 @@ All notable changes to microbench are documented here.

## [Unreleased]

### New features

- **`MBResourceUsage` mixin**: captures POSIX `getrusage()` data — user and
system CPU time, peak RSS (in bytes, normalised across platforms), minor and
major page faults, block I/O operations, and voluntary/involuntary context
switches. Results are stored as a **list** in `resource_usage`, one entry per
timed iteration, aligned index-for-index with `call.durations` and
`call.returncode`. Added as a **default CLI mixin** so every CLI run captures
it automatically.

- *CLI mode*: on POSIX, uses `os.wait4()` to capture the exact rusage of each
individual child process as reported by the kernel, including a reliable
`maxrss` per iteration regardless of `--iterations` or `--warmup` count.
- *Python API mode*: uses `RUSAGE_SELF` with a before/after delta around each
individual call — one entry per timed iteration, aligned with
`call.durations`. Warmup calls are excluded. `maxrss` is omitted (lifetime
process HWM, not per-call). Use `MBPeakMemory` for per-call peak memory.

On platforms where the stdlib `resource` module is unavailable, the mixin
records an empty list without raising an error.

### Enhancements

- **`--mixin defaults` keyword** (CLI): `defaults` can be used as a mixin
name to expand to the standard default set (`python-info`, `host-info`,
`slurm-info`, `loaded-modules`, `working-dir`). This makes it easy to add
one or more extra mixins without listing all five defaults explicitly:
`slurm-info`, `loaded-modules`, `working-dir`, `resource-usage`). This
makes it easy to add one or more extra mixins without listing all six
defaults explicitly:
`microbench --mixin defaults file-hash -- ./job.sh`.

- **`file-hash` mixin — automatic argument file scanning** (CLI): the
Expand Down
22 changes: 17 additions & 5 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,11 +117,13 @@ Every record contains the standard `mb.*` and `call.*` fields plus:
## Default mixins

When no `--mixin` is specified, `python-info`, `host-info`, `slurm-info`,
`loaded-modules`, and `working-dir` are included automatically, capturing
the Python interpreter version, prefix, and executable path; hostname and
operating system; all `SLURM_*` environment variables; the loaded
Lmod/Environment Modules software stack; and the current working directory.
All five degrade gracefully or produce stable values outside their respective
`loaded-modules`, `working-dir`, and `resource-usage` are included
automatically, capturing the Python interpreter version, prefix, and
executable path; hostname and operating system; all `SLURM_*` environment
variables; the loaded Lmod/Environment Modules software stack; the current
working directory; and POSIX resource usage (CPU time, peak RSS, page faults,
I/O, and context switches) for the benchmarked subprocess.
All six degrade gracefully or produce stable values outside their respective
environments.

Mixin names use a short kebab-case form without the `MB` prefix
Expand Down Expand Up @@ -236,6 +238,15 @@ microbench \
-- ./run_simulation.sh
```

### `resource-usage` options

`resource-usage` has no CLI flags. It is included in the defaults and records
POSIX `getrusage()` data automatically for the benchmarked subprocess.

On platforms where the Python `resource` module is unavailable,
`resource-usage` records an empty `resource_usage` list and does not raise an
error.

## Capture failures

Metadata capture failures (e.g. `nvidia-smi` not installed on this node,
Expand Down Expand Up @@ -289,6 +300,7 @@ With 10 iterations and 2 warmup runs, the record contains:
- `call.durations` — list of 10 wall-clock durations in seconds
- `call.returncode` — list of 10 exit codes (one per timed iteration)
- `call.stdout` / `call.stderr` — list of 10 captured strings, if `--stdout`/`--stderr` is used
- `resource_usage` — list of 10 per-iteration rusage dicts (when `resource-usage` mixin is active)

Warmup runs are excluded from all three lists. The process exits with
the first non-zero return code, if present, so any failing iteration
Expand Down
173 changes: 170 additions & 3 deletions docs/user-guide/mixins.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ to list all available mixins with descriptions:
microbench --show-mixins
```

By default, `python-info`, `host-info`, `slurm-info`, `loaded-modules`, and
`working-dir` are included automatically. Specifying `--mixin` replaces the
defaults entirely. Use `--no-mixin` to disable all mixins:
By default, `python-info`, `host-info`, `slurm-info`, `loaded-modules`,
`working-dir`, and `resource-usage` are included automatically. Specifying
`--mixin` replaces the defaults entirely. Use `--no-mixin` to disable all mixins:

```bash
# Only peak-memory — no host info or SLURM
Expand Down Expand Up @@ -70,6 +70,7 @@ combine any number of microbench mixins without conflicts, and their
| `MBLoadedModules` | `loaded-modules` *(default)* | `loaded_modules` dict mapping module name to version (empty dict if no Lmod/Environment Modules are loaded) | — |
| `MBWorkingDir` | `working-dir` *(default)* | `call.working_dir` — absolute path of the working directory at benchmark time | — |
| `MBCgroupLimits` | `cgroup-limits` | `cgroups` dict with `cpu_cores_limit`, `memory_bytes_limit`, `version` (empty dict if not on Linux or cgroup fs unavailable) | Linux only |
| `MBResourceUsage` | `resource-usage` *(default)* | `resource_usage` list of dicts with CPU times, peak RSS, page faults, I/O ops, and context switches (`[]` when the stdlib `resource` module is unavailable) | POSIX only (stdlib) |
| `MBGitInfo` | `git-info` | `git` dict with `repo`, `commit`, `branch`, `dirty` | `git` ≥ 2.11 on PATH |
| `MBGlobalPackages` | Python only | `python.loaded_packages` for every package in the caller's global scope | — |
| `MBInstalledPackages` | `installed-packages` | `python.installed_packages` (and optionally `python.installed_package_paths`) for every installed package | — |
Expand Down Expand Up @@ -338,6 +339,172 @@ platforms or when the cgroup filesystem is unavailable.
scheduler metadata (job ID, node list, etc.) while `MBCgroupLimits` captures
the kernel-enforced resource limits.

### `MBResourceUsage`

Captures POSIX [`getrusage(2)`](https://man7.org/linux/man-pages/man2/getrusage.2.html)
data — CPU time, page faults, block I/O operations, and context switches —
using only the Python standard library (`resource` module). No extra
dependencies are required.

**Modes**

- **CLI mode**: on POSIX, uses `os.wait4()` to get the exact rusage of each
child process as reported by the kernel — one dict per timed iteration,
aligned index-for-index with `call.durations`.
`maxrss` is the child's own peak RSS.
- **Python API mode**: uses `RUSAGE_SELF` — one dict per timed iteration,
each a before/after delta around that single call (aligned index-for-index
with `call.durations`). Warmup calls are excluded.
`maxrss` is **omitted** — `RUSAGE_SELF.maxrss` is a lifetime process
high-water mark that reflects the peak since the interpreter started,
not just since the decorated function was called, making it unreliable
for function-level measurement.

On platforms where the stdlib `resource` module is unavailable, `resource_usage`
is recorded as an empty list.

```python
from microbench import MicroBench, MBResourceUsage

class Bench(MicroBench, MBResourceUsage):
pass

bench = Bench()

@bench
def work():
return list(range(1_000_000))

work()
```

Python API record (one entry per timed iteration, no `maxrss`):

```json
{
"resource_usage": [
{
"utime": 0.052,
"stime": 0.003,
"minflt": 1024,
"majflt": 0,
"inblock": 0,
"oublock": 0,
"nvcsw": 2,
"nivcsw": 1
}
]
}
```

CLI record with `--iterations 2` (one entry per iteration, includes `maxrss`):

```json
{
"resource_usage": [
{
"utime": 0.068,
"stime": 0.029,
"maxrss": 11386880,
"minflt": 621,
"majflt": 0,
"inblock": 0,
"oublock": 0,
"nvcsw": 1,
"nivcsw": 2
},
{
"utime": 0.071,
"stime": 0.031,
"maxrss": 11386880,
"minflt": 618,
"majflt": 0,
"inblock": 0,
"oublock": 0,
"nvcsw": 1,
"nivcsw": 3
}
]
}
```

| Field | Modes | Description |
|---|---|---|
| `utime` | Both | User CPU time in seconds (float) |
| `stime` | Both | System CPU time in seconds (float) |
| `maxrss` | CLI only | Peak RSS in bytes (int) — see platform notes |
| `minflt` | Both | Minor page faults — pages reclaimed without I/O (int) |
| `majflt` | Both | Major page faults — pages requiring disk I/O (int) |
| `inblock` | Both | Block input operations (int) — see platform notes |
| `oublock` | Both | Block output operations (int) — see platform notes |
| `nvcsw` | Both | Voluntary context switches (int) |
| `nivcsw` | Both | Involuntary context switches (int) |

All fields are before/after **deltas** so they reflect only the benchmarked
work. `utime`, `stime`, `minflt`, `nvcsw`, and `nivcsw` are the most
reliable across platforms.

#### Platform notes and known quirks

**`maxrss` — CLI mode with `os.wait4()` (all POSIX)**

`os.wait4()` returns the exact rusage of each individual child process as
reported by the kernel. `maxrss` is the child's own peak RSS, accurate
regardless of iteration count or warmup. Values are normalised to bytes
(Linux reports kilobytes; macOS already reports bytes).

**`maxrss` — Python API mode (`RUSAGE_SELF`)**

`RUSAGE_SELF.maxrss` is a lifetime high-water mark for the Python interpreter
process. It is intentionally omitted. Use
[`MBPeakMemory`](#mbpeakmemory) if you need per-call peak memory tracking.

**`inblock` / `oublock` — macOS**

These counters are **almost always zero on macOS**, even for substantial file
I/O. The macOS unified buffer cache charges block I/O to the *first* process
that touches each page; subsequent reads and writes to cached pages are not
counted against the process that performed them. In practice, nearly all file
I/O is served from the cache and the counters never increment.

This is a macOS kernel accounting limitation. It is documented in the
[`getrusage(2)` man page](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/getrusage.2.html):
*"The numbers ru_inblock and ru_oublock account only for real I/O; data
supplied by the caching mechanism is charged only to the first process to
read or write the data."*

**`inblock` / `oublock` — Linux**

On Linux these counters increment only for I/O that truly bypasses the page
cache — cold-cache reads (first access to a file since it was last evicted)
or writes with `O_DIRECT`. Warm-cache reads also show zero. Drop the page
cache (`echo 3 > /proc/sys/vm/drop_caches` as root) before benchmarking if
you need to measure true cold-cache I/O.

**`majflt` — macOS**

Major page faults are rare on macOS because the unified buffer cache handles
most page-in activity. Zero is normal.

**`utime`, `stime`, `minflt`, `nvcsw`, `nivcsw`**

These are the most reliable fields across both Linux and macOS and are
non-zero for any non-trivial workload.

!!! note "Non-POSIX platforms"
When the Python `resource` module is unavailable, `resource_usage` is
recorded as an empty list without raising an error.

**CLI:** `resource-usage` is a default mixin — no flags needed:

```bash
# Included automatically
microbench --outfile results.jsonl -- ./run_simulation.sh

# Explicit, if defaults have been overridden
microbench --mixin resource-usage -- ./run_simulation.sh
```

## Code provenance

### `MBGitInfo`
Expand Down
2 changes: 2 additions & 0 deletions microbench/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
MBCgroupLimits,
MBHostInfo,
MBLoadedModules,
MBResourceUsage,
MBSlurmInfo,
MBWorkingDir,
)
Expand Down Expand Up @@ -77,6 +78,7 @@
'MBCgroupLimits',
'MBGitInfo',
'MBFileHash',
'MBResourceUsage',
'MBGlobalPackages',
'MBInstalledPackages',
'MBCondaPackages',
Expand Down
Loading
Loading