Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/release_checklist.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ body:
- label: "Finalize the doc update, including release notes (\"Note: Touching docstrings/type annotations in code is OK during code freeze, apply your best judgement!\")"
- label: Update the docs for the new version
- label: Create a public release tag
- label: Wait for the tag-triggered CI run to complete, and use that run ID for release workflows
- label: If any code change happens, rebuild the wheels from the new tag
- label: Update the conda recipe & release conda packages
- label: Upload conda packages to nvidia channel
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@ on:
branches:
- "pull-request/[0-9]+"
- "main"
tags:
# Build release artifacts from tag refs so setuptools-scm resolves exact
# release versions instead of .dev+local variants.
- "v*"
- "cuda-core-v*"
- "cuda-pathfinder-v*"
Comment on lines +18 to +23
Copy link
Member

@leofang leofang Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on offline discussion we have 3 solutions

  1. add another workflow which is a clone of "CI" (they can share most of the same code) that runs on tags, not every push.  And then update the lookup-run-id script to look for artifacts in that new workflow.  Those runs should always be guaranteed to have "releasable" versions.
    • this PR currently implements Solution 1.
    • I do not like this solution because I’ve seen several cases where we needed to walk back from a tag (i.e. re-tag) for various reasons, so I’d really like to separate tagging from releasing. But Solution 1 couples them together.
    • For wheels that were already tested on main, rebuilding & testing again takes unnecessary time before pushing packages out
  2. Make building the wheel an actual dependency of the release workflow.  The current approach of "assume it's already been done and look for it" seems pretty brittle.
    • This is what numba-cuda uses today (trigger release workflow -> tag -> rebuild wheels -> push out without tests). But I would like to unify our treatment across repos in the future and walk away from it. The reason is that by heavily relying on GHA we cannot guarantee that the wheels we build at release time are bitwise-identical to what’s built (and tested) in the main branch; the infra could change asynchronously behind our back. We should not rebuild IMHO.
  3. Add a tagging workflow that pushes an empty commit to main + a git tag.
    • Automate tagging (instead of manually pushing a tag to the upstream, which can be nerve wrecking)
    • rebuild & test on main
    • decoupled from release workflow (require manual triggering)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally: the simplest solution that meets all requirements is the best one. 1 (cloning/duplication) and 2 (convoluting) don't sound like that's the direction.

Regrading 3, sounds like training wheels (no pun intended)? Do we need them, for tagging? This PR seems to be very close to 3 already?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be right. I probably should rest and resume tomorrow...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution seems fine, except it means we only get CI on tagged commits on main. The "tags" metadata here acts as a filter, so this means "on branch main, only run this when there is a tag matching the pattern". I think we want to do /both/ every commit to main (which is useful for development and also people do like to have "development snapshots" to download) and the tagged commits again. That's why I suggested in (1) that we need to /clone/ the existing CI workflow to trigger on tags so that we also get tagged releases being built. And when I say "clone" it doesn't have to be literally copy-and-paste -- GHA has various ways to reuse code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I stand corrected. I just experimented on my own fork and it does look like pushing a tag causes the same workflow to run on the same commit. The cancelation policy we have in place gets in the way, but we can remove that. So this does seem like a good approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep the current cancellation policy, based on this Cursor-generated explanation (it gave me something similar yesterday, which it then distilled into the Auto-cancellation behavior section in the PR description):

on.push.branches and on.push.tags are additive (OR), so we still run CI on every push to main, and we also run CI on matching tag pushes.

Concurrency is currently:

group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event_name }}

with cancel-in-progress: true.

Since github.ref differs (refs/heads/main vs refs/tags/<tag>), cancellations are scoped per ref:

  • new main push cancels older main run
  • tag run does not cancel main run
  • new main run does not cancel tag run
  • different tags do not cancel each other

So we keep the benefit of pruning stale branch CI without hurting tag-triggered release builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimentally, that doesn't seem to be how it works. On my testing on my own fork, the tag-triggered run canceled the branch-triggered run. But I'm fine with merging this and experimenting and changing the cancelation config later if necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Canceled runs can be manually re-triggered. I am comfortable with merging this PR.

schedule:
# every 24 hours at midnight UTC
- cron: "0 0 * * *"
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/release-upload.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@ jobs:
# Use the shared script to download wheels
./ci/tools/download-wheels "${{ inputs.run-id }}" "${{ inputs.component }}" "${{ github.repository }}" "release/wheels"

# Validate that release wheels match the expected version from tag.
./ci/tools/validate-release-wheels "${{ inputs.git-tag }}" "${{ inputs.component }}" "release/wheels"

# Upload wheels to the release
if [[ -d "release/wheels" && $(ls -A release/wheels 2>/dev/null | wc -l) -gt 0 ]]; then
echo "Uploading wheels to release ${{ inputs.git-tag }}"
Expand Down
8 changes: 6 additions & 2 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ on:
required: true
type: string
run-id:
description: "The GHA run ID that generated validated artifacts (optional - will be auto-detected from git tag if not provided)"
description: "The GHA run ID that generated validated artifacts (optional - auto-detects successful tag-triggered CI run for git-tag)"
required: false
type: string
default: ""
Expand Down Expand Up @@ -64,7 +64,7 @@ jobs:
echo "Using provided run ID: ${{ inputs.run-id }}"
echo "run-id=${{ inputs.run-id }}" >> $GITHUB_OUTPUT
else
echo "Auto-detecting run ID for tag: ${{ inputs.git-tag }}"
echo "Auto-detecting successful tag-triggered run ID for tag: ${{ inputs.git-tag }}"
RUN_ID=$(./ci/tools/lookup-run-id "${{ inputs.git-tag }}" "${{ github.repository }}")
echo "Auto-detected run ID: $RUN_ID"
echo "run-id=$RUN_ID" >> $GITHUB_OUTPUT
Expand Down Expand Up @@ -165,6 +165,10 @@ jobs:
run: |
./ci/tools/download-wheels "${{ needs.determine-run-id.outputs.run-id }}" "${{ inputs.component }}" "${{ github.repository }}" "dist"

- name: Validate wheel versions for release tag
run: |
./ci/tools/validate-release-wheels "${{ inputs.git-tag }}" "${{ inputs.component }}" "dist"

- name: Publish package distributions to PyPI
if: ${{ inputs.wheel-dst == 'pypi' }}
uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e # v1.13.0
Expand Down
33 changes: 19 additions & 14 deletions ci/tools/lookup-run-id
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# SPDX-License-Identifier: Apache-2.0

# A utility script to find the GitHub Actions workflow run ID for a given git tag.
# This script looks for the CI workflow run that corresponds to the commit of the given tag.
# This script requires a successful CI run that was triggered by the tag push.

set -euo pipefail

Expand Down Expand Up @@ -54,16 +54,16 @@ fi
echo "Resolved tag '${GIT_TAG}' to commit: ${COMMIT_SHA}" >&2

# Find workflow runs for this commit
echo "Searching for '${WORKFLOW_NAME}' workflow runs for commit: ${COMMIT_SHA}" >&2
echo "Searching for '${WORKFLOW_NAME}' workflow runs for commit: ${COMMIT_SHA} (tag: ${GIT_TAG})" >&2

# Get workflow runs for the commit, filter by workflow name and successful status
# Get completed workflow runs for this commit.
RUN_DATA=$(gh run list \
--repo "${REPOSITORY}" \
--commit "${COMMIT_SHA}" \
--workflow "${WORKFLOW_NAME}" \
--status completed \
--json databaseId,workflowName,status,conclusion,headSha \
--limit 10)
--json databaseId,workflowName,status,conclusion,headSha,headBranch,event,createdAt,url \
--limit 50)

if [[ -z "${RUN_DATA}" || "${RUN_DATA}" == "[]" ]]; then
echo "Error: No completed '${WORKFLOW_NAME}' workflow runs found for commit ${COMMIT_SHA}" >&2
Expand All @@ -72,16 +72,21 @@ if [[ -z "${RUN_DATA}" || "${RUN_DATA}" == "[]" ]]; then
exit 1
fi

# Filter for successful runs (conclusion = success) and extract the run ID from the first one
RUN_ID=$(echo "${RUN_DATA}" | jq -r '.[] | select(.conclusion == "success") | .databaseId' | head -1)

if [[ -z "${RUN_ID}" || "${RUN_ID}" == "null" ]]; then
echo "Error: No successful '${WORKFLOW_NAME}' workflow runs found for commit ${COMMIT_SHA}" >&2
echo "Available workflow runs for this commit:" >&2
gh run list --repo "$REPOSITORY" --commit "${COMMIT_SHA}" --limit 10 || true
# Filter for successful push runs from the tag ref.
RUN_ID=$(echo "${RUN_DATA}" | jq -r --arg tag "${GIT_TAG}" '
map(select(.conclusion == "success" and .event == "push" and .headBranch == $tag))
| sort_by(.createdAt)
| reverse
| .[0].databaseId // empty
')

if [[ -z "${RUN_ID}" ]]; then
echo "Error: No successful '${WORKFLOW_NAME}' workflow runs found for tag '${GIT_TAG}'." >&2
echo "This release workflow now requires artifacts from a tag-triggered CI run." >&2
echo "If you just pushed the tag, wait for CI on that tag to finish and retry." >&2
echo "" >&2
echo "Completed runs with their conclusions:" >&2
echo "${RUN_DATA}" | jq -r '.[] | "\(.databaseId): \(.conclusion)"' >&2
echo "Completed runs for commit ${COMMIT_SHA}:" >&2
echo "${RUN_DATA}" | jq -r '.[] | "\(.databaseId): event=\(.event // "null"), headBranch=\(.headBranch // "null"), conclusion=\(.conclusion // "null"), status=\(.status // "null"), createdAt=\(.createdAt // "null")"' >&2
exit 1
fi

Expand Down
127 changes: 127 additions & 0 deletions ci/tools/validate-release-wheels
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
#!/usr/bin/env python3

# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# SPDX-License-Identifier: Apache-2.0

"""Validate downloaded release wheels against the requested release tag."""

from __future__ import annotations

import argparse
import re
import sys
from collections import defaultdict
from pathlib import Path

COMPONENT_TO_DISTRIBUTIONS: dict[str, set[str]] = {
"cuda-core": {"cuda_core"},
"cuda-bindings": {"cuda_bindings"},
"cuda-pathfinder": {"cuda_pathfinder"},
"cuda-python": {"cuda_python"},
"all": {"cuda_core", "cuda_bindings", "cuda_pathfinder", "cuda_python"},
}

TAG_PATTERNS = (
re.compile(r"^v(?P<version>\d+\.\d+\.\d+)"),
re.compile(r"^cuda-core-v(?P<version>\d+\.\d+\.\d+)"),
re.compile(r"^cuda-pathfinder-v(?P<version>\d+\.\d+\.\d+)"),
)


def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description=(
"Validate that wheel versions match the release tag. "
"This rejects dev/local wheel versions for release uploads."
)
)
parser.add_argument("git_tag", help="Release git tag (for example: v13.0.0)")
parser.add_argument("component", choices=sorted(COMPONENT_TO_DISTRIBUTIONS.keys()))
parser.add_argument("wheel_dir", help="Directory containing wheel files")
return parser.parse_args()


def version_from_tag(tag: str) -> str:
for pattern in TAG_PATTERNS:
match = pattern.match(tag)
if match:
return match.group("version")
raise ValueError(
"Unsupported git tag format "
f"{tag!r}; expected tags beginning with vX.Y.Z, cuda-core-vX.Y.Z, "
"or cuda-pathfinder-vX.Y.Z."
)


def parse_wheel_dist_and_version(path: Path) -> tuple[str, str]:
# Wheel name format starts with: {distribution}-{version}-...
parts = path.stem.split("-")
if len(parts) < 5:
raise ValueError(f"Invalid wheel filename format: {path.name}")
return parts[0], parts[1]


def main() -> int:
args = parse_args()
expected_version = version_from_tag(args.git_tag)
expected_distributions = COMPONENT_TO_DISTRIBUTIONS[args.component]
wheel_dir = Path(args.wheel_dir)

wheels = sorted(wheel_dir.glob("*.whl"))
if not wheels:
print(f"Error: No wheel files found in {wheel_dir}", file=sys.stderr)
return 1

seen_versions: dict[str, set[str]] = defaultdict(set)
errors: list[str] = []

for wheel in wheels:
try:
distribution, version = parse_wheel_dist_and_version(wheel)
except ValueError as exc:
errors.append(str(exc))
continue

if distribution not in expected_distributions:
continue

seen_versions[distribution].add(version)

if ".dev" in version or "+" in version:
errors.append(
f"{wheel.name}: wheel version {version!r} contains dev/local markers "
"(.dev or +), which is not allowed for release uploads."
)

if version != expected_version:
errors.append(
f"{wheel.name}: wheel version {version!r} does not match expected "
f"release version {expected_version!r} from git tag {args.git_tag!r}."
)

missing_distributions = sorted(expected_distributions - set(seen_versions))
if missing_distributions:
errors.append("Missing expected component wheels in download set: " + ", ".join(missing_distributions))

for distribution, versions in sorted(seen_versions.items()):
if len(versions) > 1:
errors.append(
f"Expected one release version for {distribution}, found multiple: " + ", ".join(sorted(versions))
)

if errors:
print("Wheel validation failed:", file=sys.stderr)
for error in errors:
print(f" - {error}", file=sys.stderr)
return 1

print(
"Validated release wheels for component "
f"{args.component} at version {expected_version} from tag {args.git_tag}."
)
return 0


if __name__ == "__main__":
raise SystemExit(main())