Skip to content

Improve handling of the NVTE_CUDA_ARCHS#2665

Open
ptrendx wants to merge 7 commits intoNVIDIA:mainfrom
ptrendx:pr_get_arch_cmake2
Open

Improve handling of the NVTE_CUDA_ARCHS#2665
ptrendx wants to merge 7 commits intoNVIDIA:mainfrom
ptrendx:pr_get_arch_cmake2

Conversation

@ptrendx
Copy link
Member

@ptrendx ptrendx commented Feb 9, 2026

Description

Improve handling of the NVTE_CUDA_ARCHS env variable:

  • add the regular architectures to the build of the sources with specific architectures to enable some support for GPU architectures in the family that were not specialized directly.
  • automatically add sm75 to the build in case the CMAKE_CUDA_ARCHITECTURES becomes empty (which currently should only happen when cmake < 4.0.2 and sm120 is the only selected architecture)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
@ptrendx ptrendx requested a review from ksivaman February 9, 2026 23:09
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 9, 2026

Greptile Summary

This PR improves handling of the NVTE_CUDA_ARCHS environment variable by restructuring how GPU architectures are processed in the CMake build system:

  • For arch 100/101: The generic architecture now stays in CMAKE_CUDA_ARCHITECTURES (instead of being moved to NVTE_GENERIC_ARCHS), so arch-specific sources are compiled for both the generic arch and the arch-specific variant (e.g., sm_100 + sm_100a). This enables fallback support for GPUs in the same family that don't have a dedicated specific compilation.
  • For arch 110/120: Adds CMake 4.0.2+ native f suffix support (e.g., 110f, 120f placed directly in CMAKE_CUDA_ARCHITECTURES), eliminating the need for manual --generate-code workarounds when the CMake version supports it.
  • Empty arch fallback: Adds sm_75 to CMAKE_CUDA_ARCHITECTURES when it becomes empty on CMake < 4.0.2 (e.g., when sm_120 is the only selected architecture and it gets removed during specific-arch processing).
  • ptx.cuh changes: Introduces NVTE_HAS_ARCH_SPECIFIC_TARGETS compile definition that converts static_assert failures into runtime return false in the arch-compatibility checks, allowing generic and specific compilation to coexist in the same build.

Confidence Score: 4/5

  • This PR is a build system improvement with reasonable safety — the logic is correct for the supported configurations, and the changes are consistent with the described intent.
  • The architecture routing logic is well-structured with correct CMake version branching. The unconditional NVTE_ARCH_SPECIFIC_TARGETS=TRUE is safe because the compatibility checks in ptx.cuh already guard against non-matching architectures via outer if-constexpr conditions. The sm_75 fallback addresses a real edge case. Minor concern is that the asymmetric handling between arch 100/101 (kept in CMAKE_CUDA_ARCHITECTURES) and arch 110/120 (removed/replaced) could be confusing for future maintainers, but it's justified by the different suffix semantics (a=arch-specific vs f=family-specific).
  • No files require special attention — both changed files are consistent with the described behavior.

Important Files Changed

Filename Overview
transformer_engine/common/CMakeLists.txt Restructures CUDA architecture handling: keeps generic archs in CMAKE_CUDA_ARCHITECTURES for arch 100/101 (to provide family fallback), adds CMake 4.0.2+ native 'f' suffix support for 110/120, and adds an sm_75 fallback for empty arch lists on older CMake. No significant issues found.
transformer_engine/common/util/ptx.cuh Adds NVTE_HAS_ARCH_SPECIFIC_TARGETS preprocessor guard to convert compile-time static_assert into runtime false return in ArchSpecific and FamilySpecific compatible() methods. This allows generic arch compilations to coexist with specific variants without triggering build errors.

Flowchart

flowchart TD
    A[CMAKE_CUDA_ARCHITECTURES input] --> B{Arch 100 found?}
    B -->|Yes| C["Keep 100 in CMAKE_CUDA_ARCHITECTURES\nAppend 100a to NVTE_SPECIFIC_ARCHS\n(+103a if CUDA≥12.9)"]
    B -->|No| D{Arch 101 found?}
    C --> D
    D -->|Yes| E["Keep 101 in CMAKE_CUDA_ARCHITECTURES\nAppend 101a to NVTE_SPECIFIC_ARCHS"]
    D -->|No| F{Arch 110 found?}
    E --> F
    F -->|Yes| G{CMake ≥ 4.0.2?}
    G -->|Yes| H["Remove 110, add 110f\nto CMAKE_CUDA_ARCHITECTURES"]
    G -->|No| I["Remove 110\nAdd 110 to NVTE_GENERIC_ARCHS\nAdd 110f to NVTE_SPECIFIC_ARCHS"]
    F -->|No| J{Arch 120 found?}
    H --> J
    I --> J
    J -->|Yes| K["Remove 120 from CMAKE_CUDA_ARCHITECTURES"]
    K --> L{CMake ≥ 4.0.2?}
    L -->|Yes| M{CUDA ≥ 12.9?}
    L -->|No| N{CUDA ≥ 12.9?}
    M -->|Yes| O["Add 120f to CMAKE_CUDA_ARCHITECTURES"]
    M -->|No| P["Add 120 to NVTE_GENERIC_ARCHS\nAdd 120a to NVTE_SPECIFIC_ARCHS"]
    N -->|Yes| Q["Add 120 to NVTE_GENERIC_ARCHS\nAdd 120f to NVTE_SPECIFIC_ARCHS"]
    N -->|No| R["Add 120 to NVTE_GENERIC_ARCHS\nAdd 120a to NVTE_SPECIFIC_ARCHS"]
    J -->|No| S{CMake < 4.0.2 AND\nCMAKE_CUDA_ARCHITECTURES empty?}
    O --> S
    P --> S
    Q --> S
    R --> S
    S -->|Yes| T["Warning + set CMAKE_CUDA_ARCHITECTURES=75"]
    S -->|No| U["Set NVTE_ARCH_SPECIFIC_TARGETS=TRUE\nDefine NVTE_HAS_ARCH_SPECIFIC_TARGETS=1\non arch-specific sources"]
    T --> U
Loading

Last reviewed commit: 1d89a57

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 43 to 56
@@ -54,31 +52,57 @@ endif()
# Check for architecture 101 (if we see this we are in toolkit <= 12.9)
list(FIND CMAKE_CUDA_ARCHITECTURES "101" arch_101_index)
if(NOT arch_101_index EQUAL -1)
list(REMOVE_ITEM CMAKE_CUDA_ARCHITECTURES "101")
list(APPEND NVTE_GENERIC_ARCHS "101")
list(APPEND NVTE_SPECIFIC_ARCHS "101a")
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arch 100/101 not removed

When "100" / "101" are present in CMAKE_CUDA_ARCHITECTURES, this block only appends 100a/101a to NVTE_SPECIFIC_ARCHS but never removes the base arch from CMAKE_CUDA_ARCHITECTURES nor adds it to NVTE_GENERIC_ARCHS. As a result, the build will still compile all sources for sm_100 / sm_101 (via CMAKE_CUDA_ARCHITECTURES) while also compiling arch-specific sources for sm_100a / sm_101a (via --generate-code), which defeats the “generic + specific” split and can cause unexpected extra build work / incorrect targeting. This differs from the 110/120 handling below, which removes the base arch and routes it into the explicit lists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptileai That is because arches 110/120 are routed to the "f" (like 120f) variant where the entire family is covered, whereas in the case of 100, it routes to "a" variants (like 100a) - this is intentional behavior, but in order to support different Arches in the same family, we need to still compile for the regular 100 then.

Comment on lines +93 to 101
if(CMAKE_VERSION VERSION_LESS 4.0.2)
if(NOT CMAKE_CUDA_ARCHITECTURES)
message(WARNING
"CMAKE_CUDA_ARCHITECTURES is empty after replacing arch-specific targets. "
"Please upgrade to CMake 4.0.2+ for native 'f' architecture support. "
"Adding sm_75 target in addition to the specified target to avoid configuration "
"errors - this will result in longer build time, but does not affect correctness.")
set(CMAKE_CUDA_ARCHITECTURES 75)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fallback wipes user archs

On CMake < 4.0.2, if the arch list becomes empty you set(CMAKE_CUDA_ARCHITECTURES 75) (dropping the user’s originally requested arch entirely). Given the warning text says “add sm_75 target in addition to the specified target”, this looks like a functional mismatch: the fallback should preserve the originally selected arch(es) and append 75, not replace everything with 75.

Copy link
Member Author

@ptrendx ptrendx Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptileai If the list is empty, then setting is the same as appending.

@ptrendx
Copy link
Member Author

ptrendx commented Feb 11, 2026

/te-ci

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +104 to 106
set(NVTE_ARCH_SPECIFIC_TARGETS TRUE)

# cuDNN frontend API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always-on arch-specific define

set(NVTE_ARCH_SPECIFIC_TARGETS TRUE) makes NVTE_HAS_ARCH_SPECIFIC_TARGETS=1 get added to all transformer_engine_cuda_arch_specific_sources (lines ~247-255) even when NVTE_SPECIFIC_ARCHS is empty (e.g., user selects only 70/80/89/90). That flips the behavior in util/ptx.cuh to skip the compile-time static_assert that catches “compiled generic but using arch/family-specific features”, so those mistakes will now silently pass in exactly the builds where no arch-specific codegen flags are being injected for these sources.

@ptrendx
Copy link
Member Author

ptrendx commented Feb 12, 2026

/te-ci

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

endif()
endif()

set(NVTE_ARCH_SPECIFIC_TARGETS TRUE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditional flag disables compile-time safety checks. NVTE_ARCH_SPECIFIC_TARGETS is set to TRUE even when NVTE_SPECIFIC_ARCHS is empty (e.g., user builds only sm_70/80/89/90). This causes NVTE_HAS_ARCH_SPECIFIC_TARGETS=1 to be defined for all arch-specific sources (lines 249-258), which disables the static_assert in ptx.cuh lines 34-38 and 56-60. The compile-time check that catches misuse of arch/family-specific features in generic builds is now always off, even when no arch-specific code generation is happening.

Suggested change
set(NVTE_ARCH_SPECIFIC_TARGETS TRUE)
if(NVTE_SPECIFIC_ARCHS)
set(NVTE_ARCH_SPECIFIC_TARGETS TRUE)
endif()

@ptrendx
Copy link
Member Author

ptrendx commented Feb 18, 2026

/te-ci

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments