Improve handling of the NVTE_CUDA_ARCHS by ptrendx · Pull Request #2665 · NVIDIA/TransformerEngine

ptrendx · 2026-02-09T23:09:22Z

Description

Improve handling of the NVTE_CUDA_ARCHS env variable:

add the regular architectures to the build of the sources with specific architectures to enable some support for GPU architectures in the family that were not specialized directly.
automatically add sm75 to the build in case the CMAKE_CUDA_ARCHITECTURES becomes empty (which currently should only happen when cmake < 4.0.2 and sm120 is the only selected architecture)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

greptile-apps · 2026-02-09T23:20:51Z

Greptile Summary

This PR improves handling of the NVTE_CUDA_ARCHS environment variable by restructuring how GPU architectures are processed in the CMake build system:

For arch 100/101: The generic architecture now stays in CMAKE_CUDA_ARCHITECTURES (instead of being moved to NVTE_GENERIC_ARCHS), so arch-specific sources are compiled for both the generic arch and the arch-specific variant (e.g., sm_100 + sm_100a). This enables fallback support for GPUs in the same family that don't have a dedicated specific compilation.
For arch 110/120: Adds CMake 4.0.2+ native f suffix support (e.g., 110f, 120f placed directly in CMAKE_CUDA_ARCHITECTURES), eliminating the need for manual --generate-code workarounds when the CMake version supports it.
Empty arch fallback: Adds sm_75 to CMAKE_CUDA_ARCHITECTURES when it becomes empty on CMake < 4.0.2 (e.g., when sm_120 is the only selected architecture and it gets removed during specific-arch processing).
ptx.cuh changes: Introduces NVTE_HAS_ARCH_SPECIFIC_TARGETS compile definition that converts static_assert failures into runtime return false in the arch-compatibility checks, allowing generic and specific compilation to coexist in the same build.

Confidence Score: 4/5

This PR is a build system improvement with reasonable safety — the logic is correct for the supported configurations, and the changes are consistent with the described intent.
The architecture routing logic is well-structured with correct CMake version branching. The unconditional NVTE_ARCH_SPECIFIC_TARGETS=TRUE is safe because the compatibility checks in ptx.cuh already guard against non-matching architectures via outer if-constexpr conditions. The sm_75 fallback addresses a real edge case. Minor concern is that the asymmetric handling between arch 100/101 (kept in CMAKE_CUDA_ARCHITECTURES) and arch 110/120 (removed/replaced) could be confusing for future maintainers, but it's justified by the different suffix semantics (a=arch-specific vs f=family-specific).
No files require special attention — both changed files are consistent with the described behavior.

Important Files Changed

Filename	Overview
transformer_engine/common/CMakeLists.txt	Restructures CUDA architecture handling: keeps generic archs in CMAKE_CUDA_ARCHITECTURES for arch 100/101 (to provide family fallback), adds CMake 4.0.2+ native 'f' suffix support for 110/120, and adds an sm_75 fallback for empty arch lists on older CMake. No significant issues found.
transformer_engine/common/util/ptx.cuh	Adds NVTE_HAS_ARCH_SPECIFIC_TARGETS preprocessor guard to convert compile-time static_assert into runtime false return in ArchSpecific and FamilySpecific compatible() methods. This allows generic arch compilations to coexist with specific variants without triggering build errors.

Flowchart

flowchart TD
    A[CMAKE_CUDA_ARCHITECTURES input] --> B{Arch 100 found?}
    B -->|Yes| C["Keep 100 in CMAKE_CUDA_ARCHITECTURES\nAppend 100a to NVTE_SPECIFIC_ARCHS\n(+103a if CUDA≥12.9)"]
    B -->|No| D{Arch 101 found?}
    C --> D
    D -->|Yes| E["Keep 101 in CMAKE_CUDA_ARCHITECTURES\nAppend 101a to NVTE_SPECIFIC_ARCHS"]
    D -->|No| F{Arch 110 found?}
    E --> F
    F -->|Yes| G{CMake ≥ 4.0.2?}
    G -->|Yes| H["Remove 110, add 110f\nto CMAKE_CUDA_ARCHITECTURES"]
    G -->|No| I["Remove 110\nAdd 110 to NVTE_GENERIC_ARCHS\nAdd 110f to NVTE_SPECIFIC_ARCHS"]
    F -->|No| J{Arch 120 found?}
    H --> J
    I --> J
    J -->|Yes| K["Remove 120 from CMAKE_CUDA_ARCHITECTURES"]
    K --> L{CMake ≥ 4.0.2?}
    L -->|Yes| M{CUDA ≥ 12.9?}
    L -->|No| N{CUDA ≥ 12.9?}
    M -->|Yes| O["Add 120f to CMAKE_CUDA_ARCHITECTURES"]
    M -->|No| P["Add 120 to NVTE_GENERIC_ARCHS\nAdd 120a to NVTE_SPECIFIC_ARCHS"]
    N -->|Yes| Q["Add 120 to NVTE_GENERIC_ARCHS\nAdd 120f to NVTE_SPECIFIC_ARCHS"]
    N -->|No| R["Add 120 to NVTE_GENERIC_ARCHS\nAdd 120a to NVTE_SPECIFIC_ARCHS"]
    J -->|No| S{CMake < 4.0.2 AND\nCMAKE_CUDA_ARCHITECTURES empty?}
    O --> S
    P --> S
    Q --> S
    R --> S
    S -->|Yes| T["Warning + set CMAKE_CUDA_ARCHITECTURES=75"]
    S -->|No| U["Set NVTE_ARCH_SPECIFIC_TARGETS=TRUE\nDefine NVTE_HAS_ARCH_SPECIFIC_TARGETS=1\non arch-specific sources"]
    T --> U

_{Last reviewed commit: 1d89a57}

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-09T23:20:54Z

transformer_engine/common/CMakeLists.txt

@@ -54,31 +52,57 @@ endif()
 # Check for architecture 101 (if we see this we are in toolkit <= 12.9)
 list(FIND CMAKE_CUDA_ARCHITECTURES "101" arch_101_index)
 if(NOT arch_101_index EQUAL -1)
-  list(REMOVE_ITEM CMAKE_CUDA_ARCHITECTURES "101")
-  list(APPEND NVTE_GENERIC_ARCHS "101")
  list(APPEND NVTE_SPECIFIC_ARCHS "101a")
 endif()


Arch 100/101 not removed

When "100" / "101" are present in CMAKE_CUDA_ARCHITECTURES, this block only appends 100a/101a to NVTE_SPECIFIC_ARCHS but never removes the base arch from CMAKE_CUDA_ARCHITECTURES nor adds it to NVTE_GENERIC_ARCHS. As a result, the build will still compile all sources for sm_100 / sm_101 (via CMAKE_CUDA_ARCHITECTURES) while also compiling arch-specific sources for sm_100a / sm_101a (via --generate-code), which defeats the “generic + specific” split and can cause unexpected extra build work / incorrect targeting. This differs from the 110/120 handling below, which removes the base arch and routes it into the explicit lists.

@greptileai That is because arches 110/120 are routed to the "f" (like 120f) variant where the entire family is covered, whereas in the case of 100, it routes to "a" variants (like 100a) - this is intentional behavior, but in order to support different Arches in the same family, we need to still compile for the regular 100 then.

greptile-apps · 2026-02-09T23:20:56Z

transformer_engine/common/CMakeLists.txt

+if(CMAKE_VERSION VERSION_LESS 4.0.2)
+  if(NOT CMAKE_CUDA_ARCHITECTURES)
+    message(WARNING
+            "CMAKE_CUDA_ARCHITECTURES is empty after replacing arch-specific targets. "
+            "Please upgrade to CMake 4.0.2+ for native 'f' architecture support. "
+            "Adding sm_75 target in addition to the specified target to avoid configuration "
+            "errors - this will result in longer build time, but does not affect correctness.")
+    set(CMAKE_CUDA_ARCHITECTURES 75)
  endif()


Fallback wipes user archs

On CMake < 4.0.2, if the arch list becomes empty you set(CMAKE_CUDA_ARCHITECTURES 75) (dropping the user’s originally requested arch entirely). Given the warning text says “add sm_75 target in addition to the specified target”, this looks like a functional mismatch: the fallback should preserve the originally selected arch(es) and append 75, not replace everything with 75.

@greptileai If the list is empty, then setting is the same as appending.

ptrendx · 2026-02-11T20:55:25Z

/te-ci

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-11T21:08:45Z

transformer_engine/common/CMakeLists.txt

+set(NVTE_ARCH_SPECIFIC_TARGETS TRUE)
+
 # cuDNN frontend API


Always-on arch-specific define

set(NVTE_ARCH_SPECIFIC_TARGETS TRUE) makes NVTE_HAS_ARCH_SPECIFIC_TARGETS=1 get added to all transformer_engine_cuda_arch_specific_sources (lines ~247-255) even when NVTE_SPECIFIC_ARCHS is empty (e.g., user selects only 70/80/89/90). That flips the behavior in util/ptx.cuh to skip the compile-time static_assert that catches “compiled generic but using arch/family-specific features”, so those mistakes will now silently pass in exactly the builds where no arch-specific codegen flags are being injected for these sources.

ptrendx · 2026-02-12T19:15:32Z

/te-ci

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-12T19:18:16Z

transformer_engine/common/CMakeLists.txt

  endif()
 endif()

+set(NVTE_ARCH_SPECIFIC_TARGETS TRUE)


Unconditional flag disables compile-time safety checks. NVTE_ARCH_SPECIFIC_TARGETS is set to TRUE even when NVTE_SPECIFIC_ARCHS is empty (e.g., user builds only sm_70/80/89/90). This causes NVTE_HAS_ARCH_SPECIFIC_TARGETS=1 to be defined for all arch-specific sources (lines 249-258), which disables the static_assert in ptx.cuh lines 34-38 and 56-60. The compile-time check that catches misuse of arch/family-specific features in generic builds is now always off, even when no arch-specific code generation is happening.

Suggested change

set(NVTE_ARCH_SPECIFIC_TARGETS TRUE)

if(NVTE_SPECIFIC_ARCHS)

set(NVTE_ARCH_SPECIFIC_TARGETS TRUE)

endif()

ptrendx · 2026-02-18T01:41:49Z

/te-ci

greptile-apps

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

ptrendx added 4 commits January 26, 2026 15:57

Trial and error

8f9804a

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Should be ok

09e6ca8

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix one issue and add better message to the fallback path

5740300

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fixes

71041ae

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

ptrendx requested a review from ksivaman February 9, 2026 23:09

greptile-apps bot reviewed Feb 9, 2026

View reviewed changes

Merge branch 'main' into pr_get_arch_cmake2

495739c

greptile-apps bot reviewed Feb 11, 2026

View reviewed changes

Merge branch 'main' into pr_get_arch_cmake2

18f5a3c

greptile-apps bot reviewed Feb 12, 2026

View reviewed changes

Merge branch 'main' into pr_get_arch_cmake2

1d89a57

greptile-apps bot reviewed Feb 18, 2026

View reviewed changes

Conversation

ptrendx commented Feb 9, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

ptrendx Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

ptrendx Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ptrendx commented Feb 11, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

ptrendx commented Feb 12, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

ptrendx commented Feb 18, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

greptile-apps bot commented Feb 9, 2026 •

edited

Loading

ptrendx Feb 10, 2026 •

edited

Loading