Skip to content

LessUp/modern-ai-kernels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

TensorCraft-HPC

English | 简体中文 | Docs

CI Docs License: MIT CUDA C++ CMake Python

TensorCraft-HPC is a modern C++/CUDA AI kernel library for studying and validating GEMM, attention, convolution, normalization, sparse operators, and quantization.

Repository Overview

  • Header-first kernel library under include/tensorcraft/
  • Python bindings in src/python_ops/
  • Tests in tests/
  • Benchmarks in benchmarks/
  • Project docs on GitHub Pages

Quick Start

Recommended on a CUDA development machine:

cmake --preset dev
cmake --build --preset dev --parallel 2
ctest --preset dev --output-on-failure
python -m pip install -e .
python -c "import tensorcraft_ops as tc; print(tc.__version__)"

Build Presets

  • dev: recommended day-to-day CUDA development preset; single architecture, tests on, Python on
  • python-dev: lighter CUDA preset focused on building tensorcraft_ops
  • release: heavier full build, including benchmarks
  • cpu-smoke: CPU-only configure/install smoke validation; tests and Python bindings are disabled

Build Notes

  • This repository targets the local CUDA 12.8 toolkit at /usr/local/cuda/bin/nvcc
  • CMake presets and Python builds pin CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
  • If CUDA is unavailable, CMake disables tests, benchmarks, and Python bindings automatically
  • If build pressure is high, prefer dev/python-dev, keep --parallel low, and set a single CMAKE_CUDA_ARCHITECTURES value for your GPU

Python Bindings

The pybind11 module is exposed as tensorcraft_ops.

python -m pip install -e .
python -c "import tensorcraft_ops as tc; print(tc.__version__)"

Docs

  • Project docs: https://lessup.github.io/modern-ai-kernels/
  • Installation: docs/INSTALL.md
  • Troubleshooting: docs/TROUBLESHOOTING.md
  • Contribution workflow: CONTRIBUTING.md

License

MIT License

About

Modern AI Kernel Library (CUDA C++17/20): Elementwise, GEMM, Attention, Conv2D, Sparse, Fusion & Quantization | 现代 AI 算子库(CUDA C++17/20):Elementwise、GEMM、Attention、Conv2D、稀疏矩阵、融合与量化

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors