TensorCraft-HPC is a modern C++/CUDA AI kernel library for studying and validating GEMM, attention, convolution, normalization, sparse operators, and quantization.
- Header-first kernel library under
include/tensorcraft/ - Python bindings in
src/python_ops/ - Tests in
tests/ - Benchmarks in
benchmarks/ - Project docs on GitHub Pages
Recommended on a CUDA development machine:
cmake --preset dev
cmake --build --preset dev --parallel 2
ctest --preset dev --output-on-failure
python -m pip install -e .
python -c "import tensorcraft_ops as tc; print(tc.__version__)"dev: recommended day-to-day CUDA development preset; single architecture, tests on, Python onpython-dev: lighter CUDA preset focused on buildingtensorcraft_opsrelease: heavier full build, including benchmarkscpu-smoke: CPU-only configure/install smoke validation; tests and Python bindings are disabled
- This repository targets the local CUDA
12.8toolkit at/usr/local/cuda/bin/nvcc - CMake presets and Python builds pin
CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc - If CUDA is unavailable, CMake disables tests, benchmarks, and Python bindings automatically
- If build pressure is high, prefer
dev/python-dev, keep--parallellow, and set a singleCMAKE_CUDA_ARCHITECTURESvalue for your GPU
The pybind11 module is exposed as tensorcraft_ops.
python -m pip install -e .
python -c "import tensorcraft_ops as tc; print(tc.__version__)"- Project docs:
https://lessup.github.io/modern-ai-kernels/ - Installation:
docs/INSTALL.md - Troubleshooting:
docs/TROUBLESHOOTING.md - Contribution workflow:
CONTRIBUTING.md
MIT License