OpenAI OSS models on CUDA

Here a P1gen6 12g RTX-3500, then an RTX-a6000 48g card
verify gemma2 first (fits in 12g)

https://github.com/ObrienlabsDev/blog/issues/133

```

import os
# default dual GPU - either PCIe bus or NVidia bus - slowdowns
#os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
# specific GPU - model must fit entierely in memory RTX-3500 ada = 12G, A4000=16G, A4500=20, A6000=48, 4000 ada = 20, 5000 ada = 32, 6000 ada = 48
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

from transformers import AutoTokenizer, AutoModelForCausalLM
from datetime import datetime

#access_token='hf_cfTP...XCQqH'

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b", token=access_token)
# GPU
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto", token=access_token)
# CPU
#model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b",token=access_token)
## model.summary() # tensorflow.keras.applications

input_text = "how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process."
time_start = datetime.now().strftime("%H:%M:%S")
print("generate srt: ", datetime.now().strftime("%H:%M:%S"))

# GPU
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
# CPU
#input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids, max_new_tokens=10000)
print(tokenizer.decode(outputs[0]))

print("generate end: ", datetime.now().strftime("%H:%M:%S"))
time_end = datetime.now().strftime("%H:%M:%S")

$ pip install transformers
Collecting transformers
  Using cached transformers-4.55.0-py3-none-any.whl.metadata (39 kB)
Collecting filelock (from transformers)
  Using cached filelock-3.18.0-py3-none-any.whl.metadata (2.9 kB)
Collecting huggingface-hub<1.0,>=0.34.0 (from transformers)
  Using cached huggingface_hub-0.34.3-py3-none-any.whl.metadata (14 kB)
Requirement already satisfied: numpy>=1.17 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from transformers) (1.26.4)
Requirement already satisfied: packaging>=20.0 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from transformers) (24.2)
Collecting pyyaml>=5.1 (from transformers)
  Using cached PyYAML-6.0.2-cp312-cp312-win_amd64.whl.metadata (2.1 kB)
Collecting regex!=2019.12.17 (from transformers)
  Using cached regex-2025.7.34-cp312-cp312-win_amd64.whl.metadata (41 kB)
Requirement already satisfied: requests in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from transformers) (2.32.3)
Collecting tokenizers<0.22,>=0.21 (from transformers)
  Using cached tokenizers-0.21.4-cp39-abi3-win_amd64.whl.metadata (6.9 kB)
Collecting safetensors>=0.4.3 (from transformers)
  Using cached safetensors-0.6.1-cp38-abi3-win_amd64.whl.metadata (4.1 kB)
Collecting tqdm>=4.27 (from transformers)
  Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting fsspec>=2023.5.0 (from huggingface-hub<1.0,>=0.34.0->transformers)
  Using cached fsspec-2025.7.0-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from huggingface-hub<1.0,>=0.34.0->transformers) (4.12.2)
Collecting colorama (from tqdm>=4.27->transformers)
  Using cached colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from requests->transformers) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from requests->transformers) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from requests->transformers) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from requests->transformers) (2024.8.30)
Using cached transformers-4.55.0-py3-none-any.whl (11.3 MB)
Using cached huggingface_hub-0.34.3-py3-none-any.whl (558 kB)
Using cached PyYAML-6.0.2-cp312-cp312-win_amd64.whl (156 kB)
Using cached regex-2025.7.34-cp312-cp312-win_amd64.whl (275 kB)
Using cached safetensors-0.6.1-cp38-abi3-win_amd64.whl (320 kB)
Using cached tokenizers-0.21.4-cp39-abi3-win_amd64.whl (2.5 MB)
Using cached tqdm-4.67.1-py3-none-any.whl (78 kB)
Using cached filelock-3.18.0-py3-none-any.whl (16 kB)
Using cached fsspec-2025.7.0-py3-none-any.whl (199 kB)
Using cached colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Installing collected packages: safetensors, regex, pyyaml, fsspec, filelock, colorama, tqdm, huggingface-hub, tokenizers, transformers
Successfully installed colorama-0.4.6 filelock-3.18.0 fsspec-2025.7.0 huggingface-hub-0.34.3 pyyaml-6.0.2 regex-2025.7.34 safetensors-0.6.1 tokenizers-0.21.4 tqdm-4.67.1 transformers-4.55.0

[notice] A new release of pip is available: 24.3.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip
(venv-cuda)
micha@p1gen6 MINGW64 /c/wse_github/ObrienlabsDev/machine-learning/environments/windows/src/google-gemma (main)


(venv-cuda)
micha@p1gen6 MINGW64 /c/wse_github/ObrienlabsDev/machine-learning/environments/windows/src/google-gemma (main)
$ pip install torch
Collecting torch
  Using cached torch-2.8.0-cp312-cp312-win_amd64.whl.metadata (30 kB)
Requirement already satisfied: filelock in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch) (3.18.0)
Requirement already satisfied: typing-extensions>=4.10.0 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch) (4.12.2)
Collecting sympy>=1.13.3 (from torch)
  Using cached sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch)
  Using cached networkx-3.5-py3-none-any.whl.metadata (6.3 kB)
Collecting jinja2 (from torch)
  Using cached jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB)
Requirement already satisfied: fsspec in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch) (2025.7.0)
Requirement already satisfied: setuptools in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch) (75.6.0)
Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch)
  Using cached mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Requirement already satisfied: MarkupSafe>=2.0 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from jinja2->torch) (3.0.2)
Using cached torch-2.8.0-cp312-cp312-win_amd64.whl (241.3 MB)
Using cached sympy-1.14.0-py3-none-any.whl (6.3 MB)
Using cached jinja2-3.1.6-py3-none-any.whl (134 kB)
Using cached networkx-3.5-py3-none-any.whl (2.0 MB)
Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
Installing collected packages: mpmath, sympy, networkx, jinja2, torch
Successfully installed jinja2-3.1.6 mpmath-1.3.0 networkx-3.5 sympy-1.14.0 torch-2.8.0


$ pip install accelerate
Collecting accelerate
  Downloading accelerate-1.10.0-py3-none-any.whl.metadata (19 kB)
Requirement already satisfied: numpy<3.0.0,>=1.17 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from accelerate) (1.26.4)
Requirement already satisfied: packaging>=20.0 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from accelerate) (24.2)
Collecting psutil (from accelerate)
  Using cached psutil-7.0.0-cp37-abi3-win_amd64.whl.metadata (23 kB)
Requirement already satisfied: pyyaml in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from accelerate) (6.0.2)
Requirement already satisfied: torch>=2.0.0 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from accelerate) (2.8.0)
Requirement already satisfied: huggingface_hub>=0.21.0 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from accelerate) (0.34.3)
Requirement already satisfied: safetensors>=0.4.3 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from accelerate) (0.6.1)
Requirement already satisfied: filelock in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from huggingface_hub>=0.21.0->accelerate) (3.18.0)
Requirement already satisfied: fsspec>=2023.5.0 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from huggingface_hub>=0.21.0->accelerate) (2025.7.0)
Requirement already satisfied: requests in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from huggingface_hub>=0.21.0->accelerate) (2.32.3)
Requirement already satisfied: tqdm>=4.42.1 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from huggingface_hub>=0.21.0->accelerate) (4.67.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from huggingface_hub>=0.21.0->accelerate) (4.12.2)
Requirement already satisfied: sympy>=1.13.3 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch>=2.0.0->accelerate) (1.14.0)
Requirement already satisfied: networkx in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch>=2.0.0->accelerate) (3.5)
Requirement already satisfied: jinja2 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch>=2.0.0->accelerate) (3.1.6)
Requirement already satisfied: setuptools in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch>=2.0.0->accelerate) (75.6.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from sympy>=1.13.3->torch>=2.0.0->accelerate) (1.3.0)
Requirement already satisfied: colorama in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from tqdm>=4.42.1->huggingface_hub>=0.21.0->accelerate) (0.4.6)
Requirement already satisfied: MarkupSafe>=2.0 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from jinja2->torch>=2.0.0->accelerate) (3.0.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from requests->huggingface_hub>=0.21.0->accelerate) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from requests->huggingface_hub>=0.21.0->accelerate) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from requests->huggingface_hub>=0.21.0->accelerate) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from requests->huggingface_hub>=0.21.0->accelerate) (2024.8.30)
Downloading accelerate-1.10.0-py3-none-any.whl (374 kB)
Using cached psutil-7.0.0-cp37-abi3-win_amd64.whl (244 kB)
Installing collected packages: psutil, accelerate
Successfully installed accelerate-1.10.0 psutil-7.0.0

```

https://pytorch.org/get-started/locally/
```
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu128


$ nvidia-smi
Thu Aug  7 13:02:10 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 573.22                 Driver Version: 573.22         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 3500 Ada Gene...  WDDM  |   00000000:01:00.0 Off |                  Off |
| N/A   48C    P8              2W /  100W |     114MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            5572    C+G   ...munity\Common7\IDE\devenv.exe      N/A      |
|    0   N/A  N/A           14796    C+G   C:\Windows\explorer.exe               N/A      |
+-----------------------------------------------------------------------------------------+
(venv-cuda)
micha@p1gen6 MINGW64 /c/wse_github/ObrienlabsDev/machine-learning/environments/windows/src/google-gemma (main)
$ pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu128
Looking in indexes: https://download.pytorch.org/whl/cu128
Requirement already satisfied: torch in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (2.8.0)
Collecting torchvision
  Downloading https://download.pytorch.org/whl/cu128/torchvision-0.23.0%2Bcu128-cp312-cp312-win_amd64.whl.metadata (6.3 kB)
Requirement already satisfied: filelock in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch) (3.18.0)
Requirement already satisfied: typing-extensions>=4.10.0 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch) (4.12.2)
Requirement already satisfied: sympy>=1.13.3 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch) (1.14.0)
Requirement already satisfied: networkx in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch) (3.5)
Requirement already satisfied: jinja2 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch) (3.1.6)
Requirement already satisfied: fsspec in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch) (2025.7.0)
Requirement already satisfied: setuptools in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torch) (75.6.0)
Requirement already satisfied: numpy in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from torchvision) (1.26.4)
Collecting torch
  Downloading https://download.pytorch.org/whl/cu128/torch-2.8.0%2Bcu128-cp312-cp312-win_amd64.whl.metadata (29 kB)
Collecting pillow!=8.3.*,>=5.3.0 (from torchvision)
  Downloading https://download.pytorch.org/whl/pillow-11.0.0-cp312-cp312-win_amd64.whl.metadata (9.3 kB)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from sympy>=1.13.3->torch) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\venv-cuda\lib\site-packages (from jinja2->torch) (3.0.2)
Downloading https://download.pytorch.org/whl/cu128/torchvision-0.23.0%2Bcu128-cp312-cp312-win_amd64.whl (7.5 MB)
   ---------------------------------------- 7.5/7.5 MB 51.6 MB/s eta 0:00:00
Downloading https://download.pytorch.org/whl/cu128/torch-2.8.0%2Bcu128-cp312-cp312-win_amd64.whl (3461.4 MB)
   ---------------------------------------- 3.5/3.5 GB 88.6 MB/s eta 0:00:00
Downloading https://download.pytorch.org/whl/pillow-11.0.0-cp312-cp312-win_amd64.whl (2.6 MB)
   ---------------------------------------- 2.6/2.6 MB 74.4 MB/s eta 0:00:00
Installing collected packages: pillow, torch, torchvision
  Attempting uninstall: torch
    Found existing installation: torch 2.8.0
    Uninstalling torch-2.8.0:
      Successfully uninstalled torch-2.8.0
Successfully installed pillow-11.0.0 torch-2.8.0+cu128 torchvision-0.23.0+cu128

[notice] A new release of pip is available: 24.3.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip
(venv-cuda)

micha@p1gen6 MINGW64 /c/wse_github/ObrienlabsDev/machine-learning/environments/windows/src/google-gemma (main)
$ python gemma-gpu.py
2025-08-07 13:07:43.649660: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-08-07 13:07:44.170615: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
Loading checkpoint shards: 100%|##########| 2/2 [00:02<00:00,  1.02s/it]
generate srt:  13:07:47
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, we need to understand what the beta and r process are. The beta process is a type of nuclear reaction that occurs in stars when a neutron is converted into a proton, releasing a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. The r process is a type of nuclear reaction that occurs in supernovae and neutron stars. It involves the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of elements heavier than iron that are not produced by the beta process. Now, let's consider how gold is made in these processes. In the beta process, gold is produced by the conversion of a neutron into a proton, followed by the emission of a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. However, the r process is responsible for the production of gold in particular. In supernovae and neutron stars, gold is produced by the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of gold in the universe.

Step 2/2
Therefore, the ratio of gold created during the beta and r process depends on the ratio of the number of neutrons to protons in the star. If the ratio is high, more gold will be produced by the beta process, and if the ratio is low, more gold will be produced by the r process. However, the exact ratio of gold created by these processes is not known, as it depends on the specific conditions of the star and the supernova or neutron star.<eos>
generate end:  13:07:58
(venv-cuda)




```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI OSS models on CUDA #45

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OpenAI OSS models on CUDA #45

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions