Constant and global device arrays and scopes #6
GaffaSnobb
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In my shell model solver, there are a lot of arrays whose values are calculated in the initial stage of the program and stay constant for the entire rest of the program run time. These values sound suited for
__constant__device arrays. Now, the initial problem is that the size of the__constant__arrays must be known at compile-time which is problematic since the length of the arrays will depend on parameters of the calculation, like the model space of the interaction. For many of the arrays in question, this issue can be somewhat mitigated by the fact that there is a relatively small upper limit to the sizes of the arrays. A solution is then to allocate the arrays to accommodate the largest possible size, the sizes are as of now (parameters.hpp)The number of orbitals correspond to the
sdpf-sdgmodel space which has 12 proton orbitals and 12 neutron orbitals.Now to the programming part; initially I tried to define the
__constant__device arrays in a separate header to make them available across translation units, declaring them in some common header file likeWith CUDA I would then use the
-rdc=trueflag to allow relocatable device code, but I have not managed to make this work with HIP. I have therefore decided to let the__constant__arrays live only in the file scope ofhamiltonian_device.cpp. Normally I dislike defining stuff in the global scope or file scope, but defining the__constant__arrays in a function scope and then passing the values as arguments to the kernels proved to be difficult if not impossible, so I have decided to let them live in the file scope.Now! One notable array which is constant during the entire program run time is the array containing the$M$ basis states. It has the added complexity of being fricken huge. Well, not always as it is dependent on the number of valence particles, but for most interesting calculations its size will be in the millions and milliards.
The size is problematic in two ways; for Nvidia GPUs the
__constant__memory is very fast but quite small, on the order of tens of kB, sometimes 64 kB. This is most certainly too little for the basis states, and likely too little for the other__constant__arrays.5688*sizeof(double) = 45504 Bwhich fills up more than half of the mentioned Nvidia__constant__device memory size. However,__constant__memory seems to be working a bit differently in the AMD universe. On my7900 XTXI have the following stats:where it seems that the
__constant__memory is far larger, however not covering the entire VRAM size of the GPU. I suspect that the__constant__memory acts differently on the AMD platform. Anyway, all the__constant__arrays except for the basis states will easily fit into that memory, case closed.It is not possible to determine the size of the basis states array without some pre-calculation which is dependent on the parameters of the problem. I have therefore decided to put the basis states into the global device memory. However, I am not able to declare the device pointer in file scope as I did with the
__constant__arrays. Consider the following minimal (non)working example:This approach would work if
device_arraywas a__constant__array. The solution for now is to pass all the non-__constant__arrays as function arguments.Beta Was this translation helpful? Give feedback.
All reactions