Thoughts about bit-representation of particle configurations #1
GaffaSnobb
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Sub-state index representation
In second quantisation we describe a quantum mechanical system in terms of occupation numbers, saying how many particles are in each state rather than specifying the state of each particle. This is a more compact and often more manageable way to describe many-body systems and it aligns well with how a computer represents data. Let's say that we have two particles in a system and the particles are occupying two specific sub-states in the system which we'll label 3 and 5. In second quantisation we represent the wavefunction of such a system by using creation and annihilation operators:
If we want to represent$| \Psi \rangle$ on a computer we might store only which sub-states are occupied, like
This is currently how
kshell-cppstores particle configurations. While this is verbose, I think that it is quite expensive to insert and delete elements from astd::vectorthough having the benefit of being a very intuitive way of implementing creation and annihilation operators. Additionally, it is important to know the order of the particles. While the initial choice of particle placement is arbitrary, changing the order of the two operators will introduce a sign shiftThis is because of the anti-commutation relations of the creation and annihilation operators
where the last of the three is the relevant one in this case. This means that every time we want to remove or add a particle sub-state in the
psilist we have to consider the order of which the particles are put into the state (the order of the creation operators). The convention I have chosen is to put the particles into the state in order of increasing sub-state index, building all the basis states this way, and choosing positive sign. If we choose to add a particle in sub-state 2 the operation is trivialHowever, if we add a particle to sub-state 4
Bit-representation
Since the occupation of each sub-state only requires a single bit per sub-state (occupied or not), we might look to other more memory efficient and more performant ways of storing the wave functions. A "bit-like" representation might look like
Here, a 0 at position$i$ means that sub-state $i$ is un-occupied while a 1 means that it is occupied. Of course, storing each sub-state as an individual integer in some list / vector data structure is unnecessary since we only need to differentiate between occupied and un-occupied. We might therefore use bit-manipulation on an appropriately sized primitive data type, for example an $sd$ , $pf$ , and $sdg$ major orbitals. There are a total of 62 distinct sub-states (all the $j$ projections for all of the orbitals combined) for both protons and neutrons, totalling 124 sub-states. Meaning that a 128 bit integer is appropriate.
int, or we might use C++'sstd::bitset. If using a primitive data type, we are limited to the bit size of said data type. Consider a large model space of theFor the following example, let us reduce to 16 possible sub-states represented by an
unsigned shortin C++. It is easy to make a simple framework for applying creation operators to the empty (vacuum) stateWe now see that our previous basis state
[3, 5]can be represented as the number 40:Adding or removing particles (applying creation and annihilation operators) is as easy as flipping the correct bit. However, we need to be mindful of the order of operation because we might create a sign in the process.
According to the good ole' Internet,
std::bitsetis more user-friendly and equally fast as bit-flipping primitive data types if used correctly. I will therefore start withstd::bitsetand possibly benchmarking it against using a primitive data type later.Speed difference
Perhaps not surprisingly, using a bit-representation with
std::bitsetand flipping bits is a LOT faster than inserting and removing from astd::vector. To apply creation and annihilation operators, we simply need to flip a bit instead of resizing a vector. My current implementation is as follows:Note that I am using
state[bit_to_reset] = 0instead ofstate.reset(bit_to_reset)because the latter checks if the index is out-of-bounds and thus has a slower execution time. The same can be said for usingstate[i]instead ofstate.test(i). As you might also see from the code example above, theres a choice of using eitherif (state[i]) count++orcount += state[i]which has the same end result since the values are either 1 or 0. Using the if-statement reduces the execution time slightly, but creating a branch might prove a disadvantage when I soon start testing GPU acceleration of the code.The above function does the job of applying an annihilation operator which is the same as setting the correct bit to zero aka. resetting the bit, but it also has to count how many bits are set before the bit which will be reset. This is because the annihilation operator has to be moved so that it is applied to the creation operator of the same bit, and changing positions of operators creates signs which must be accounted for. Since the sign is always calculated by taking -1 to the power of a positive integer, we might get some speed-up by using the fact that -1 to an even power is 1, and to an odd power is -1:
As you see, there are three ways of calculating this exponent: Using
std::pow, checking the most significant bit to see if the exponent is odd or even, or using the modulo operator to check if the exponent is divisible by two. With-O0, blasting the three methods with some millions of calculations gave these times(19.54375, 19.70725, 19.9485)s respectively which was a bit surprising to me since I expected the very generalisedstd::powto perform slightly worse than a tailored solution but it turns out that it performs almost identical to the other implementations. The same test but with-Ofastgave(1.542, 1.4765, 1.475)s which gave almost identical results too, but just maybe the tiniest little advantage to the two tailored solutions. I will test this more on GPU.Using a primitive data type instead of
std::bitsetSo,
std::bitsethas some conveniences, likebitset.set,bitset.reset,bitset.test, etc. But since these methods are doing boundary checks, they turn out to be a bit slower than just indexing the bitset. And if I'm not using the conveniences of a bitset, I might check out a primitive data type instead.An
unsigned long longhas 64 bits (I dont know if theunsignedpart is important but thats what I'm using). Thesd-pf-sdgmodel space has 62 m-substates, so it seems sufficient to use (one for protons, one for neutrons). Instead of iterating to the bit I want to set (or reset), I can create a bitmask and set the bit I want to:If I want to set a bit and count how many bits before that bit is set, which amounts to ordering the creation operators before letting them act, I can do
while with the
std::bitsetI would doThe assembly for these two solutions are quite different. For the
unsigned long longsolution we haveand for the
std::bitsetsolution we haveNow, I'm not gonna pretend to be an expert on assembly code – because I'm not – but we can see that the two solutions produce more or less the same assembly for the first six lines, except that the bitset solution reserves 64 bits on the stack while the primitive solution reserves 32. But more relevant is that the bitset solution creates a loop in the assembly code for counting the bits, while the primitive solution uses
salfor shift,andfor masking,orfor setting a bit, and the__builtin_popcountll, which likely directly translates to a highly optimized CPU instruction for counting set bits. Overall, there are fewer assembly instructions in the primitive solution and some benchmarking shows that it is approximately 30% faster than the bitset solution.Beta Was this translation helpful? Give feedback.
All reactions