pick_nbnxn_kernel needs reworking
In https://gerrit.gromacs.org/#/c/1695/4/src/mdlib/forcerec.c I mention how I think the initialization of Verlet GPU settings can be made much clearer. We need this because of the complex interaction with GPU settings, emulation, hybrid, etc.
Berk can you comment please on the reorganization I talk about there and continue here?
Basically, I think we should use a different way of constructing bUseGPU, which will lead to simpler functionality for pick_nbnxn_kernel(), which should lead to clearer variable naming. I am not sure that "bUseGPU" actually reflects its function (does it mean run on a GPU, or just use 8x8x8 lists?), but you are the judge of that.
There are six(?) things the kernel picking code wants to be able to achieve. The necessary conditions are in brackets afterwards.
1) set up 8x8x8 lists and CUDA for GPU and run non-bonded on a GPU (if we have a GPU and nothing's set, use it)
2) set up 8x8x8 lists and run non-bonded on the CPU (GMX_EMULATE_GPU and not GMX_NO_NONBONDED)
3) set up 8x8x8 lists and do not run non-bonded anywhere (GMX_EMULATE_GPU and GMX_NO_NONBONDED, which is a combination useful to document for users to assess whether GPUs might be worthwhile)
4) set up CPU-suitable lists and run non-bonded on the CPU (no GPU found, not GMX_EMULATE_GPU and not GMX_NO_NONBONDED)
5) set up CPU-suitable lists and do not run non-bonded anywhere (no GPU found, not GMX_EMULATE_GPU and GMX_NO_NONBONDED)
6) manage hybrid GPU mode somehow? (NFI)
So we need booleans for
bCanUseGPU = hwinfo->bCanUseGPU
bEmulateGPU = getenv("GMX_EMULATE_GPU") != NULL
bNoNonbonded = getenv("GMX_NO_NONBONDED") != NULL
Those are easily defined above. Now we can use those in simple combinations to detect 1-5 above, call the right routines and set the value of bUseGPU appropriately. With good planning, we can avoid the complex OR-fest and composite booleans that currently exist. For example, 1) and 4) should be the fall-back paths once we've checked for all the weird things we might have been asked to do. 6) seems to be detected by the calling routine?
There's currently a bug (both before and after Szilard's patch) where GMX_NO_NONBONDED in CPU-only mdrun sets bEmulateGPU which triggers the 8x8x8 lists. That's benign, but wrong and the warning "Emulating a GPU run on the CPU (slow)" creates confusion for the user who's never even heard of GPUs and just set GMX_NO_NONBONDED in a rerun with a Verlet .mdp for the usual purpose of playing with their bonded interactions or something.
I think that it makes the most sense to refactor pick_nbnxn_kernel into two functions. In init_nb_verlet, the "local" case calls both functions in succession, and the "non-local" hybrid case uses only the function with the second half of the logic currently in pick_nbnxn_function.
#4 Updated by Szilárd Páll over 6 years ago
There are a few (hopefully) minor things I would like to consider.After 4.6 I plan to work on some code-wise minor, performance impact-wise possible quite important improvements which could hopefully be considered for inclusion as optimization/performance improvement during the 4.6.x series.
- using multiple GPUs from a single process - would help a avoiding the large DD performance hit with small number of GPUs => small number of domains/processes.
- CPU-GPU non-bonded task splitting & load balancing: all the infrastructure is there as this works already with the hybrid scheme (but that's a static task splitting);
- multiple processes per GPU in a more flexible manner: M processed using N GPUs, where M % N != 0 (might require more code which might push it to 5.0).
It would be of great help if the reworked
pick_nbnxn_kernel() and related code would consider the above aspects and facilitate adding them later on.
#9 Updated by Szilárd Páll over 6 years ago
I still don't see any issue mentioned by Mark. He describes the uses-cases that do work with the current code - or at least did work last time I checked/worked with that code.There is no requirement of GMX_NO_NONBONDED alone always triggering GPU emulation, but rather:
- with -nb gpu (or any other means through which GPU acceleration is enabled) + GMX_NO_NONBONDED set => switch to GPU emulation to avoid calling the data management functions
- with mdrun compiled without GPU emulation, setting GMX_EMULATE_GPU should trigger the emulation code-path.