Bonded GPU kernel performance regression with 2020
a set of benchmark tests with large systems using Gromacs versions 2019.5 and 2020 showed a decrease of the performance to about 2/3 of the 2019.5 version. Interestingly, according to nvidia-smi, the GPU usage is about 20% higher for the 2020 version.
Apparently, it affects the energy calculation steps where the GPU bonded computational did get significantly slower (as a side-effect of optimizations that mainly targeted the force-only kernels).
All logfiles of the benchmarks can be found with the following link:
Additionally, with this link there are setup files (.tpr/.gro/.top/.mdp in `C60xh.7z`) and scripts, starting the benchmarks (`runfiles.7z`).
Some background info on the benchmarks:
- System contains about 2.1 million atoms.
- Hardware: 2x Intel Xeon Gold 6134 („Skylake“)
3.2 GHz = 16 cores + SMT; 4x NVIDIA Tesla V100 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
(similar results with less significant performance drop (~15%) on a different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy Bridge“)
- Several options for -ntmpi, -ntomp, -bonded, -pme are used to find the optimal set. However the performance drop seems to be persistent for all such options.
Please let me know, if you need further information.