Bug #1570

ver 4.6.5 for GPU gives incorrect results

Added by Marcin Nowosielski over 3 years ago. Updated about 3 years ago.

preprocessing (pdb2gmx,grompp)
Target version:
Affected version - extra info:
Affected version:



we have a serious problem with GROMACS 4.6.5 for GPU, which produces incorrect output. Comparing to the CPU version, calculated energies are totally different (see attached files from the official GROMACS gpu benchmark), what basically results in a fast collapse to a random coil structure. The system we are running on is 2 × 8-core 2.5 GHz Intel Xeon E5-2450 + 2 × NVIDIA Tesla K40m.

Could you help us with that?

Computational Chemistry Group
UvA, Amsterdam

dhfr.tar.gz (5.18 MB) dhfr.tar.gz Marcin Nowosielski, 07/30/2014 09:29 AM

Associated revisions

Revision 92a2eb1a (diff)
Added by Berk Hess over 3 years ago

Check for implicit solvent + Verlet scheme

Fixed #1570

Change-Id: I8734c2dc99d3bc3e0a79ae043d86854446f3b495


#1 Updated by Justin Lemkul over 3 years ago

The CPU run reports version 4.6.4, while the GPU version reports 4.6.5; at least do a comparison with a consistent version. Many bugs get fixed between minor releases. Better yet, try again with 4.6.6 or 5.0; it is much more effective to troubleshoot the current release than it is an old version whose issues may have already been resolved.

#2 Updated by Marcin Nowosielski over 3 years ago

In addition to 4.6.5 we have tested 4.5.7 and the latest 5.0. All of them give the wrong result.

Focusing on the 5.0, there are two major issues:

1) Incorrect Polarization and Nonpolar Sol. energies (here I mean orders of magnitude)

2) Presence of pressure (either positive or negative) with pcoupl = No

To be honest this is at least disturbing... .
Don't get me wrong, I appreciate your work a lot, and use GROMACS all the time.

Nevertheless, how can it be that all three versions are giving such huge errors (i.e. they do not work)?
Are we so unlucky with the system configuration?


#3 Updated by Justin Lemkul over 3 years ago

  • Status changed from New to Rejected

Well, to be fair, we advertise that implicit + GPU does not work:

The previous use of implicit solvent on GPU was totally reliant upon the OpenMM interface, which is no longer supported. It would be nice if the implicit code could be made more robust and work with GPU, but at present there are no developers with time to do it. There have been a number of discussions about this, but time is limited and implicit + GPU is a very low priority. We should probably just issue a fatal error in mdrun if anyone tries to do this.

#4 Updated by Roland Schulz over 3 years ago

  • Status changed from Rejected to In Progress

I agree we should add a fatal error for implicit+GPU. In fact I think it is important enough to not close the bug until we have done so.

#5 Updated by Roland Schulz over 3 years ago

  • Status changed from In Progress to Accepted

#6 Updated by Thomas Geenen over 3 years ago

The gromacs runs that gave the wrong results where performed on our system.
from the release notes of the 4.6 version I understand that the 4.5 version using openmm should be able to compute the correct results.
Marcin ran a model also with this version of gromacs and found that these results are also incorrect.

we build the 4.5 version of gromacs with these settings

export OPENMM_ROOT_DIR=/hpc/sw/gromacs-${ver}-gpu-sp/openmm
cmake -DCMAKE_INSTALL_PREFIX=$installdir \
-DGMX_MPI=no \
-DGMX_GPU=ON -DCUDA_TOOLKIT_ROOT_DIR=/hpc/sw/cuda/5.5/ \
../$dir | tee ../log.cmake

should we expect to get the correct results from such a build with implicit calculations like the GPU benchmark examples impl 1nm and 2nm?


#7 Updated by Justin Lemkul over 3 years ago

The benchmarks were created using very old versions of everything, CUDA 3.0 and OpenMM 2.0, IIRC. Any results you get will almost certainly be sensitive to changes. The combination of Gromacs 4.5 + OpenMM 2.0 had lots of bugs and missing features (on both the Gromacs and OpenMM sides), so I would expect that only the same combination of software versions might generate the same exact results. How "wrong" are your results from 4.5? Side-by-side comparisons are needed here.

#8 Updated by Marcin Nowosielski over 3 years ago


Together with Thomas we have tried a whole bunch of things.

The setup has been taken from the official benchmark: dhfr-impl-2nm.bench (with amber96ff)

The results


4.5 on CPU :

  Step           Time         Lambda
   0        0.00000        0.00000

   Energies (kJ/mol)
          Angle    Proper Dih.  Improper Dih.GB Polarization  Nonpolar Sol.
    1.74631e+03    3.15965e+03    4.17787e+01   -1.27414e+04    1.48139e+02
          LJ-14     Coulomb-14        LJ (SR)   Coulomb (SR)      Potential
    2.46211e+03    2.90393e+04   -5.17308e+03   -4.22534e+04   -2.35706e+04
    Kinetic En.   Total Energy    Temperature Pressure (bar)   Constr. rmsd
    8.98330e+01   -2.34807e+04    4.37601e+00    0.00000e+00    8.09423e-06


4.5 for GPU:

Energies (kJ/mol)
      Potential    Kinetic En.   Total Energy    Temperature   Constr. rmsd
   -1.86574e+04    6.36140e+03   -1.22960e+04    3.09693e+02    2.10775e-06

Error (potential) ca. 20%


4.6 for GPU

          Angle    Proper Dih.  Improper Dih.GB Polarization  Nonpolar Sol.
    1.74632e+03    3.15965e+03    4.17922e+01   -6.47831e+04    8.76577e+03
          LJ-14     Coulomb-14        LJ (SR)   Coulomb (SR)      Potential
    2.46211e+03    2.90393e+04   -5.15137e+03   -3.66485e+04   -6.13680e+04
    Kinetic En.   Total Energy    Temperature Pressure (bar)   Constr. rmsd
    1.08079e+02   -6.12599e+04    5.26483e+00    2.99131e+01    7.35318e-06

Error (potential) ca. 160%


5.0 for GPU

        Angle    Proper Dih.  Improper Dih.GB Polarization  Nonpolar Sol.
    5.51520e+03    4.10883e+03    5.02242e+02   -6.74749e+04    8.43644e+03
          LJ-14     Coulomb-14        LJ (SR)   Coulomb (SR)      Potential
    3.03242e+03    3.23932e+04   -4.38033e+03   -5.49163e+04   -7.27832e+04
    Kinetic En.   Total Energy    Temperature Pressure (bar)   Constr. rmsd
    6.13076e+03   -6.66524e+04    2.98646e+02   -1.23727e+01    2.36228e-05

Error (potential) ca. 208%

Taking into account that a folded state is only slightly more stable than an unfolded (<100kJ/mol) none of the versions actually works (confirmed by analyzing trajectories)

Now on the good side - with the infinite cut-offs the 4.5 ver. works well

CPU Potential: -2.43293e+04
GPU Potential: -2.43574e+04

There is only one small note added to the release of 4.6v. which suggests that implicit solvent calculations may not work correctly and none in case of the 5.0. In case of 4.5 all configurations were suppose to work.
Reading the manuals (one small table at page 21 for ver.5.0) ,and looking at the benchmarks, one have an impression that it works. Especially that in the case of expl. solvent the speedup is marginal and may not justify costs of moving to GPU at all.

Please be more specific, that is a great software after all.

Getting back to bug hunting in my own code.


#9 Updated by Gerrit Code Review Bot over 3 years ago

Gerrit received a related patchset '1' for Issue #1570.
Uploader: Berk Hess ()
Change-Id: I8734c2dc99d3bc3e0a79ae043d86854446f3b495
Gerrit URL:

#10 Updated by Berk Hess about 3 years ago

  • Category set to preprocessing (pdb2gmx,grompp)
  • Status changed from Accepted to Resolved
  • Priority changed from High to Normal
  • Target version set to 5.0.2

The incorrect GB results with GPU were due to a missing check for GB with the Verlet scheme in grompp. A grompp check has been added for 5.0.2.
We would like to have GB supported with the new SIMD and GPU kernels, but we currently don't have time to do this (but we do need it for 6.0 where the group cut-off scheme will no longer be present).

Were there more issues here?

#11 Updated by Marcin Nowosielski about 3 years ago

That is awesome.
No more issues for the moment and I am keeping my fingers crossed for new releases.


#12 Updated by Mark Abraham about 3 years ago

  • Status changed from Resolved to Closed
  • Target version deleted (5.0.2)

Also available in: Atom PDF