Bug #2649
Virial calculation necessary for correct energy calculation on GPU
Description
In the current implementation, the energy in do_md() is never calculated without the virial (see below for the respective code snippet [md.cpp line 1002+]). This should therefore not have any implications on the current program. In the scope of #1868 (rewriting of rerun, which in many cases will not require calculation of the force virial), however, I tried to remove the GMX_FORCE_VIRIAL
flag and found significantly different energies, however only when running on a GPU. Example (900 water molecules, first column: MD, second column: Rerun using GMX_FORCE_VIRIAL
, third column: Rerun without GMX_FORCE_VIRIAL
):
CPU
-- STEP 0 Component MD Rerun-Vir Rerun-noVir Bond 7.75685e+01 7.75685e+01 7.75685e+01 Angle 1.20884e+01 1.20884e+01 1.20884e+01 LJ (SR) 8.02172e+03 8.02172e+03 8.02172e+03 Coulomb (SR) -4.19559e+04 -4.19559e+04 -4.19559e+04 Coul. recip. 2.12264e+02 2.12264e+02 2.12264e+02 LJ recip. -2.56284e+03 -2.56284e+03 -2.56284e+03 Potential -3.61951e+04 -3.61951e+04 -3.61951e+04 -- STEP 100 Bond 1.24078e+03 1.24078e+03 1.24078e+03 Angle 2.03029e+02 2.03029e+02 2.03029e+02 LJ (SR) 8.78162e+03 8.78162e+03 8.78162e+03 Coulomb (SR) -4.48843e+04 -4.48843e+04 -4.48843e+04 Coul. recip. 2.27078e+02 2.27078e+02 2.27078e+02 LJ recip. -2.56304e+03 -2.56304e+03 -2.56304e+03 Potential -3.69948e+04 -3.69948e+04 -3.69948e+04 -- STEP 200 Bond 3.96066e+03 3.96066e+03 3.96066e+03 Angle 5.03959e+02 5.03959e+02 5.03959e+02 LJ (SR) 9.11253e+03 9.11253e+03 9.11253e+03 Coulomb (SR) -4.70180e+04 -4.70180e+04 -4.70180e+04 Coul. recip. 2.42462e+02 2.42462e+02 2.42462e+02 LJ recip. -2.56513e+03 -2.56513e+03 -2.56513e+03 Potential -3.57635e+04 -3.57635e+04 -3.57635e+04
GPU
-- STEP 0 Component MD Rerun-Vir Rerun-noVir Bond 7.75685e+01 7.75685e+01 7.75685e+01 Angle 1.20884e+01 1.20884e+01 1.20884e+01 LJ (SR) 8.02171e+03 8.02171e+03 8.02171e+03 Coulomb (SR) -4.19558e+04 -4.19558e+04 -4.19558e+04 Coul. recip. 2.12264e+02 2.12264e+02 0.00000e+00 LJ recip. -2.56284e+03 -2.56284e+03 0.00000e+00 Potential -3.61950e+04 -3.61950e+04 -3.38444e+04 -- STEP 100 Component Energy Energy Energy Bond 1.24078e+03 1.24078e+03 1.24078e+03 Angle 2.03029e+02 2.03029e+02 2.03029e+02 LJ (SR) 8.41424e+03 8.78162e+03 1.68033e+04 Coulomb (SR) -4.48465e+04 -4.48840e+04 -8.68398e+04 Coul. recip. 1.89358e+02 2.27112e+02 0.00000e+00 LJ recip. -2.19562e+03 -2.56305e+03 0.00000e+00 Potential -3.69947e+04 -3.69946e+04 -6.85927e+04 -- STEP 200 Component Energy Energy Energy Bond 3.96060e+03 3.96060e+03 3.96060e+03 Angle 5.03893e+02 5.03893e+02 5.03893e+02 LJ (SR) 7.81235e+03 9.11248e+03 2.59158e+04 Coulomb (SR) -4.68806e+04 -4.70174e+04 -1.33857e+05 Coul. recip. 1.06147e+02 2.42568e+02 0.00000e+00 LJ recip. -1.26499e+03 -2.56513e+03 0.00000e+00 Potential -3.57626e+04 -3.57630e+04 -1.03477e+05
Code snippet (md.cpp line 1002+): GMX_FORCE_ENERGY
was never used without GMX_FORCE_VIRIAL
if (EI_VV(ir->eI) && (!bInitStep))
{
/* for vv, the first half of the integration actually corresponds
to the previous step. bCalcEner is only required to be evaluated on the 'next' step,
but the virial needs to be calculated on both the current step and the 'next' step. Future
reorganization may be able to get rid of one of the bCalcVir=TRUE steps. */
/* TODO: This is probably not what we want, we will write to energy file one step after nstcalcenergy steps. */
bCalcEnerStep = do_per_step(step - 1, ir->nstcalcenergy);
bCalcVir = bCalcEnerStep ||
(ir->epc != epcNO && (do_per_step(step, ir->nstpcouple) || do_per_step(step-1, ir->nstpcouple)));
}
else
{
bCalcEnerStep = do_per_step(step, ir->nstcalcenergy);
bCalcVir = bCalcEnerStep ||
(ir->epc != epcNO && do_per_step(step, ir->nstpcouple));
}
bCalcEner = bCalcEnerStep;
if (do_ene || do_log || bDoReplEx)
{
bCalcVir = TRUE;
bCalcEner = TRUE;
}
/* Do we need global communication ? */
bGStat = (bCalcVir || bCalcEner || bStopCM ||
do_per_step(step, nstglobalcomm) ||
(EI_VV(ir->eI) && inputrecNvtTrotter(ir) && do_per_step(step-1, nstglobalcomm)));
force_flags = (GMX_FORCE_STATECHANGED |
((inputrecDynamicBox(ir) || bRerunMD) ? GMX_FORCE_DYNAMICBOX : 0) |
GMX_FORCE_ALLFORCES |
(bCalcVir ? GMX_FORCE_VIRIAL : 0) |
(bCalcEner ? GMX_FORCE_ENERGY : 0) |
(bDoFEP ? GMX_FORCE_DHDL : 0)
);
Related issues
History
#1 Updated by Mark Abraham over 2 years ago
That's probably some of the "what are we calculating and which buffers do we clear" logic in PME on GPU code going wrong.
#2 Updated by Berk Hess over 2 years ago
I looks to me like the issue is on line 910 of sim_util.cpp:
pmeFlags |= (flags & GMX_FORCE_VIRIAL) ? GMX_PME_CALC_ENER_VIR : 0;
I would suggest to change this to & (GMX_FORCE_VIRIAL | GMX_FORCE_ENERGY).
#3 Updated by Mark Abraham over 2 years ago
Berk Hess wrote:
I looks to me like the issue is on line 910 of sim_util.cpp:
pmeFlags |= (flags & GMX_FORCE_VIRIAL) ? GMX_PME_CALC_ENER_VIR : 0;I would suggest to change this to & (GMX_FORCE_VIRIAL | GMX_FORCE_ENERGY).
That looks like a useful improvement. Hopefully that will work now.
#4 Updated by Pascal Merz over 2 years ago
Berk Hess wrote:
I looks to me like the issue is on line 910 of sim_util.cpp:
pmeFlags |= (flags & GMX_FORCE_VIRIAL) ? GMX_PME_CALC_ENER_VIR : 0;I would suggest to change this to & (GMX_FORCE_VIRIAL | GMX_FORCE_ENERGY).
That alone is not solving it, unfortunately. I played around a bit trying to find a place where the flag seems to have a (gpu-only) influence, but no luck yet.
#5 Updated by Paul Bauer about 2 years ago
- Target version changed from 2019 to 2020
Changed target for now.
#6 Updated by Berk Hess about 2 years ago
- Target version changed from 2020 to 2019
There is also in sim_util.cpp in launchPmeGpuSpread:
pmeFlags |= (flags & GMX_FORCE_VIRIAL) ? GMX_PME_CALC_ENER_VIR : 0;
Does fixing that as well fix this issue?
#7 Updated by Paul Bauer about 2 years ago
- Target version changed from 2019 to future
retargeted, as this doesn't affect 2019
#8 Updated by Pascal Merz 11 months ago
- Related to Bug #3400: Rerun does not calculate reciprocal energies added