Project

General

Profile

Bug #2649

Virial calculation necessary for correct energy calculation on GPU

Added by Pascal Merz about 1 year ago. Updated 11 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

In the current implementation, the energy in do_md() is never calculated without the virial (see below for the respective code snippet [md.cpp line 1002+]). This should therefore not have any implications on the current program. In the scope of #1868 (rewriting of rerun, which in many cases will not require calculation of the force virial), however, I tried to remove the GMX_FORCE_VIRIAL flag and found significantly different energies, however only when running on a GPU. Example (900 water molecules, first column: MD, second column: Rerun using GMX_FORCE_VIRIAL, third column: Rerun without GMX_FORCE_VIRIAL):

CPU

-- STEP 0
               Component            MD       Rerun-Vir     Rerun-noVir
                    Bond   7.75685e+01     7.75685e+01     7.75685e+01
                   Angle   1.20884e+01     1.20884e+01     1.20884e+01
                 LJ (SR)   8.02172e+03     8.02172e+03     8.02172e+03
            Coulomb (SR)  -4.19559e+04    -4.19559e+04    -4.19559e+04
            Coul. recip.   2.12264e+02     2.12264e+02     2.12264e+02
               LJ recip.  -2.56284e+03    -2.56284e+03    -2.56284e+03
               Potential  -3.61951e+04    -3.61951e+04    -3.61951e+04
-- STEP 100                                                                   
                    Bond   1.24078e+03     1.24078e+03     1.24078e+03
                   Angle   2.03029e+02     2.03029e+02     2.03029e+02
                 LJ (SR)   8.78162e+03     8.78162e+03     8.78162e+03
            Coulomb (SR)  -4.48843e+04    -4.48843e+04    -4.48843e+04
            Coul. recip.   2.27078e+02     2.27078e+02     2.27078e+02
               LJ recip.  -2.56304e+03    -2.56304e+03    -2.56304e+03
               Potential  -3.69948e+04    -3.69948e+04    -3.69948e+04
-- STEP 200                                                                   
                    Bond   3.96066e+03     3.96066e+03     3.96066e+03
                   Angle   5.03959e+02     5.03959e+02     5.03959e+02
                 LJ (SR)   9.11253e+03     9.11253e+03     9.11253e+03
            Coulomb (SR)  -4.70180e+04    -4.70180e+04    -4.70180e+04
            Coul. recip.   2.42462e+02     2.42462e+02     2.42462e+02
               LJ recip.  -2.56513e+03    -2.56513e+03    -2.56513e+03
               Potential  -3.57635e+04    -3.57635e+04    -3.57635e+04

GPU

-- STEP 0
               Component            MD       Rerun-Vir     Rerun-noVir
                    Bond   7.75685e+01     7.75685e+01     7.75685e+01
                   Angle   1.20884e+01     1.20884e+01     1.20884e+01
                 LJ (SR)   8.02171e+03     8.02171e+03     8.02171e+03
            Coulomb (SR)  -4.19558e+04    -4.19558e+04    -4.19558e+04
            Coul. recip.   2.12264e+02     2.12264e+02     0.00000e+00
               LJ recip.  -2.56284e+03    -2.56284e+03     0.00000e+00
               Potential  -3.61950e+04    -3.61950e+04    -3.38444e+04
-- STEP 100                                                                   
               Component        Energy          Energy          Energy
                    Bond   1.24078e+03     1.24078e+03     1.24078e+03
                   Angle   2.03029e+02     2.03029e+02     2.03029e+02
                 LJ (SR)   8.41424e+03     8.78162e+03     1.68033e+04
            Coulomb (SR)  -4.48465e+04    -4.48840e+04    -8.68398e+04
            Coul. recip.   1.89358e+02     2.27112e+02     0.00000e+00
               LJ recip.  -2.19562e+03    -2.56305e+03     0.00000e+00
               Potential  -3.69947e+04    -3.69946e+04    -6.85927e+04
-- STEP 200                                                                   
               Component        Energy          Energy          Energy
                    Bond   3.96060e+03     3.96060e+03     3.96060e+03
                   Angle   5.03893e+02     5.03893e+02     5.03893e+02
                 LJ (SR)   7.81235e+03     9.11248e+03     2.59158e+04
            Coulomb (SR)  -4.68806e+04    -4.70174e+04    -1.33857e+05
            Coul. recip.   1.06147e+02     2.42568e+02     0.00000e+00
               LJ recip.  -1.26499e+03    -2.56513e+03     0.00000e+00
               Potential  -3.57626e+04    -3.57630e+04    -1.03477e+05

Code snippet (md.cpp line 1002+): GMX_FORCE_ENERGY was never used without GMX_FORCE_VIRIAL

if (EI_VV(ir->eI) && (!bInitStep))
{
    /* for vv, the first half of the integration actually corresponds
       to the previous step.  bCalcEner is only required to be evaluated on the 'next' step,
       but the virial needs to be calculated on both the current step and the 'next' step. Future
       reorganization may be able to get rid of one of the bCalcVir=TRUE steps. */

    /* TODO: This is probably not what we want, we will write to energy file one step after nstcalcenergy steps. */
    bCalcEnerStep = do_per_step(step - 1, ir->nstcalcenergy);
    bCalcVir      = bCalcEnerStep ||
        (ir->epc != epcNO && (do_per_step(step, ir->nstpcouple) || do_per_step(step-1, ir->nstpcouple)));
}
else
{
    bCalcEnerStep = do_per_step(step, ir->nstcalcenergy);
    bCalcVir      = bCalcEnerStep ||
        (ir->epc != epcNO && do_per_step(step, ir->nstpcouple));
}
bCalcEner = bCalcEnerStep;

if (do_ene || do_log || bDoReplEx)
{
    bCalcVir  = TRUE;
    bCalcEner = TRUE;
}

/* Do we need global communication ? */
bGStat = (bCalcVir || bCalcEner || bStopCM ||
          do_per_step(step, nstglobalcomm) ||
          (EI_VV(ir->eI) && inputrecNvtTrotter(ir) && do_per_step(step-1, nstglobalcomm)));

force_flags = (GMX_FORCE_STATECHANGED |
               ((inputrecDynamicBox(ir) || bRerunMD) ? GMX_FORCE_DYNAMICBOX : 0) |
               GMX_FORCE_ALLFORCES |
               (bCalcVir ? GMX_FORCE_VIRIAL : 0) |
               (bCalcEner ? GMX_FORCE_ENERGY : 0) |
               (bDoFEP ? GMX_FORCE_DHDL : 0)
               );

History

#1 Updated by Mark Abraham about 1 year ago

That's probably some of the "what are we calculating and which buffers do we clear" logic in PME on GPU code going wrong.

#2 Updated by Berk Hess about 1 year ago

I looks to me like the issue is on line 910 of sim_util.cpp:
pmeFlags |= (flags & GMX_FORCE_VIRIAL) ? GMX_PME_CALC_ENER_VIR : 0;

I would suggest to change this to & (GMX_FORCE_VIRIAL | GMX_FORCE_ENERGY).

#3 Updated by Mark Abraham about 1 year ago

Berk Hess wrote:

I looks to me like the issue is on line 910 of sim_util.cpp:
pmeFlags |= (flags & GMX_FORCE_VIRIAL) ? GMX_PME_CALC_ENER_VIR : 0;

I would suggest to change this to & (GMX_FORCE_VIRIAL | GMX_FORCE_ENERGY).

That looks like a useful improvement. Hopefully that will work now.

#4 Updated by Pascal Merz about 1 year ago

Berk Hess wrote:

I looks to me like the issue is on line 910 of sim_util.cpp:
pmeFlags |= (flags & GMX_FORCE_VIRIAL) ? GMX_PME_CALC_ENER_VIR : 0;

I would suggest to change this to & (GMX_FORCE_VIRIAL | GMX_FORCE_ENERGY).

That alone is not solving it, unfortunately. I played around a bit trying to find a place where the flag seems to have a (gpu-only) influence, but no luck yet.

#5 Updated by Paul Bauer 12 months ago

  • Target version changed from 2019 to 2020

Changed target for now.

#6 Updated by Berk Hess 12 months ago

  • Target version changed from 2020 to 2019

There is also in sim_util.cpp in launchPmeGpuSpread:
pmeFlags |= (flags & GMX_FORCE_VIRIAL) ? GMX_PME_CALC_ENER_VIR : 0;

Does fixing that as well fix this issue?

#7 Updated by Paul Bauer 11 months ago

  • Target version changed from 2019 to future

retargeted, as this doesn't affect 2019

Also available in: Atom PDF