Project

General

Profile

Task #1781

Updated by Mark Abraham over 4 years ago

mdrun counter reset does not interaction well with PME tuning. We should probably delay counter reset until after tuning completes, and have a better interface, e.g. @mdrun -benchmarksteps 1000@ and/or @mdrun -benchmarktime 0.05@

See discussion beginning at comment 6.

Original issue follows:

The following PME LB error can be triggered by some runs:

<pre>
NOTE: DLB will not turn on during the first phase of PME tuning

starting mdrun 'Water'
1000 steps, 2.0 ps.
step 80: timed with pme grid 200 100 100, coulomb cutoff 1.000: 1989.6 M-cycles
step 160: timed with pme grid 168 84 84, coulomb cutoff 1.187: 2006.8 M-cycles
step 240: timed with pme grid 144 72 72, coulomb cutoff 1.384: 2664.3 M-cycles
step 320: timed with pme grid 160 80 80, coulomb cutoff 1.246: 2205.9 M-cycles
step 400: timed with pme grid 168 84 84, coulomb cutoff 1.187: 2016.8 M-cycles
step 480: timed with pme grid 192 96 96, coulomb cutoff 1.038: 2097.2 M-cycles
optimal pme grid 200 100 100, coulomb cutoff 1.000

step 500: resetting all time and cycle counters

-------------------------------------------------------
Program gmx mdrun, VERSION 5.1-rc1
Source code file: /var/data0/sandbox/gromacs/bmdir/data/source/src/gromacs/ewald/pme-load-balancing.cpp, line: 927

Software inconsistency error:
pme_loadbal_do called at an interval != nstlist
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

-------------------------------------------------------
Program gmx mdrun, VERSION 5.1-rc1
Source code file: /var/data0/sandbox/gromacs/bmdir/data/source/src/gromacs/ewald/pme-load-balancing.cpp, line: 927

Software inconsistency error:
pme_loadbal_do called at an interval != nstlist
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
</pre>

Detected by the NVIDIA Perflab team, they note that:
> I’ve also noticed that unless this line pops up
> NOTE: DLB can now turn on, when beneficial
> Before
> step 500: resetting all time and cycle counters
> Then I get the pme_loadbal_do error.

Original log and standard output attached, input is 384k water box (but that is likely not relevant).

Back