Project

General

Profile

Bug #2802

Force counter contains up to few % time unaccounted for

Added by Szilárd Páll about 1 month ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
-
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

With all bonded, nonbonded, and PME interaction in the system offloaded, the cycle counters still record non-negligible work under "Force"; e.g. with GluCL on a 12-core + GV 100 system running OpenMP-only:
gmx mdrun -ntmpi 1 -ntomp 12 -v -quiet -noconfout -npme 0 -pin on -nsteps 100
00 -resetstep 8000 -pinstride 2 -nb gpu -pme gpu -tunepme -bonded gpu

     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 1 MPI rank, each using 12 OpenMP threads

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Neighbor search        1   12         21       0.145          5.037   3.9
 Launch GPU ops.        1   12       4002       0.260          9.068   7.1
 Force                  1   12       2001       0.038          1.335   1.0
 Wait PME GPU gather    1   12       2001       0.771         26.885  21.0
 Reduce GPU PME F       1   12       2001       0.050          1.752   1.4
 Wait GPU NB local      1   12       2001       1.532         53.383  41.8
 NB X/F buffer ops.     1   12       3981       0.283          9.867   7.7
 Update                 1   12       2001       0.172          5.997   4.7
 Constraints            1   12       2001       0.295         10.289   8.1
 Rest                                           0.118          4.111   3.2
-----------------------------------------------------------------------------
 Total                                          3.665        127.724 100.0
-----------------------------------------------------------------------------
 Breakdown of PP computation
-----------------------------------------------------------------------------
 NS grid local          1   12         21       0.026          0.918   0.7
 NS search local        1   12         21       0.101          3.529   2.8
 Launch NB GPU tasks    1   12       2001       0.044          1.531   1.2
 Launch PME GPU task    1   12       2001       0.145          5.064   4.0
 NB X buffer ops.       1   12       1980       0.135          4.698   3.7
 NB F buffer ops.       1   12       2001       0.148          5.164   4.0
-----------------------------------------------------------------------------

History

#1 Updated by Berk Hess about 1 month ago

How much?
We are not calculating anything and also not reducing AFAIK. So there should only be loops over bonded types and maybe reduction buffer parts that only check bools. These could still cost some time and we could skip them.

#2 Updated by Szilárd Páll about 1 month ago

  • Subject changed from Listed force call not skipped when all bondeds are offloaded to Force counter contains up to few % time unaccoutned for
  • Description updated (diff)

#3 Updated by Szilárd Páll about 1 month ago

  • Subject changed from Force counter contains up to few % time unaccoutned for to Force counter contains up to few % time unaccounted for

#4 Updated by Paul Bauer about 1 month ago

Is this the issue that got fixed by https://gerrit.gromacs.org/#/c/8822/?

#5 Updated by Szilárd Páll about 1 month ago

No, it did not get fixed. The title was updated because unlike I originally thought, it is likely not related to bondeds, but something else in the force counter is taking time that's not supposed to take time.

Also available in: Atom PDF