Project

General

Profile

md-single-rank.log

Mark Abraham, 03/09/2017 05:05 PM

 
1
Log file opened on Thu Mar  9 16:28:25 2017
2
Host: tcbl02.scilifelab.se  pid: 27617  rank ID: 0  number of ranks:  1
3
           :-) GROMACS - gmx mdrun, 2016.3-dev-20170307-942b2dc4b (-:
4

    
5
                            GROMACS is written by:
6
     Emile Apol      Rossen Apostolov  Herman J.C. Berendsen    Par Bjelkmar   
7
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra    Gerrit Groenhof  
8
 Christoph Junghans   Anca Hamuraru    Vincent Hindriksen Dimitrios Karkoulis
9
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
10
  Justin A. Lemkul   Magnus Lundborg   Pieter Meulenhoff    Erik Marklund   
11
   Teemu Murtola       Szilard Pall       Sander Pronk      Roland Schulz   
12
  Alexey Shvetsov     Michael Shirts     Alfons Sijbers     Peter Tieleman  
13
  Teemu Virolainen  Christian Wennberg    Maarten Wolf   
14
                           and the project leaders:
15
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
16

    
17
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
18
Copyright (c) 2001-2015, The GROMACS development team at
19
Uppsala University, Stockholm University and
20
the Royal Institute of Technology, Sweden.
21
check out http://www.gromacs.org for more information.
22

    
23
GROMACS is free software; you can redistribute it and/or modify it
24
under the terms of the GNU Lesser General Public License
25
as published by the Free Software Foundation; either version 2.1
26
of the License, or (at your option) any later version.
27

    
28
GROMACS:      gmx mdrun, version 2016.3-dev-20170307-942b2dc4b
29
Executable:   /home/marklocal/git/r2016/build-cmake-gcc-gpu-release/install/bin/gmx
30
Data prefix:  /home/marklocal/git/r2016/build-cmake-gcc-gpu-release/install
31
Working dir:  /home/marklocal/redmines/redmine-2125
32
Command line:
33
  gmx mdrun -s md -notunepme -pin on -ntmpi 1 -v -nsteps 1000
34

    
35
GROMACS version:    2016.3-dev-20170307-942b2dc4b
36
GIT SHA1 hash:      942b2dc4b2c9f8704002428345df3bf893d46253
37
Branched from:      e5242964a563c81e91e04ceea94c141a1772c1d1 (1 newer local commits)
38
Precision:          single
39
Memory model:       64 bit
40
MPI library:        thread_mpi
41
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
42
GPU support:        CUDA
43
SIMD instructions:  AVX_256
44
FFT library:        fftw-3.3.6-pl1-sse2-avx
45
RDTSCP usage:       enabled
46
TNG support:        enabled
47
Hwloc support:      hwloc-1.11.0
48
Tracing support:    disabled
49
Built on:           Tue Jun 28 14:49:02 CEST 2016
50
Built by:           marklocal@tcbl02.scilifelab.se [CMAKE]
51
Build OS/arch:      Linux 4.6.3-1-ARCH x86_64
52
Build CPU vendor:   Intel
53
Build CPU brand:    Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
54
Build CPU family:   6   Model: 58   Stepping: 9
55
Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
56
C compiler:         /home/marklocal/progs/bin/gcc-4.9 GNU 4.9.3
57
C compiler flags:    -mavx    -Wundef -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
58
C++ compiler:       /home/marklocal/progs/bin/g++-4.9 GNU 4.9.3
59
C++ compiler flags:  -mavx    -std=c++0x  -Wundef -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
60
CUDA compiler:      /opt/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built on Sun_Sep__4_22:14:01_CDT_2016;Cuda compilation tools, release 8.0, V8.0.44
61
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_30,code=compute_30;-use_fast_math;-D_FORCE_INLINES;;-Xcompiler;,-mavx,,,,,,-Wundef,-Wextra,-Wno-missing-field-initializers,-Wpointer-arith,-Wall,-Wno-unused-function,;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,-Wno-array-bounds,;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,-Wno-array-bounds,; 
62
CUDA driver:        8.0
63
CUDA runtime:       8.0
64

    
65

    
66
Running on 1 node with total 4 cores, 8 logical cores, 2 compatible GPUs
67
Hardware detected:
68
  CPU info:
69
    Vendor: Intel
70
    Brand:  Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
71
    Family: 6   Model: 58   Stepping: 9
72
    Features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
73
    SIMD instructions most likely to fit this hardware: AVX_256
74
    SIMD instructions selected at GROMACS compile time: AVX_256
75

    
76
  Hardware topology: Full, with devices
77
    Sockets, cores, and logical processors:
78
      Socket  0: [   0   4] [   1   5] [   2   6] [   3   7]
79
    Numa nodes:
80
      Node  0 (16779456512 bytes mem):   0   1   2   3   4   5   6   7
81
      Latency:
82
               0
83
         0  1.00
84
    Caches:
85
      L1: 32768 bytes, linesize 64 bytes, assoc. 8, shared 2 ways
86
      L2: 262144 bytes, linesize 64 bytes, assoc. 8, shared 2 ways
87
      L3: 8388608 bytes, linesize 64 bytes, assoc. 16, shared 8 ways
88
    PCI devices:
89
      0000:01:00.0  Id: 10de:1183  Class: 0x0300  Numa: 0
90
      0000:02:00.0  Id: 10de:1401  Class: 0x0300  Numa: 0
91
      0000:03:00.0  Id: 10ec:8168  Class: 0x0200  Numa: 0
92
      0000:00:1f.2  Id: 8086:1e02  Class: 0x0106  Numa: 0
93
  GPU info:
94
    Number of GPUs detected: 2
95
    #0: NVIDIA GeForce GTX 960, compute cap.: 5.2, ECC:  no, stat: compatible
96
    #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat: compatible
97

    
98

    
99
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
100
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
101
Lindahl
102
GROMACS: High performance molecular simulations through multi-level
103
parallelism from laptops to supercomputers
104
SoftwareX 1 (2015) pp. 19-25
105
-------- -------- --- Thank You --- -------- --------
106

    
107

    
108
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
109
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
110
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
111
GROMACS
112
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
113
-------- -------- --- Thank You --- -------- --------
114

    
115

    
116
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
117
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
118
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
119
GROMACS 4.5: a high-throughput and highly parallel open source molecular
120
simulation toolkit
121
Bioinformatics 29 (2013) pp. 845-54
122
-------- -------- --- Thank You --- -------- --------
123

    
124

    
125
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
126
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
127
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
128
molecular simulation
129
J. Chem. Theory Comput. 4 (2008) pp. 435-447
130
-------- -------- --- Thank You --- -------- --------
131

    
132

    
133
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
134
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
135
Berendsen
136
GROMACS: Fast, Flexible and Free
137
J. Comp. Chem. 26 (2005) pp. 1701-1719
138
-------- -------- --- Thank You --- -------- --------
139

    
140

    
141
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
142
E. Lindahl and B. Hess and D. van der Spoel
143
GROMACS 3.0: A package for molecular simulation and trajectory analysis
144
J. Mol. Mod. 7 (2001) pp. 306-317
145
-------- -------- --- Thank You --- -------- --------
146

    
147

    
148
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
149
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
150
GROMACS: A message-passing parallel molecular dynamics implementation
151
Comp. Phys. Comm. 91 (1995) pp. 43-56
152
-------- -------- --- Thank You --- -------- --------
153

    
154
Changing nstlist from 20 to 40, rlist from 1.137 to 1.22
155

    
156
Input Parameters:
157
   integrator                     = md
158
   tinit                          = 0
159
   dt                             = 0.01
160
   nsteps                         = 1000000000
161
   init-step                      = 0
162
   simulation-part                = 1
163
   comm-mode                      = Linear
164
   nstcomm                        = 100
165
   bd-fric                        = 0
166
   ld-seed                        = 2210354616
167
   emtol                          = 10
168
   emstep                         = 0.01
169
   niter                          = 20
170
   fcstep                         = 0
171
   nstcgsteep                     = 1000
172
   nbfgscorr                      = 10
173
   rtpi                           = 0.05
174
   nstxout                        = 0
175
   nstvout                        = 0
176
   nstfout                        = 0
177
   nstlog                         = 100000
178
   nstcalcenergy                  = 100
179
   nstenergy                      = 1000
180
   nstxout-compressed             = 100000
181
   compressed-x-precision         = 1000
182
   cutoff-scheme                  = Verlet
183
   nstlist                        = 40
184
   ns-type                        = Grid
185
   pbc                            = xyz
186
   periodic-molecules             = false
187
   verlet-buffer-tolerance        = 0.005
188
   rlist                          = 1.22
189
   coulombtype                    = PME
190
   coulomb-modifier               = Potential-shift
191
   rcoulomb-switch                = 0
192
   rcoulomb                       = 1.1
193
   epsilon-r                      = 15
194
   epsilon-rf                     = inf
195
   vdw-type                       = Cut-off
196
   vdw-modifier                   = Potential-shift
197
   rvdw-switch                    = 0
198
   rvdw                           = 1.1
199
   DispCorr                       = No
200
   table-extension                = 1
201
   fourierspacing                 = 0.22
202
   fourier-nx                     = 144
203
   fourier-ny                     = 160
204
   fourier-nz                     = 160
205
   pme-order                      = 4
206
   ewald-rtol                     = 1e-05
207
   ewald-rtol-lj                  = 0.001
208
   lj-pme-comb-rule               = Geometric
209
   ewald-geometry                 = 0
210
   epsilon-surface                = 0
211
   implicit-solvent               = No
212
   gb-algorithm                   = Still
213
   nstgbradii                     = 1
214
   rgbradii                       = 1
215
   gb-epsilon-solvent             = 80
216
   gb-saltconc                    = 0
217
   gb-obc-alpha                   = 1
218
   gb-obc-beta                    = 0.8
219
   gb-obc-gamma                   = 4.85
220
   gb-dielectric-offset           = 0.009
221
   sa-algorithm                   = Ace-approximation
222
   sa-surface-tension             = 2.05016
223
   tcoupl                         = V-rescale
224
   nsttcouple                     = 20
225
   nh-chain-length                = 0
226
   print-nose-hoover-chain-variables = false
227
   pcoupl                         = Parrinello-Rahman
228
   pcoupltype                     = Semiisotropic
229
   nstpcouple                     = 20
230
   tau-p                          = 12
231
   compressibility (3x3):
232
      compressibility[    0]={ 3.00000e-04,  0.00000e+00,  0.00000e+00}
233
      compressibility[    1]={ 0.00000e+00,  3.00000e-04,  0.00000e+00}
234
      compressibility[    2]={ 0.00000e+00,  0.00000e+00,  3.00000e-04}
235
   ref-p (3x3):
236
      ref-p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
237
      ref-p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
238
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
239
   refcoord-scaling               = All
240
   posres-com (3):
241
      posres-com[0]= 0.00000e+00
242
      posres-com[1]= 0.00000e+00
243
      posres-com[2]= 0.00000e+00
244
   posres-comB (3):
245
      posres-comB[0]= 0.00000e+00
246
      posres-comB[1]= 0.00000e+00
247
      posres-comB[2]= 0.00000e+00
248
   QMMM                           = false
249
   QMconstraints                  = 0
250
   QMMMscheme                     = 0
251
   MMChargeScaleFactor            = 1
252
qm-opts:
253
   ngQM                           = 0
254
   constraint-algorithm           = Lincs
255
   continuation                   = false
256
   Shake-SOR                      = false
257
   shake-tol                      = 0.0001
258
   lincs-order                    = 4
259
   lincs-iter                     = 1
260
   lincs-warnangle                = 30
261
   nwall                          = 0
262
   wall-type                      = 9-3
263
   wall-r-linpot                  = -1
264
   wall-atomtype[0]               = -1
265
   wall-atomtype[1]               = -1
266
   wall-density[0]                = 0
267
   wall-density[1]                = 0
268
   wall-ewald-zfac                = 3
269
   pull                           = false
270
   rotation                       = false
271
   interactiveMD                  = false
272
   disre                          = No
273
   disre-weighting                = Conservative
274
   disre-mixed                    = false
275
   dr-fc                          = 1000
276
   dr-tau                         = 0
277
   nstdisreout                    = 100
278
   orire-fc                       = 0
279
   orire-tau                      = 0
280
   nstorireout                    = 100
281
   free-energy                    = no
282
   cos-acceleration               = 0
283
   deform (3x3):
284
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
285
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
286
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
287
   simulated-tempering            = false
288
   E-x:
289
      n = 0
290
   E-xt:
291
      n = 0
292
   E-y:
293
      n = 0
294
   E-yt:
295
      n = 0
296
   E-z:
297
      n = 0
298
   E-zt:
299
      n = 0
300
   swapcoords                     = no
301
   userint1                       = 0
302
   userint2                       = 0
303
   userint3                       = 0
304
   userint4                       = 0
305
   userreal1                      = 0
306
   userreal2                      = 0
307
   userreal3                      = 0
308
   userreal4                      = 0
309
grpopts:
310
   nrdf:      449086     3047.99      164021
311
   ref-t:         310         310         310
312
   tau-t:           1           1           1
313
annealing:          No          No          No
314
annealing-npoints:           0           0           0
315
   acc:	           0           0           0
316
   nfreeze:           N           N           N
317
   energygrp-flags[  0]: 0
318

    
319

    
320
Overriding nsteps with value passed on the command line: 1000 steps, 10 ps
321

    
322
Using 1 MPI thread
323
Using 8 OpenMP threads 
324

    
325
2 compatible GPUs are present, with IDs 0,1
326
1 GPU auto-selected for this run.
327
Mapping of GPU ID to the 1 PP rank in this node: 0
328

    
329

    
330
NOTE: potentially sub-optimal launch configuration, gmx mdrun started with less
331
      PP thread-MPI thread than GPUs available.
332
      Each PP thread-MPI thread can use only one GPU, 1 GPU will be used.
333

    
334
Will do PME sum in reciprocal space for electrostatic interactions.
335

    
336
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
337
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen 
338
A smooth particle mesh Ewald method
339
J. Chem. Phys. 103 (1995) pp. 8577-8592
340
-------- -------- --- Thank You --- -------- --------
341

    
342
Will do ordinary reciprocal space Ewald sum.
343
Using a Gaussian width (1/beta) of 0.352179 nm for Ewald
344
Cut-off's:   NS: 1.22   Coulomb: 1.1   LJ: 1.1
345
System total charge: 0.000
346
Potential shift: LJ r^-12: -3.186e-01 r^-6: -5.645e-01, Ewald -1.000e-05
347
Initialized non-bonded Ewald correction tables, spacing: 9.79e-04 size: 1126
348

    
349

    
350
Using GPU 8x8 non-bonded kernels
351

    
352
Using full Lennard-Jones parameter combination matrix
353

    
354
Removing pbc first time
355
Pinning threads with an auto-selected logical core stride of 1
356

    
357
Initializing LINear Constraint Solver
358

    
359
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
360
B. Hess
361
P-LINCS: A Parallel Linear Constraint Solver for molecular simulation
362
J. Chem. Theory Comput. 4 (2008) pp. 116-122
363
-------- -------- --- Thank You --- -------- --------
364

    
365
The number of constraints is 672
366
288 constraints are involved in constraint triangles,
367
will apply an additional matrix expansion of order 4 for couplings
368
between constraints inside triangles
369
Intra-simulation communication will occur every 20 steps.
370
Center of mass motion removal mode is Linear
371
We have the following groups for center of mass motion removal:
372
  0:  rest
373

    
374
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
375
G. Bussi, D. Donadio and M. Parrinello
376
Canonical sampling through velocity rescaling
377
J. Chem. Phys. 126 (2007) pp. 014101
378
-------- -------- --- Thank You --- -------- --------
379

    
380
There are: 205610 Atoms
381

    
382
Constraining the starting coordinates (step 0)
383

    
384
Constraining the coordinates at t0-dt (step 0)
385
RMS relative constraint deviation after constraining: 4.38e-06
386
Initial temperature: 310.48 K
387

    
388
Started mdrun on rank 0 Thu Mar  9 16:28:28 2017
389
           Step           Time
390
              0        0.00000
391

    
392
   Energies (kJ/mol)
393
           Bond          Angle       G96Angle    Proper Dih.  Improper Dih.
394
    8.88957e+04    1.00163e+03    4.26538e+04    2.56100e+02    1.04703e+03
395
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
396
   -5.06150e+06   -1.12951e+05    5.38024e+04   -4.98679e+06    7.96860e+05
397
   Total Energy    Temperature Pressure (bar)   Constr. rmsd
398
   -4.18993e+06    3.11091e+02    7.44866e-01    4.47193e-06
399

    
400
           Step           Time
401
           1000       10.00000
402

    
403
Writing checkpoint, step 1000 at Thu Mar  9 16:29:30 2017
404

    
405

    
406
   Energies (kJ/mol)
407
           Bond          Angle       G96Angle    Proper Dih.  Improper Dih.
408
    8.90240e+04    9.93195e+02    4.29161e+04    2.66265e+02    1.02039e+03
409
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
410
   -5.06646e+06   -1.12980e+05    5.37696e+04   -4.99146e+06    7.92126e+05
411
   Total Energy    Temperature Pressure (bar)   Constr. rmsd
412
   -4.19933e+06    3.09243e+02   -2.88894e+00    4.81347e-06
413

    
414
	<======  ###############  ==>
415
	<====  A V E R A G E S  ====>
416
	<==  ###############  ======>
417

    
418
	Statistics over 1001 steps using 11 frames
419

    
420
   Energies (kJ/mol)
421
           Bond          Angle       G96Angle    Proper Dih.  Improper Dih.
422
    8.91310e+04    9.76019e+02    4.24611e+04    2.70589e+02    1.04681e+03
423
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
424
   -5.06406e+06   -1.12948e+05    5.37875e+04   -4.98933e+06    7.93563e+05
425
   Total Energy    Temperature Pressure (bar)   Constr. rmsd
426
   -4.19577e+06    3.09804e+02    3.64493e+00    0.00000e+00
427

    
428
          Box-X          Box-Y          Box-Z
429
    2.90796e+01    2.90796e+01    2.81272e+01
430

    
431
   Total Virial (kJ/mol)
432
    2.61220e+05   -4.36089e+02   -1.83690e+03
433
   -4.36363e+02    2.63533e+05    9.06110e+02
434
   -1.83613e+03    9.05252e+02    2.60982e+05
435

    
436
   Pressure (bar)
437
    4.59721e+00    6.04770e-01    2.47167e+00
438
    6.05153e-01    8.58022e-01   -8.68098e-01
439
    2.47061e+00   -8.66900e-01    5.47955e+00
440

    
441
        T-W_ION          T-DNA       T-Lipids
442
    3.09848e+02    3.09544e+02    3.09688e+02
443

    
444

    
445
	M E G A - F L O P S   A C C O U N T I N G
446

    
447
 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
448
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
449
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
450
 V&F=Potential and force  V=Potential only  F=Force only
451

    
452
 Computing:                               M-Number         M-Flops  % Flops
453
-----------------------------------------------------------------------------
454
 Pair Search distance check             362.061824        3258.556     0.1
455
 NxN Ewald Elec. + LJ [F]             46272.485248     3053984.026    67.2
456
 NxN Ewald Elec. + LJ [V&F]             514.348416       55035.281     1.2
457
 Calc Weights                           617.446830       22228.086     0.5
458
 Spread Q Bspline                     13172.199040       26344.398     0.6
459
 Gather F Bspline                     13172.199040       79033.194     1.7
460
 3D-FFT                              160989.470642     1287915.765    28.4
461
 Solve PME                               23.063040        1476.035     0.0
462
 Shift-X                                  5.345860          32.075     0.0
463
 Bonds                                   76.354278        4504.902     0.1
464
 Angles                                  39.151112        6577.387     0.1
465
 Propers                                  0.552552         126.534     0.0
466
 Impropers                                0.744744         154.907     0.0
467
 Virial                                  10.488405         188.791     0.0
468
 Stop-CM                                  2.467320          24.673     0.0
469
 Calc-Ekin                               20.972220         566.250     0.0
470
 Lincs                                    0.674016          40.441     0.0
471
 Lincs-Mat                               10.399104          41.596     0.0
472
 Constraint-V                             1.346688          10.774     0.0
473
 Constraint-Vir                           0.034272           0.823     0.0
474
-----------------------------------------------------------------------------
475
 Total                                                 4541544.495   100.0
476
-----------------------------------------------------------------------------
477

    
478

    
479
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
480

    
481
On 1 MPI rank, each using 8 OpenMP threads
482

    
483
 Computing:          Num   Num      Call    Wall time         Giga-Cycles
484
                     Ranks Threads  Count      (s)         total sum    %
485
-----------------------------------------------------------------------------
486
 Neighbor search        1    8         26       0.369         10.053   0.6
487
 Launch GPU ops.        1    8       1001       0.134          3.648   0.2
488
 Force                  1    8       1001       2.326         63.444   3.7
489
 PME mesh               1    8       1001      54.042       1474.282  86.0
490
 Wait GPU local         1    8       1001       0.027          0.745   0.0
491
 NB X/F buffer ops.     1    8       1976       1.494         40.758   2.4
492
 Write traj.            1    8          2       0.675         18.412   1.1
493
 Update                 1    8       1001       2.380         64.926   3.8
494
 Constraints            1    8       1001       0.733         19.986   1.2
495
 Rest                                           0.649         17.703   1.0
496
-----------------------------------------------------------------------------
497
 Total                                         62.828       1713.957 100.0
498
-----------------------------------------------------------------------------
499
 Breakdown of PME mesh computation
500
-----------------------------------------------------------------------------
501
 PME spread/gather      1    8       2002      17.802        485.634  28.3
502
 PME 3D-FFT             1    8       2002      34.191        932.735  54.4
503
 PME solve Elec         1    8       1001       1.787         48.739   2.8
504
-----------------------------------------------------------------------------
505

    
506
 GPU timings
507
-----------------------------------------------------------------------------
508
 Computing:                         Count  Wall t (s)      ms/step       %
509
-----------------------------------------------------------------------------
510
 Pair list H2D                         26       0.093        3.594     1.4
511
 X / q H2D                           1001       2.280        2.278    34.6
512
 Nonbonded F kernel                   970       2.595        2.675    39.3
513
 Nonbonded F+ene k.                     5       0.019        3.740     0.3
514
 Nonbonded F+prune k.                  20       0.069        3.432     1.0
515
 Nonbonded F+ene+prune k.               6       0.027        4.511     0.4
516
 F D2H                               1001       1.512        1.510    22.9
517
-----------------------------------------------------------------------------
518
 Total                                          6.595        6.588   100.0
519
-----------------------------------------------------------------------------
520

    
521
Average per-step force GPU/CPU evaluation time ratio: 6.588 ms/56.311 ms = 0.117
522
For optimal performance this ratio should be close to 1!
523

    
524

    
525
NOTE: The GPU has >25% less load than the CPU. This imbalance causes
526
      performance loss.
527

    
528
               Core t (s)   Wall t (s)        (%)
529
       Time:      502.620       62.828      800.0
530
                 (ns/day)    (hour/ns)
531
Performance:       13.766        1.743
532
Finished mdrun on rank 0 Thu Mar  9 16:29:31 2017