Project

General

Profile

test_1x16_tune_1.log

Szilárd Páll, 01/13/2016 06:10 PM

 
1
Log file opened on Wed Jan 13 17:35:22 2016
2
Host: tcbs14  pid: 4117  rank ID: 0  number of ranks:  1
3
        :-) GROMACS - gmx mdrun, VERSION 5.1.2-dev-20160113-8b14e14 (-:
4

    
5
                            GROMACS is written by:
6
     Emile Apol      Rossen Apostolov  Herman J.C. Berendsen    Par Bjelkmar   
7
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra   Sebastian Fritsch 
8
  Gerrit Groenhof   Christoph Junghans   Anca Hamuraru    Vincent Hindriksen
9
 Dimitrios Karkoulis    Peter Kasson        Jiri Kraus      Carsten Kutzner  
10
    Per Larsson      Justin A. Lemkul   Magnus Lundborg   Pieter Meulenhoff 
11
   Erik Marklund      Teemu Murtola       Szilard Pall       Sander Pronk   
12
   Roland Schulz     Alexey Shvetsov     Michael Shirts     Alfons Sijbers  
13
   Peter Tieleman    Teemu Virolainen  Christian Wennberg    Maarten Wolf   
14
                           and the project leaders:
15
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
16

    
17
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
18
Copyright (c) 2001-2015, The GROMACS development team at
19
Uppsala University, Stockholm University and
20
the Royal Institute of Technology, Sweden.
21
check out http://www.gromacs.org for more information.
22

    
23
GROMACS is free software; you can redistribute it and/or modify it
24
under the terms of the GNU Lesser General Public License
25
as published by the Free Software Foundation; either version 2.1
26
of the License, or (at your option) any later version.
27

    
28
GROMACS:      gmx mdrun, VERSION 5.1.2-dev-20160113-8b14e14
29
Executable:   /nethome/pszilard-projects/gromacs/gromacs-5.1/build_gcc48_hsw_cuda65/bin/gmx
30
Data prefix:  /nethome/pszilard-projects/gromacs/gromacs-5.1 (source tree)
31
Command line:
32
  gmx mdrun -quiet -v -resethway -noconfout -pin on -ntmpi 1 -ntomp 16 -nsteps 10000 -g test_1x16_tune_1 -tunepme
33

    
34
GROMACS version:    VERSION 5.1.2-dev-20160113-8b14e14
35
GIT SHA1 hash:      8b14e14f4a18193eacc86a2da9a4d812df0e03eb
36
Precision:          single
37
Memory model:       64 bit
38
MPI library:        thread_mpi
39
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
40
GPU support:        enabled
41
OpenCL support:     disabled
42
invsqrt routine:    gmx_software_invsqrt(x)
43
SIMD instructions:  AVX2_256
44
FFT library:        fftw-3.3.4-sse2-avx
45
RDTSCP usage:       enabled
46
C++11 compilation:  disabled
47
TNG support:        enabled
48
Tracing support:    disabled
49
Built on:           Mon Sep 14 15:56:07 CEST 2015
50
Built by:           pszilard@tcbs14 [CMAKE]
51
Build OS/arch:      Linux 3.13.0-63-generic x86_64
52
Build CPU vendor:   GenuineIntel
53
Build CPU brand:    Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
54
Build CPU family:   6   Model: 63   Stepping: 2
55
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
56
C compiler:         /usr/bin/gcc-4.8 GNU 4.8.1
57
C compiler flags:    -march=core-avx2    -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
58
C++ compiler:       /usr/bin/g++-4.8 GNU 4.8.1
59
C++ compiler flags:  -march=core-avx2    -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
60
Boost version:      1.55.0 (internal)
61
CUDA compiler:      /opt/tcbsys/cuda/6.5/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2014 NVIDIA Corporation;Built on Wed_Aug_27_10:36:36_CDT_2014;Cuda compilation tools, release 6.5, V6.5.16
62
CUDA compiler flags:-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-use_fast_math;-Xptxas;-dlcm=ca; ;-march=core-avx2;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;-Wno-array-bounds;
63
CUDA driver:        7.50
64
CUDA runtime:       6.50
65

    
66

    
67
Running on 1 node with total 8 cores, 16 logical cores, 1 compatible GPU
68
Hardware detected:
69
  CPU info:
70
    Vendor: GenuineIntel
71
    Brand:  Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
72
    Family:  6  model: 63  stepping:  2
73
    CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
74
    SIMD instructions most likely to fit this hardware: AVX2_256
75
    SIMD instructions selected at GROMACS compile time: AVX2_256
76
  GPU info:
77
    Number of GPUs detected: 1
78
    #0: NVIDIA Quadro M6000, compute cap.: 5.2, ECC:  no, stat: compatible
79

    
80

    
81
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
82
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
83
Lindahl
84
GROMACS: High performance molecular simulations through multi-level
85
parallelism from laptops to supercomputers
86
SoftwareX 1 (2015) pp. 19-25
87
-------- -------- --- Thank You --- -------- --------
88

    
89

    
90
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
91
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
92
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
93
GROMACS
94
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
95
-------- -------- --- Thank You --- -------- --------
96

    
97

    
98
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
99
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
100
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
101
GROMACS 4.5: a high-throughput and highly parallel open source molecular
102
simulation toolkit
103
Bioinformatics 29 (2013) pp. 845-54
104
-------- -------- --- Thank You --- -------- --------
105

    
106

    
107
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
108
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
109
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
110
molecular simulation
111
J. Chem. Theory Comput. 4 (2008) pp. 435-447
112
-------- -------- --- Thank You --- -------- --------
113

    
114

    
115
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
116
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
117
Berendsen
118
GROMACS: Fast, Flexible and Free
119
J. Comp. Chem. 26 (2005) pp. 1701-1719
120
-------- -------- --- Thank You --- -------- --------
121

    
122

    
123
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
124
E. Lindahl and B. Hess and D. van der Spoel
125
GROMACS 3.0: A package for molecular simulation and trajectory analysis
126
J. Mol. Mod. 7 (2001) pp. 306-317
127
-------- -------- --- Thank You --- -------- --------
128

    
129

    
130
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
131
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
132
GROMACS: A message-passing parallel molecular dynamics implementation
133
Comp. Phys. Comm. 91 (1995) pp. 43-56
134
-------- -------- --- Thank You --- -------- --------
135

    
136

    
137
For optimal performance with a GPU nstlist (now 10) should be larger.
138
The optimum depends on your CPU and GPU resources.
139
You might want to try several nstlist values.
140
Changing nstlist from 10 to 40, rlist from 0.9 to 0.996
141

    
142
Input Parameters:
143
   integrator                     = md
144
   tinit                          = 0
145
   dt                             = 0.002
146
   nsteps                         = 10000
147
   init-step                      = 0
148
   simulation-part                = 1
149
   comm-mode                      = Linear
150
   nstcomm                        = 100
151
   bd-fric                        = 0
152
   ld-seed                        = 4200386634
153
   emtol                          = 10
154
   emstep                         = 0.01
155
   niter                          = 20
156
   fcstep                         = 0
157
   nstcgsteep                     = 1000
158
   nbfgscorr                      = 10
159
   rtpi                           = 0.05
160
   nstxout                        = 0
161
   nstvout                        = 0
162
   nstfout                        = 0
163
   nstlog                         = 0
164
   nstcalcenergy                  = 100
165
   nstenergy                      = 500
166
   nstxout-compressed             = 0
167
   compressed-x-precision         = 1000
168
   cutoff-scheme                  = Verlet
169
   nstlist                        = 40
170
   ns-type                        = Grid
171
   pbc                            = xyz
172
   periodic-molecules             = FALSE
173
   verlet-buffer-tolerance        = 0.005
174
   rlist                          = 0.996
175
   rlistlong                      = 0.996
176
   nstcalclr                      = 10
177
   coulombtype                    = PME
178
   coulomb-modifier               = Potential-shift
179
   rcoulomb-switch                = 0
180
   rcoulomb                       = 0.9
181
   epsilon-r                      = 1
182
   epsilon-rf                     = inf
183
   vdw-type                       = Cut-off
184
   vdw-modifier                   = Potential-shift
185
   rvdw-switch                    = 0
186
   rvdw                           = 0.9
187
   DispCorr                       = No
188
   table-extension                = 1
189
   fourierspacing                 = 0.1125
190
   fourier-nx                     = 56
191
   fourier-ny                     = 56
192
   fourier-nz                     = 56
193
   pme-order                      = 4
194
   ewald-rtol                     = 1e-05
195
   ewald-rtol-lj                  = 0.001
196
   lj-pme-comb-rule               = Geometric
197
   ewald-geometry                 = 0
198
   epsilon-surface                = 0
199
   implicit-solvent               = No
200
   gb-algorithm                   = Still
201
   nstgbradii                     = 1
202
   rgbradii                       = 1
203
   gb-epsilon-solvent             = 80
204
   gb-saltconc                    = 0
205
   gb-obc-alpha                   = 1
206
   gb-obc-beta                    = 0.8
207
   gb-obc-gamma                   = 4.85
208
   gb-dielectric-offset           = 0.009
209
   sa-algorithm                   = Ace-approximation
210
   sa-surface-tension             = 2.05016
211
   tcoupl                         = V-rescale
212
   nsttcouple                     = 10
213
   nh-chain-length                = 0
214
   print-nose-hoover-chain-variables = FALSE
215
   pcoupl                         = No
216
   pcoupltype                     = Isotropic
217
   nstpcouple                     = -1
218
   tau-p                          = 1
219
   compressibility (3x3):
220
      compressibility[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
221
      compressibility[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
222
      compressibility[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
223
   ref-p (3x3):
224
      ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
225
      ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
226
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
227
   refcoord-scaling               = No
228
   posres-com (3):
229
      posres-com[0]= 0.00000e+00
230
      posres-com[1]= 0.00000e+00
231
      posres-com[2]= 0.00000e+00
232
   posres-comB (3):
233
      posres-comB[0]= 0.00000e+00
234
      posres-comB[1]= 0.00000e+00
235
      posres-comB[2]= 0.00000e+00
236
   QMMM                           = FALSE
237
   QMconstraints                  = 0
238
   QMMMscheme                     = 0
239
   MMChargeScaleFactor            = 1
240
qm-opts:
241
   ngQM                           = 0
242
   constraint-algorithm           = Lincs
243
   continuation                   = FALSE
244
   Shake-SOR                      = FALSE
245
   shake-tol                      = 0.0001
246
   lincs-order                    = 4
247
   lincs-iter                     = 1
248
   lincs-warnangle                = 30
249
   nwall                          = 0
250
   wall-type                      = 9-3
251
   wall-r-linpot                  = -1
252
   wall-atomtype[0]               = -1
253
   wall-atomtype[1]               = -1
254
   wall-density[0]                = 0
255
   wall-density[1]                = 0
256
   wall-ewald-zfac                = 3
257
   pull                           = FALSE
258
   rotation                       = FALSE
259
   interactiveMD                  = FALSE
260
   disre                          = No
261
   disre-weighting                = Conservative
262
   disre-mixed                    = FALSE
263
   dr-fc                          = 1000
264
   dr-tau                         = 0
265
   nstdisreout                    = 100
266
   orire-fc                       = 0
267
   orire-tau                      = 0
268
   nstorireout                    = 100
269
   free-energy                    = no
270
   cos-acceleration               = 0
271
   deform (3x3):
272
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
273
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
274
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
275
   simulated-tempering            = FALSE
276
   E-x:
277
      n = 0
278
   E-xt:
279
      n = 0
280
   E-y:
281
      n = 0
282
   E-yt:
283
      n = 0
284
   E-z:
285
      n = 0
286
   E-zt:
287
      n = 0
288
   swapcoords                     = no
289
   adress                         = FALSE
290
   userint1                       = 0
291
   userint2                       = 0
292
   userint3                       = 0
293
   userint4                       = 0
294
   userreal1                      = 0
295
   userreal2                      = 0
296
   userreal3                      = 0
297
   userreal4                      = 0
298
grpopts:
299
   nrdf:       48056
300
   ref-t:         300
301
   tau-t:         0.1
302
annealing:          No
303
annealing-npoints:           0
304
   acc:	           0           0           0
305
   nfreeze:           N           N           N
306
   energygrp-flags[  0]: 0
307

    
308

    
309
Overriding nsteps with value passed on the command line: 10000 steps, 20 ps
310

    
311
Using 1 MPI thread
312
Using 16 OpenMP threads 
313

    
314
1 compatible GPU is present, with ID 0
315
1 GPU auto-selected for this run.
316
Mapping of GPU ID to the 1 PP rank in this node: 0
317

    
318
Will do PME sum in reciprocal space for electrostatic interactions.
319

    
320
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
321
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen 
322
A smooth particle mesh Ewald method
323
J. Chem. Phys. 103 (1995) pp. 8577-8592
324
-------- -------- --- Thank You --- -------- --------
325

    
326
Will do ordinary reciprocal space Ewald sum.
327
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
328
Cut-off's:   NS: 0.996   Coulomb: 0.9   LJ: 0.9
329
System total charge: 0.000
330
Generated table with 998 data points for Ewald.
331
Tabscale = 500 points/nm
332
Generated table with 998 data points for LJ6.
333
Tabscale = 500 points/nm
334
Generated table with 998 data points for LJ12.
335
Tabscale = 500 points/nm
336
Generated table with 998 data points for 1-4 COUL.
337
Tabscale = 500 points/nm
338
Generated table with 998 data points for 1-4 LJ6.
339
Tabscale = 500 points/nm
340
Generated table with 998 data points for 1-4 LJ12.
341
Tabscale = 500 points/nm
342
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Ewald -1.000e-05
343
Initialized non-bonded Ewald correction tables, spacing: 8.85e-04 size: 1018
344

    
345

    
346
NOTE: GROMACS was configured without NVML support hence it can not exploit
347
      application clocks of the detected Quadro M6000 GPU to improve performance.
348
      Recompile with the NVML library (compatible with the driver used) or set application clocks manually.
349

    
350

    
351
Using GPU 8x8 non-bonded kernels
352

    
353
Removing pbc first time
354
Pinning threads with an auto-selected logical core stride of 1
355

    
356
Initializing LINear Constraint Solver
357

    
358
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
359
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
360
LINCS: A Linear Constraint Solver for molecular simulations
361
J. Comp. Chem. 18 (1997) pp. 1463-1472
362
-------- -------- --- Thank You --- -------- --------
363

    
364
The number of constraints is 2053
365

    
366
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
367
S. Miyamoto and P. A. Kollman
368
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
369
Water Models
370
J. Comp. Chem. 13 (1992) pp. 952-962
371
-------- -------- --- Thank You --- -------- --------
372

    
373
Center of mass motion removal mode is Linear
374
We have the following groups for center of mass motion removal:
375
  0:  rest
376

    
377
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
378
G. Bussi, D. Donadio and M. Parrinello
379
Canonical sampling through velocity rescaling
380
J. Chem. Phys. 126 (2007) pp. 014101
381
-------- -------- --- Thank You --- -------- --------
382

    
383
There are: 24040 Atoms
384

    
385
Constraining the starting coordinates (step 0)
386

    
387
Constraining the coordinates at t0-dt (step 0)
388
RMS relative constraint deviation after constraining: 1.20e-05
389
Initial temperature: 297.8 K
390

    
391
Started mdrun on rank 0 Wed Jan 13 17:35:23 2016
392
           Step           Time         Lambda
393
              0        0.00000        0.00000
394

    
395
   Energies (kJ/mol)
396
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
397
    4.44103e+03    5.70375e+03    2.50388e+02    2.00472e+03    1.68037e+04
398
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
399
    4.16574e+04   -3.84143e+05    3.38823e+03   -3.09894e+05    5.99548e+04
400
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
401
   -2.49939e+05   -2.49939e+05    3.00104e+02   -3.53119e+02    2.74372e-05
402

    
403
step   80: timed with pme grid 56 56 56, coulomb cutoff 0.900: 132.4 M-cycles
404
step  160: timed with pme grid 48 48 48, coulomb cutoff 1.046: 117.0 M-cycles
405
step  240: timed with pme grid 44 44 44, coulomb cutoff 1.141: 134.0 M-cycles
406
step  320: timed with pme grid 48 48 48, coulomb cutoff 1.046: 118.1 M-cycles
407
step  400: timed with pme grid 52 52 52, coulomb cutoff 0.966: 123.6 M-cycles
408
              optimal pme grid 48 48 48, coulomb cutoff 1.046
409

    
410
step 5000: resetting all time and cycle counters
411

    
412
Restarted time on rank 0 Wed Jan 13 17:35:29 2016
413
           Step           Time         Lambda
414
          10000       20.00000        0.00000
415

    
416
   Energies (kJ/mol)
417
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
418
    4.58241e+03    5.59258e+03    2.72785e+02    2.02947e+03    1.66522e+04
419
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
420
    4.29553e+04   -3.84741e+05    1.95844e+03   -3.10698e+05    5.96130e+04
421
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
422
   -2.51085e+05   -2.50355e+05    2.98393e+02   -1.42444e+02    2.77777e-05
423

    
424
	<======  ###############  ==>
425
	<====  A V E R A G E S  ====>
426
	<==  ###############  ======>
427

    
428
	Statistics over 10001 steps using 101 frames
429

    
430
   Energies (kJ/mol)
431
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
432
    4.51721e+03    5.57610e+03    2.56936e+02    2.08051e+03    1.67647e+04
433
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
434
    4.21277e+04   -3.83586e+05    1.98749e+03   -3.10275e+05    5.99803e+04
435
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
436
   -2.50295e+05   -2.50147e+05    3.00232e+02   -2.11692e+02    0.00000e+00
437

    
438
   Total Virial (kJ/mol)
439
    2.16029e+04    1.64406e+00   -1.43081e+02
440
    2.55019e+00    2.17040e+04   -9.07856e-02
441
   -1.41324e+02   -5.73709e-01    2.14018e+04
442

    
443
   Pressure (bar)
444
   -2.11569e+02    1.22313e-01    1.94469e+01
445
    6.08956e-04   -2.33601e+02    2.29045e+00
446
    1.92110e+01    2.35532e+00   -1.89907e+02
447

    
448

    
449
       P P   -   P M E   L O A D   B A L A N C I N G
450

    
451
 PP/PME load balancing changed the cut-off and PME settings:
452
           particle-particle                    PME
453
            rcoulomb  rlist            grid      spacing   1/beta
454
   initial  0.900 nm  0.996 nm      56  56  56   0.112 nm  0.288 nm
455
   final    1.046 nm  1.142 nm      48  48  48   0.131 nm  0.335 nm
456
 cost-ratio           1.51             0.63
457
 (note that these numbers concern only part of the total PP and PME load)
458

    
459

    
460
	M E G A - F L O P S   A C C O U N T I N G
461

    
462
 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
463
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
464
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
465
 V&F=Potential and force  V=Potential only  F=Force only
466

    
467
 Computing:                               M-Number         M-Flops  % Flops
468
-----------------------------------------------------------------------------
469
 Pair Search distance check             401.090672        3609.816     0.0
470
 NxN Ewald Elec. + LJ [F]            109436.909824     7222836.048    95.1
471
 NxN Ewald Elec. + LJ [V&F]            1127.396672      120631.444     1.6
472
 1,4 nonbonded interactions              26.710341        2403.931     0.0
473
 Calc Weights                           360.672120       12984.196     0.2
474
 Spread Q Bspline                      7694.338560       15388.677     0.2
475
 Gather F Bspline                      7694.338560       46166.031     0.6
476
 3D-FFT                               18533.265912      148266.127     2.0
477
 Solve PME                               11.522304         737.427     0.0
478
 Shift-X                                  3.029040          18.174     0.0
479
 Angles                                  18.523704        3111.982     0.0
480
 Propers                                 27.915582        6392.668     0.1
481
 Impropers                                2.110422         438.968     0.0
482
 Virial                                   1.228335          22.110     0.0
483
 Stop-CM                                  1.226040          12.260     0.0
484
 Calc-Ekin                               24.064040         649.729     0.0
485
 Lincs                                   10.267053         616.023     0.0
486
 Lincs-Mat                              222.284448         889.138     0.0
487
 Constraint-V                           130.596114        1044.769     0.0
488
 Constraint-Vir                           1.227111          29.451     0.0
489
 Settle                                  36.687336       11850.010     0.2
490
-----------------------------------------------------------------------------
491
 Total                                                 7598098.981   100.0
492
-----------------------------------------------------------------------------
493

    
494

    
495
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
496

    
497
On 1 MPI rank, each using 16 OpenMP threads
498

    
499
 Computing:          Num   Num      Call    Wall time         Giga-Cycles
500
                     Ranks Threads  Count      (s)         total sum    %
501
-----------------------------------------------------------------------------
502
 Neighbor search        1   16        126       0.151          7.270   3.2
503
 Launch GPU ops.        1   16       5001       0.175          8.399   3.7
504
 Force                  1   16       5001       0.540         25.943  11.3
505
 PME mesh               1   16       5001       2.219        106.628  46.6
506
 Wait GPU local         1   16       5001       0.712         34.222  14.9
507
 NB X/F buffer ops.     1   16       9876       0.116          5.564   2.4
508
 Update                 1   16       5001       0.199          9.563   4.2
509
 Constraints            1   16       5001       0.590         28.340  12.4
510
 Rest                                           0.064          3.051   1.3
511
-----------------------------------------------------------------------------
512
 Total                                          4.766        228.980 100.0
513
-----------------------------------------------------------------------------
514
 Breakdown of PME mesh computation
515
-----------------------------------------------------------------------------
516
 PME spread/gather      1   16      10002       1.440         69.203  30.2
517
 PME 3D-FFT             1   16      10002       0.697         33.483  14.6
518
 PME solve Elec         1   16       5001       0.069          3.331   1.5
519
-----------------------------------------------------------------------------
520
 Breakdown of PP computation
521
-----------------------------------------------------------------------------
522
 NS grid local          1   16        126       0.029          1.387   0.6
523
 NS search local        1   16        126       0.113          5.441   2.4
524
 Bonded F               1   16       5001       0.311         14.944   6.5
525
 Listed buffer ops.     1   16       5001       0.164          7.867   3.4
526
 NB X buffer ops.       1   16       4875       0.056          2.672   1.2
527
 NB F buffer ops.       1   16       5001       0.060          2.866   1.3
528
-----------------------------------------------------------------------------
529

    
530
 GPU timings
531
-----------------------------------------------------------------------------
532
 Computing:                         Count  Wall t (s)      ms/step       %
533
-----------------------------------------------------------------------------
534
 Pair list H2D                        126       0.015        0.120     0.4
535
 X / q H2D                           5001       0.199        0.040     5.6
536
 Nonbonded F kernel                  4850       3.022        0.623    85.5
537
 Nonbonded F+ene k.                    25       0.023        0.934     0.7
538
 Nonbonded F+prune k.                 100       0.082        0.818     2.3
539
 Nonbonded F+ene+prune k.              26       0.029        1.131     0.8
540
 F D2H                               5001       0.161        0.032     4.6
541
-----------------------------------------------------------------------------
542
 Total                                          3.532        0.706   100.0
543
-----------------------------------------------------------------------------
544

    
545
Force evaluation time GPU/CPU: 0.706 ms/0.552 ms = 1.280
546
For optimal performance this ratio should be close to 1!
547

    
548

    
549
NOTE: The GPU has >20% more load than the CPU. This imbalance causes
550
      performance loss, consider using a shorter cut-off and a finer PME grid.
551

    
552
               Core t (s)   Wall t (s)        (%)
553
       Time:       76.001        4.766     1594.7
554
                 (ns/day)    (hour/ns)
555
Performance:      181.320        0.132
556
Finished mdrun on rank 0 Wed Jan 13 17:35:34 2016