Project

General

Profile

test_1x16_tune_5.log

Szilárd Páll, 01/13/2016 06:06 PM

 
1
Log file opened on Wed Jan 13 17:36:19 2016
2
Host: tcbs14  pid: 4213  rank ID: 0  number of ranks:  1
3
        :-) GROMACS - gmx mdrun, VERSION 5.1.2-dev-20160113-8b14e14 (-:
4

    
5
                            GROMACS is written by:
6
     Emile Apol      Rossen Apostolov  Herman J.C. Berendsen    Par Bjelkmar   
7
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra   Sebastian Fritsch 
8
  Gerrit Groenhof   Christoph Junghans   Anca Hamuraru    Vincent Hindriksen
9
 Dimitrios Karkoulis    Peter Kasson        Jiri Kraus      Carsten Kutzner  
10
    Per Larsson      Justin A. Lemkul   Magnus Lundborg   Pieter Meulenhoff 
11
   Erik Marklund      Teemu Murtola       Szilard Pall       Sander Pronk   
12
   Roland Schulz     Alexey Shvetsov     Michael Shirts     Alfons Sijbers  
13
   Peter Tieleman    Teemu Virolainen  Christian Wennberg    Maarten Wolf   
14
                           and the project leaders:
15
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
16

    
17
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
18
Copyright (c) 2001-2015, The GROMACS development team at
19
Uppsala University, Stockholm University and
20
the Royal Institute of Technology, Sweden.
21
check out http://www.gromacs.org for more information.
22

    
23
GROMACS is free software; you can redistribute it and/or modify it
24
under the terms of the GNU Lesser General Public License
25
as published by the Free Software Foundation; either version 2.1
26
of the License, or (at your option) any later version.
27

    
28
GROMACS:      gmx mdrun, VERSION 5.1.2-dev-20160113-8b14e14
29
Executable:   /nethome/pszilard-projects/gromacs/gromacs-5.1/build_gcc48_hsw_cuda65/bin/gmx
30
Data prefix:  /nethome/pszilard-projects/gromacs/gromacs-5.1 (source tree)
31
Command line:
32
  gmx mdrun -quiet -v -resethway -noconfout -pin on -ntmpi 1 -ntomp 16 -nsteps 10000 -g test_1x16_tune_5 -tunepme
33

    
34
GROMACS version:    VERSION 5.1.2-dev-20160113-8b14e14
35
GIT SHA1 hash:      8b14e14f4a18193eacc86a2da9a4d812df0e03eb
36
Precision:          single
37
Memory model:       64 bit
38
MPI library:        thread_mpi
39
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
40
GPU support:        enabled
41
OpenCL support:     disabled
42
invsqrt routine:    gmx_software_invsqrt(x)
43
SIMD instructions:  AVX2_256
44
FFT library:        fftw-3.3.4-sse2-avx
45
RDTSCP usage:       enabled
46
C++11 compilation:  disabled
47
TNG support:        enabled
48
Tracing support:    disabled
49
Built on:           Mon Sep 14 15:56:07 CEST 2015
50
Built by:           pszilard@tcbs14 [CMAKE]
51
Build OS/arch:      Linux 3.13.0-63-generic x86_64
52
Build CPU vendor:   GenuineIntel
53
Build CPU brand:    Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
54
Build CPU family:   6   Model: 63   Stepping: 2
55
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
56
C compiler:         /usr/bin/gcc-4.8 GNU 4.8.1
57
C compiler flags:    -march=core-avx2    -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
58
C++ compiler:       /usr/bin/g++-4.8 GNU 4.8.1
59
C++ compiler flags:  -march=core-avx2    -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
60
Boost version:      1.55.0 (internal)
61
CUDA compiler:      /opt/tcbsys/cuda/6.5/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2014 NVIDIA Corporation;Built on Wed_Aug_27_10:36:36_CDT_2014;Cuda compilation tools, release 6.5, V6.5.16
62
CUDA compiler flags:-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-use_fast_math;-Xptxas;-dlcm=ca; ;-march=core-avx2;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;-Wno-array-bounds;
63
CUDA driver:        7.50
64
CUDA runtime:       6.50
65

    
66

    
67
Running on 1 node with total 8 cores, 16 logical cores, 1 compatible GPU
68
Hardware detected:
69
  CPU info:
70
    Vendor: GenuineIntel
71
    Brand:  Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
72
    Family:  6  model: 63  stepping:  2
73
    CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
74
    SIMD instructions most likely to fit this hardware: AVX2_256
75
    SIMD instructions selected at GROMACS compile time: AVX2_256
76
  GPU info:
77
    Number of GPUs detected: 1
78
    #0: NVIDIA Quadro M6000, compute cap.: 5.2, ECC:  no, stat: compatible
79

    
80

    
81
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
82
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
83
Lindahl
84
GROMACS: High performance molecular simulations through multi-level
85
parallelism from laptops to supercomputers
86
SoftwareX 1 (2015) pp. 19-25
87
-------- -------- --- Thank You --- -------- --------
88

    
89

    
90
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
91
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
92
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
93
GROMACS
94
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
95
-------- -------- --- Thank You --- -------- --------
96

    
97

    
98
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
99
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
100
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
101
GROMACS 4.5: a high-throughput and highly parallel open source molecular
102
simulation toolkit
103
Bioinformatics 29 (2013) pp. 845-54
104
-------- -------- --- Thank You --- -------- --------
105

    
106

    
107
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
108
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
109
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
110
molecular simulation
111
J. Chem. Theory Comput. 4 (2008) pp. 435-447
112
-------- -------- --- Thank You --- -------- --------
113

    
114

    
115
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
116
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
117
Berendsen
118
GROMACS: Fast, Flexible and Free
119
J. Comp. Chem. 26 (2005) pp. 1701-1719
120
-------- -------- --- Thank You --- -------- --------
121

    
122

    
123
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
124
E. Lindahl and B. Hess and D. van der Spoel
125
GROMACS 3.0: A package for molecular simulation and trajectory analysis
126
J. Mol. Mod. 7 (2001) pp. 306-317
127
-------- -------- --- Thank You --- -------- --------
128

    
129

    
130
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
131
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
132
GROMACS: A message-passing parallel molecular dynamics implementation
133
Comp. Phys. Comm. 91 (1995) pp. 43-56
134
-------- -------- --- Thank You --- -------- --------
135

    
136

    
137
For optimal performance with a GPU nstlist (now 10) should be larger.
138
The optimum depends on your CPU and GPU resources.
139
You might want to try several nstlist values.
140
Changing nstlist from 10 to 40, rlist from 0.9 to 0.996
141

    
142
Input Parameters:
143
   integrator                     = md
144
   tinit                          = 0
145
   dt                             = 0.002
146
   nsteps                         = 10000
147
   init-step                      = 0
148
   simulation-part                = 1
149
   comm-mode                      = Linear
150
   nstcomm                        = 100
151
   bd-fric                        = 0
152
   ld-seed                        = 4200386634
153
   emtol                          = 10
154
   emstep                         = 0.01
155
   niter                          = 20
156
   fcstep                         = 0
157
   nstcgsteep                     = 1000
158
   nbfgscorr                      = 10
159
   rtpi                           = 0.05
160
   nstxout                        = 0
161
   nstvout                        = 0
162
   nstfout                        = 0
163
   nstlog                         = 0
164
   nstcalcenergy                  = 100
165
   nstenergy                      = 500
166
   nstxout-compressed             = 0
167
   compressed-x-precision         = 1000
168
   cutoff-scheme                  = Verlet
169
   nstlist                        = 40
170
   ns-type                        = Grid
171
   pbc                            = xyz
172
   periodic-molecules             = FALSE
173
   verlet-buffer-tolerance        = 0.005
174
   rlist                          = 0.996
175
   rlistlong                      = 0.996
176
   nstcalclr                      = 10
177
   coulombtype                    = PME
178
   coulomb-modifier               = Potential-shift
179
   rcoulomb-switch                = 0
180
   rcoulomb                       = 0.9
181
   epsilon-r                      = 1
182
   epsilon-rf                     = inf
183
   vdw-type                       = Cut-off
184
   vdw-modifier                   = Potential-shift
185
   rvdw-switch                    = 0
186
   rvdw                           = 0.9
187
   DispCorr                       = No
188
   table-extension                = 1
189
   fourierspacing                 = 0.1125
190
   fourier-nx                     = 56
191
   fourier-ny                     = 56
192
   fourier-nz                     = 56
193
   pme-order                      = 4
194
   ewald-rtol                     = 1e-05
195
   ewald-rtol-lj                  = 0.001
196
   lj-pme-comb-rule               = Geometric
197
   ewald-geometry                 = 0
198
   epsilon-surface                = 0
199
   implicit-solvent               = No
200
   gb-algorithm                   = Still
201
   nstgbradii                     = 1
202
   rgbradii                       = 1
203
   gb-epsilon-solvent             = 80
204
   gb-saltconc                    = 0
205
   gb-obc-alpha                   = 1
206
   gb-obc-beta                    = 0.8
207
   gb-obc-gamma                   = 4.85
208
   gb-dielectric-offset           = 0.009
209
   sa-algorithm                   = Ace-approximation
210
   sa-surface-tension             = 2.05016
211
   tcoupl                         = V-rescale
212
   nsttcouple                     = 10
213
   nh-chain-length                = 0
214
   print-nose-hoover-chain-variables = FALSE
215
   pcoupl                         = No
216
   pcoupltype                     = Isotropic
217
   nstpcouple                     = -1
218
   tau-p                          = 1
219
   compressibility (3x3):
220
      compressibility[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
221
      compressibility[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
222
      compressibility[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
223
   ref-p (3x3):
224
      ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
225
      ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
226
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
227
   refcoord-scaling               = No
228
   posres-com (3):
229
      posres-com[0]= 0.00000e+00
230
      posres-com[1]= 0.00000e+00
231
      posres-com[2]= 0.00000e+00
232
   posres-comB (3):
233
      posres-comB[0]= 0.00000e+00
234
      posres-comB[1]= 0.00000e+00
235
      posres-comB[2]= 0.00000e+00
236
   QMMM                           = FALSE
237
   QMconstraints                  = 0
238
   QMMMscheme                     = 0
239
   MMChargeScaleFactor            = 1
240
qm-opts:
241
   ngQM                           = 0
242
   constraint-algorithm           = Lincs
243
   continuation                   = FALSE
244
   Shake-SOR                      = FALSE
245
   shake-tol                      = 0.0001
246
   lincs-order                    = 4
247
   lincs-iter                     = 1
248
   lincs-warnangle                = 30
249
   nwall                          = 0
250
   wall-type                      = 9-3
251
   wall-r-linpot                  = -1
252
   wall-atomtype[0]               = -1
253
   wall-atomtype[1]               = -1
254
   wall-density[0]                = 0
255
   wall-density[1]                = 0
256
   wall-ewald-zfac                = 3
257
   pull                           = FALSE
258
   rotation                       = FALSE
259
   interactiveMD                  = FALSE
260
   disre                          = No
261
   disre-weighting                = Conservative
262
   disre-mixed                    = FALSE
263
   dr-fc                          = 1000
264
   dr-tau                         = 0
265
   nstdisreout                    = 100
266
   orire-fc                       = 0
267
   orire-tau                      = 0
268
   nstorireout                    = 100
269
   free-energy                    = no
270
   cos-acceleration               = 0
271
   deform (3x3):
272
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
273
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
274
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
275
   simulated-tempering            = FALSE
276
   E-x:
277
      n = 0
278
   E-xt:
279
      n = 0
280
   E-y:
281
      n = 0
282
   E-yt:
283
      n = 0
284
   E-z:
285
      n = 0
286
   E-zt:
287
      n = 0
288
   swapcoords                     = no
289
   adress                         = FALSE
290
   userint1                       = 0
291
   userint2                       = 0
292
   userint3                       = 0
293
   userint4                       = 0
294
   userreal1                      = 0
295
   userreal2                      = 0
296
   userreal3                      = 0
297
   userreal4                      = 0
298
grpopts:
299
   nrdf:       48056
300
   ref-t:         300
301
   tau-t:         0.1
302
annealing:          No
303
annealing-npoints:           0
304
   acc:	           0           0           0
305
   nfreeze:           N           N           N
306
   energygrp-flags[  0]: 0
307

    
308

    
309
Overriding nsteps with value passed on the command line: 10000 steps, 20 ps
310

    
311
Using 1 MPI thread
312
Using 16 OpenMP threads 
313

    
314
1 compatible GPU is present, with ID 0
315
1 GPU auto-selected for this run.
316
Mapping of GPU ID to the 1 PP rank in this node: 0
317

    
318
Will do PME sum in reciprocal space for electrostatic interactions.
319

    
320
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
321
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen 
322
A smooth particle mesh Ewald method
323
J. Chem. Phys. 103 (1995) pp. 8577-8592
324
-------- -------- --- Thank You --- -------- --------
325

    
326
Will do ordinary reciprocal space Ewald sum.
327
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
328
Cut-off's:   NS: 0.996   Coulomb: 0.9   LJ: 0.9
329
System total charge: 0.000
330
Generated table with 998 data points for Ewald.
331
Tabscale = 500 points/nm
332
Generated table with 998 data points for LJ6.
333
Tabscale = 500 points/nm
334
Generated table with 998 data points for LJ12.
335
Tabscale = 500 points/nm
336
Generated table with 998 data points for 1-4 COUL.
337
Tabscale = 500 points/nm
338
Generated table with 998 data points for 1-4 LJ6.
339
Tabscale = 500 points/nm
340
Generated table with 998 data points for 1-4 LJ12.
341
Tabscale = 500 points/nm
342
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Ewald -1.000e-05
343
Initialized non-bonded Ewald correction tables, spacing: 8.85e-04 size: 1018
344

    
345

    
346
NOTE: GROMACS was configured without NVML support hence it can not exploit
347
      application clocks of the detected Quadro M6000 GPU to improve performance.
348
      Recompile with the NVML library (compatible with the driver used) or set application clocks manually.
349

    
350

    
351
Using GPU 8x8 non-bonded kernels
352

    
353
Removing pbc first time
354
Pinning threads with an auto-selected logical core stride of 1
355

    
356
Initializing LINear Constraint Solver
357

    
358
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
359
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
360
LINCS: A Linear Constraint Solver for molecular simulations
361
J. Comp. Chem. 18 (1997) pp. 1463-1472
362
-------- -------- --- Thank You --- -------- --------
363

    
364
The number of constraints is 2053
365

    
366
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
367
S. Miyamoto and P. A. Kollman
368
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
369
Water Models
370
J. Comp. Chem. 13 (1992) pp. 952-962
371
-------- -------- --- Thank You --- -------- --------
372

    
373
Center of mass motion removal mode is Linear
374
We have the following groups for center of mass motion removal:
375
  0:  rest
376

    
377
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
378
G. Bussi, D. Donadio and M. Parrinello
379
Canonical sampling through velocity rescaling
380
J. Chem. Phys. 126 (2007) pp. 014101
381
-------- -------- --- Thank You --- -------- --------
382

    
383
There are: 24040 Atoms
384

    
385
Constraining the starting coordinates (step 0)
386

    
387
Constraining the coordinates at t0-dt (step 0)
388
RMS relative constraint deviation after constraining: 1.20e-05
389
Initial temperature: 297.8 K
390

    
391
Started mdrun on rank 0 Wed Jan 13 17:36:20 2016
392
           Step           Time         Lambda
393
              0        0.00000        0.00000
394

    
395
   Energies (kJ/mol)
396
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
397
    4.44103e+03    5.70375e+03    2.50388e+02    2.00472e+03    1.68037e+04
398
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
399
    4.16574e+04   -3.84143e+05    3.38823e+03   -3.09894e+05    5.99548e+04
400
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
401
   -2.49939e+05   -2.49939e+05    3.00104e+02   -3.53122e+02    2.74372e-05
402

    
403
step   80: timed with pme grid 56 56 56, coulomb cutoff 0.900: 123.8 M-cycles
404
step  160: timed with pme grid 48 48 48, coulomb cutoff 1.046: 117.1 M-cycles
405
step  240: timed with pme grid 44 44 44, coulomb cutoff 1.141: 133.1 M-cycles
406
step  320: timed with pme grid 48 48 48, coulomb cutoff 1.046: 118.2 M-cycles
407
step  400: timed with pme grid 52 52 52, coulomb cutoff 0.966: 122.4 M-cycles
408
step  480: timed with pme grid 56 56 56, coulomb cutoff 0.900: 123.7 M-cycles
409
              optimal pme grid 48 48 48, coulomb cutoff 1.046
410

    
411
step 5000: resetting all time and cycle counters
412

    
413
Restarted time on rank 0 Wed Jan 13 17:36:26 2016
414
           Step           Time         Lambda
415
          10000       20.00000        0.00000
416

    
417
   Energies (kJ/mol)
418
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
419
    4.64193e+03    5.55978e+03    2.47517e+02    2.05754e+03    1.67330e+04
420
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
421
    4.18313e+04   -3.83297e+05    2.02986e+03   -3.10196e+05    5.97392e+04
422
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
423
   -2.50457e+05   -2.50371e+05    2.99025e+02   -4.38195e+02    2.88291e-05
424

    
425
	<======  ###############  ==>
426
	<====  A V E R A G E S  ====>
427
	<==  ###############  ======>
428

    
429
	Statistics over 10001 steps using 101 frames
430

    
431
   Energies (kJ/mol)
432
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
433
    4.48499e+03    5.59426e+03    2.63861e+02    2.08071e+03    1.68182e+04
434
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
435
    4.21315e+04   -3.83617e+05    2.00659e+03   -3.10237e+05    5.99538e+04
436
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
437
   -2.50284e+05   -2.50146e+05    3.00099e+02   -2.02113e+02    0.00000e+00
438

    
439
   Total Virial (kJ/mol)
440
    2.14504e+04    9.68297e+01   -1.62449e+00
441
    9.73269e+01    2.14502e+04   -9.18374e+01
442
   -3.62943e+00   -9.24507e+01    2.15676e+04
443

    
444
   Pressure (bar)
445
   -1.91677e+02   -1.46230e+01   -5.33324e-01
446
   -1.46898e+01   -2.05441e+02    9.64885e+00
447
   -2.64037e-01    9.73122e+00   -2.09222e+02
448

    
449

    
450
       P P   -   P M E   L O A D   B A L A N C I N G
451

    
452
 PP/PME load balancing changed the cut-off and PME settings:
453
           particle-particle                    PME
454
            rcoulomb  rlist            grid      spacing   1/beta
455
   initial  0.900 nm  0.996 nm      56  56  56   0.112 nm  0.288 nm
456
   final    1.046 nm  1.142 nm      48  48  48   0.131 nm  0.335 nm
457
 cost-ratio           1.51             0.63
458
 (note that these numbers concern only part of the total PP and PME load)
459

    
460

    
461
	M E G A - F L O P S   A C C O U N T I N G
462

    
463
 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
464
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
465
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
466
 V&F=Potential and force  V=Potential only  F=Force only
467

    
468
 Computing:                               M-Number         M-Flops  % Flops
469
-----------------------------------------------------------------------------
470
 Pair Search distance check             400.637920        3605.741     0.0
471
 NxN Ewald Elec. + LJ [F]            109431.129728     7222454.562    95.1
472
 NxN Ewald Elec. + LJ [V&F]            1127.540288      120646.811     1.6
473
 1,4 nonbonded interactions              26.710341        2403.931     0.0
474
 Calc Weights                           360.672120       12984.196     0.2
475
 Spread Q Bspline                      7694.338560       15388.677     0.2
476
 Gather F Bspline                      7694.338560       46166.031     0.6
477
 3D-FFT                               18533.265912      148266.127     2.0
478
 Solve PME                               11.522304         737.427     0.0
479
 Shift-X                                  3.029040          18.174     0.0
480
 Angles                                  18.523704        3111.982     0.0
481
 Propers                                 27.915582        6392.668     0.1
482
 Impropers                                2.110422         438.968     0.0
483
 Virial                                   1.228335          22.110     0.0
484
 Stop-CM                                  1.226040          12.260     0.0
485
 Calc-Ekin                               24.064040         649.729     0.0
486
 Lincs                                   10.267053         616.023     0.0
487
 Lincs-Mat                              222.284448         889.138     0.0
488
 Constraint-V                           130.596114        1044.769     0.0
489
 Constraint-Vir                           1.227111          29.451     0.0
490
 Settle                                  36.687336       11850.010     0.2
491
-----------------------------------------------------------------------------
492
 Total                                                 7597728.787   100.0
493
-----------------------------------------------------------------------------
494

    
495

    
496
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
497

    
498
On 1 MPI rank, each using 16 OpenMP threads
499

    
500
 Computing:          Num   Num      Call    Wall time         Giga-Cycles
501
                     Ranks Threads  Count      (s)         total sum    %
502
-----------------------------------------------------------------------------
503
 Neighbor search        1   16        126       0.152          7.285   3.2
504
 Launch GPU ops.        1   16       5001       0.175          8.384   3.7
505
 Force                  1   16       5001       0.540         25.926  11.4
506
 PME mesh               1   16       5001       2.227        106.993  46.9
507
 Wait GPU local         1   16       5001       0.691         33.182  14.5
508
 NB X/F buffer ops.     1   16       9876       0.117          5.621   2.5
509
 Update                 1   16       5001       0.202          9.717   4.3
510
 Constraints            1   16       5001       0.587         28.212  12.4
511
 Rest                                           0.063          3.013   1.3
512
-----------------------------------------------------------------------------
513
 Total                                          4.754        228.333 100.0
514
-----------------------------------------------------------------------------
515
 Breakdown of PME mesh computation
516
-----------------------------------------------------------------------------
517
 PME spread/gather      1   16      10002       1.439         69.106  30.3
518
 PME 3D-FFT             1   16      10002       0.704         33.793  14.8
519
 PME solve Elec         1   16       5001       0.072          3.466   1.5
520
-----------------------------------------------------------------------------
521
 Breakdown of PP computation
522
-----------------------------------------------------------------------------
523
 NS grid local          1   16        126       0.029          1.386   0.6
524
 NS search local        1   16        126       0.114          5.465   2.4
525
 Bonded F               1   16       5001       0.306         14.702   6.4
526
 Listed buffer ops.     1   16       5001       0.169          8.129   3.6
527
 NB X buffer ops.       1   16       4875       0.056          2.713   1.2
528
 NB F buffer ops.       1   16       5001       0.060          2.881   1.3
529
-----------------------------------------------------------------------------
530

    
531
 GPU timings
532
-----------------------------------------------------------------------------
533
 Computing:                         Count  Wall t (s)      ms/step       %
534
-----------------------------------------------------------------------------
535
 Pair list H2D                        126       0.015        0.120     0.4
536
 X / q H2D                           5001       0.200        0.040     5.7
537
 Nonbonded F kernel                  4850       3.021        0.623    85.5
538
 Nonbonded F+ene k.                    25       0.023        0.935     0.7
539
 Nonbonded F+prune k.                 100       0.082        0.819     2.3
540
 Nonbonded F+ene+prune k.              26       0.029        1.131     0.8
541
 F D2H                               5001       0.162        0.032     4.6
542
-----------------------------------------------------------------------------
543
 Total                                          3.533        0.706   100.0
544
-----------------------------------------------------------------------------
545

    
546
Force evaluation time GPU/CPU: 0.706 ms/0.553 ms = 1.277
547
For optimal performance this ratio should be close to 1!
548

    
549

    
550
NOTE: The GPU has >20% more load than the CPU. This imbalance causes
551
      performance loss, consider using a shorter cut-off and a finer PME grid.
552

    
553
               Core t (s)   Wall t (s)        (%)
554
       Time:       75.955        4.754     1597.8
555
                 (ns/day)    (hour/ns)
556
Performance:      181.791        0.132
557
Finished mdrun on rank 0 Wed Jan 13 17:36:31 2016