Project

General

Profile

test_1x16_notune_5.log

Szilárd Páll, 01/13/2016 06:06 PM

 
1
Log file opened on Wed Jan 13 17:37:24 2016
2
Host: tcbs14  pid: 4332  rank ID: 0  number of ranks:  1
3
        :-) GROMACS - gmx mdrun, VERSION 5.1.2-dev-20160113-8b14e14 (-:
4

    
5
                            GROMACS is written by:
6
     Emile Apol      Rossen Apostolov  Herman J.C. Berendsen    Par Bjelkmar   
7
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra   Sebastian Fritsch 
8
  Gerrit Groenhof   Christoph Junghans   Anca Hamuraru    Vincent Hindriksen
9
 Dimitrios Karkoulis    Peter Kasson        Jiri Kraus      Carsten Kutzner  
10
    Per Larsson      Justin A. Lemkul   Magnus Lundborg   Pieter Meulenhoff 
11
   Erik Marklund      Teemu Murtola       Szilard Pall       Sander Pronk   
12
   Roland Schulz     Alexey Shvetsov     Michael Shirts     Alfons Sijbers  
13
   Peter Tieleman    Teemu Virolainen  Christian Wennberg    Maarten Wolf   
14
                           and the project leaders:
15
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
16

    
17
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
18
Copyright (c) 2001-2015, The GROMACS development team at
19
Uppsala University, Stockholm University and
20
the Royal Institute of Technology, Sweden.
21
check out http://www.gromacs.org for more information.
22

    
23
GROMACS is free software; you can redistribute it and/or modify it
24
under the terms of the GNU Lesser General Public License
25
as published by the Free Software Foundation; either version 2.1
26
of the License, or (at your option) any later version.
27

    
28
GROMACS:      gmx mdrun, VERSION 5.1.2-dev-20160113-8b14e14
29
Executable:   /nethome/pszilard-projects/gromacs/gromacs-5.1/build_gcc48_hsw_cuda65/bin/gmx
30
Data prefix:  /nethome/pszilard-projects/gromacs/gromacs-5.1 (source tree)
31
Command line:
32
  gmx mdrun -quiet -v -resethway -noconfout -pin on -ntmpi 1 -ntomp 16 -nsteps 10000 -g test_1x16_notune_5 -notunepme
33

    
34
GROMACS version:    VERSION 5.1.2-dev-20160113-8b14e14
35
GIT SHA1 hash:      8b14e14f4a18193eacc86a2da9a4d812df0e03eb
36
Precision:          single
37
Memory model:       64 bit
38
MPI library:        thread_mpi
39
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
40
GPU support:        enabled
41
OpenCL support:     disabled
42
invsqrt routine:    gmx_software_invsqrt(x)
43
SIMD instructions:  AVX2_256
44
FFT library:        fftw-3.3.4-sse2-avx
45
RDTSCP usage:       enabled
46
C++11 compilation:  disabled
47
TNG support:        enabled
48
Tracing support:    disabled
49
Built on:           Mon Sep 14 15:56:07 CEST 2015
50
Built by:           pszilard@tcbs14 [CMAKE]
51
Build OS/arch:      Linux 3.13.0-63-generic x86_64
52
Build CPU vendor:   GenuineIntel
53
Build CPU brand:    Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
54
Build CPU family:   6   Model: 63   Stepping: 2
55
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
56
C compiler:         /usr/bin/gcc-4.8 GNU 4.8.1
57
C compiler flags:    -march=core-avx2    -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
58
C++ compiler:       /usr/bin/g++-4.8 GNU 4.8.1
59
C++ compiler flags:  -march=core-avx2    -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
60
Boost version:      1.55.0 (internal)
61
CUDA compiler:      /opt/tcbsys/cuda/6.5/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2014 NVIDIA Corporation;Built on Wed_Aug_27_10:36:36_CDT_2014;Cuda compilation tools, release 6.5, V6.5.16
62
CUDA compiler flags:-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-use_fast_math;-Xptxas;-dlcm=ca; ;-march=core-avx2;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;-Wno-array-bounds;
63
CUDA driver:        7.50
64
CUDA runtime:       6.50
65

    
66

    
67
Running on 1 node with total 8 cores, 16 logical cores, 1 compatible GPU
68
Hardware detected:
69
  CPU info:
70
    Vendor: GenuineIntel
71
    Brand:  Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
72
    Family:  6  model: 63  stepping:  2
73
    CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
74
    SIMD instructions most likely to fit this hardware: AVX2_256
75
    SIMD instructions selected at GROMACS compile time: AVX2_256
76
  GPU info:
77
    Number of GPUs detected: 1
78
    #0: NVIDIA Quadro M6000, compute cap.: 5.2, ECC:  no, stat: compatible
79

    
80

    
81
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
82
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
83
Lindahl
84
GROMACS: High performance molecular simulations through multi-level
85
parallelism from laptops to supercomputers
86
SoftwareX 1 (2015) pp. 19-25
87
-------- -------- --- Thank You --- -------- --------
88

    
89

    
90
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
91
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
92
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
93
GROMACS
94
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
95
-------- -------- --- Thank You --- -------- --------
96

    
97

    
98
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
99
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
100
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
101
GROMACS 4.5: a high-throughput and highly parallel open source molecular
102
simulation toolkit
103
Bioinformatics 29 (2013) pp. 845-54
104
-------- -------- --- Thank You --- -------- --------
105

    
106

    
107
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
108
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
109
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
110
molecular simulation
111
J. Chem. Theory Comput. 4 (2008) pp. 435-447
112
-------- -------- --- Thank You --- -------- --------
113

    
114

    
115
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
116
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
117
Berendsen
118
GROMACS: Fast, Flexible and Free
119
J. Comp. Chem. 26 (2005) pp. 1701-1719
120
-------- -------- --- Thank You --- -------- --------
121

    
122

    
123
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
124
E. Lindahl and B. Hess and D. van der Spoel
125
GROMACS 3.0: A package for molecular simulation and trajectory analysis
126
J. Mol. Mod. 7 (2001) pp. 306-317
127
-------- -------- --- Thank You --- -------- --------
128

    
129

    
130
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
131
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
132
GROMACS: A message-passing parallel molecular dynamics implementation
133
Comp. Phys. Comm. 91 (1995) pp. 43-56
134
-------- -------- --- Thank You --- -------- --------
135

    
136

    
137
For optimal performance with a GPU nstlist (now 10) should be larger.
138
The optimum depends on your CPU and GPU resources.
139
You might want to try several nstlist values.
140
Changing nstlist from 10 to 40, rlist from 0.9 to 0.996
141

    
142
Input Parameters:
143
   integrator                     = md
144
   tinit                          = 0
145
   dt                             = 0.002
146
   nsteps                         = 10000
147
   init-step                      = 0
148
   simulation-part                = 1
149
   comm-mode                      = Linear
150
   nstcomm                        = 100
151
   bd-fric                        = 0
152
   ld-seed                        = 4200386634
153
   emtol                          = 10
154
   emstep                         = 0.01
155
   niter                          = 20
156
   fcstep                         = 0
157
   nstcgsteep                     = 1000
158
   nbfgscorr                      = 10
159
   rtpi                           = 0.05
160
   nstxout                        = 0
161
   nstvout                        = 0
162
   nstfout                        = 0
163
   nstlog                         = 0
164
   nstcalcenergy                  = 100
165
   nstenergy                      = 500
166
   nstxout-compressed             = 0
167
   compressed-x-precision         = 1000
168
   cutoff-scheme                  = Verlet
169
   nstlist                        = 40
170
   ns-type                        = Grid
171
   pbc                            = xyz
172
   periodic-molecules             = FALSE
173
   verlet-buffer-tolerance        = 0.005
174
   rlist                          = 0.996
175
   rlistlong                      = 0.996
176
   nstcalclr                      = 10
177
   coulombtype                    = PME
178
   coulomb-modifier               = Potential-shift
179
   rcoulomb-switch                = 0
180
   rcoulomb                       = 0.9
181
   epsilon-r                      = 1
182
   epsilon-rf                     = inf
183
   vdw-type                       = Cut-off
184
   vdw-modifier                   = Potential-shift
185
   rvdw-switch                    = 0
186
   rvdw                           = 0.9
187
   DispCorr                       = No
188
   table-extension                = 1
189
   fourierspacing                 = 0.1125
190
   fourier-nx                     = 56
191
   fourier-ny                     = 56
192
   fourier-nz                     = 56
193
   pme-order                      = 4
194
   ewald-rtol                     = 1e-05
195
   ewald-rtol-lj                  = 0.001
196
   lj-pme-comb-rule               = Geometric
197
   ewald-geometry                 = 0
198
   epsilon-surface                = 0
199
   implicit-solvent               = No
200
   gb-algorithm                   = Still
201
   nstgbradii                     = 1
202
   rgbradii                       = 1
203
   gb-epsilon-solvent             = 80
204
   gb-saltconc                    = 0
205
   gb-obc-alpha                   = 1
206
   gb-obc-beta                    = 0.8
207
   gb-obc-gamma                   = 4.85
208
   gb-dielectric-offset           = 0.009
209
   sa-algorithm                   = Ace-approximation
210
   sa-surface-tension             = 2.05016
211
   tcoupl                         = V-rescale
212
   nsttcouple                     = 10
213
   nh-chain-length                = 0
214
   print-nose-hoover-chain-variables = FALSE
215
   pcoupl                         = No
216
   pcoupltype                     = Isotropic
217
   nstpcouple                     = -1
218
   tau-p                          = 1
219
   compressibility (3x3):
220
      compressibility[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
221
      compressibility[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
222
      compressibility[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
223
   ref-p (3x3):
224
      ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
225
      ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
226
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
227
   refcoord-scaling               = No
228
   posres-com (3):
229
      posres-com[0]= 0.00000e+00
230
      posres-com[1]= 0.00000e+00
231
      posres-com[2]= 0.00000e+00
232
   posres-comB (3):
233
      posres-comB[0]= 0.00000e+00
234
      posres-comB[1]= 0.00000e+00
235
      posres-comB[2]= 0.00000e+00
236
   QMMM                           = FALSE
237
   QMconstraints                  = 0
238
   QMMMscheme                     = 0
239
   MMChargeScaleFactor            = 1
240
qm-opts:
241
   ngQM                           = 0
242
   constraint-algorithm           = Lincs
243
   continuation                   = FALSE
244
   Shake-SOR                      = FALSE
245
   shake-tol                      = 0.0001
246
   lincs-order                    = 4
247
   lincs-iter                     = 1
248
   lincs-warnangle                = 30
249
   nwall                          = 0
250
   wall-type                      = 9-3
251
   wall-r-linpot                  = -1
252
   wall-atomtype[0]               = -1
253
   wall-atomtype[1]               = -1
254
   wall-density[0]                = 0
255
   wall-density[1]                = 0
256
   wall-ewald-zfac                = 3
257
   pull                           = FALSE
258
   rotation                       = FALSE
259
   interactiveMD                  = FALSE
260
   disre                          = No
261
   disre-weighting                = Conservative
262
   disre-mixed                    = FALSE
263
   dr-fc                          = 1000
264
   dr-tau                         = 0
265
   nstdisreout                    = 100
266
   orire-fc                       = 0
267
   orire-tau                      = 0
268
   nstorireout                    = 100
269
   free-energy                    = no
270
   cos-acceleration               = 0
271
   deform (3x3):
272
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
273
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
274
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
275
   simulated-tempering            = FALSE
276
   E-x:
277
      n = 0
278
   E-xt:
279
      n = 0
280
   E-y:
281
      n = 0
282
   E-yt:
283
      n = 0
284
   E-z:
285
      n = 0
286
   E-zt:
287
      n = 0
288
   swapcoords                     = no
289
   adress                         = FALSE
290
   userint1                       = 0
291
   userint2                       = 0
292
   userint3                       = 0
293
   userint4                       = 0
294
   userreal1                      = 0
295
   userreal2                      = 0
296
   userreal3                      = 0
297
   userreal4                      = 0
298
grpopts:
299
   nrdf:       48056
300
   ref-t:         300
301
   tau-t:         0.1
302
annealing:          No
303
annealing-npoints:           0
304
   acc:	           0           0           0
305
   nfreeze:           N           N           N
306
   energygrp-flags[  0]: 0
307

    
308

    
309
Overriding nsteps with value passed on the command line: 10000 steps, 20 ps
310

    
311
Using 1 MPI thread
312
Using 16 OpenMP threads 
313

    
314
1 compatible GPU is present, with ID 0
315
1 GPU auto-selected for this run.
316
Mapping of GPU ID to the 1 PP rank in this node: 0
317

    
318
Will do PME sum in reciprocal space for electrostatic interactions.
319

    
320
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
321
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen 
322
A smooth particle mesh Ewald method
323
J. Chem. Phys. 103 (1995) pp. 8577-8592
324
-------- -------- --- Thank You --- -------- --------
325

    
326
Will do ordinary reciprocal space Ewald sum.
327
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
328
Cut-off's:   NS: 0.996   Coulomb: 0.9   LJ: 0.9
329
System total charge: 0.000
330
Generated table with 998 data points for Ewald.
331
Tabscale = 500 points/nm
332
Generated table with 998 data points for LJ6.
333
Tabscale = 500 points/nm
334
Generated table with 998 data points for LJ12.
335
Tabscale = 500 points/nm
336
Generated table with 998 data points for 1-4 COUL.
337
Tabscale = 500 points/nm
338
Generated table with 998 data points for 1-4 LJ6.
339
Tabscale = 500 points/nm
340
Generated table with 998 data points for 1-4 LJ12.
341
Tabscale = 500 points/nm
342
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Ewald -1.000e-05
343
Initialized non-bonded Ewald correction tables, spacing: 8.85e-04 size: 1018
344

    
345

    
346
NOTE: GROMACS was configured without NVML support hence it can not exploit
347
      application clocks of the detected Quadro M6000 GPU to improve performance.
348
      Recompile with the NVML library (compatible with the driver used) or set application clocks manually.
349

    
350

    
351
Using GPU 8x8 non-bonded kernels
352

    
353
Removing pbc first time
354
Pinning threads with an auto-selected logical core stride of 1
355

    
356
Initializing LINear Constraint Solver
357

    
358
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
359
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
360
LINCS: A Linear Constraint Solver for molecular simulations
361
J. Comp. Chem. 18 (1997) pp. 1463-1472
362
-------- -------- --- Thank You --- -------- --------
363

    
364
The number of constraints is 2053
365

    
366
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
367
S. Miyamoto and P. A. Kollman
368
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
369
Water Models
370
J. Comp. Chem. 13 (1992) pp. 952-962
371
-------- -------- --- Thank You --- -------- --------
372

    
373
Center of mass motion removal mode is Linear
374
We have the following groups for center of mass motion removal:
375
  0:  rest
376

    
377
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
378
G. Bussi, D. Donadio and M. Parrinello
379
Canonical sampling through velocity rescaling
380
J. Chem. Phys. 126 (2007) pp. 014101
381
-------- -------- --- Thank You --- -------- --------
382

    
383
There are: 24040 Atoms
384

    
385
Constraining the starting coordinates (step 0)
386

    
387
Constraining the coordinates at t0-dt (step 0)
388
RMS relative constraint deviation after constraining: 1.20e-05
389
Initial temperature: 297.8 K
390

    
391
Started mdrun on rank 0 Wed Jan 13 17:37:25 2016
392
           Step           Time         Lambda
393
              0        0.00000        0.00000
394

    
395
   Energies (kJ/mol)
396
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
397
    4.44103e+03    5.70375e+03    2.50388e+02    2.00472e+03    1.68037e+04
398
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
399
    4.16574e+04   -3.84143e+05    3.38823e+03   -3.09894e+05    5.99548e+04
400
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
401
   -2.49939e+05   -2.49939e+05    3.00104e+02   -3.53152e+02    2.74372e-05
402

    
403

    
404
step 5000: resetting all time and cycle counters
405

    
406
Restarted time on rank 0 Wed Jan 13 17:37:30 2016
407
           Step           Time         Lambda
408
          10000       20.00000        0.00000
409

    
410
   Energies (kJ/mol)
411
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
412
    4.58476e+03    5.58935e+03    2.43991e+02    2.08611e+03    1.69315e+04
413
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
414
    4.22963e+04   -3.85657e+05    3.49717e+03   -3.10428e+05    5.94818e+04
415
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
416
   -2.50946e+05   -2.50372e+05    2.97737e+02   -2.83088e+02    2.85865e-05
417

    
418
	<======  ###############  ==>
419
	<====  A V E R A G E S  ====>
420
	<==  ###############  ======>
421

    
422
	Statistics over 10001 steps using 101 frames
423

    
424
   Energies (kJ/mol)
425
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
426
    4.47830e+03    5.59856e+03    2.59648e+02    2.08591e+03    1.68327e+04
427
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
428
    4.21674e+04   -3.85226e+05    3.45553e+03   -3.10348e+05    5.99840e+04
429
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
430
   -2.50364e+05   -2.50160e+05    3.00251e+02   -2.13938e+02    0.00000e+00
431

    
432
   Total Virial (kJ/mol)
433
    2.16720e+04    4.09824e+01   -8.46002e+01
434
    4.08116e+01    2.15351e+04    7.10434e+01
435
   -8.39866e+01    7.12612e+01    2.15554e+04
436

    
437
   Pressure (bar)
438
   -2.21857e+02   -5.57502e+00    7.41399e+00
439
   -5.55208e+00   -2.07590e+02   -1.19342e+01
440
    7.33157e+00   -1.19634e+01   -2.12367e+02
441

    
442

    
443
	M E G A - F L O P S   A C C O U N T I N G
444

    
445
 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
446
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
447
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
448
 V&F=Potential and force  V=Potential only  F=Force only
449

    
450
 Computing:                               M-Number         M-Flops  % Flops
451
-----------------------------------------------------------------------------
452
 Pair Search distance check             372.519600        3352.676     0.1
453
 NxN Ewald Elec. + LJ [F]             79468.710592     5244934.899    92.3
454
 NxN Ewald Elec. + LJ [V&F]             818.982464       87631.124     1.5
455
 1,4 nonbonded interactions              26.710341        2403.931     0.0
456
 Calc Weights                           360.672120       12984.196     0.2
457
 Spread Q Bspline                      7694.338560       15388.677     0.3
458
 Gather F Bspline                      7694.338560       46166.031     0.8
459
 3D-FFT                               30602.049186      244816.393     4.3
460
 Solve PME                               15.683136        1003.721     0.0
461
 Shift-X                                  3.029040          18.174     0.0
462
 Angles                                  18.523704        3111.982     0.1
463
 Propers                                 27.915582        6392.668     0.1
464
 Impropers                                2.110422         438.968     0.0
465
 Virial                                   1.228335          22.110     0.0
466
 Stop-CM                                  1.226040          12.260     0.0
467
 Calc-Ekin                               24.064040         649.729     0.0
468
 Lincs                                   10.267053         616.023     0.0
469
 Lincs-Mat                              222.284448         889.138     0.0
470
 Constraint-V                           130.596114        1044.769     0.0
471
 Constraint-Vir                           1.227111          29.451     0.0
472
 Settle                                  36.687336       11850.010     0.2
473
-----------------------------------------------------------------------------
474
 Total                                                 5683756.931   100.0
475
-----------------------------------------------------------------------------
476

    
477

    
478
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
479

    
480
On 1 MPI rank, each using 16 OpenMP threads
481

    
482
 Computing:          Num   Num      Call    Wall time         Giga-Cycles
483
                     Ranks Threads  Count      (s)         total sum    %
484
-----------------------------------------------------------------------------
485
 Neighbor search        1   16        126       0.140          6.732   3.1
486
 Launch GPU ops.        1   16       5001       0.173          8.326   3.8
487
 Force                  1   16       5001       0.536         25.729  11.8
488
 PME mesh               1   16       5001       2.695        129.434  59.2
489
 Wait GPU local         1   16       5001       0.041          1.959   0.9
490
 NB X/F buffer ops.     1   16       9876       0.115          5.520   2.5
491
 Update                 1   16       5001       0.199          9.569   4.4
492
 Constraints            1   16       5001       0.594         28.510  13.0
493
 Rest                                           0.062          2.993   1.4
494
-----------------------------------------------------------------------------
495
 Total                                          4.556        218.774 100.0
496
-----------------------------------------------------------------------------
497
 Breakdown of PME mesh computation
498
-----------------------------------------------------------------------------
499
 PME spread/gather      1   16      10002       1.540         73.956  33.8
500
 PME 3D-FFT             1   16      10002       1.004         48.205  22.0
501
 PME solve Elec         1   16       5001       0.139          6.665   3.0
502
-----------------------------------------------------------------------------
503
 Breakdown of PP computation
504
-----------------------------------------------------------------------------
505
 NS grid local          1   16        126       0.029          1.391   0.6
506
 NS search local        1   16        126       0.102          4.879   2.2
507
 Bonded F               1   16       5001       0.301         14.442   6.6
508
 Listed buffer ops.     1   16       5001       0.168          8.082   3.7
509
 NB X buffer ops.       1   16       4875       0.056          2.668   1.2
510
 NB F buffer ops.       1   16       5001       0.059          2.830   1.3
511
-----------------------------------------------------------------------------
512

    
513
 GPU timings
514
-----------------------------------------------------------------------------
515
 Computing:                         Count  Wall t (s)      ms/step       %
516
-----------------------------------------------------------------------------
517
 Pair list H2D                        126       0.014        0.109     0.5
518
 X / q H2D                           5001       0.199        0.040     7.6
519
 Nonbonded F kernel                  4850       2.154        0.444    82.0
520
 Nonbonded F+ene k.                    25       0.017        0.680     0.6
521
 Nonbonded F+prune k.                 100       0.058        0.579     2.2
522
 Nonbonded F+ene+prune k.              26       0.021        0.822     0.8
523
 F D2H                               5001       0.162        0.032     6.2
524
-----------------------------------------------------------------------------
525
 Total                                          2.625        0.525   100.0
526
-----------------------------------------------------------------------------
527

    
528
Force evaluation time GPU/CPU: 0.525 ms/0.646 ms = 0.812
529
For optimal performance this ratio should be close to 1!
530

    
531
               Core t (s)   Wall t (s)        (%)
532
       Time:       72.644        4.556     1594.6
533
                 (ns/day)    (hour/ns)
534
Performance:      189.689        0.127
535
Finished mdrun on rank 0 Wed Jan 13 17:37:35 2016