Project

General

Profile

_test3.log

Szilárd Páll, 12/16/2015 03:27 PM

 
1
Log file opened on Wed Dec 16 15:23:16 2015
2
Host: tcbs21  pid: 13912  rank ID: 0  number of ranks:  1
3
     :-) GROMACS - gmx mdrun, VERSION 5.2-dev-20151215-ccf04b2-unknown (-:
4

    
5
                            GROMACS is written by:
6
     Emile Apol      Rossen Apostolov  Herman J.C. Berendsen    Par Bjelkmar   
7
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra    Gerrit Groenhof  
8
 Christoph Junghans   Anca Hamuraru    Vincent Hindriksen Dimitrios Karkoulis
9
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
10
  Justin A. Lemkul   Magnus Lundborg   Pieter Meulenhoff    Erik Marklund   
11
   Teemu Murtola       Szilard Pall       Sander Pronk      Roland Schulz   
12
  Alexey Shvetsov     Michael Shirts     Alfons Sijbers     Peter Tieleman  
13
  Teemu Virolainen  Christian Wennberg    Maarten Wolf   
14
                           and the project leaders:
15
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
16

    
17
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
18
Copyright (c) 2001-2015, The GROMACS development team at
19
Uppsala University, Stockholm University and
20
the Royal Institute of Technology, Sweden.
21
check out http://www.gromacs.org for more information.
22

    
23
GROMACS is free software; you can redistribute it and/or modify it
24
under the terms of the GNU Lesser General Public License
25
as published by the Free Software Foundation; either version 2.1
26
of the License, or (at your option) any later version.
27

    
28
GROMACS:      gmx mdrun, VERSION 5.2-dev-20151215-ccf04b2-unknown
29
Executable:   /nethome/pszilard-projects/gromacs/tmp/gromacs-master_multi-NB/build_sb_gcc48_cuda75/bin/gmx
30
Data prefix:  /nethome/pszilard/projects/gromacs/tmp/gromacs-master_multi-NB (source tree)
31
Command line:
32
  gmx mdrun -quiet -v -resethway -noconfout -pin on -nsteps 10000 -s topol -ntmpi 1 -ntomp 12 -g _test3
33

    
34
GROMACS version:    VERSION 5.2-dev-20151215-ccf04b2-unknown
35
GIT SHA1 hash:      ccf04b2a5c9009eb0ba89f170bab998e602590d8
36
Branched from:      unknown
37
Precision:          single
38
Memory model:       64 bit
39
MPI library:        thread_mpi
40
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
41
GPU support:        enabled
42
OpenCL support:     disabled
43
invsqrt routine:    gmx_software_invsqrt(x)
44
SIMD instructions:  AVX_256
45
FFT library:        fftw-3.3.4-sse2-avx
46
RDTSCP usage:       enabled
47
TNG support:        enabled
48
Tracing support:    disabled
49
Built on:           Mon Dec 14 21:07:30 CET 2015
50
Built by:           pszilard@tcbs21 [CMAKE]
51
Build OS/arch:      Linux 3.13.0-71-generic x86_64
52
Build CPU vendor:   GenuineIntel
53
Build CPU brand:    Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
54
Build CPU family:   6   Model: 45   Stepping: 7
55
Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
56
C compiler:         /usr/bin/gcc-4.8 GNU 4.8.1
57
C compiler flags:    -mavx    -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
58
C++ compiler:       /usr/bin/g++-4.8 GNU 4.8.1
59
C++ compiler flags:  -mavx   -std=c++0x  -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
60
CUDA compiler:      /opt/tcbsys/cuda/7.5/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2015 NVIDIA Corporation;Built on Tue_Aug_11_14:27:32_CDT_2015;Cuda compilation tools, release 7.5, V7.5.17
61
CUDA compiler flags:-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_52,code=sm_52;-use_fast_math;-ccbin=/usr/bin/gcc-4.8;;;-Xcompiler;,-mavx,,,,,-Wextra,-Wno-missing-field-initializers,-Wpointer-arith,-Wall,-Wno-unused-function,-fopenmp;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,-Wno-array-bounds,; 
62
CUDA driver:        7.50
63
CUDA runtime:       7.50
64

    
65

    
66
Running on 1 node with total 12 cores, 24 logical cores, 2 compatible GPUs
67
Hardware detected:
68
  CPU info:
69
    Vendor: GenuineIntel
70
    Brand:  Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
71
    Family:  6  model: 45  stepping:  7
72
    CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
73
    SIMD instructions most likely to fit this hardware: AVX_256
74
    SIMD instructions selected at GROMACS compile time: AVX_256
75
  GPU info:
76
    Number of GPUs detected: 2
77
    #0: NVIDIA Tesla K20c, compute cap.: 3.5, ECC: yes, stat: compatible
78
    #1: NVIDIA Tesla K20c, compute cap.: 3.5, ECC: yes, stat: compatible
79

    
80

    
81
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
82
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
83
Lindahl
84
GROMACS: High performance molecular simulations through multi-level
85
parallelism from laptops to supercomputers
86
SoftwareX 1 (2015) pp. 19-25
87
-------- -------- --- Thank You --- -------- --------
88

    
89

    
90
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
91
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
92
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
93
GROMACS
94
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
95
-------- -------- --- Thank You --- -------- --------
96

    
97

    
98
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
99
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
100
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
101
GROMACS 4.5: a high-throughput and highly parallel open source molecular
102
simulation toolkit
103
Bioinformatics 29 (2013) pp. 845-54
104
-------- -------- --- Thank You --- -------- --------
105

    
106

    
107
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
108
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
109
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
110
molecular simulation
111
J. Chem. Theory Comput. 4 (2008) pp. 435-447
112
-------- -------- --- Thank You --- -------- --------
113

    
114

    
115
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
116
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
117
Berendsen
118
GROMACS: Fast, Flexible and Free
119
J. Comp. Chem. 26 (2005) pp. 1701-1719
120
-------- -------- --- Thank You --- -------- --------
121

    
122

    
123
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
124
E. Lindahl and B. Hess and D. van der Spoel
125
GROMACS 3.0: A package for molecular simulation and trajectory analysis
126
J. Mol. Mod. 7 (2001) pp. 306-317
127
-------- -------- --- Thank You --- -------- --------
128

    
129

    
130
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
131
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
132
GROMACS: A message-passing parallel molecular dynamics implementation
133
Comp. Phys. Comm. 91 (1995) pp. 43-56
134
-------- -------- --- Thank You --- -------- --------
135

    
136

    
137
For optimal performance with a GPU nstlist (now 10) should be larger.
138
The optimum depends on your CPU and GPU resources.
139
You might want to try several nstlist values.
140
Changing nstlist from 10 to 40, rlist from 0.9 to 0.996
141

    
142
Input Parameters:
143
   integrator                     = md
144
   tinit                          = 0
145
   dt                             = 0.002
146
   nsteps                         = 10000
147
   init-step                      = 0
148
   simulation-part                = 1
149
   comm-mode                      = Linear
150
   nstcomm                        = 100
151
   bd-fric                        = 0
152
   ld-seed                        = 4200386634
153
   emtol                          = 10
154
   emstep                         = 0.01
155
   niter                          = 20
156
   fcstep                         = 0
157
   nstcgsteep                     = 1000
158
   nbfgscorr                      = 10
159
   rtpi                           = 0.05
160
   nstxout                        = 0
161
   nstvout                        = 0
162
   nstfout                        = 0
163
   nstlog                         = 0
164
   nstcalcenergy                  = 100
165
   nstenergy                      = 500
166
   nstxout-compressed             = 0
167
   compressed-x-precision         = 1000
168
   cutoff-scheme                  = Verlet
169
   nstlist                        = 40
170
   ns-type                        = Grid
171
   pbc                            = xyz
172
   periodic-molecules             = FALSE
173
   verlet-buffer-tolerance        = 0.005
174
   rlist                          = 0.996
175
   rlistlong                      = 0.996
176
   nstcalclr                      = 10
177
   coulombtype                    = PME
178
   coulomb-modifier               = Potential-shift
179
   rcoulomb-switch                = 0
180
   rcoulomb                       = 0.9
181
   epsilon-r                      = 1
182
   epsilon-rf                     = inf
183
   vdw-type                       = Cut-off
184
   vdw-modifier                   = Potential-shift
185
   rvdw-switch                    = 0
186
   rvdw                           = 0.9
187
   DispCorr                       = No
188
   table-extension                = 1
189
   fourierspacing                 = 0.1125
190
   fourier-nx                     = 56
191
   fourier-ny                     = 56
192
   fourier-nz                     = 56
193
   pme-order                      = 4
194
   ewald-rtol                     = 1e-05
195
   ewald-rtol-lj                  = 0.001
196
   lj-pme-comb-rule               = Geometric
197
   ewald-geometry                 = 0
198
   epsilon-surface                = 0
199
   implicit-solvent               = No
200
   gb-algorithm                   = Still
201
   nstgbradii                     = 1
202
   rgbradii                       = 1
203
   gb-epsilon-solvent             = 80
204
   gb-saltconc                    = 0
205
   gb-obc-alpha                   = 1
206
   gb-obc-beta                    = 0.8
207
   gb-obc-gamma                   = 4.85
208
   gb-dielectric-offset           = 0.009
209
   sa-algorithm                   = Ace-approximation
210
   sa-surface-tension             = 2.05016
211
   tcoupl                         = V-rescale
212
   nsttcouple                     = 10
213
   nh-chain-length                = 0
214
   print-nose-hoover-chain-variables = FALSE
215
   pcoupl                         = No
216
   pcoupltype                     = Isotropic
217
   nstpcouple                     = -1
218
   tau-p                          = 1
219
   compressibility (3x3):
220
      compressibility[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
221
      compressibility[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
222
      compressibility[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
223
   ref-p (3x3):
224
      ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
225
      ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
226
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
227
   refcoord-scaling               = No
228
   posres-com (3):
229
      posres-com[0]= 0.00000e+00
230
      posres-com[1]= 0.00000e+00
231
      posres-com[2]= 0.00000e+00
232
   posres-comB (3):
233
      posres-comB[0]= 0.00000e+00
234
      posres-comB[1]= 0.00000e+00
235
      posres-comB[2]= 0.00000e+00
236
   QMMM                           = FALSE
237
   QMconstraints                  = 0
238
   QMMMscheme                     = 0
239
   MMChargeScaleFactor            = 1
240
qm-opts:
241
   ngQM                           = 0
242
   constraint-algorithm           = Lincs
243
   continuation                   = FALSE
244
   Shake-SOR                      = FALSE
245
   shake-tol                      = 0.0001
246
   lincs-order                    = 4
247
   lincs-iter                     = 1
248
   lincs-warnangle                = 30
249
   nwall                          = 0
250
   wall-type                      = 9-3
251
   wall-r-linpot                  = -1
252
   wall-atomtype[0]               = -1
253
   wall-atomtype[1]               = -1
254
   wall-density[0]                = 0
255
   wall-density[1]                = 0
256
   wall-ewald-zfac                = 3
257
   pull                           = FALSE
258
   rotation                       = FALSE
259
   interactiveMD                  = FALSE
260
   disre                          = No
261
   disre-weighting                = Conservative
262
   disre-mixed                    = FALSE
263
   dr-fc                          = 1000
264
   dr-tau                         = 0
265
   nstdisreout                    = 100
266
   orire-fc                       = 0
267
   orire-tau                      = 0
268
   nstorireout                    = 100
269
   free-energy                    = no
270
   cos-acceleration               = 0
271
   deform (3x3):
272
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
273
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
274
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
275
   simulated-tempering            = FALSE
276
   E-x:
277
      n = 0
278
   E-xt:
279
      n = 0
280
   E-y:
281
      n = 0
282
   E-yt:
283
      n = 0
284
   E-z:
285
      n = 0
286
   E-zt:
287
      n = 0
288
   swapcoords                     = no
289
   userint1                       = 0
290
   userint2                       = 0
291
   userint3                       = 0
292
   userint4                       = 0
293
   userreal1                      = 0
294
   userreal2                      = 0
295
   userreal3                      = 0
296
   userreal4                      = 0
297
grpopts:
298
   nrdf:       48056
299
   ref-t:         300
300
   tau-t:         0.1
301
annealing:          No
302
annealing-npoints:           0
303
   acc:	           0           0           0
304
   nfreeze:           N           N           N
305
   energygrp-flags[  0]: 0
306

    
307

    
308
Overriding nsteps with value passed on the command line: 10000 steps, 20 ps
309

    
310
Using 1 MPI thread
311
Using 12 OpenMP threads 
312

    
313
2 compatible GPUs are present, with IDs 0,1
314
1 GPU auto-selected for this run.
315
Number of tasks per PP rank: 1
316
Mapping of GPU ID to the 1 PP rank in this node: 0
317

    
318

    
319
NOTE: potentially sub-optimal launch configuration, gmx mdrun started with less
320
      non-bonded tasks across PP thread-MPI thread than GPUs available.
321
      Each PP thread-MPI thread is requested to use 1 GPU, 1 GPU will be used.
322

    
323
Will do PME sum in reciprocal space for electrostatic interactions.
324

    
325
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
326
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen 
327
A smooth particle mesh Ewald method
328
J. Chem. Phys. 103 (1995) pp. 8577-8592
329
-------- -------- --- Thank You --- -------- --------
330

    
331
Will do ordinary reciprocal space Ewald sum.
332
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
333
Cut-off's:   NS: 0.996   Coulomb: 0.9   LJ: 0.9
334
System total charge: 0.000
335
Generated table with 998 data points for Ewald.
336
Tabscale = 500 points/nm
337
Generated table with 998 data points for LJ6.
338
Tabscale = 500 points/nm
339
Generated table with 998 data points for LJ12.
340
Tabscale = 500 points/nm
341
Generated table with 998 data points for 1-4 COUL.
342
Tabscale = 500 points/nm
343
Generated table with 998 data points for 1-4 LJ6.
344
Tabscale = 500 points/nm
345
Generated table with 998 data points for 1-4 LJ12.
346
Tabscale = 500 points/nm
347
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Ewald -1.000e-05
348
Initialized non-bonded Ewald correction tables, spacing: 8.85e-04 size: 1018
349

    
350

    
351
NOTE: GROMACS was configured without NVML support hence it can not exploit
352
      application clocks of the detected Tesla K20c GPU to improve performance.
353
      Recompile with the NVML library (compatible with the driver used) or set application clocks manually.
354

    
355

    
356
Using GPU 8x8 non-bonded kernels
357

    
358
Removing pbc first time
359
Pinning threads with an auto-selected logical core stride of 2
360

    
361
Initializing LINear Constraint Solver
362

    
363
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
364
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
365
LINCS: A Linear Constraint Solver for molecular simulations
366
J. Comp. Chem. 18 (1997) pp. 1463-1472
367
-------- -------- --- Thank You --- -------- --------
368

    
369
The number of constraints is 2053
370

    
371
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
372
S. Miyamoto and P. A. Kollman
373
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
374
Water Models
375
J. Comp. Chem. 13 (1992) pp. 952-962
376
-------- -------- --- Thank You --- -------- --------
377

    
378
Center of mass motion removal mode is Linear
379
We have the following groups for center of mass motion removal:
380
  0:  rest
381

    
382
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
383
G. Bussi, D. Donadio and M. Parrinello
384
Canonical sampling through velocity rescaling
385
J. Chem. Phys. 126 (2007) pp. 014101
386
-------- -------- --- Thank You --- -------- --------
387

    
388
There are: 24040 Atoms
389

    
390
Constraining the starting coordinates (step 0)
391

    
392
Constraining the coordinates at t0-dt (step 0)
393
RMS relative constraint deviation after constraining: 1.20e-05
394
Initial temperature: 297.8 K
395

    
396
Started mdrun on rank 0 Wed Dec 16 15:23:18 2015
397
           Step           Time
398
              0        0.00000
399

    
400
   Energies (kJ/mol)
401
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
402
    4.44103e+03    5.70375e+03    2.50388e+02    2.00472e+03    1.68037e+04
403
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
404
    4.16575e+04   -3.84143e+05    3.38823e+03   -3.09894e+05    5.99548e+04
405
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
406
   -2.49939e+05   -2.49939e+05    3.00104e+02   -3.53129e+02    2.74409e-05
407

    
408
step   80: timed with pme grid 56 56 56, coulomb cutoff 0.900: 153.9 M-cycles
409
step  160: timed with pme grid 48 48 48, coulomb cutoff 1.046: 150.9 M-cycles
410
step  240: timed with pme grid 44 44 44, coulomb cutoff 1.141: 168.6 M-cycles
411
step  320: timed with pme grid 40 40 40, coulomb cutoff 1.255: 192.3 M-cycles
412
step  400: timed with pme grid 42 42 42, coulomb cutoff 1.196: 173.7 M-cycles
413
step  480: timed with pme grid 44 44 44, coulomb cutoff 1.141: 162.4 M-cycles
414
step  560: timed with pme grid 48 48 48, coulomb cutoff 1.046: 144.7 M-cycles
415
step  640: timed with pme grid 52 52 52, coulomb cutoff 0.966: 133.3 M-cycles
416
step  720: timed with pme grid 48 48 48, coulomb cutoff 1.046: 146.9 M-cycles
417
step  800: timed with pme grid 52 52 52, coulomb cutoff 0.966: 130.5 M-cycles
418
step  880: timed with pme grid 48 48 48, coulomb cutoff 1.046: 148.4 M-cycles
419
step  960: timed with pme grid 52 52 52, coulomb cutoff 0.966: 129.9 M-cycles
420
              optimal pme grid 52 52 52, coulomb cutoff 0.966
421

    
422
step 5000: resetting all time and cycle counters
423

    
424
Restarted time on rank 0 Wed Dec 16 15:23:30 2015
425
           Step           Time
426
          10000       20.00000
427

    
428
   Energies (kJ/mol)
429
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
430
    4.41166e+03    5.58003e+03    2.67546e+02    2.09612e+03    1.66252e+04
431
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
432
    4.25887e+04   -3.84838e+05    2.69867e+03   -3.10570e+05    5.98998e+04
433
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
434
   -2.50670e+05   -2.50374e+05    2.99829e+02   -1.75286e+02    2.75183e-05
435

    
436
	<======  ###############  ==>
437
	<====  A V E R A G E S  ====>
438
	<==  ###############  ======>
439

    
440
	Statistics over 10001 steps using 101 frames
441

    
442
   Energies (kJ/mol)
443
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
444
    4.45182e+03    5.58001e+03    2.57706e+02    2.08478e+03    1.68188e+04
445
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
446
    4.20979e+04   -3.83936e+05    2.59784e+03   -3.10047e+05    5.99351e+04
447
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
448
   -2.50112e+05   -2.50151e+05    3.00006e+02   -1.83562e+02    0.00000e+00
449

    
450
   Total Virial (kJ/mol)
451
    2.12579e+04    4.14745e+01    1.20121e+02
452
    4.15914e+01    2.12142e+04    5.61588e+01
453
    1.20064e+02    5.53039e+01    2.15631e+04
454

    
455
   Pressure (bar)
456
   -1.69704e+02   -4.77370e+00   -1.72608e+01
457
   -4.78941e+00   -1.65949e+02   -9.46455e+00
458
   -1.72531e+01   -9.34973e+00   -2.15033e+02
459

    
460

    
461
       P P   -   P M E   L O A D   B A L A N C I N G
462

    
463
 PP/PME load balancing changed the cut-off and PME settings:
464
           particle-particle                    PME
465
            rcoulomb  rlist            grid      spacing   1/beta
466
   initial  0.900 nm  0.996 nm      56  56  56   0.112 nm  0.288 nm
467
   final    0.966 nm  1.062 nm      52  52  52   0.121 nm  0.309 nm
468
 cost-ratio           1.21             0.80
469
 (note that these numbers concern only part of the total PP and PME load)
470

    
471

    
472
	M E G A - F L O P S   A C C O U N T I N G
473

    
474
 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
475
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
476
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
477
 V&F=Potential and force  V=Potential only  F=Force only
478

    
479
 Computing:                               M-Number         M-Flops  % Flops
480
-----------------------------------------------------------------------------
481
 Pair Search distance check             385.499568        3469.496     0.1
482
 NxN Ewald Elec. + LJ [F]             92663.234048     6115773.447    93.8
483
 NxN Ewald Elec. + LJ [V&F]             954.844352      102168.346     1.6
484
 1,4 nonbonded interactions              26.710341        2403.931     0.0
485
 Calc Weights                           360.672120       12984.196     0.2
486
 Spread Q Bspline                      7694.338560       15388.677     0.2
487
 Gather F Bspline                      7694.338560       46166.031     0.7
488
 3D-FFT                               24050.629164      192405.033     3.0
489
 Solve PME                               13.522704         865.453     0.0
490
 Shift-X                                  3.029040          18.174     0.0
491
 Angles                                  18.523704        3111.982     0.0
492
 Propers                                 27.915582        6392.668     0.1
493
 Impropers                                2.110422         438.968     0.0
494
 Virial                                   1.228335          22.110     0.0
495
 Stop-CM                                  1.226040          12.260     0.0
496
 Calc-Ekin                               24.064040         649.729     0.0
497
 Lincs                                   10.267053         616.023     0.0
498
 Lincs-Mat                              222.284448         889.138     0.0
499
 Constraint-V                           130.596114        1044.769     0.0
500
 Constraint-Vir                           1.227111          29.451     0.0
501
 Settle                                  36.687336       11850.010     0.2
502
-----------------------------------------------------------------------------
503
 Total                                                 6516699.893   100.0
504
-----------------------------------------------------------------------------
505

    
506

    
507
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
508

    
509
On 1 MPI rank, each using 12 OpenMP threads
510

    
511
 Computing:          Num   Num      Call    Wall time         Giga-Cycles
512
                     Ranks Threads  Count      (s)         total sum    %
513
-----------------------------------------------------------------------------
514
 Neighbor search        1   12        126       0.241          5.777   2.9
515
 Launch GPU ops.        1   12       5001       0.374          8.981   4.6
516
 Force                  1   12       5001       0.860         20.632  10.5
517
 PME mesh               1   12       5001       4.790        114.971  58.3
518
 Wait GPU local         1   12       5001       0.160          3.844   1.9
519
 NB X/F buffer ops.     1   12       9876       0.299          7.182   3.6
520
 Update                 1   12       5001       0.266          6.393   3.2
521
 Constraints            1   12       5001       1.133         27.195  13.8
522
 Rest                                           0.091          2.186   1.1
523
-----------------------------------------------------------------------------
524
 Total                                          8.214        197.160 100.0
525
-----------------------------------------------------------------------------
526
 Breakdown of PME mesh computation
527
-----------------------------------------------------------------------------
528
 PME spread/gather      1   12      10002       3.001         72.032  36.5
529
 PME 3D-FFT             1   12      10002       1.603         38.478  19.5
530
 PME solve Elec         1   12       5001       0.161          3.861   2.0
531
-----------------------------------------------------------------------------
532
 Breakdown of PP computation
533
-----------------------------------------------------------------------------
534
 NS grid local          1   12        126       0.055          1.314   0.7
535
 NS search local        1   12        126       0.171          4.108   2.1
536
 Bonded F               1   12       5001       0.484         11.608   5.9
537
 Listed buffer ops.     1   12       5001       0.033          0.796   0.4
538
 NB X buffer ops.       1   12       4875       0.138          3.321   1.7
539
 NB F buffer ops.       1   12       5001       0.160          3.830   1.9
540
-----------------------------------------------------------------------------
541

    
542
 GPU timings
543
-----------------------------------------------------------------------------
544
 Computing:                         Count  Wall t (s)      ms/step       %
545
-----------------------------------------------------------------------------
546
 Pair list H2D                        126       0.026        0.209     0.4
547
 X / q H2D                           5001       0.376        0.075     6.4
548
 Nonbonded F kernel                  4850       5.024        1.036    85.1
549
 Nonbonded F+ene k.                    25       0.035        1.419     0.6
550
 Nonbonded F+prune k.                 100       0.130        1.296     2.2
551
 Nonbonded F+ene+prune k.              26       0.045        1.725     0.8
552
 F D2H                               5001       0.270        0.054     4.6
553
-----------------------------------------------------------------------------
554
 Total                                          5.906        1.181   100.0
555
-----------------------------------------------------------------------------
556

    
557
Force evaluation time GPU/CPU: 1.181 ms/1.130 ms = 1.045
558
For optimal performance this ratio should be close to 1!
559

    
560
               Core t (s)   Wall t (s)        (%)
561
       Time:       98.471        8.214     1198.8
562
                 (ns/day)    (hour/ns)
563
Performance:      105.205        0.228
564
Finished mdrun on rank 0 Wed Dec 16 15:23:38 2015