Project

General

Profile

test_1x12_1task_noT.log

wrong tuning - Szilárd Páll, 12/16/2015 03:22 PM

 
1
Log file opened on Wed Dec 16 15:09:17 2015
2
Host: tcbs21  pid: 13341  rank ID: 0  number of ranks:  1
3
     :-) GROMACS - gmx mdrun, VERSION 5.2-dev-20151215-ccf04b2-unknown (-:
4

    
5
                            GROMACS is written by:
6
     Emile Apol      Rossen Apostolov  Herman J.C. Berendsen    Par Bjelkmar   
7
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra    Gerrit Groenhof  
8
 Christoph Junghans   Anca Hamuraru    Vincent Hindriksen Dimitrios Karkoulis
9
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
10
  Justin A. Lemkul   Magnus Lundborg   Pieter Meulenhoff    Erik Marklund   
11
   Teemu Murtola       Szilard Pall       Sander Pronk      Roland Schulz   
12
  Alexey Shvetsov     Michael Shirts     Alfons Sijbers     Peter Tieleman  
13
  Teemu Virolainen  Christian Wennberg    Maarten Wolf   
14
                           and the project leaders:
15
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
16

    
17
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
18
Copyright (c) 2001-2015, The GROMACS development team at
19
Uppsala University, Stockholm University and
20
the Royal Institute of Technology, Sweden.
21
check out http://www.gromacs.org for more information.
22

    
23
GROMACS is free software; you can redistribute it and/or modify it
24
under the terms of the GNU Lesser General Public License
25
as published by the Free Software Foundation; either version 2.1
26
of the License, or (at your option) any later version.
27

    
28
GROMACS:      gmx mdrun, VERSION 5.2-dev-20151215-ccf04b2-unknown
29
Executable:   /nethome/pszilard-projects/gromacs/tmp/gromacs-master_multi-NB/build_sb_gcc48_cuda75/bin/gmx
30
Data prefix:  /nethome/pszilard/projects/gromacs/tmp/gromacs-master_multi-NB (source tree)
31
Command line:
32
  gmx mdrun -quiet -v -resethway -noconfout -pin on -nsteps 10000 -s topol -ntmpi 1 -ntomp 12 -g test_1x12_1task_noT -gpu_id 0
33

    
34
GROMACS version:    VERSION 5.2-dev-20151215-ccf04b2-unknown
35
GIT SHA1 hash:      ccf04b2a5c9009eb0ba89f170bab998e602590d8
36
Branched from:      unknown
37
Precision:          single
38
Memory model:       64 bit
39
MPI library:        thread_mpi
40
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
41
GPU support:        enabled
42
OpenCL support:     disabled
43
invsqrt routine:    gmx_software_invsqrt(x)
44
SIMD instructions:  AVX_256
45
FFT library:        fftw-3.3.4-sse2-avx
46
RDTSCP usage:       enabled
47
TNG support:        enabled
48
Tracing support:    disabled
49
Built on:           Mon Dec 14 21:07:30 CET 2015
50
Built by:           pszilard@tcbs21 [CMAKE]
51
Build OS/arch:      Linux 3.13.0-71-generic x86_64
52
Build CPU vendor:   GenuineIntel
53
Build CPU brand:    Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
54
Build CPU family:   6   Model: 45   Stepping: 7
55
Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
56
C compiler:         /usr/bin/gcc-4.8 GNU 4.8.1
57
C compiler flags:    -mavx    -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
58
C++ compiler:       /usr/bin/g++-4.8 GNU 4.8.1
59
C++ compiler flags:  -mavx   -std=c++0x  -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
60
CUDA compiler:      /opt/tcbsys/cuda/7.5/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2015 NVIDIA Corporation;Built on Tue_Aug_11_14:27:32_CDT_2015;Cuda compilation tools, release 7.5, V7.5.17
61
CUDA compiler flags:-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_52,code=sm_52;-use_fast_math;-ccbin=/usr/bin/gcc-4.8;;;-Xcompiler;,-mavx,,,,,-Wextra,-Wno-missing-field-initializers,-Wpointer-arith,-Wall,-Wno-unused-function,-fopenmp;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,-Wno-array-bounds,; 
62
CUDA driver:        7.50
63
CUDA runtime:       7.50
64

    
65

    
66
Running on 1 node with total 12 cores, 24 logical cores, 2 compatible GPUs
67
Hardware detected:
68
  CPU info:
69
    Vendor: GenuineIntel
70
    Brand:  Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
71
    Family:  6  model: 45  stepping:  7
72
    CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
73
    SIMD instructions most likely to fit this hardware: AVX_256
74
    SIMD instructions selected at GROMACS compile time: AVX_256
75
  GPU info:
76
    Number of GPUs detected: 2
77
    #0: NVIDIA Tesla K20c, compute cap.: 3.5, ECC: yes, stat: compatible
78
    #1: NVIDIA Tesla K20c, compute cap.: 3.5, ECC: yes, stat: compatible
79

    
80

    
81
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
82
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
83
Lindahl
84
GROMACS: High performance molecular simulations through multi-level
85
parallelism from laptops to supercomputers
86
SoftwareX 1 (2015) pp. 19-25
87
-------- -------- --- Thank You --- -------- --------
88

    
89

    
90
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
91
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
92
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
93
GROMACS
94
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
95
-------- -------- --- Thank You --- -------- --------
96

    
97

    
98
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
99
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
100
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
101
GROMACS 4.5: a high-throughput and highly parallel open source molecular
102
simulation toolkit
103
Bioinformatics 29 (2013) pp. 845-54
104
-------- -------- --- Thank You --- -------- --------
105

    
106

    
107
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
108
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
109
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
110
molecular simulation
111
J. Chem. Theory Comput. 4 (2008) pp. 435-447
112
-------- -------- --- Thank You --- -------- --------
113

    
114

    
115
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
116
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
117
Berendsen
118
GROMACS: Fast, Flexible and Free
119
J. Comp. Chem. 26 (2005) pp. 1701-1719
120
-------- -------- --- Thank You --- -------- --------
121

    
122

    
123
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
124
E. Lindahl and B. Hess and D. van der Spoel
125
GROMACS 3.0: A package for molecular simulation and trajectory analysis
126
J. Mol. Mod. 7 (2001) pp. 306-317
127
-------- -------- --- Thank You --- -------- --------
128

    
129

    
130
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
131
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
132
GROMACS: A message-passing parallel molecular dynamics implementation
133
Comp. Phys. Comm. 91 (1995) pp. 43-56
134
-------- -------- --- Thank You --- -------- --------
135

    
136

    
137
For optimal performance with a GPU nstlist (now 10) should be larger.
138
The optimum depends on your CPU and GPU resources.
139
You might want to try several nstlist values.
140
Changing nstlist from 10 to 40, rlist from 0.9 to 0.996
141

    
142
Input Parameters:
143
   integrator                     = md
144
   tinit                          = 0
145
   dt                             = 0.002
146
   nsteps                         = 10000
147
   init-step                      = 0
148
   simulation-part                = 1
149
   comm-mode                      = Linear
150
   nstcomm                        = 100
151
   bd-fric                        = 0
152
   ld-seed                        = 4200386634
153
   emtol                          = 10
154
   emstep                         = 0.01
155
   niter                          = 20
156
   fcstep                         = 0
157
   nstcgsteep                     = 1000
158
   nbfgscorr                      = 10
159
   rtpi                           = 0.05
160
   nstxout                        = 0
161
   nstvout                        = 0
162
   nstfout                        = 0
163
   nstlog                         = 0
164
   nstcalcenergy                  = 100
165
   nstenergy                      = 500
166
   nstxout-compressed             = 0
167
   compressed-x-precision         = 1000
168
   cutoff-scheme                  = Verlet
169
   nstlist                        = 40
170
   ns-type                        = Grid
171
   pbc                            = xyz
172
   periodic-molecules             = FALSE
173
   verlet-buffer-tolerance        = 0.005
174
   rlist                          = 0.996
175
   rlistlong                      = 0.996
176
   nstcalclr                      = 10
177
   coulombtype                    = PME
178
   coulomb-modifier               = Potential-shift
179
   rcoulomb-switch                = 0
180
   rcoulomb                       = 0.9
181
   epsilon-r                      = 1
182
   epsilon-rf                     = inf
183
   vdw-type                       = Cut-off
184
   vdw-modifier                   = Potential-shift
185
   rvdw-switch                    = 0
186
   rvdw                           = 0.9
187
   DispCorr                       = No
188
   table-extension                = 1
189
   fourierspacing                 = 0.1125
190
   fourier-nx                     = 56
191
   fourier-ny                     = 56
192
   fourier-nz                     = 56
193
   pme-order                      = 4
194
   ewald-rtol                     = 1e-05
195
   ewald-rtol-lj                  = 0.001
196
   lj-pme-comb-rule               = Geometric
197
   ewald-geometry                 = 0
198
   epsilon-surface                = 0
199
   implicit-solvent               = No
200
   gb-algorithm                   = Still
201
   nstgbradii                     = 1
202
   rgbradii                       = 1
203
   gb-epsilon-solvent             = 80
204
   gb-saltconc                    = 0
205
   gb-obc-alpha                   = 1
206
   gb-obc-beta                    = 0.8
207
   gb-obc-gamma                   = 4.85
208
   gb-dielectric-offset           = 0.009
209
   sa-algorithm                   = Ace-approximation
210
   sa-surface-tension             = 2.05016
211
   tcoupl                         = V-rescale
212
   nsttcouple                     = 10
213
   nh-chain-length                = 0
214
   print-nose-hoover-chain-variables = FALSE
215
   pcoupl                         = No
216
   pcoupltype                     = Isotropic
217
   nstpcouple                     = -1
218
   tau-p                          = 1
219
   compressibility (3x3):
220
      compressibility[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
221
      compressibility[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
222
      compressibility[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
223
   ref-p (3x3):
224
      ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
225
      ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
226
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
227
   refcoord-scaling               = No
228
   posres-com (3):
229
      posres-com[0]= 0.00000e+00
230
      posres-com[1]= 0.00000e+00
231
      posres-com[2]= 0.00000e+00
232
   posres-comB (3):
233
      posres-comB[0]= 0.00000e+00
234
      posres-comB[1]= 0.00000e+00
235
      posres-comB[2]= 0.00000e+00
236
   QMMM                           = FALSE
237
   QMconstraints                  = 0
238
   QMMMscheme                     = 0
239
   MMChargeScaleFactor            = 1
240
qm-opts:
241
   ngQM                           = 0
242
   constraint-algorithm           = Lincs
243
   continuation                   = FALSE
244
   Shake-SOR                      = FALSE
245
   shake-tol                      = 0.0001
246
   lincs-order                    = 4
247
   lincs-iter                     = 1
248
   lincs-warnangle                = 30
249
   nwall                          = 0
250
   wall-type                      = 9-3
251
   wall-r-linpot                  = -1
252
   wall-atomtype[0]               = -1
253
   wall-atomtype[1]               = -1
254
   wall-density[0]                = 0
255
   wall-density[1]                = 0
256
   wall-ewald-zfac                = 3
257
   pull                           = FALSE
258
   rotation                       = FALSE
259
   interactiveMD                  = FALSE
260
   disre                          = No
261
   disre-weighting                = Conservative
262
   disre-mixed                    = FALSE
263
   dr-fc                          = 1000
264
   dr-tau                         = 0
265
   nstdisreout                    = 100
266
   orire-fc                       = 0
267
   orire-tau                      = 0
268
   nstorireout                    = 100
269
   free-energy                    = no
270
   cos-acceleration               = 0
271
   deform (3x3):
272
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
273
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
274
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
275
   simulated-tempering            = FALSE
276
   E-x:
277
      n = 0
278
   E-xt:
279
      n = 0
280
   E-y:
281
      n = 0
282
   E-yt:
283
      n = 0
284
   E-z:
285
      n = 0
286
   E-zt:
287
      n = 0
288
   swapcoords                     = no
289
   userint1                       = 0
290
   userint2                       = 0
291
   userint3                       = 0
292
   userint4                       = 0
293
   userreal1                      = 0
294
   userreal2                      = 0
295
   userreal3                      = 0
296
   userreal4                      = 0
297
grpopts:
298
   nrdf:       48056
299
   ref-t:         300
300
   tau-t:         0.1
301
annealing:          No
302
annealing-npoints:           0
303
   acc:	           0           0           0
304
   nfreeze:           N           N           N
305
   energygrp-flags[  0]: 0
306

    
307

    
308
Overriding nsteps with value passed on the command line: 10000 steps, 20 ps
309

    
310
Using 1 MPI thread
311
Using 12 OpenMP threads 
312

    
313
1 GPU user-selected for this run.
314
Number of tasks per PP rank: 1
315
Mapping of GPU ID to the 1 PP rank in this node: 0
316

    
317
Will do PME sum in reciprocal space for electrostatic interactions.
318

    
319
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
320
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen 
321
A smooth particle mesh Ewald method
322
J. Chem. Phys. 103 (1995) pp. 8577-8592
323
-------- -------- --- Thank You --- -------- --------
324

    
325
Will do ordinary reciprocal space Ewald sum.
326
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
327
Cut-off's:   NS: 0.996   Coulomb: 0.9   LJ: 0.9
328
System total charge: 0.000
329
Generated table with 998 data points for Ewald.
330
Tabscale = 500 points/nm
331
Generated table with 998 data points for LJ6.
332
Tabscale = 500 points/nm
333
Generated table with 998 data points for LJ12.
334
Tabscale = 500 points/nm
335
Generated table with 998 data points for 1-4 COUL.
336
Tabscale = 500 points/nm
337
Generated table with 998 data points for 1-4 LJ6.
338
Tabscale = 500 points/nm
339
Generated table with 998 data points for 1-4 LJ12.
340
Tabscale = 500 points/nm
341
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Ewald -1.000e-05
342
Initialized non-bonded Ewald correction tables, spacing: 8.85e-04 size: 1018
343

    
344

    
345
NOTE: GROMACS was configured without NVML support hence it can not exploit
346
      application clocks of the detected Tesla K20c GPU to improve performance.
347
      Recompile with the NVML library (compatible with the driver used) or set application clocks manually.
348

    
349

    
350
Using GPU 8x8 non-bonded kernels
351

    
352
Removing pbc first time
353
Pinning threads with an auto-selected logical core stride of 2
354

    
355
Initializing LINear Constraint Solver
356

    
357
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
358
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
359
LINCS: A Linear Constraint Solver for molecular simulations
360
J. Comp. Chem. 18 (1997) pp. 1463-1472
361
-------- -------- --- Thank You --- -------- --------
362

    
363
The number of constraints is 2053
364

    
365
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
366
S. Miyamoto and P. A. Kollman
367
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
368
Water Models
369
J. Comp. Chem. 13 (1992) pp. 952-962
370
-------- -------- --- Thank You --- -------- --------
371

    
372
Center of mass motion removal mode is Linear
373
We have the following groups for center of mass motion removal:
374
  0:  rest
375

    
376
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
377
G. Bussi, D. Donadio and M. Parrinello
378
Canonical sampling through velocity rescaling
379
J. Chem. Phys. 126 (2007) pp. 014101
380
-------- -------- --- Thank You --- -------- --------
381

    
382
There are: 24040 Atoms
383

    
384
Constraining the starting coordinates (step 0)
385

    
386
Constraining the coordinates at t0-dt (step 0)
387
RMS relative constraint deviation after constraining: 1.20e-05
388
Initial temperature: 297.8 K
389

    
390
Started mdrun on rank 0 Wed Dec 16 15:09:19 2015
391
           Step           Time
392
              0        0.00000
393

    
394
   Energies (kJ/mol)
395
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
396
    4.44103e+03    5.70375e+03    2.50388e+02    2.00472e+03    1.68037e+04
397
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
398
    4.16575e+04   -3.84143e+05    3.38823e+03   -3.09894e+05    5.99548e+04
399
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
400
   -2.49939e+05   -2.49939e+05    3.00104e+02   -3.53127e+02    2.74409e-05
401

    
402
step   80: timed with pme grid 56 56 56, coulomb cutoff 0.900: 167.6 M-cycles
403
step  160: timed with pme grid 48 48 48, coulomb cutoff 1.046: 149.1 M-cycles
404
step  240: timed with pme grid 44 44 44, coulomb cutoff 1.141: 169.0 M-cycles
405
step  320: timed with pme grid 48 48 48, coulomb cutoff 1.046: 145.6 M-cycles
406
step  400: timed with pme grid 48 48 48, coulomb cutoff 1.046: 143.2 M-cycles
407
              optimal pme grid 48 48 48, coulomb cutoff 1.046
408

    
409
step 5000: resetting all time and cycle counters
410

    
411
Restarted time on rank 0 Wed Dec 16 15:09:29 2015
412
           Step           Time
413
          10000       20.00000
414

    
415
   Energies (kJ/mol)
416
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
417
    4.58275e+03    5.60380e+03    2.96228e+02    2.06968e+03    1.66270e+04
418
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
419
    4.25107e+04   -3.84016e+05    1.93813e+03   -3.10388e+05    5.97256e+04
420
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
421
   -2.50663e+05   -2.50367e+05    2.98957e+02   -2.23611e+02    2.65203e-05
422

    
423
	<======  ###############  ==>
424
	<====  A V E R A G E S  ====>
425
	<==  ###############  ======>
426

    
427
	Statistics over 10001 steps using 101 frames
428

    
429
   Energies (kJ/mol)
430
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
431
    4.50231e+03    5.60528e+03    2.60447e+02    2.08015e+03    1.68064e+04
432
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
433
    4.21299e+04   -3.83599e+05    1.97887e+03   -3.10236e+05    5.99816e+04
434
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
435
   -2.50254e+05   -2.50144e+05    3.00238e+02   -2.16515e+02    0.00000e+00
436

    
437
   Total Virial (kJ/mol)
438
    2.14860e+04    1.36786e+02   -7.52152e+01
439
    1.37741e+02    2.17058e+04   -2.12420e+02
440
   -7.48880e+01   -2.12187e+02    2.16259e+04
441

    
442
   Pressure (bar)
443
   -1.95481e+02   -1.74552e+01    7.82176e+00
444
   -1.75835e+01   -2.31468e+02    2.44576e+01
445
    7.77782e+00    2.44263e+01   -2.22597e+02
446

    
447

    
448
       P P   -   P M E   L O A D   B A L A N C I N G
449

    
450
 PP/PME load balancing changed the cut-off and PME settings:
451
           particle-particle                    PME
452
            rcoulomb  rlist            grid      spacing   1/beta
453
   initial  0.900 nm  0.996 nm      56  56  56   0.112 nm  0.288 nm
454
   final    1.046 nm  1.142 nm      48  48  48   0.131 nm  0.335 nm
455
 cost-ratio           1.51             0.63
456
 (note that these numbers concern only part of the total PP and PME load)
457

    
458

    
459
	M E G A - F L O P S   A C C O U N T I N G
460

    
461
 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
462
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
463
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
464
 V&F=Potential and force  V=Potential only  F=Force only
465

    
466
 Computing:                               M-Number         M-Flops  % Flops
467
-----------------------------------------------------------------------------
468
 Pair Search distance check             401.414112        3612.727     0.0
469
 NxN Ewald Elec. + LJ [F]            109528.128000     7228856.448    95.1
470
 NxN Ewald Elec. + LJ [V&F]            1128.398272      120738.615     1.6
471
 1,4 nonbonded interactions              26.710341        2403.931     0.0
472
 Calc Weights                           360.672120       12984.196     0.2
473
 Spread Q Bspline                      7694.338560       15388.677     0.2
474
 Gather F Bspline                      7694.338560       46166.031     0.6
475
 3D-FFT                               18533.265912      148266.127     1.9
476
 Solve PME                               11.522304         737.427     0.0
477
 Shift-X                                  3.029040          18.174     0.0
478
 Angles                                  18.523704        3111.982     0.0
479
 Propers                                 27.915582        6392.668     0.1
480
 Impropers                                2.110422         438.968     0.0
481
 Virial                                   1.228335          22.110     0.0
482
 Stop-CM                                  1.226040          12.260     0.0
483
 Calc-Ekin                               24.064040         649.729     0.0
484
 Lincs                                   10.267053         616.023     0.0
485
 Lincs-Mat                              222.284448         889.138     0.0
486
 Constraint-V                           130.596114        1044.769     0.0
487
 Constraint-Vir                           1.227111          29.451     0.0
488
 Settle                                  36.687336       11850.010     0.2
489
-----------------------------------------------------------------------------
490
 Total                                                 7604229.463   100.0
491
-----------------------------------------------------------------------------
492

    
493

    
494
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
495

    
496
On 1 MPI rank, each using 12 OpenMP threads
497

    
498
 Computing:          Num   Num      Call    Wall time         Giga-Cycles
499
                     Ranks Threads  Count      (s)         total sum    %
500
-----------------------------------------------------------------------------
501
 Neighbor search        1   12        126       0.248          5.964   2.8
502
 Launch GPU ops.        1   12       5001       0.295          7.075   3.3
503
 Force                  1   12       5001       0.852         20.456   9.5
504
 PME mesh               1   12       5001       4.119         98.865  45.7
505
 Wait GPU local         1   12       5001       1.731         41.552  19.2
506
 NB X/F buffer ops.     1   12       9876       0.303          7.262   3.4
507
 Update                 1   12       5001       0.268          6.444   3.0
508
 Constraints            1   12       5001       1.100         26.400  12.2
509
 Rest                                           0.094          2.250   1.0
510
-----------------------------------------------------------------------------
511
 Total                                          9.010        216.269 100.0
512
-----------------------------------------------------------------------------
513
 Breakdown of PME mesh computation
514
-----------------------------------------------------------------------------
515
 PME spread/gather      1   12      10002       2.874         68.987  31.9
516
 PME 3D-FFT             1   12      10002       1.102         26.449  12.2
517
 PME solve Elec         1   12       5001       0.119          2.855   1.3
518
-----------------------------------------------------------------------------
519
 Breakdown of PP computation
520
-----------------------------------------------------------------------------
521
 NS grid local          1   12        126       0.055          1.316   0.6
522
 NS search local        1   12        126       0.179          4.307   2.0
523
 Bonded F               1   12       5001       0.482         11.579   5.4
524
 Listed buffer ops.     1   12       5001       0.031          0.739   0.3
525
 NB X buffer ops.       1   12       4875       0.139          3.337   1.5
526
 NB F buffer ops.       1   12       5001       0.162          3.892   1.8
527
-----------------------------------------------------------------------------
528

    
529
               Core t (s)   Wall t (s)        (%)
530
       Time:      107.972        9.010     1198.3
531
                 (ns/day)    (hour/ns)
532
Performance:       95.908        0.250
533
Finished mdrun on rank 0 Wed Dec 16 15:09:38 2015