Project

General

Profile

Bug #3097

nbnxm grid issue with regressiontest complex/nbnxn_rzero with gpubufferops path

Added by Mark Abraham about 1 month ago. Updated 25 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

The nightly tests of master HEAD are running the gpubufferops and gpucomm matrices, and in them we see e.g. at http://jenkins.gromacs.org/job/Gromacs_Nightly_master/709/

I couldn't reproduce it locally with nbnxn_rzero (obviously with GMX_USE_GPU_BUFFER_OPS=1).

It looks like the failure is after step 19 of a 20-step run.

mdrun.out:
             :-) GROMACS - gmx mdrun, 2020-dev-20190917-003f96f (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar      Christian Blau   Viacheslav Bolnykh     Kevin Boyd    
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra       Alan Gray     
  Gerrit Groenhof     Anca Hamuraru    Vincent Hindriksen  M. Eric Irrgang  
  Aleksei Iupinov   Christoph Junghans     Joe Jordan     Dimitrios Karkoulis
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
  Justin A. Lemkul    Viveca Lindahl    Magnus Lundborg     Erik Marklund   
    Pascal Merz     Pieter Meulenhoff    Teemu Murtola       Szilard Pall   
    Sander Pronk      Roland Schulz      Michael Shirts    Alexey Shvetsov  
   Alfons Sijbers     Peter Tieleman      Jon Vincent      Teemu Virolainen 
 Christian Wennberg    Maarten Wolf   
                           and the project leaders:
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2018, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx mdrun, version 2020-dev-20190917-003f96f
Executable:   /home/jenkins/workspace/Matrix_OnDemand/17187d3a/gromacs/bin/gmx
Data prefix:  /home/jenkins/workspace/Matrix_OnDemand/17187d3a/gromacs (source tree)
Working dir:  /mnt/workspace/Matrix_OnDemand/17187d3a/regressiontests/complex/nbnxn_rzero
Command line:
  gmx mdrun -ntmpi 4 -gpu_id 1 -notunepme

NOTE: This run uses the 'GPU buffer ops' feature, enabled by the GMX_USE_GPU_BUFFER_OPS environment variable.
Compiled SIMD: None, but for this host/run AVX2_256 might be better (see log).

The current CPU can measure timings more accurately than the code in
gmx mdrun was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding gmx mdrun with the GMX_USE_RDTSCP=ON CMake option.
Reading file topol.tpr, VERSION 2020-dev-20190917-003f96f (single precision)
Can not increase nstlist because an NVE ensemble is used

Using 4 MPI threads
Using 2 OpenMP threads per tMPI thread

On host bs-nix1 1 GPU selected for this run.
Mapping of GPU IDs to the 4 GPU tasks in the 4 ranks on this node:
  PP:1,PP:1,PP:1,PP:1
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
starting mdrun 'dipoles'
20 steps,      0.0 ps.

-------------------------------------------------------
Program:     gmx mdrun, version 2020-dev-20190917-003f96f
Source file: src/gromacs/nbnxm/grid.cpp (line 1253)
MPI rank:    3 (out of 4)

Fatal error:
grid cell cx -2147483648 cy -2147483648 out of range (max 1 1)
atom nan nan nan, grid->c0 1.650000 0.000000

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

--------------------------------
md.log:
             :-) GROMACS - gmx mdrun, 2020-dev-20190917-003f96f (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar      Christian Blau   Viacheslav Bolnykh     Kevin Boyd    
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra       Alan Gray     
  Gerrit Groenhof     Anca Hamuraru    Vincent Hindriksen  M. Eric Irrgang  
  Aleksei Iupinov   Christoph Junghans     Joe Jordan     Dimitrios Karkoulis
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
  Justin A. Lemkul    Viveca Lindahl    Magnus Lundborg     Erik Marklund   
    Pascal Merz     Pieter Meulenhoff    Teemu Murtola       Szilard Pall   
    Sander Pronk      Roland Schulz      Michael Shirts    Alexey Shvetsov  
   Alfons Sijbers     Peter Tieleman      Jon Vincent      Teemu Virolainen 
 Christian Wennberg    Maarten Wolf   
                           and the project leaders:
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2018, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx mdrun, version 2020-dev-20190917-003f96f
Executable:   /home/jenkins/workspace/Matrix_OnDemand/17187d3a/gromacs/bin/gmx
Data prefix:  /home/jenkins/workspace/Matrix_OnDemand/17187d3a/gromacs (source tree)
Working dir:  /mnt/workspace/Matrix_OnDemand/17187d3a/regressiontests/complex/nbnxn_rzero
Process ID:   2977
Command line:
  gmx mdrun -ntmpi 4 -gpu_id 1 -notunepme

GROMACS version:    2020-dev-20190917-003f96f
GIT SHA1 hash:      003f96f6fb30315252094f57a23a3d33a4f48864
Branched from:      3329a50b64be16240ce188cb599687c2d399a4a1 (333 newer local commits)
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  NONE
FFT library:        fftw-3.3.3-sse2-avx
RDTSCP usage:       disabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
C compiler:         /home/jenkins/bin/gcc-7 GNU 7.3.0
C compiler flags:     -O3 
C++ compiler:       /home/jenkins/bin/g++-7 GNU 7.3.0
C++ compiler flags:   -O3 
CUDA compiler:      /opt/cuda_10.0/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on Sat_Aug_25_21:08:01_CDT_2018;Cuda compilation tools, release 10.0, V10.0.130
CUDA compiler flags:-std=c++14;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_50,code=compute_50;-gencode;arch=compute_52,code=compute_52;-gencode;arch=compute_60,code=compute_60;-gencode;arch=compute_61,code=compute_61;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; ;-O3;
CUDA driver:        10.10
CUDA runtime:       10.0

NOTE: This run uses the 'GPU buffer ops' feature, enabled by the GMX_USE_GPU_BUFFER_OPS environment variable.

Running on 1 node with total 4 cores, 8 logical cores, 2 compatible GPUs
Hardware detected:
  CPU info:
    Vendor: Intel
    Brand:  Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
    Family: 6   Model: 60   Stepping: 3
    Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
  Hardware topology: Basic
    Sockets, cores, and logical processors:
      Socket  0: [   0   4] [   1   5] [   2   6] [   3   7]
  GPU info:
    Number of GPUs detected: 2
    #0: NVIDIA GeForce GT 640, compute cap.: 3.0, ECC:  no, stat: compatible
    #1: NVIDIA GeForce GT 640, compute cap.: 3.0, ECC:  no, stat: compatible

Highest SIMD level requested by all nodes in run: AVX2_256
SIMD instructions selected at compile time:       None
This program was compiled for different hardware than you are running on,
which could influence performance.

The current CPU can measure timings more accurately than the code in
gmx mdrun was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding gmx mdrun with the GMX_USE_RDTSCP=ON CMake option.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

Input Parameters:
   integrator                     = md
   tinit                          = 0
   dt                             = 0.002
   nsteps                         = 20
   init-step                      = 0
   simulation-part                = 1
   comm-mode                      = Linear
   nstcomm                        = 100
   bd-fric                        = 0
   ld-seed                        = 1993
   emtol                          = 10
   emstep                         = 0.01
   niter                          = 20
   fcstep                         = 0
   nstcgsteep                     = 1000
   nbfgscorr                      = 10
   rtpi                           = 0.05
   nstxout                        = 20
   nstvout                        = 20
   nstfout                        = 20
   nstlog                         = 0
   nstcalcenergy                  = 20
   nstenergy                      = 20
   nstxout-compressed             = 0
   compressed-x-precision         = 1000
   cutoff-scheme                  = Verlet
   nstlist                        = 20
   pbc                            = xyz
   periodic-molecules             = false
   verlet-buffer-tolerance        = -1
   rlist                          = 1.08
   coulombtype                    = Reaction-Field
   coulomb-modifier               = Potential-shift
   rcoulomb-switch                = 0
   rcoulomb                       = 1
   epsilon-r                      = 1
   epsilon-rf                     = inf
   vdw-type                       = Cut-off
   vdw-modifier                   = Potential-shift
   rvdw-switch                    = 0
   rvdw                           = 1
   DispCorr                       = No
   table-extension                = 1
   fourierspacing                 = 0.12
   fourier-nx                     = 0
   fourier-ny                     = 0
   fourier-nz                     = 0
   pme-order                      = 4
   ewald-rtol                     = 1e-05
   ewald-rtol-lj                  = 0.001
   lj-pme-comb-rule               = Geometric
   ewald-geometry                 = 0
   epsilon-surface                = 0
   tcoupl                         = No
   nsttcouple                     = -1
   nh-chain-length                = 0
   print-nose-hoover-chain-variables = false
   pcoupl                         = No
   pcoupltype                     = Isotropic
   nstpcouple                     = -1
   tau-p                          = 1
   compressibility (3x3):
      compressibility[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compressibility[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compressibility[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   ref-p (3x3):
      ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   refcoord-scaling               = No
   posres-com (3):
      posres-com[0]= 0.00000e+00
      posres-com[1]= 0.00000e+00
      posres-com[2]= 0.00000e+00
   posres-comB (3):
      posres-comB[0]= 0.00000e+00
      posres-comB[1]= 0.00000e+00
      posres-comB[2]= 0.00000e+00
   QMMM                           = false
   QMconstraints                  = 0
   QMMMscheme                     = 0
   MMChargeScaleFactor            = 1
qm-opts:
   ngQM                           = 0
   constraint-algorithm           = Lincs
   continuation                   = false
   Shake-SOR                      = false
   shake-tol                      = 0.0001
   lincs-order                    = 4
   lincs-iter                     = 1
   lincs-warnangle                = 30
   nwall                          = 0
   wall-type                      = 9-3
   wall-r-linpot                  = -1
   wall-atomtype[0]               = -1
   wall-atomtype[1]               = -1
   wall-density[0]                = 0
   wall-density[1]                = 0
   wall-ewald-zfac                = 3
   pull                           = false
   awh                            = false
   rotation                       = false
   interactiveMD                  = false
   disre                          = No
   disre-weighting                = Conservative
   disre-mixed                    = false
   dr-fc                          = 1000
   dr-tau                         = 0
   nstdisreout                    = 100
   orire-fc                       = 0
   orire-tau                      = 0
   nstorireout                    = 100
   free-energy                    = no
   cos-acceleration               = 0
   deform (3x3):
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   simulated-tempering            = false
   swapcoords                     = no
   userint1                       = 0
   userint2                       = 0
   userint3                       = 0
   userint4                       = 0
   userreal1                      = 0
   userreal2                      = 0
   userreal3                      = 0
   userreal4                      = 0
   applied-forces:
     electric-field:
       x:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
       y:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
       z:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
     density-guided-simulation:
       active                     = false
       group                      = protein
       similarity-measure         = inner-product
       amplitude-method           = unity
       force-constant             = 1e+09
       gaussian-transform-spreading-width = 0.2
       gaussian-transform-spreading-range-in-multiples-of-width = 4
       reference-density-filename = reference.mrc
       nst                        = 1
       normalize-densities        = true
grpopts:
   nrdf:           9
   ref-t:           0
   tau-t:           0
annealing:          No
annealing-npoints:           0
   acc:               0           0           0
   nfreeze:           N           N           N
   energygrp-flags[  0]: 0

Can not increase nstlist because an NVE ensemble is used

Initializing Domain Decomposition on 4 ranks
Dynamic load balancing: locked
Minimum cell size due to atom displacement: 0.080 nm
Initial maximum distances in bonded interactions:
    two-body bonded interactions: 0.038 nm, Bond, atoms 3 4
Minimum cell size due to bonded interactions: 0.000 nm
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 4 cells with a minimum initial size of 0.100 nm
The maximum allowed number of cells is: X 21 Y 21 Z 21
Domain decomposition grid 4 x 1 x 1, separate PME ranks 0
Domain decomposition rank 0, coordinates 0 0 0

The initial number of communication pulses is: X 2
The initial domain decomposition cell size is: X 0.55 nm

The maximum allowed distance for atoms involved in interactions is:
                 non-bonded interactions           1.080 nm
            two-body bonded interactions  (-rdd)   1.080 nm
          multi-body bonded interactions  (-rdd)   0.550 nm

When dynamic load balancing gets turned on, these settings will change to:
The maximum number of communication pulses is: X 3
The minimum size for domain decomposition cells is 0.360 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.65
The maximum allowed distance for atoms involved in interactions is:
                 non-bonded interactions           1.080 nm
            two-body bonded interactions  (-rdd)   1.080 nm
          multi-body bonded interactions  (-rdd)   0.360 nm

Using 4 MPI threads
Using 2 OpenMP threads per tMPI thread

On host bs-nix1 1 GPU selected for this run.
Mapping of GPU IDs to the 4 GPU tasks in the 4 ranks on this node:
  PP:1,PP:1,PP:1,PP:1
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
Pinning threads with an auto-selected logical core stride of 1
System total charge: 0.000
Reaction-Field:
epsRF = 0, rc = 1, krf = 0.5, crf = 1.5, epsfac = 138.935
The electrostatics potential has its minimum at r = 1
Potential shift: LJ r^-12: -1.000e+00 r^-6: -1.000e+00

Using GPU 8x8 nonbonded short-range kernels

Using a 8x8 pair-list setup:
  updated every 20 steps, buffer 0.080 nm, rlist 1.080 nm

Using full Lennard-Jones parameter combination matrix

Removing pbc first time

Linking all bonded interactions to atoms

Intra-simulation communication will occur every 20 steps.
There are: 4 Atoms
Atom distribution over 4 domains: av 1 stddev 1 min 0 max 2
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
  0:  rest
Initial temperature: 0 K

Started mdrun on rank 0 Wed Sep 18 04:14:41 2019

           Step           Time
              0        0.00000

   Energies (kJ/mol)
           Bond        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.
    3.70250e+00   -1.25800e-02   -1.02921e-01    3.58700e+00    3.51700e-03
   Total Energy    Temperature Pressure (bar)
    3.59052e+00    9.39996e-02   -3.80053e+00

DD  step 19 load imb.: force 136.9%

-------------------------------------------------------
Program:     gmx mdrun, version 2020-dev-20190917-003f96f
Source file: src/gromacs/nbnxm/grid.cpp (line 1253)
MPI rank:    3 (out of 4)

Fatal error:
grid cell cx -2147483648 cy -2147483648 out of range (max 1 1)
atom nan nan nan, grid->c0 1.650000 0.000000

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

--------------------------------

History

#1 Updated by Berk Hess about 1 month ago

This is not a grid issue. The coordinates of an atom are NaN.

#2 Updated by Alan Gray about 1 month ago

I also can't reproduce it locally, tried running it multiple times and it passes consistently.

#3 Updated by Paul Bauer 25 days ago

  • Target version changed from 2020-beta1 to 2020-beta2

bumped to the next beta

Also available in: Atom PDF