Project

General

Profile

Bug #1609

Updated by Mark Abraham about 5 years ago

(copied from gmx-users)

Hi,

I see where the problem is.
There is an initial check in g_tune_pme to make sure that parallel
runs can be executed at all. This is being run with the automatic
number of PME-only ranks, which is 11 for your input file.
Unfortunately, this results in 37 PP ranks, for which no domain
decomposition can be found.

At some point in the past we discussed that this could happen
and it should be fixed. Will open a bug entry.

Thanks,
Carsten

On 29 Sep 2014, at 15:36, Ebert Maximilian <m.ebert@umontreal.ca> wrote:

Hi,

this ist he command:

setenv MDRUN mdrun_mpi

g_tune_pme_mpi -np 48 -s ../eq_nvt/1ZG4_nvt.tpr -launch

Here the output of perf.out

<pre>
------------------------------------------------------------

P E R F O R M A N C E R E S U L T S

------------------------------------------------------------
g_tune_pme_mpi for Gromacs VERSION 5.0.1
Number of ranks : 48
The mpirun command is : mpirun
Passing # of ranks via : -np
The mdrun command is : mdrun_mpi
mdrun args benchmarks : -resetstep 100 -o bench.trr -x bench.xtc -cpo bench.cpt -c bench.gro -e bench.edr -g bench.log
Benchmark steps : 1000
dlb equilibration steps : 100
mdrun args at launchtime:
Repeats for each test : 2
Input file : ../eq_nvt/1ZG4_nvt.tpr
PME/PP load estimate : 0.151964
Number of particles : 39489
Coulomb type : PME
Grid spacing x y z : 0.114561 0.114561 0.114561
Van der Waals type : Cut-off

Will try these real/reciprocal workload settings:
No. scaling rcoulomb nkx nky nkz spacing rvdw tpr file
0 1.000000 1.200000 72 72 72 0.120000 1.200000 ../eq_nvt/1ZG4_nvt_bench00.tpr
1 1.100000 1.320000 64 64 64 0.132000 1.320000 ../eq_nvt/1ZG4_nvt_bench01.tpr
2 1.200000 1.440000 60 60 60 0.144000 1.440000 ../eq_nvt/1ZG4_nvt_bench02.tpr

Note that in addition to the Coulomb radius and the Fourier grid
other input settings were also changed (see table above).
Please check if the modified settings are appropriate.

Individual timings for input file 0 (../eq_nvt/1ZG4_nvt_bench00.tpr):
PME ranks Gcycles ns/day PME/f Remark

------------------------------------------------------------
Cannot run the benchmark simulations! Please check the error message of
mdrun for the source of the problem. Did you provide a command line
argument that neither g_tune_pme nor mdrun understands? Offending command:

mpirun -np 48 mdrun_mpi -npme 11 -s ../eq_nvt/1ZG4_nvt_bench00.tpr -resetstep 100 -o bench.trr -x bench.xtc -cpo bench.cpt -c bench.gro -e bench.edr -g bench.log -nsteps 1 -quiet

</pre>



and here are parts of the bench.log:

<pre>
Log file opened on Mon Sep 29 08:56:38 2014
Host: node-e1-67 pid: 24470 rank ID: 0 number of ranks: 48
GROMACS: gmx mdrun, VERSION 5.0.1

GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
Peter Tieleman Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2014, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS: gmx mdrun, VERSION 5.0.1
Executable: /home/apps/Logiciels/gromacs/gromacs-5.0.1/bin/gmx_mpi
Library dir: /home/apps/Logiciels/gromacs/gromacs-5.0.1/share/gromacs/top
Command line:
mdrun_mpi -npme 11 -s ../eq_nvt/1ZG4_nvt_bench00.tpr -resetstep 100 -o bench.trr -x bench.xtc -cpo bench.cpt -c bench.gro -e bench.edr -g bench.log -nsteps 1 -quiet

Gromacs version: VERSION 5.0.1
Precision: single
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled
GPU support: disabled
invsqrt routine: gmx_software_invsqrt(x)
SIMD instructions: SSE4.1
FFT library: fftw-3.3.3-sse2
RDTSCP usage: enabled
C++11 compilation: enabled
TNG support: enabled
Tracing support: disabled
Built on: Tue Sep 23 09:58:07 EDT 2014
Built by: rqchpbib@briaree1 [CMAKE]
Build OS/arch: Linux 2.6.32-71.el6.x86_64 x86_64
Build CPU vendor: GenuineIntel
Build CPU brand: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
Build CPU family: 6 Model: 44 Stepping: 2
Build CPU features: aes apic clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
C compiler: /RQusagers/apps/Logiciels/gcc/4.8.1/bin/gcc GNU 4.8.1
C compiler flags: -msse4.1 -Wno-maybe-uninitialized -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds -O3 -DNDEBUG
C++ compiler: /RQusagers/apps/Logiciels/gcc/4.8.1/bin/g++ GNU 4.8.1
C++ compiler flags: -msse4.1 -std=c++0x -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds -O3 -DNDEBUG
Boost version: 1.55.0 (internal)

....
Initializing Domain Decomposition on 48 ranks

-------------------------------------------------------
Program mdrun_mpi, VERSION 5.0.1
Source code file: /RQusagers/rqchpbib/stubbsda/gromacs-5.0.1/src/gromacs/mdlib/domdec_setup.c, line: 728

Fatal error:
The number of ranks you selected (37) contains a large prime factor 37. In most cases this will lead to bad performance. Choose a number with smaller prime factors or set the decomposition (option -dd) manually.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
</pre>

Back