Project

General

Profile

Bug #3336

DD error can't be avoided with the suggested manual decomposition option

Added by Szilárd Páll 7 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

$ $gmx mdrun $optsn -nsteps 10000 -resetstep 8000 -ntmpi 14 -npme 0 -dd 7 2 1
         :-) GROMACS - gmx mdrun, 2020.1-dev-20200115-0fdb424-dirty (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar      Christian Blau   Viacheslav Bolnykh     Kevin Boyd    
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra       Alan Gray     
  Gerrit Groenhof     Anca Hamuraru    Vincent Hindriksen  M. Eric Irrgang  
  Aleksei Iupinov   Christoph Junghans     Joe Jordan     Dimitrios Karkoulis
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
  Justin A. Lemkul    Viveca Lindahl    Magnus Lundborg     Erik Marklund   
    Pascal Merz     Pieter Meulenhoff    Teemu Murtola       Szilard Pall   
    Sander Pronk      Roland Schulz      Michael Shirts    Alexey Shvetsov  
   Alfons Sijbers     Peter Tieleman      Jon Vincent      Teemu Virolainen 
 Christian Wennberg    Maarten Wolf      Artem Zhmurov   
                           and the project leaders:
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2019, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx mdrun, version 2020.1-dev-20200115-0fdb424-dirty
Executable:   /nethome/pszilard-projects/gromacs/gromacs-20/build_AVX512_256_gcc8_cuda10.1/bin/gmx
Data prefix:  /nethome/pszilard/projects/gromacs/gromacs-20 (source tree)
Working dir:  /nethome/pszilard-projects/gromacs/bench/LUMI-bench/aqp_ensemble/test_dev-purley01
Command line:
  gmx mdrun -v -noconfout -pin on -nsteps 10000 -resetstep 8000 -ntmpi 14 -npme 0 -dd 7 2 1

Back Off! I just backed up md.log to ./#md.log.2#
Reading file topol.tpr, VERSION 2020.1-dev-20200120-4cebec1 (single precision)
Overriding nsteps with value passed on the command line: 10000 steps, 25 ps
Changing nstlist from 40 to 100, rlist from 1.2 to 1.287

-------------------------------------------------------
Program:     gmx mdrun, version 2020.1-dev-20200115-0fdb424-dirty
Source file: src/gromacs/domdec/domdec_setup.cpp (line 784)
MPI rank:    8 (out of 14)

Fatal error:
The number of ranks selected for particle-particle work (14) contains a large
prime factor 7. In most cases this will lead to bad performance. Choose a
number with smaller prime factors or set the decomposition (option -dd)
manually.

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

-------------------------------------------------------
Program:     gmx mdrun, version 2020.1-dev-20200115-0fdb424-dirty
Source file: src/gromacs/domdec/domdec_setup.cpp (line 784)

Associated revisions

Revision b4bf4c08 (diff)
Added by Berk Hess 7 months ago

Fix DD rank count prime check

The domain decomposition would refuse to run with large prime factors
in the MPI rank count even when the grid was specified by the user.

Fixes #3336

Change-Id: I92f20ce18f314db68890650e76741b0ee70c05df

History

#1 Updated by Berk Hess 7 months ago

  • Category set to mdrun
  • Status changed from New to Fix uploaded
  • Assignee set to Berk Hess
  • Target version set to 2020.1

#2 Updated by Berk Hess 7 months ago

  • Status changed from Fix uploaded to Resolved

#3 Updated by Szilárd Páll 7 months ago

Berk Hess wrote:

Applied in changeset b4bf4c088ec42d73fb9fa7a3412c3ef294e361f0.

By the way, @Berk: we discussed allowing the factor 7 considering the experience that it has been relatively common to run into rank count that contain it, especially with a number of 28-core CPUs on the market. I guess that change didn't make it into a release? Should we reconsider?

#4 Updated by Paul Bauer 6 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF