Project

General

Profile

Bug #2141

Each new invocation of mdrun needs it's own setting to use MPI on 28 cores

Added by Ahmet Yildirim over 2 years ago. Updated about 1 year ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Hi,

I was trying to do the free energy calculation on GROMACS but I came across the Software inconsistency error. At each stage (em, nvt, npt and md) mdrun needs it's own setting to use MPI on 28 cores otherwise it gives an error as follows. At each simulation stage I had to use different -npme and -dd values. That is a bug? But I couldn't find any values that work for the md step. I did the below tests on GROMACS-2016.2 and 2016.3.

The input files of the complex structures are attached.

For DNA+ligand complex:
mpirun -n 28 gmx_mpi mdrun -ntomp 1 -dd 5 4 1 -npme 8 works for the em step but nvt, npt and md step.
mpirun -n 28 gmx_mpi mdrun -ntomp 1 -dd 3 3 2 -npme 10 works for the nvt and npt step.
Either above and below commands don't work for the md step.
mpirun -n 28 gmx_mpi mdrun -ntomp 7 -dd 2 2 1
mpirun -n 28 gmx_mpi mdrun -ntomp 7 -npme 4
mpirun -n 28 gmx_mpi mdrun -ntomp 4 -npme 7

For protein+ligand complex:
mpirun -np 28 gmx_mpi mdrun $md -dd 3 3 2 -npme 10 works for the em, nvt, npt step but md step.
I tried also all the above commands that have used for the DNA complex but they don't work for the md step of this protein+ligand complex.

Error:
Program: gmx mdrun, version 2016.3
Source file: src/gromacs/domdec/domdec_topology.cpp (line 238)
MPI rank: 17 (out of 28)

Software inconsistency error:
Some interactions seem to be assigned multiple times

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

-------------------------------------------------------
Program: gmx mdrun, version 2016.3
Source file: src/gromacs/domdec/domdec_topology.cpp (line 238)
MPI rank: 23 (out of 28)
...

dna.tar.gz (431 KB) dna.tar.gz Ahmet Yildirim, 03/17/2017 01:41 PM
protein.tar.gz (1.04 MB) protein.tar.gz Ahmet Yildirim, 03/17/2017 01:41 PM

Associated revisions

Revision 5ad6b516 (diff)
Added by Berk Hess over 2 years ago

Made duplicate atoms in bondeds an error

Having duplicate atom indices in bonded interactios used to be only
a warning. But since in nearly all cases this will lead to issues,
this is now a error, except for angle restraints where it can be
useful so there it is now a note.

Refs #2141.

Change-Id: I359257cc1685a8944d6bada74523d6c8fea62126

History

#1 Updated by Mark Abraham over 2 years ago

  • Status changed from New to Feedback wanted

I tried gmx grompp -p complex -c complex -f em.mdp -o em on the protein, but got

WARNING 1 [file complex.top, line 57]:
  Duplicate atom index (3605) in angle_restraints

WARNING 2 [file complex.top, line 58]:
  Duplicate atom index (1) in angle_restraints

Coupling 1 copies of molecule type 'LIG'
Removing all charge groups because cutoff-scheme=Verlet
Analysing residue names:
There are:     1      Other residues
There are:   247    Protein residues
There are: 11594      Water residues
There are:    78        Ion residues
Analysing residues not classified as Protein/DNA/RNA/Water and splitting into groups...
Analysing Protein...
Analysing residues not classified as Protein/DNA/RNA/Water and splitting into groups...
Number of degrees of freedom in T-Coupling group rest is 116718.00
Calculating fourier grid dimensions for X Y Z
Using a fourier grid of 72x72x72, spacing 0.114 0.114 0.114
Estimate for the relative computational load of the PME mesh part: 0.40
This run will generate roughly 22 Mb of data

There was 1 note

There were 2 warnings

-------------------------------------------------------
Program:     gmx grompp, version 2016.3
Source file: src/gromacs/gmxpreprocess/grompp.cpp (line 2325)

Fatal error:
Too many warnings (2).
If you are sure all warnings are harmless, use the -maxwarn option.

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

I can't be sure this is related to your symptoms, but knowing that we have a simulation that is warning free and stable in a single-domain case seems like something that should be known before we consider further :-)

#2 Updated by Ahmet Yildirim over 2 years ago

Hi,

If you use gmx grompp -p complex -c complex.gro -f em.mdp -o em.tpr -maxwarn 2, then the you can run the mdrun smoothly :-) I get these two warnings (3 warnings in the nvt, npt and md step) in the em step but this shouldn't be a cause to set mdrun differently for each simulation stage?
Please find the script (eqA.sh) used to submit jobs in the attached dna.tar.gz file. It will be good to see what I did.

#3 Updated by Berk Hess over 2 years ago

The multiple assignment error is likely due to the indices appearing multiple times. We might need an error somewhere instead of a warning to avoid users from running with invalid setups.

#4 Updated by Berk Hess over 2 years ago

I just realized that angle constraints is probably the only case where it can be useful to use one atom index twice. I need to think if this can cause a multiple assignment error in the domain decomposition.

#5 Updated by Eric Irrgang over 2 years ago

Regarding multiple assignment: This discussion reminded me of a document linked from one of the tutorials linked through the Gromacs websites: http://local.biochemistry.utoronto.ca/pomes//files/lipidCombinationRules.pdf

I'm not sure how wide-spread this use case is and it may rely on undocumented or unintended behavior, but is it relevant to to Berk's considerations?

#6 Updated by Berk Hess over 2 years ago

That is a parameter assignment issue and has nothing to do with the error message here.

#7 Updated by Gerrit Code Review Bot over 2 years ago

Gerrit received a related patchset '1' for Issue #2141.
Uploader: Berk Hess ()
Change-Id: gromacs~master~I359257cc1685a8944d6bada74523d6c8fea62126
Gerrit URL: https://gerrit.gromacs.org/6609

#8 Updated by Ahmet Yildirim over 2 years ago

I don't think that the issue is about multiple assignment (using one atom index twice in the intermolecular_interactions part). To see whether the issue is the intermolecular_interactions part, I also tested the solvation free energy simulations (ligand decoupling from solution - see https://redmine.gromacs.org/issues/2165#change-14521) which don't include any intermolecular interactions in the topology file (i.e no multiple assignment warning) with MPI. Unfortunately GROMACS was giving the domain decomposition error again even if I use couple-intramol=yes. I mean the issue is not multiple assignment. For instance, I also tested the following intermolecular_interactions part (see the angles part - no multiple assignment) for the ligand decoupling from complex but it didn't solve the issue. The only solution that many tests showed me is to use one mpi thread. For instance, gmx mdrun -ntmpi 1 -ntomp 14 -v -deffnm md. That works for both ligand decoupling from complex and ligand decoupling from solution as well as all steps em, nvt, npt and md. -ntmpi 1 -ntomp 28 won't work even if the PC has 28 cores with hyper threading (14+14).

[ intermolecular_interactions ]
[ bonds ]
; ai aj type bA kA bB kB
629 3 6 0.597 0.0 0.597 4184.0

[ angles ]
; ai aj ak type thA fcA thB fcB
281 629 3 1 37.5 0.0 37.5 41.84
629 3 21 1 121.5 0.0 121.5 41.84

[ dihedrals ]
; ai aj ak al type thA fcA thB fcB
249 281 629 3 2 -147.4 0.0 -147.4 41.84
281 629 3 21 2 -60.5 0.0 -60.5 41.84
629 3 21 16 2 -153.9 0.0 -153.9 41.84

#9 Updated by Mark Abraham about 1 year ago

  • Target version set to 2019

We should follow up and decide on any action for 2019 release

#10 Updated by Berk Hess about 1 year ago

  • Status changed from Feedback wanted to Rejected

The three OpenMP setups
mpirun -n 28 gmx_mpi mdrun -ntomp 7 -dd 2 2 1
mpirun -n 28 gmx_mpi mdrun -ntomp 7 -npme 4
mpirun -n 28 gmx_mpi mdrun -ntomp 4 -npme 7
are invalid, because you are specifying 28 ranks, not cores.

Your system is difficult to set up, because you have long-range listed/bonded pair interactions for free-energy calculations, so your domains need to be sufficiently large.
What works for me is e.g.
mpirun -n 14 gmx_mpi mdrun -ntomp 2 -npme 6

There is the separate issue that having mdrun automatically set the decomposition on 28 ranks does not work because of the large prime factor 7. But since that is not the issue here, I'm closing this issue.

Also available in: Atom PDF