Project

General

Profile

Bug #2982

GROMACS with Plumed mpi segfault

Added by Sergei Morozov 29 days ago. Updated 27 days ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Affected version - extra info:
GROMACS 2018.6 patched with Plumed 2.5.1
Affected version:
Difficulty:
uncategorized
Close

Description

Hello,

I've build the GROMACS 2018.6 patched with Plumed using the easybuild. I'm trying to run the CINECA tutorial (https://www.plumed.org/doc-v2.5/user-doc/html/cineca.html) exercise 4.

gmx_mpi mdrun -s topol.tpr -plumed plumed.dat -nsteps 500000 runs fine

mpirun -np 2 gmx_mpi mdrun -s topol.tpr -plumed plumed.dat -multi 2 -nsteps 500000

Fails with following errors:

[kfhl160@seskscpg009 SCRIPTS]$ mpirun -np 2 gmx_mpi mdrun -s ../SETUP/topol.tpr -plumed plumed.dat -multi 2 -nsteps 500000
--------------------------------------------------------------------------
WARNING: Open MPI will create a shared memory backing file in a
directory that appears to be mounted on a network filesystem.
Creating the shared memory backup file on a network file system, such
as NFS or Lustre is not recommended -- it may cause excessive network
traffic to your file servers and/or cause shared memory traffic in
Open MPI to be much slower than expected.

You may want to check what the typical temporary directory is on your
node.  Possible sources of the location of this temporary directory
include the $TEMPDIR, $TEMP, and $TMP environment variables.

Note, too, that system administrators can set a list of filesystems
where Open MPI is disallowed from creating temporary files by setting
the MCA parameter "orte_no_session_dir".

  Local host: seskscpg009.prim.scp
  Filename:   /scratch/kfhl160/openmpi-sessions-486452227@seskscpg009_0/5849/1/shared_mem_pool.seskscpg009

You can set the MCA paramter shmem_mmap_enable_nfs_warning to 0 to
disable this message.
--------------------------------------------------------------------------
                      :-) GROMACS - gmx mdrun, 2018.6 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar    Aldert van Buuren   Rudi van Drunen     Anton Feenstra
  Gerrit Groenhof    Aleksei Iupinov   Christoph Junghans   Anca Hamuraru
 Vincent Hindriksen Dimitrios Karkoulis    Peter Kasson        Jiri Kraus
  Carsten Kutzner      Per Larsson      Justin A. Lemkul    Viveca Lindahl
  Magnus Lundborg   Pieter Meulenhoff    Erik Marklund      Teemu Murtola
    Szilard Pall       Sander Pronk      Roland Schulz     Alexey Shvetsov
   Michael Shirts     Alfons Sijbers     Peter Tieleman    Teemu Virolainen
 Christian Wennberg    Maarten Wolf
                           and the project leaders:
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2017, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx mdrun, version 2018.6
Executable:   /opt/scp/software/GROMACS/2018.6_GPU-foss-2017a-mpi/bin/gmx_mpi
Data prefix:  /opt/scp/software/GROMACS/2018.6_GPU-foss-2017a-mpi
Working dir:  /home/kfhl160/gromacstest1/cineca/SCRIPTS
Command line:
  gmx_mpi mdrun -s ../SETUP/topol.tpr -plumed plumed.dat -multi 2 -nsteps 500000

Back Off! I just backed up md0.log to ./#md0.log.8#

Back Off! I just backed up md1.log to ./#md1.log.8#
+++ Loading the PLUMED kernel runtime +++
+++ PLUMED_KERNEL="/opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so" +++
+++ Loading the PLUMED kernel runtime +++
+++ PLUMED_KERNEL="/opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so" +++
+++ Loading the PLUMED kernel runtime +++
+++ PLUMED_KERNEL="/opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so" +++
+++ Loading the PLUMED kernel runtime +++
+++ PLUMED_KERNEL="/opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so" +++
Reading file ../SETUP/topol1.tpr, VERSION 4.6.7 (single precision)
Note: file tpx version 83, software tpx version 112
NOTE: GPU(s) found, but the current simulation can not use GPUs
      To use a GPU, set the mdp option: cutoff-scheme = Verlet

Overriding nsteps with value passed on the command line: 500000 steps, 1e+03 ps
Reading file ../SETUP/topol0.tpr, VERSION 4.6.7 (single precision)
Note: file tpx version 83, software tpx version 112
NOTE: GPU(s) found, but the current simulation can not use GPUs
      To use a GPU, set the mdp option: cutoff-scheme = Verlet

Overriding nsteps with value passed on the command line: 500000 steps, 1e+03 ps

This is simulation 1 out of 2 running as a composite GROMACS
multi-simulation job. Setup for this simulation:

Using 1 MPI process

This is simulation 0 out of 2 running as a composite GROMACS
multi-simulation job. Setup for this simulation:

Using 1 MPI process

Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity

Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity

NOTE: This file uses the deprecated 'group' cutoff_scheme. This will be
removed in a future release when 'verlet' supports all interaction forms.

NOTE: This file uses the deprecated 'group' cutoff_scheme. This will be
removed in a future release when 'verlet' supports all interaction forms.

Back Off! I just backed up traj_comp1.xtc to ./#traj_comp1.xtc.6#

Back Off! I just backed up traj_comp0.xtc to ./#traj_comp0.xtc.6#

Back Off! I just backed up ener1.edr to ./#ener1.edr.6#

Back Off! I just backed up ener0.edr to ./#ener0.edr.6#
starting mdrun 'alanine dipeptide in vacuum'
500000 steps,   1000.0 ps.
starting mdrun 'alanine dipeptide in vacuum'
500000 steps,   1000.0 ps.
[seskscpg009:90377] *** Process received signal ***
[seskscpg009:90376] *** Process received signal ***
[seskscpg009:90376] Signal: Segmentation fault (11)
[seskscpg009:90376] Signal code: Address not mapped (1)
[seskscpg009:90376] Failing at address: 0x30
[seskscpg009:90377] Signal: Segmentation fault (11)
[seskscpg009:90377] Signal code: Address not mapped (1)
[seskscpg009:90377] Failing at address: 0x30
[seskscpg009:90376] [ 0] [seskscpg009:90377] [ 0] /lib64/libpthread.so.0(+0xf6d0)[0x7f65825396d0]
[seskscpg009:90376] [ 1] /lib64/libpthread.so.0(+0xf6d0)[0x7f8897b1e6d0]
[seskscpg009:90377] [ 1] /opt/scp/software/OpenMPI/2.0.2-GCC-6.3.0-2.27/lib/libmpi.so.20(MPI_Allreduce+0x1a4)[0x7f657aa95f24]
[seskscpg009:90376] [ 2] /opt/scp/software/OpenMPI/2.0.2-GCC-6.3.0-2.27/lib/libmpi.so.20(MPI_Allreduce+0x1a4)[0x7f889007af24]
[seskscpg009:90377] [ 2] /opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so(_ZN4PLMD4GREX3cmdERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPv+0xb78)[0x7f65639d0198]
[seskscpg009:90376] [ 3] /opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so(_ZN4PLMD4GREX3cmdERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPv+0xb78)[0x7f8885258198]
[seskscpg009:90377] [ 3] /opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so(_ZN4PLMD10PlumedMain3cmdERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPv+0x1c66)[0x7f65639df1e6]
[seskscpg009:90376] [ 4] /opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so(_ZN4PLMD10PlumedMain3cmdERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPv+0x1c66)[0x7f88852671e6]
[seskscpg009:90377] [ 4] /opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so(plumed_plumedmain_cmd+0x5d)[0x7f65639ef37d]
[seskscpg009:90376] [ 5] gmx_mpi[0x4159af]
[seskscpg009:90376] [ 6] gmx_mpi[0x438095]
[seskscpg009:90376] [ 7] gmx_mpi[0x41faae]
[seskscpg009:90376] [ 8] /opt/scp/software/PLUMED/2.5.1-foss-2017a/lib/libplumedKernel.so(plumed_plumedmain_cmd+0x5d)[0x7f888527737d]
[seskscpg009:90377] [ 5] gmx_mpi[0x4159af]
[seskscpg009:90377] [ 6] gmx_mpi[0x438095]
[seskscpg009:90377] [ 7] gmx_mpi[0x4204e2]
[seskscpg009:90376] [ 9] gmx_mpi[0x444cdb]
[seskscpg009:90376] [10] gmx_mpi[0x40fbcc]
[seskscpg009:90376] [11] gmx_mpi[0x41faae]
[seskscpg009:90377] [ 8] gmx_mpi[0x4204e2]
[seskscpg009:90377] [ 9] gmx_mpi[0x444cdb]
[seskscpg009:90377] [10] gmx_mpi[0x40fbcc]
[seskscpg009:90377] [11] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f6579baa445]
[seskscpg009:90376] [12] gmx_mpi[0x412dee]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f888f18f445]
[seskscpg009:90377] [12] gmx_mpi[0x412dee]
[seskscpg009:90377] *** End of error message ***
[seskscpg009:90376] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node seskscpg009 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[seskscpg009.prim.scp:90335] 3 more processes have sent help message help-opal-shmem-mmap.txt / mmap on nfs
[seskscpg009.prim.scp:90335] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Could you advise that can wrong in the installation? Plumed version is 2.5.1

History

#1 Updated by Berk Hess 27 days ago

  • Status changed from New to Rejected

This is a question for the PLUMED list/developers, not GROMACS.

#2 Updated by Mark Abraham 27 days ago

  • Description updated (diff)

#3 Updated by Sergei Morozov 27 days ago

Thank you for confirmation. I raised the question to Plumed, but they can't help either . Might be an issue with openmpi

Also available in: Atom PDF