gmxapi fails with MPI build of GROMACS 2020
When building GROMACS 2020 in release mode and MPI enabled and trying to use it with gmxapi, I get the following error
Python 3.7.4+ (default, Sep 4 2019, 08:03:05) [GCC 9.2.1 20190827] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import gmxapi as gmx >>> test = gmx.mdrun("/home/acmnpv/data/gerrit/refactor/src/gromacs/trajectoryanalysis/tests/clustsize.tpr") >>> output = test.run() ------------------------------------------------------- Program: GROMACS, version 2020-dev-20190911-2e9e083d6a Source file: src/gromacs/gmxlib/network.cpp (line 70) *** The MPI_Comm_size() function was called before MPI_INIT was invoked. *** This is disallowed by the MPI standard. *** Your MPI job will now abort. [debian-xps13:19677] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
GROMACS CMake flags
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=clang++-7 -DCMAKE_C_COMPILER=clang-7 -DGMXAPI=ON -DGMX_HWLOC=AUTO -DGMX_USE_RDTSCP=DETECT -DGMX_MPI=ON '-DCMAKE_CXX_LINK_FLAGS=-Wl,-rpath,/usr/bin/../lib64 -L/usr/bin/../lib64' -DGMX_GPLUSPLUS_PATH=/usr/bin/gcc-5 -DCMAKE_INSTALL_PREFIX=/home/acmnpv/data/gerrit/refactor/install-clang-7-simd-mpi -DGMX_GPU=OFF
#2 Updated by Eric Irrgang over 1 year ago
I have not been able to reproduce this error, but I was able to produce a similar error when executing Python with mpiexec. It seems like the
gmx_mpi binary tries to initialize MPI, but runs into trouble because it is a subprocess of a subprocess of mpiexec. I don't know how to resolve that easily, other than to disallow
gmx_mpi when wrapping command line tools. (Thoughts?) It would also make sense to replace the use case of wrapped
gmx command lines with direct C++ access to the tool launcher.
In the case of mdrun, specifically, I thought I had a RAII/sentinel object in place as part of the simulation launcher to make sure MPI was properly initialized and deinitialized, but there has been some tinkering in that machinery and we never resolved the issue of MPI communicator sharing. As such, behavior is undefined when launching a script with more than 1 MPI rank and MPI-enabled libgromacs. 1 MPI rank should work, though, so we should figure out what's going on if it doesn't.