Project

General

Profile

Bug #2095

Updated by Mark Abraham about 3 years ago

I compiled gromacs (git master branch & 2016.1 release) with the following settings:

+ GCC 5.2.0 / GCC 4.9.2
+ OpenMpi 2.0.1 / Mpich 3.2
+ OpenMP enabled
+ FFTW 3.3.5
+ AVX2_256
+ CUDA 7.5
+ CUDA_HOST_COMPILER 4.9.2

In my position restraint topology files, I applied flat-bottom position restraints to three atoms. But when I started my gromacs job using

<pre> ```
mpirun -np 4 gmx_mpi mdrun ...
```


</pre>
The OpenMPI outputs a seg fault:

<pre>
[gpu072:50339] *** Process received signal ***
[gpu072:50339] Signal: Segmentation fault (11)
[gpu072:50339] Signal code: Address not mapped (1)
[gpu072:50339] Failing at address: (nil)
[gpu072:50338] *** Process received signal ***
[gpu072:50338] Signal: Segmentation fault (11)
[gpu072:50338] Signal code: Address not mapped (1)
[gpu072:50338] Failing at address: (nil)
[gpu072:50339] [ 0] /lib64/libpthread.so.0(+0xf790)[0x2aaaaf001790]
[gpu072:50339] [ 1] [gpu072:50338] [ 0] /lib64/libpthread.so.0(+0xf790)[0x2aaaaf001790]
[gpu072:50338] [ 1] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(+0x49662b)[0x2aaaab16362b]
[gpu072:50339] [ 2] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(+0x49662b)[0x2aaaab16362b]
[gpu072:50338] [ 2] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(+0x497fe2)[0x2aaaab164fe2]
[gpu072:50339] [ 3] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(+0x497fe2)[0x2aaaab164fe2]
[gpu072:50338] [ 3] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(_Z17dd_make_local_topP12gmx_domdec_tP18gmx_domdec_zones_tiPA3_fPfPiP10t_forcerecS4_P11gmx_vsite_tPK10gmx_mtop_tP14gmx_localtop_t+0x354)[0x2aaaab1654bd]
[gpu072:50339] [ 4] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(_Z17dd_make_local_topP12gmx_domdec_tP18gmx_domdec_zones_tiPA3_fPfPiP10t_forcerecS4_P11gmx_vsite_tPK10gmx_mtop_tP14gmx_localtop_t+0x354)[0x2aaaab1654bd]
[gpu072:50338] [ 4] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(_Z19dd_partition_systemP8_IO_FILElP9t_commreciiP7t_statePK10gmx_mtop_tPK10t_inputrecS4_PSt6vectorIN3gmx11BasicVectorIfEESaISE_EEP9t_mdatomsP14gmx_localtop_tP10t_forcerecP11gmx_vsite_tP10gmx_constrP6t_nrnbP13gmx_wallcyclei+0x1464)[0x2aaaab15c890]
[gpu072:50339] [ 5] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(_Z19dd_partition_systemP8_IO_FILElP9t_commreciiP7t_statePK10gmx_mtop_tPK10t_inputrecS4_PSt6vectorIN3gmx11BasicVectorIfEESaISE_EEP9t_mdatomsP14gmx_localtop_tP10t_forcerecP11gmx_vsite_tP10gmx_constrP6t_nrnbP13gmx_wallcyclei+0x1464)[0x2aaaab15c890]
[gpu072:50338] [ 5] gmx_mpi[0x429f6e]
[gpu072:50339] [ 6] gmx_mpi[0x423b91]
[gpu072:50339] [ 7] gmx_mpi[0x429f6e]
[gpu072:50338] [ 6] gmx_mpi[0x423b91]
[gpu072:50338] [ 7] gmx_mpi[0x428150]
[gpu072:50339] [ 8] gmx_mpi[0x428150]
[gpu072:50338] [ 8] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(+0x452977)[0x2aaaab11f977]
[gpu072:50339] [ 9] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(+0x452977)[0x2aaaab11f977]
[gpu072:50338] [ 9] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x38d)[0x2aaaab12142d]
[gpu072:50339] [10] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x38d)[0x2aaaab12142d]
[gpu072:50338] [10] gmx_mpi[0x41941c]
[gpu072:50338] [11] gmx_mpi[0x41941c]
[gpu072:50339] [11] /lib64/libc.so.6(__libc_start_main+0xfd)[0x2aaaaf22dd5d]
[gpu072:50338] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x2aaaaf22dd5d]
[gpu072:50339] [12] gmx_mpi[0x419299]
[gpu072:50338] *** End of error message ***
gmx_mpi[0x419299]
[gpu072:50339] *** End of error message ***
</pre>

```

The OpenMPI's debugger stacktrace shows that it is in the do_make_local_top() function in the domdec.h outputs this segfault.

However, when I removed the mpirun, in other words, when I ran the tpr using only one process with multiple threads, I didn't get any seg fault.

I attached the tpr file that can trigger this seg fault.

Back