Bug #1098
mdrun randomly hangs for 32-bit builds running on >1 thread on OS X
Description
mdrun simple continues to run (using 100% CPU) for several of the regressiontests. It happens randomly, and sometimes the tests pass. When restricting it to a single thread, the problem disappears.
64-bit builds work fine.
This could be a sign of an error in thread-MPI, or the domain decomposition, that is sensitive to 32-vs-64 bits, and it might also affect other 32-bit platforms.
History
#1 Updated by Erik Lindahl about 8 years ago
The bug seems to disappear when compiling and using a 32-bit version of openmpi. This points to a 32-bit bug in thread-mpi.
#2 Updated by Sander Pronk about 8 years ago
Stupid question: how does one force 32-bit compilation?
#3 Updated by Erik Lindahl about 8 years ago
On OS X the easiest solution is to set CMAKE_OSX_ARCHITECTURES=i386 on the command line when calling cmake (it has to be set before any tests are run).
On Linux (and OS X) it can be enabled by adding "-m32" to CFLAGS, but many Linux distros no longer install the 32-bit libraries by default.
#4 Updated by Sander Pronk about 8 years ago
I can't seem to reproduce this with llvm on Mountain Lion - what OS version & compiler is this?
#5 Updated by Erik Lindahl about 8 years ago
Mountain Lion 10.8.2, Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn).
Header of a log file attached for more information.
I'm seeing the problem on an quad core macbook pro. The problem occurs more frequently with 8 threads than 2, but after running the simple tests 5-10 times I got it to hang with 2 threads too. However, note that it is NOT deterministic - many executions work fine, and then it suddenly hangs again.
#6 Updated by Sander Pronk about 8 years ago
Fixed in https://gerrit.gromacs.org/#/c/1970/
#7 Updated by Erik Lindahl about 8 years ago
That fix solves all my issues.
#8 Updated by Roland Schulz about 8 years ago
- Status changed from New to Closed