mdrun crashes in serial, but not in parallel
I have a system that requires that some processes be carried out in serial, specifically L-BFGS minimization. It crashes with a segmentation fault. I tested the system with steepest descents, which also exits with a SEGV, but runs with mdrun -nt 2 -pd. I ran gdb on mdrun (latest release-4-5-patches) and got:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00883064
0x00327ef5 in recur ()
#0 0x00327ef5 in recur ()
#1 0x00039000 in ?? ()
Cannot access memory at address 0x5
The original workflow I used called for the free energy code to be turned on, but turning these settings off makes no difference, so I don't think that the free energy code is the problem. The SEGV is reproducible in 4.5.3 and the latest release-4-5-patches (VERSION 4.5.3-dev-20110226-f158835).
I am attaching two .tpr files that reproduce the problem. Compiling with and without threads makes no difference. The non-thread serial mdrun gives the same problem.
#4 Updated by Justin Lemkul over 9 years ago
- Status changed from In Progress to Closed
Berk Hess wrote:
I can't reproduce this with the current release-4-5-patches git
Version: VERSION 4.5.3-dev-20110307-c03cfc2
GIT SHA1 hash: c03cfc2b26926090bc175966228b28c7cd3a6f33
I think the problem is not actually in Gromacs, but FFTW. Installing FFTW-3.2.2 and linking the latest release-4-5-patches against those libraries results in a stable run in serial. I guess I never considered FFTW as the cause; I've always linked against 3.0.1 for reasons of continuity. Is there any value in specifying minimum version requirements for Gromacs now? It seems that there are several compiler- and library-related issues.
I'll go ahead and close this bug report, too, since it seems that it's not actually Gromacs' fault (sorry for the false alarm!), but it seems that it might be worthwhile to come up with a list of real requirements, maybe whenever the next release comes up. It looks like the move to C++ will also require certain minimum compiler versions, as well, based on some of the problems I've seen posted.
#6 Updated by Justin Lemkul over 9 years ago
Berk Hess wrote:
Does this maybe also fix the free-energy you reported on gmx-users?
I am working on testing that now; I suspect it might be the issue. I will report back via gmx-users when I have an answer. Our cluster is a bit backlogged right now, but I should have some runs going in the next day or so.