Project

General

Profile

Bug #715

mdrun crashes in serial, but not in parallel

Added by Justin Lemkul over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
4.5.3
Affected version:
Difficulty:
uncategorized
Close

Description

I have a system that requires that some processes be carried out in serial, specifically L-BFGS minimization. It crashes with a segmentation fault. I tested the system with steepest descents, which also exits with a SEGV, but runs with mdrun -nt 2 -pd. I ran gdb on mdrun (latest release-4-5-patches) and got:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00883064
0x00327ef5 in recur ()
(gdb) bt
#0 0x00327ef5 in recur ()
#1 0x00039000 in ?? ()
Cannot access memory at address 0x5

The original workflow I used called for the free energy code to be turned on, but turning these settings off makes no difference, so I don't think that the free energy code is the problem. The SEGV is reproducible in 4.5.3 and the latest release-4-5-patches (VERSION 4.5.3-dev-20110226-f158835).

I am attaching two .tpr files that reproduce the problem. Compiling with and without threads makes no difference. The non-thread serial mdrun gives the same problem.

test_em_steep.tpr (82 KB) test_em_steep.tpr Justin Lemkul, 03/01/2011 02:09 AM
test_em_steep_noFEP.tpr (73.1 KB) test_em_steep_noFEP.tpr Justin Lemkul, 03/01/2011 02:09 AM

History

#1 Updated by Justin Lemkul over 8 years ago

  • Assignee deleted (Berk Hess)

#2 Updated by Berk Hess over 8 years ago

I can't reproduce this with the current release-4-5-patches git
Version: VERSION 4.5.3-dev-20110307-c03cfc2
GIT SHA1 hash: c03cfc2b26926090bc175966228b28c7cd3a6f33

Berk

#3 Updated by Berk Hess over 8 years ago

  • Status changed from New to In Progress
  • Assignee set to Berk Hess

#4 Updated by Justin Lemkul over 8 years ago

  • Status changed from In Progress to Closed

Berk Hess wrote:

I can't reproduce this with the current release-4-5-patches git
Version: VERSION 4.5.3-dev-20110307-c03cfc2
GIT SHA1 hash: c03cfc2b26926090bc175966228b28c7cd3a6f33

Berk

I think the problem is not actually in Gromacs, but FFTW. Installing FFTW-3.2.2 and linking the latest release-4-5-patches against those libraries results in a stable run in serial. I guess I never considered FFTW as the cause; I've always linked against 3.0.1 for reasons of continuity. Is there any value in specifying minimum version requirements for Gromacs now? It seems that there are several compiler- and library-related issues.

I'll go ahead and close this bug report, too, since it seems that it's not actually Gromacs' fault (sorry for the false alarm!), but it seems that it might be worthwhile to come up with a list of real requirements, maybe whenever the next release comes up. It looks like the move to C++ will also require certain minimum compiler versions, as well, based on some of the problems I've seen posted.

#5 Updated by Berk Hess over 8 years ago

Does this maybe also fix the free-energy you reported on gmx-users?

Berk

#6 Updated by Justin Lemkul over 8 years ago

Berk Hess wrote:

Does this maybe also fix the free-energy you reported on gmx-users?

Berk

I am working on testing that now; I suspect it might be the issue. I will report back via gmx-users when I have an answer. Our cluster is a bit backlogged right now, but I should have some runs going in the next day or so.

Also available in: Atom PDF