Project

General

Profile

Bug #178

mdrun crashes with segmentation fault when using harmonic position restraints

Added by Peter Stern about 12 years ago. Updated 7 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Erik Lindahl
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Created an attachment (id=258)
Input to mdrun

I built gromacs-3.3.2 with gcc v4.2.2 and linked against fftw-3.1.2 on a RH 2.4.21-4.ELsmp kernel.
mdrun works when not using harmonic position restraints but crashes with a segmentation fault when these restraints are used.
The same version of gromacs and fftw built on an AMD Opteron using gcc version 4.1.0 (SUSE Linux) and 2.6.16.21-0.8-smp kernel also works fine with or without the position restraints.

We used the command: mdrun -v and the last line in md.log is:
Large VCM: 0.00000, 0.00000, 5563405426657770979983360.00000, T-cm: inf

The topol.tpr file which results in a segmentation fault is attached.

topol.tpr (2.05 MB) topol.tpr Input to mdrun Peter Stern, 11/18/2007 10:58 AM

History

#1 Updated by David van der Spoel about 12 years ago

After how much time does it crash?

I get

step 400, will finish at Wed Dec 26 13:35:44 2007

and I don't want to wait until christmas...

#2 Updated by Peter Stern about 12 years ago

For me it crashes immediately, after a few seconds. I get:
Getting Loaded...
Reading file topol.tpr, VERSION 3.3.2 (single precision)
Loaded with Money

starting mdrun 'Charged plates in water'
20000000 steps, 40000.0 ps.

step 0Segmentation fault
Getting Loaded...
Reading file topol.tpr, VERSION 3.3.2 (single precision)
Loaded with Money

starting mdrun 'Charged plates in water'
20000000 steps, 40000.0 ps.

step 0Segmentation fault

The message you get I get on a file that works (with no restraints).

#3 Updated by Erik Lindahl about 12 years ago

Hi Peter,

We've had some issues with a possible compiler bug in some recent gcc versions that we still haven't been able to isolate.

Since it sounds as if you're building this yourself, could you try to (in order of increasing difficulty/work):

1. disable assembly kernels (--disable-ia32-sse , or set NOASSEMBLYLOOPS to anything when running)
2. run it in a debugger to see where it crashes
3. Recompile with a debug flag, e.g.: ./configure CFLAGS="-O3 -g"
4. run in a debugger an see why it crashes

Cheers,

Erik

#4 Updated by David van der Spoel about 12 years ago

Maybe it is a compilation issue indeed. I've tried it on

Ubuntu 7.10 64 bit (Ubuntu provided version)
Mac OS X 10.5 (gcc 4.0.1)
Centos 64 bit and 32 bit binary

and it works everywhere :(

#5 Updated by Peter Stern about 12 years ago

Dear Erik:

Here is where it crashes:
Program received signal SIGSEGV, Segmentation fault.
0x080ae4fd in spread_on_grid ()

Regards,
Peter

#6 Updated by Erik Lindahl about 12 years ago

Hi,

OK, the crash probably happens either because the input data is incorrect, or because the compiler is doing something nasty with rounding.

In either case, it would help a lot to recompile with debug flags so we get an idea about values of local variables where it crashes!

#7 Updated by Peter Stern about 12 years ago

Dear David:

Yes, it works just fine for me using the gcc 4.1.0 compiler on an x86_64 system
(SLES 10) as I already said. I also tried it earlier on the ia32 system using
gcc 4.0.0 and got the same error.

I can compile it with -g and try to see why it crashes.

Regards,
Peter

#8 Updated by David van der Spoel about 12 years ago

I have reproduced the problem with gcc 4.1.2 on Fedora 6.
Debugging now.

#9 Updated by David van der Spoel about 12 years ago

The error lies in the position restraint routine somewhere, here spurious forces and energies suddenly occur.

A simple workaround is to use
./configure [flags]
make CFLAGS="-O2" install

If you are worried about performance then it suffices to compile only bondfree with -O2 instead of the default -O3.

#10 Updated by Erik Lindahl about 12 years ago

I've just looked through the position restraint code (~25 lines of code), and unfortunately I cannot find any obvious bug there at least. With the compiler- and optimization-dependent behaviour this is indeed probably a compiler bug.

I'll see if we can track down and work around this bug, so I'm changing resolution to "LATER" for now.

David, did you have a fedora 6 machine on the network in Uppsala I could log on to?

Cheers,

Erik

#11 Updated by David van der Spoel about 12 years ago

Not really, it is in VMWare on my laptop. Useful, that...

I suspect that if we can locally turn off the in-lining of the harmonic routine that it might work.

#12 Updated by David van der Spoel about 12 years ago

Maybe configure can test for the compiler version and turn down the optimization on gcc version >= 4.1

#13 Updated by Peter Stern about 12 years ago

I tried compiling just bondfree with -O2 but that didn't work.
Using CFLAGS="-O2" did solve the problem, however. So I guess that there are some other routines involved here which need to be compiled with -O2 besides just bondfree.

#14 Updated by Peter Stern about 12 years ago

OK. I got it to work after compiling only bondfree with -O2.
I guess I didn't do it right the first time. Thanks again.

Also available in: Atom PDF