Project

General

Profile

Bug #528

XDR Crash Windows ICC+64bit

Added by Kyle Beauchamp about 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Erik Lindahl
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

This bug was producing using a build from Aug. 25. The file gmx_system_xdr.c has not been modified recently.

When running a specific TPR file (PME), I get an unhandled exception (access violation) crash under windows. For some reason, this crash only seems to appear under 64 bit, ICC, windows builds. I suspect that the bug may also be present in MSVC 64 bit builds, but perhaps compiler differences are preventing the crash from occurring.

I'm emailing the TPR file to Erik to avoid posting it publicly. It's not super-sensitive, but we don't want it available via link. With his approval, he or I can email it to others as necessary.

I built a debug mdrun.exe and traced the crash to the following function:

bool_t xdr_float(xdrs, fp)

Line 489 appears to be the culprit:

case XDR_ENCODE:
tmp = *(xdr_int32_t *)fp;
return (xdr_putint32(xdrs, &tmp));
break;

Here is the tail of the md.log file--it shows that we die before the second log write:

There are: 27330 Atoms
There are: 1307 VSites
Max number of connections per atom is 106
Total number of connections is 170086
Max number of graph edges per atom is 8
Total number of graph edges is 43488
Initial temperature: 298.018 K

Started mdrun on node 0 Thu Aug 26 17:22:58 2010

Step           Time         Lambda
80000000 400000.00000 0.00000

Grid: 10 x 8 x 17 cells
Energies (kJ/mol)
Angle Proper Dih. Ryckaert-Bell. Improper Dih. LJ-14
1.21135e+004 3.39489e+003 1.06416e+004 5.72889e+002 9.46204e+003
Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
7.67715e+003 1.57743e+004 -3.66435e+005 -1.37424e+005 -4.44222e+005
Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
6.77188e+004 -3.76503e+005 2.98410e+002 -1.69390e+001 8.12903e-005

Writing checkpoint, step 80003100 at Thu Aug 26 17:37:59 2010

History

#1 Updated by Kyle Beauchamp about 9 years ago

PS it should take on the order of ~10 minutes to crash (Running with -nt 1 on an AMD 1090T).

#2 Updated by Teemu Murtola about 9 years ago

This can be related to bug #522, which was fixed on Aug 26. Could you try with the most recent version?

PS. In addition to the function where it crashes, a stack trace usually provides useful information, in particular if the crash is in a low-level routine like here. If the crashing function was called from do_cpte_reals_low(), it is most probably bug #522.

#3 Updated by Kyle Beauchamp about 9 years ago

It looks like yesterday's patch fixed this problem. Thanks!

#4 Updated by Teemu Murtola about 9 years ago

Marking as fixed as the bug no longer appears with the newest version.

Also available in: Atom PDF