Project

General

Profile

Bug #512

Windows 64bit crashes in PME and RF

Added by Kyle Beauchamp about 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Erik Lindahl
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Created an attachment (id=517)
md.log file

Tested using the Aug. 19 GIT.

This is possibly related to http://bugzilla.gromacs.org/show_bug.cgi?id=506

I tested builds using both ICC and MSVC, so this problem may be a small assembly bug. Also, the place where my md.log stops reminds of of earlier assembly bugs.

If I run an explicit solvent system (I tried 3 different systems, and RF and PME), mdrun crashes around the time of the first MD step (I can't give an exact crash time because the md.log write buffers aren't flushed very often).

Sometimes I get an error message of Unhandled win32 exception, which in the debugger also reports an access violation. I haven't done much else with this yet.

If I change back to a 32 bit mdrun, things run fine. Also, my GBSA TPR files seem to work fine in 64 bit mode, so perhaps one of the GBSA ASM bugfixes will also apply to the PME / RF kernels.

I am attaching a md.log file and a broken tpr.

vhp-RF.log (8.03 KB) vhp-RF.log md.log file Kyle Beauchamp, 08/21/2010 09:21 PM
vhp-rf.tpr (153 KB) vhp-rf.tpr TPR File RF Villin Kyle Beauchamp, 08/21/2010 09:23 PM

History

#1 Updated by Kyle Beauchamp about 9 years ago

Created an attachment (id=518)
TPR File RF Villin

#2 Updated by Kyle Beauchamp about 9 years ago

Note that the mdp settings in this file are likely not perfect (poor choices of cutoffs etc), but the other test cases I used were better. I included this test case because the other tpr files are from other students' research.

#3 Updated by Kyle Beauchamp about 9 years ago

Another note: I tried forcing -nt 1, which did not solve the problem.

#4 Updated by Erik Lindahl about 9 years ago

Things seem to work fine with cmake on OS X (which uses the same ASM files), so I'm not sure if this is an ASM bug. What happens if you set GMX_NOOPTIMIZEDKERNELS (to any value)?

#5 Updated by Kyle Beauchamp about 9 years ago

My 64 bit builds work fine when I "set GMX_NOOPTIMZEDKERNELS=1"

Started mdrun on node 0 Sun Aug 22 12:02:50 2010

Step           Time         Lambda
0 0.00000 0.00000

Grid: 15 x 15 x 12 cells
Energies (kJ/mol)
Angle Proper Dih. Ryckaert-Bell. Improper Dih. LJ-14
4.51437e+004 3.96956e+003 2.05713e+004 7.64538e+002 1.74526e+004
Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
3.17474e+004 5.05846e+004 -1.06049e+007 -3.49172e+006 -1.39264e+007
Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
1.49276e+005 -1.37771e+007 3.04706e+002 3.28313e+002 1.79021e-005

Received the INT signal, stopping at the next NS step

Step           Time         Lambda
60 0.12000 0.00000

Writing checkpoint, step 60 at Sun Aug 22 12:03:51 2010

Energies (kJ/mol)
Angle Proper Dih. Ryckaert-Bell. Improper Dih. LJ-14
4.45849e+004 3.99212e+003 2.03211e+004 7.87070e+002 1.72037e+004
Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
3.22196e+004 4.87817e+004 -1.06041e+007 -3.49165e+006 -1.39279e+007
Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
1.45333e+005 -1.37825e+007 2.96657e+002 -1.76953e+002 1.79021e-005

#6 Updated by Erik Lindahl about 9 years ago

I assume you're using the release-4-5-patches branch? I don't merge every incremental commit to master yet - we'll do that when we release.

#7 Updated by Kyle Beauchamp about 9 years ago

Sorry, I bet that's it. I guess I assumed that the patches got merged.

#8 Updated by Kyle Beauchamp about 9 years ago

I'll rebuild with today's master, as you just ported in all patches it appears.

#9 Updated by Erik Lindahl about 9 years ago

I'd still recommend actually checking it for release-4-5-patches, since that's the code that will be in the release :-)

#10 Updated by Kyle Beauchamp about 9 years ago

Hi,

I don't think that was it. I think I'm building with the correct 4-5-release branch but I still see the same crash.

Here's what I did:

git clone git://git.gromacs.org/gromacs.git gromacs-aug22
cd gromacs-aug22
git checkout --track -b release-4-5-patches origin/release-4-5-patches

then I built, but things seem to die with the same problem as before. Let me know if I'm not building correctly (in which case my patch for Bug 515 could have problems too).

#11 Updated by Erik Lindahl about 9 years ago

Apparently Microsoft, in their divine wisdom, has decided to use their own application binary interface on x86-64 with a call sequence that is different from AMD64 (which all other operating systems use).

I'll see what I can do...

#12 Updated by Erik Lindahl about 9 years ago

Fixed in commit commit 0b8f869fe63dfe2bbb76d6f3e25c5823db2a074d, for both single and double.

The intel syntax assembly kernels now contain %ifdefs for win64 output format, and will adjust the binary call sequence appropriately.

Also available in: Atom PDF