Project

General

Profile

Bug #1544

Most test fails after success compilation when GMX_THREAD_MPI=ON

Added by Hector Martinez-Seara Monne over 5 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Hi,
I have 3 identical new computers with Intel(R) Core(TM) i7-4770 CPU where after success compilation of gromacs most of the tests fail core dumping.

Start testing: Jul 01 09:32 EEST
----------------------------------------------------------
1/6 Testing: regressiontests/simple
1/6 Test: regressiontests/simple
Command: "/usr/bin/perl" "/home/hector/gromacs/gromacs-4.6.5-complete/src/single/tests/regressiontests-4.6.5/gmxtest.pl" "simple" "-crosscompile" "-noverbose" "-nosuffix" 
Directory: /home/hector/gromacs/gromacs-4.6.5-complete/src/single/tests
"regressiontests/simple" start time: Jul 01 09:32 EEST
Output:
----------------------------------------------------------

Abnormal return value for ' mdrun    -notunepme -table ../table -tablep ../tablep >mdrun.out 2>&1' was 139
No mdrun output files.
FAILED. Check mdrun.out, md.log files in angles125
sh: line 1: 10910 Segmentation fault      (core dumped) mdrun -notunepme -table ../table -tablep ../tablep > mdrun.out 2>&1
sh: line 1: 10940 Segmentation fault      (core dumped) mdrun -notunepme -table ../table -tablep ../tablep > mdrun.out 2>&1

Abnormal return value for ' mdrun    -notunepme -table ../table -tablep ../tablep >mdrun.out 2>&1' was 139
No mdrun output files.
FAILED. Check mdrun.out, md.log files in bonds125
sh: line 1: 10958 Segmentation fault      (core dumped) mdrun -notunepme -table ../table -tablep ../tablep > mdrun.out 2>&1

...

This is the typical backtrace for the coredumps obtained:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `mdrun -notunepme -table ../table -tablep ../tablep'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fa36cc4beb8 in __lll_unlock_elision () from /usr/lib/libpthread.so.0
(gdb) backtrace
#0  0x00007fa36cc4beb8 in __lll_unlock_elision () from /usr/lib/libpthread.so.0
#1  0x00007fa36dd1b1d2 in tMPI_Thread_mutex_lock () from /home/hector/gromacs/gromacs-4.6.5-complete/src/single/src/gmxlib/libgmx.so.8
#2  0x00007fa36dd22c38 in tMPI_Comm_alloc () from /home/hector/gromacs/gromacs-4.6.5-complete/src/single/src/gmxlib/libgmx.so.8
#3  0x00007fa36dd26c39 in tMPI_Start_threads () from /home/hector/gromacs/gromacs-4.6.5-complete/src/single/src/gmxlib/libgmx.so.8
#4  0x00007fa36dd274f1 in tMPI_Init_fn () from /home/hector/gromacs/gromacs-4.6.5-complete/src/single/src/gmxlib/libgmx.so.8
#5  0x000000000040f578 in mdrunner ()
#6  0x000000000043aed4 in cmain ()
#7  0x00007fa36d4d8000 in __libc_start_main () from /usr/lib/libc.so.6
#8  0x000000000040788e in _start ()

Also tests passes when GMX_THREAD_MPI=OFF is used in cmake. All point out to some problem in tMPI_Thread_mutex_lock() used when having Intel(R) Core(TM) i7-4770 CPU processors. Computers with Intel(R) Core(TM) i7 CPU X 980 or Intel(R) Core(TM) i7-2600 CPU or Intel(R) Core(TM) i7-3770 CPU does not show the same problem. All computer have archlinux fully updated:
- Linux 3.15.2-1-ARCH x86_64 GNU/Linux
- gcc-4.9.0
- cmake 3.0.0

I have seen the same behavior installing gromacs 4.6.5 and 4.6.3 in the affected systems.

By the way I have seen the following bug which might be related: Issue #1533

History

#1 Updated by Hector Martinez-Seara Monne over 5 years ago

I have tested git branch "release-4.6" and the problem seems gone. So I'm not sure if it is worthy to investigate further.

#2 Updated by Roland Schulz over 5 years ago

  • Status changed from New to Closed

#1533 is limited to 32bit. Did you compile for 32bit?

Either way I'll close it for now because you said it is solved in release-4-6. Please reopen if you encounter it again.

#3 Updated by Hector Martinez-Seara Monne over 5 years ago

I compiled in 64bit. Anyway as I said with 4.6.6-dev-20140629-5886961 the problem is gone.

Also available in: Atom PDF