Bug #1544
Most test fails after success compilation when GMX_THREAD_MPI=ON
Description
Hi,
I have 3 identical new computers with Intel(R) Core(TM) i7-4770 CPU where after success compilation of gromacs most of the tests fail core dumping.
Start testing: Jul 01 09:32 EEST ---------------------------------------------------------- 1/6 Testing: regressiontests/simple 1/6 Test: regressiontests/simple Command: "/usr/bin/perl" "/home/hector/gromacs/gromacs-4.6.5-complete/src/single/tests/regressiontests-4.6.5/gmxtest.pl" "simple" "-crosscompile" "-noverbose" "-nosuffix" Directory: /home/hector/gromacs/gromacs-4.6.5-complete/src/single/tests "regressiontests/simple" start time: Jul 01 09:32 EEST Output: ---------------------------------------------------------- Abnormal return value for ' mdrun -notunepme -table ../table -tablep ../tablep >mdrun.out 2>&1' was 139 No mdrun output files. FAILED. Check mdrun.out, md.log files in angles125 sh: line 1: 10910 Segmentation fault (core dumped) mdrun -notunepme -table ../table -tablep ../tablep > mdrun.out 2>&1 sh: line 1: 10940 Segmentation fault (core dumped) mdrun -notunepme -table ../table -tablep ../tablep > mdrun.out 2>&1 Abnormal return value for ' mdrun -notunepme -table ../table -tablep ../tablep >mdrun.out 2>&1' was 139 No mdrun output files. FAILED. Check mdrun.out, md.log files in bonds125 sh: line 1: 10958 Segmentation fault (core dumped) mdrun -notunepme -table ../table -tablep ../tablep > mdrun.out 2>&1 ...
This is the typical backtrace for the coredumps obtained:
[Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". Core was generated by `mdrun -notunepme -table ../table -tablep ../tablep'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fa36cc4beb8 in __lll_unlock_elision () from /usr/lib/libpthread.so.0 (gdb) backtrace #0 0x00007fa36cc4beb8 in __lll_unlock_elision () from /usr/lib/libpthread.so.0 #1 0x00007fa36dd1b1d2 in tMPI_Thread_mutex_lock () from /home/hector/gromacs/gromacs-4.6.5-complete/src/single/src/gmxlib/libgmx.so.8 #2 0x00007fa36dd22c38 in tMPI_Comm_alloc () from /home/hector/gromacs/gromacs-4.6.5-complete/src/single/src/gmxlib/libgmx.so.8 #3 0x00007fa36dd26c39 in tMPI_Start_threads () from /home/hector/gromacs/gromacs-4.6.5-complete/src/single/src/gmxlib/libgmx.so.8 #4 0x00007fa36dd274f1 in tMPI_Init_fn () from /home/hector/gromacs/gromacs-4.6.5-complete/src/single/src/gmxlib/libgmx.so.8 #5 0x000000000040f578 in mdrunner () #6 0x000000000043aed4 in cmain () #7 0x00007fa36d4d8000 in __libc_start_main () from /usr/lib/libc.so.6 #8 0x000000000040788e in _start ()
Also tests passes when GMX_THREAD_MPI=OFF is used in cmake. All point out to some problem in tMPI_Thread_mutex_lock() used when having Intel(R) Core(TM) i7-4770 CPU processors. Computers with Intel(R) Core(TM) i7 CPU X 980 or Intel(R) Core(TM) i7-2600 CPU or Intel(R) Core(TM) i7-3770 CPU does not show the same problem. All computer have archlinux fully updated:
- Linux 3.15.2-1-ARCH x86_64 GNU/Linux
- gcc-4.9.0
- cmake 3.0.0
I have seen the same behavior installing gromacs 4.6.5 and 4.6.3 in the affected systems.
By the way I have seen the following bug which might be related: Issue #1533
History
#1 Updated by Hector Martinez-Seara Monne over 6 years ago
I have tested git branch "release-4.6" and the problem seems gone. So I'm not sure if it is worthy to investigate further.
#2 Updated by Roland Schulz over 6 years ago
- Status changed from New to Closed
#1533 is limited to 32bit. Did you compile for 32bit?
Either way I'll close it for now because you said it is solved in release-4-6. Please reopen if you encounter it again.
#3 Updated by Hector Martinez-Seara Monne over 6 years ago
I compiled in 64bit. Anyway as I said with 4.6.6-dev-20140629-5886961 the problem is gone.