Bug #1254

Updated by Szilárd Páll over 7 years ago

Based on some strange mdrun runtime behavior, including unexpected behavior, race condition, and segv-s it is quite likely that there is memory corruption occurring in parallel mdrun runs which might be related to affinity setting.

# The message "Pinning threads with a logical core stride of..." is often missing from the log file even if @-pinstride@ is not set on the command line - this could only happen if the memory holding the stride gets overwritten (see "gmx_thread_affinity.c:133": );
# Valgrind reports use of uninitialized values (see attached);
# With -With MPI builds in some cases race conditions and segv-s have been observed. I've managed to repro on the AMD compute machines with a 192k atom water system as well as a protein system Anders G. is working with (see in/nethome/anders/VSDbox/kv21_vsd-GPU_testing/*crash).
in/nethome/anders/VSDbox/kv21_vsd-GPU_testing/*crash).- This bug seems to not be related to the affinity setting issue.-
UPDATE2: It is not related, the deadlock is reproducible even with commit:e5d22a35.