Project

General

Profile

Bug #1254

Updated by Szilárd Páll over 7 years ago

Based on some strange mdrun runtime behavior, including unexpected behavior, race condition, and segv-s it is quite likely that there is memory corruption occurring in parallel mdrun runs which might be related to affinity setting.

Symptoms:
# The message "Pinning threads with a logical core stride of..." is often missing from the log file even if @-pinstride@ is not set on the command line - this could only happen if the memory holding the stride gets overwritten (see "gmx_thread_affinity.c:133":http://redmine.gromacs.org/projects/gromacs/repository/revisions/release-4-6/entry/src/gmxlib/gmx_thread_affinity.c#L133 );
# Valgrind reports use of uninitialized values (see attached);
# -With With MPI builds in some cases race conditions and segv-s have been observed. I've managed to repro on the tcbs2x.theophys.kth.se AMD compute machines with a 192k atom water system as well as a protein system Anders G. is working with (see in/nethome/anders/VSDbox/kv21_vsd-GPU_testing/*crash).- This bug seems to not be related to the affinity setting issue. in/nethome/anders/VSDbox/kv21_vsd-GPU_testing/*crash).

Back