(thread-) MPI setup hanging on bs_jetson_tk1
Running gmx mdrun leads to a hang during the thread-MPI setup phase since commit https://gerrit.gromacs.org/c/gromacs/+/11197
#2 Updated by Szilárd Páll 2 months ago
Reproduced, the issue seems to be that some tMPI ranks are stuck in analyzeThreadsOnThisNode() MPI_Scan(), while the other is already in the next bcast collective:
Here's the backtrace.
$ gdb --args gmx mdrun -ntmpi 2 -ntomp 1 -nb cpu -notunepme -s topol [...] (gdb) bt #0 tMPI_Event_wait (ev=0x398c8) at /nethome/pszilard/gromacs-master/src/external/thread_mpi/src/event.cpp:71 #1 0xb6cc9bdc in tMPI_Wait_for_others (cev=0x36fd4, myrank=0) at /nethome/pszilard/gromacs-master/src/external/thread_mpi/src/collective.cpp:522 #2 0xb6cc8ebc in tMPI_Bcast (buffer=0xbeffe9a0, count=8, datatype=0xb6fcbb24 <tmpi_byte>, root=0, comm=0x396b8) at /nethome/pszilard/gromacs-master/src/external/thread_mpi/src/bcast.cpp:98 #3 0xb6ae3b5a in gmx_bcast_sim (nbytes=8, b=0xbeffe9a0, cr=0x5ad90) at /nethome/pszilard/gromacs-master/src/gromacs/gmxlib/network.cpp:286 #4 0xb6bfcdb2 in gmx::Mdrunner::mdrunner (this=0xbeffede8) at /nethome/pszilard/gromacs-master/src/gromacs/mdrun/runner.cpp:1225 #5 0x00014e4a in gmx::gmx_mdrun (argc=12, argv=0xbefff608) at /nethome/pszilard/gromacs-master/src/programs/mdrun/mdrun.cpp:276 #6 0xb671bd98 in gmx::(anonymous namespace)::CMainCommandLineModule::run (this=0x2e148, argc=12, argv=0xbefff608) at /nethome/pszilard/gromacs-master/src/gromacs/commandline/cmdlinemodulemanager.cpp:133 #7 0xb671d034 in gmx::CommandLineModuleManager::run (this=0xbefff48c, argc=12, argv=0xbefff608) at /nethome/pszilard/gromacs-master/src/gromacs/commandline/cmdlinemodulemanager.cpp:589 #8 0x00012b7c in main (argc=13, argv=0xbefff604) at /nethome/pszilard/gromacs-master/src/programs/gmx.cpp:60
Simple reason could be that the refactoring broke things and
hw_opt.thread_affinity != threadaffOFF check evaluates differently on the two ranks (affinities get turned off as for some reason the mask on this slave is 0x1 by default).
Can't dig deeper ATM, but if anybody wants to debug here's a binary you can grab on dev-jetson01:
#3 Updated by Szilárd Páll 2 months ago
A few more tests:
$ gmx mdrun -ntmpi 2 -ntomp 1 -nb cpu -notunepme -s topol -pin off $ gmx mdrun -ntmpi 2 -ntomp 1 -nb cpu -notunepme -s topol -pin on
both work to some extent -- crashes due to unrelated issue).
The latter actually fails to set affinities, but this may be a peculiarity on the ARM board.
This seems to support my above hypothesis about the thread affinity check not evaluating to the same value on all ranks
#6 Updated by Szilárd Páll about 2 months ago
Mark Abraham wrote:
I didn't spot anything that suggested the code is a problem :-( I suggest we revert my change (on Monday!)
The issue was indeed that conditional on line 1194 that I pointed at; after refactoring that conditional communication became a path that was triggered with tMPI and as on the Jetson TK1 the kernel seems to change the system-wide affinity mask based on load, different ranks detected different masks and therefore different values of hw_opt.thread_affinity.