COMM Removal Failure in GROMACS 2020.1
This is the same issue reported in the following thread: https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2020-March/128660.html
Summary: GROMACS 2020.1 fails to remove center of mass motion for a protein-membrane system in a very short simulation (1 ns) where the membrane translates several nm in the xy plane (plane of the membrane) as measured by gmx traj and confirmed by visual inspection.
Files are attached in a zipped/compressed folder. The system is somewhat large, but since the same error was not observed in a pure membrane system (no protein), I found it necessary to test with this. The error is observable after 1 ns so long simulations are not required. The system was built with the CHARMM GUI and using the CHARMM36 force field.
NOTE: I also tried the system with only two comm groups (i.e. I combined the protein and membrane groups into a single comm group), but this did not resolve the issue.
Description of files:
(slurm file for compiling tpr and running simulation)
(End point from previous restrained npt equilibration)
(topology files and index file)
(tpr file and simulation output)
(xtc file every 100 ps to avoid uploading a huge file)
(com tracking for protein embedded in membrane)
- Category set to mdrun
- Status changed from New to Feedback wanted
A run on a single GPU without the GMX_GPU_DD_COMMS environment variable does not show this issue.
My guess is that this is caused by the experimental GMX_GPU_DD_COMMS feature. Could you try without this environment variable to check if that causes the issue?
Daniel, thanks very much for reporting this. To help us isolate the issue, could you also please try with exactly the same settings as your original run, but without GMX_FORCE_UPDATE_DEFAULT_GPU set. This will then trigger the new GPU communication features, but not the new GPU update feature.
#3 Updated by Daniel Kozuch 8 months ago
Thanks for the replies. It looks like the COMM issue is resolved if I don't use GMX_FORCE_UPDATE_DEFAULT_GPU feature, although there is a small performance hit (maybe 10%).
If I run without the GMX_GPU_DD_COMMS feature (while using GMX_GPU_PME_PP_COMMS and GMX_FORCE_UPDATE_DEFAULT_GPU on 4 GPUs), I get an immediate segmentation fault.
Thanks Daniel - just to be sure, do I understand correctly that:
the code works as expected with * no experimental features set * the GMX_GPU_DD_COMMS and GMX_GPU_PME_PP_COMMS variables set
it fails to remove center of mass motion with: * the GMX_GPU_DD_COMMS, GMX_GPU_PME_PP_COMMS and GMX_FORCE_UPDATE_DEFAULT_GPU variables set
it crashes with a seg fault with: * the GMX_GPU_PME_PP_COMMS and GMX_FORCE_UPDATE_DEFAULT_GPU variables set