Project

General

Profile

Bug #1853

LINCS: (erroneous?) assertion with multiple tMPI ranks and multiple OpenMP threads

Added by Thomas Ullmann over 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
>= v5.1, commit b23fad4be3871942cc5e4cf9a6dac311f51005bb, Change-Id: Ibbafd9c10f51d35a87e9784a0650d849c0d1c1e5
Affected version:
Difficulty:
uncategorized
Close

Description

When running with more than one tMPI thread and more than one one OpenMP thread per rank,
I get the following error messages of the following type with gmx mdrun versions >5.0:

gmx_d: /home/tullman/GROMACS/gromacs.git/src/gromacs/mdlib/clincs.cpp:1417: void set_lincs_matrix_task(gmx_lincsdata*, lincs_task_t*, const real*, int*): Assertion `k >= li_task->b0 && k < li_task->b1' failed.
Abort

The error messages are due to these two assertions in clincs.cpp:

1413 /* If we are using multiple tasks for LINCS,
1414 * the calls to check_assign_triangle should have
1415 * put all constraints in the triangle in our task.
1416 */
1417 assert(k >= li_task->b0 && k < li_task->b1);
1418 assert(kk >= li_task->b0 && kk < li_task->b1);

The assertions were introduced in this commit:
commit b23fad4be3871942cc5e4cf9a6dac311f51005bb
Change-Id: Ibbafd9c10f51d35a87e9784a0650d849c0d1c1e5

Commenting out the assertions seems to let the simulation run just fine,
judging from a quick look at the trajectory and some observables from
a short (200 ps). This run done was done with a GPU-version, but the
problem occurs also when not compiling for GPUs and on two different machines.
Only thread-MPI + OpenMP versions were tested so far.

Versions >=5.1 run without problems with the same input with a single rank
and any number of OpenMP threads, or two ranks with a single OpenMP thread.
Versions 4.x run without problems for any number of (t)MPI ranks and OpenMP
threads.

Could the conditions for the assertion be inapplicable for >=2 ranks and >=2
OpenMP threads?

channel.gro (18.8 MB) channel.gro Thomas Ullmann, 11/12/2015 01:56 AM
channel.tpr (19.1 MB) channel.tpr Thomas Ullmann, 11/12/2015 01:56 AM
run-gmx-mdrun.csh (964 Bytes) run-gmx-mdrun.csh Thomas Ullmann, 11/12/2015 01:57 AM
mdout.mdp (11.4 KB) mdout.mdp Thomas Ullmann, 11/12/2015 02:01 AM
channel.assertions_commented_out.log (68.2 KB) channel.assertions_commented_out.log Thomas Ullmann, 11/12/2015 02:44 AM

Associated revisions

Revision a6f7cdb2 (diff)
Added by Berk Hess over 3 years ago

Fix LINCS triangle constraint thread issue

With triangle constraints, a OpenMP thread barrier could be missing
under certain conditions. A debug build produced an assertion failure.
This was unlikely to happen without DD, but quite likely with DD.
This bug will have had minimal effect on the results.

Fixes #1853

Change-Id: I6ad1a75ce89ef1e753a7567868a76c2fd891f729

History

#1 Updated by Gerrit Code Review Bot over 3 years ago

Gerrit received a related patchset '1' for Issue #1853.
Uploader: Berk Hess ()
Change-Id: I6ad1a75ce89ef1e753a7567868a76c2fd891f729
Gerrit URL: https://gerrit.gromacs.org/5345

#2 Updated by Berk Hess over 3 years ago

  • Status changed from New to Resolved

#3 Updated by Erik Lindahl about 3 years ago

  • Status changed from Resolved to Closed

#4 Updated by Mark Abraham almost 3 years ago

  • Target version changed from 5.x to 5.1.3

Also available in: Atom PDF