Project

General

Profile

Bug #3081

segmentation fault from ~LegacyMdrunOptions()

Added by Eric Irrgang 5 months ago. Updated 4 months ago.

Status:
Closed
Priority:
High
Assignee:
Category:
core library
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Only seems to occur in simulations launched through Python gmxapi. The problem was not obvious (only occurred in MPI-enabled GROMACS) before https://gerrit.gromacs.org/c/gromacs/+/12982

Initial investigation indicates done_commrec is either called multiple times or lacks sufficient checking for a nullptr. The lifetime, ownership, and copy safeness of LegacyMdrunOptions are not well specified, and it is not clear that ownership of a commrec instance is transferred to the LegacyMdrunOptions instance.

Two possible solutions that work in initial testing are to (a) add nullptr checks to done_commrec, or (b) to ~LegacyMdrunOptions.

Suggestions on how to manage commrec lifetime would be appreciated.

Otherwise, I will do a little more testing and submit a conservative change that adds checking in both places.

Associated revisions

Revision a363c7cc (diff)
Added by Mark Abraham 5 months ago

Use more RAII semantics with t_commrec

This permitted simplifying building the Mdrunner

Fixes #3081

Change-Id: I4ac5fb2017d960d3d0cfd0103e1d7232da9f1c3a

Revision 51973164 (diff)
Added by Mark Abraham 5 months ago

Use more RAII semantics with t_commrec

This permitted simplifying building the Mdrunner

Fixes #3081

Change-Id: I4ac5fb2017d960d3d0cfd0103e1d7232da9f1c3a

Revision 2ec3cd52 (diff)
Added by Mark Abraham 4 months ago

Improve MiMiC test implementation

These tests were not using callGrompp in the intended way, and
inadvertently relying on behavior that might change when follow-up
work on #3081 changes mdrun thread-MPI initialization behavior

Change-Id: Icdbb6049a7b609ddaabc8e073f87e8bbb00da4f2

Revision e39949c7 (diff)
Added by Mark Abraham 4 months ago

Removed dependency on commrec of mdrun setup

Changes no functionality.

Setup is now parameterized directly on MPI_COMM_WORLD, which we will
want later for letting library-based callers pass in an
MPI_Communicator. This permits commrec to be initialized later, once
the threads have been spawned for the thread-MPI ranks.

The initialization of multi-simulations moves from LegacyMdrunOptions
to SimulationContext, which is more appropriate for
ensemble-parallelism established directly by the user.

Before the decision about the duty of a rank, there is no difference
between MASTER and SIMMASTER, so several calls to macros
taking a t_commrec pointer are replaced by booleans. Introduced
findIsSimulationMasterRank to compute that value. This eliminates
early use of t_commrec that has necessitated other hacks and
workarounds.

Removed redundant check for replica exchange when the number of multi
simulations is less than two, because gmx_multisim_t constructor
already prohibits that.

Resolves several TODO items and improves modularity, too.

Refs #2587, #2605, #3081

Change-Id: I48bd3b713bc181b5c1e4cbcd648706a9f00eab96

History

#1 Updated by Mark Abraham 5 months ago

https://gerrit.gromacs.org/c/gromacs/+/13047 proposes a fix for proper lifetime semantics for t_commrec in all cases, but I need Eric's guidance on correctness for the gmxapi use case

#2 Updated by Mark Abraham 5 months ago

  • Status changed from New to Fix uploaded
  • Assignee set to Mark Abraham

#3 Updated by Mark Abraham 5 months ago

  • Status changed from Fix uploaded to Resolved

#4 Updated by Mark Abraham 5 months ago

#5 Updated by Paul Bauer 4 months ago

  • Status changed from Resolved to Closed

#6 Updated by Mark Abraham 4 months ago

  • Status changed from Closed to In Progress

Eric gave some feedback at https://gerrit.gromacs.org/c/gromacs/+/13047#message-818c7e2c98945ed1994b331ef1b2059c0dd4f50c which I will try to address in a follow-up patch, but perhaps too complicated for beta phase

#7 Updated by Mark Abraham 4 months ago

  • Target version changed from 2020-beta1 to 2020-beta2

#8 Updated by Mark Abraham 4 months ago

Mark Abraham wrote:

Eric gave some feedback at https://gerrit.gromacs.org/c/gromacs/+/13047#message-818c7e2c98945ed1994b331ef1b2059c0dd4f50c which I will try to address in a follow-up patch, but perhaps too complicated for beta phase

Now at https://gerrit.gromacs.org/c/gromacs/+/11198 (plus a few clean up patches already integrated)

#9 Updated by Paul Bauer 4 months ago

is this still an issue here

#10 Updated by Mark Abraham 4 months ago

  • Status changed from In Progress to Resolved

Paul Bauer wrote:

is this still an issue here

No, the issue is long solved, and the subsequent cleanup is also in.

#11 Updated by Paul Bauer 4 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF