Project

General

Profile

Bug #3081

segmentation fault from ~LegacyMdrunOptions()

Added by Eric Irrgang about 1 month ago. Updated about 1 hour ago.

Status:
Closed
Priority:
High
Assignee:
Category:
core library
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Only seems to occur in simulations launched through Python gmxapi. The problem was not obvious (only occurred in MPI-enabled GROMACS) before https://gerrit.gromacs.org/c/gromacs/+/12982

Initial investigation indicates done_commrec is either called multiple times or lacks sufficient checking for a nullptr. The lifetime, ownership, and copy safeness of LegacyMdrunOptions are not well specified, and it is not clear that ownership of a commrec instance is transferred to the LegacyMdrunOptions instance.

Two possible solutions that work in initial testing are to (a) add nullptr checks to done_commrec, or (b) to ~LegacyMdrunOptions.

Suggestions on how to manage commrec lifetime would be appreciated.

Otherwise, I will do a little more testing and submit a conservative change that adds checking in both places.

Associated revisions

Revision a363c7cc (diff)
Added by Mark Abraham about 1 month ago

Use more RAII semantics with t_commrec

This permitted simplifying building the Mdrunner

Fixes #3081

Change-Id: I4ac5fb2017d960d3d0cfd0103e1d7232da9f1c3a

Revision 51973164 (diff)
Added by Mark Abraham about 1 month ago

Use more RAII semantics with t_commrec

This permitted simplifying building the Mdrunner

Fixes #3081

Change-Id: I4ac5fb2017d960d3d0cfd0103e1d7232da9f1c3a

Revision 2ec3cd52 (diff)
Added by Mark Abraham 19 days ago

Improve MiMiC test implementation

These tests were not using callGrompp in the intended way, and
inadvertently relying on behavior that might change when follow-up
work on #3081 changes mdrun thread-MPI initialization behavior

Change-Id: Icdbb6049a7b609ddaabc8e073f87e8bbb00da4f2

Revision e39949c7 (diff)
Added by Mark Abraham 15 days ago

Removed dependency on commrec of mdrun setup

Changes no functionality.

Setup is now parameterized directly on MPI_COMM_WORLD, which we will
want later for letting library-based callers pass in an
MPI_Communicator. This permits commrec to be initialized later, once
the threads have been spawned for the thread-MPI ranks.

The initialization of multi-simulations moves from LegacyMdrunOptions
to SimulationContext, which is more appropriate for
ensemble-parallelism established directly by the user.

Before the decision about the duty of a rank, there is no difference
between MASTER and SIMMASTER, so several calls to macros
taking a t_commrec pointer are replaced by booleans. Introduced
findIsSimulationMasterRank to compute that value. This eliminates
early use of t_commrec that has necessitated other hacks and
workarounds.

Removed redundant check for replica exchange when the number of multi
simulations is less than two, because gmx_multisim_t constructor
already prohibits that.

Resolves several TODO items and improves modularity, too.

Refs #2587, #2605, #3081

Change-Id: I48bd3b713bc181b5c1e4cbcd648706a9f00eab96

History

#1 Updated by Mark Abraham about 1 month ago

https://gerrit.gromacs.org/c/gromacs/+/13047 proposes a fix for proper lifetime semantics for t_commrec in all cases, but I need Eric's guidance on correctness for the gmxapi use case

#2 Updated by Mark Abraham about 1 month ago

  • Status changed from New to Fix uploaded
  • Assignee set to Mark Abraham

#3 Updated by Mark Abraham about 1 month ago

  • Status changed from Fix uploaded to Resolved

#5 Updated by Paul Bauer 29 days ago

  • Status changed from Resolved to Closed

#6 Updated by Mark Abraham 19 days ago

  • Status changed from Closed to In Progress

Eric gave some feedback at https://gerrit.gromacs.org/c/gromacs/+/13047#message-818c7e2c98945ed1994b331ef1b2059c0dd4f50c which I will try to address in a follow-up patch, but perhaps too complicated for beta phase

#7 Updated by Mark Abraham 19 days ago

  • Target version changed from 2020-beta1 to 2020-beta2

#8 Updated by Mark Abraham 18 days ago

Mark Abraham wrote:

Eric gave some feedback at https://gerrit.gromacs.org/c/gromacs/+/13047#message-818c7e2c98945ed1994b331ef1b2059c0dd4f50c which I will try to address in a follow-up patch, but perhaps too complicated for beta phase

Now at https://gerrit.gromacs.org/c/gromacs/+/11198 (plus a few clean up patches already integrated)

#9 Updated by Paul Bauer 2 days ago

is this still an issue here

#10 Updated by Mark Abraham 1 day ago

  • Status changed from In Progress to Resolved

Paul Bauer wrote:

is this still an issue here

No, the issue is long solved, and the subsequent cleanup is also in.

#11 Updated by Paul Bauer about 1 hour ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF