Project

General

Profile

Bug #2436

Attempt to read file pointer fplog = nullptr with AWH

Added by Viveca Lindahl over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Mdp-option 'awh1-equilibrate-histogram=yes' and multiple ranks per simulation leads to segmentation fault. Running a debugger on a core file shows an attempt to write to a null fplog if equilibrateHistogram=True.

ddt.png (280 KB) ddt.png ddt debugger window Viveca Lindahl, 03/05/2018 07:54 PM
slurm-2676613.out (2.48 KB) slurm-2676613.out slurm output Viveca Lindahl, 03/05/2018 08:01 PM

Associated revisions

Revision 1356e68d (diff)
Added by Viveca Lindahl over 1 year ago

Add checks for non-null log file pointer in AWH

Mdp-option 'awh1-equilibrate-histogram=yes' and certain run setups could
lead to a segmentation fault due to fplog being null.

Fixes #2436

Change-Id: I0110325a8624c9d434af5eeeccace29814d40f2d

History

#1 Updated by Viveca Lindahl over 1 year ago

In this case the run command was:

aprun -cc none -n 32 -N 32 $gmx mdrun -pin on -quiet -v -stepout 1000 -nstlist 40 -dlb no -npme 2 -ntomp 2 -cpi -maxh 0.25 -multidir walker-0

#3 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '2' for Issue #2436.
Uploader: Viveca Lindahl ()
Change-Id: gromacs~release-2018~I0110325a8624c9d434af5eeeccace29814d40f2d
Gerrit URL: https://gerrit.gromacs.org/7652

#4 Updated by Anonymous over 1 year ago

  • Status changed from New to Resolved

#5 Updated by Mark Abraham over 1 year ago

It looks like we probably tested AWH with multiple ranks per simulation, but not with -multidir? AFAIK -multidir walker-0 shouldn't run because there's only one replica.

#6 Updated by Viveca Lindahl over 1 year ago

Mark Abraham wrote:

It looks like we probably tested AWH with multiple ranks per simulation, but not with -multidir? AFAIK -multidir walker-0 shouldn't run because there's only one replica.

I'm guessing it's the 'awh1-equilibrate-histogram=yes' that isn't tested. The bug is not related to the single-walker multidir afaiu. It's a somewhat "degenerate" case, but should surely not be disallowed (fail), and is convenient for automatically generating a run setup.

#7 Updated by Mark Abraham over 1 year ago

  • Status changed from Resolved to Closed

OK, I'm not sure how we should test that more effectively?

Also available in: Atom PDF