Project

General

Profile

Feature #1083

Improve collective error/warning/note handling in mdrun

Added by Erik Lindahl over 6 years ago. Updated almost 3 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Difficulty:
uncategorized
Close

Description

"reduce" messages common to nodes to as few as possible; this applies both to notes (e.g. the note on using OMP_NUM_THREADS to set the number of OpenMP threads) and error messages that occur on multiple nodes (e.g. no GPU found on M nodes out of N in usd). This would enable easy and elegant way to do hardware consistency check across all nodes (e.g. inhomogeneous hardware setup).

Associated revisions

Revision 69e15959 (diff)
Added by Mark Abraham over 6 years ago

Quiet stderr output, particularly for multi-simulations

  • removed printing of DD info to stderr
  • printed multi-simulation information only once in places where
    repetition is clearly redundant (Some repetition remains,
    but only from each simulation master.)
  • add option to not print result of multi-simulation check
    if it passed, so that we don't have to print things to
    stderr/stdout just because the .log file is not yet open
  • printing of diagnostics about the number of MPI
    processes present when mdrun starts only goes (once) to
    each debug file, and not to stderr
  • reduced printing of diagnostics about the number of OpenMP
    threads; now goes to stderr only on SIMMASTER, or
    once to each debug file
  • clarified errors and informational messages about selecting
    the number of OpenMP threads

Fixes #1078, refs #1083

Change-Id: If782259bcd62ddd9be325393930080b70c5cfb4e

Revision fa136061 (diff)
Added by Teemu Murtola about 3 years ago

Make thread affinity failures always end up in log

Remove calls to md_print_warn(NULL, fplog, ...), which were used for
cases where only some of the ranks could fail. But if non-master rank
failed, the error went only to stderr, not into the log file. Make this
work more uniformly such that the error always ends up in the log file.
The approach could possibly be generalized (it is now local to
threadaffinity.cpp, and only works for warnings where the text is the
same on each rank), but that is probably easier after the logging is
using C++.

Add some trailing newlines for consistent output from md_print_warn().

This also makes all md_print_warn/info calls use the same pattern, which
makes things easier to understand, and allows replacing them with a
simple object.

Related to #1083.

Change-Id: I03a3524ed883bed0c5b039836e9d1741c672d97d

History

#1 Updated by Mark Abraham over 6 years ago

Agree. We want this for 4.6.

#2 Updated by Mark Abraham over 6 years ago

  • Status changed from New to In Progress

Some incidental progress in https://gerrit.gromacs.org/2028

#3 Updated by Erik Lindahl over 6 years ago

  • Target version changed from 4.6 to 4.6.1

#4 Updated by Mark Abraham about 6 years ago

  • Target version deleted (4.6.1)

#5 Updated by Rossen Apostolov over 5 years ago

can we close this one?

#6 Updated by Mark Abraham over 5 years ago

  • Target version set to 5.x

Rossen Apostolov wrote:

can we close this one?

Not yet

#7 Updated by Gerrit Code Review Bot about 3 years ago

Gerrit received a related patchset '2' for Issue #1083.
Uploader: Teemu Murtola ()
Change-Id: I03a3524ed883bed0c5b039836e9d1741c672d97d
Gerrit URL: https://gerrit.gromacs.org/5457

#8 Updated by Mark Abraham almost 3 years ago

  • Target version deleted (5.x)

Also available in: Atom PDF