Project

General

Profile

Task #2344

Agree on standards for different types of output and log files

Added by Szilárd Páll 4 months ago. Updated 19 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
Difficulty:
uncategorized
Close

Description

If the user explicitly request verbosity it is not reasonable to not print detection output. Without that task assignment output makes very little sense. It is also diagnostic output that has previously been printed and unless I missed it there has been no consensus on removing it, nor convincing amount of user feedback that its it not wanted (i.e. nobody complained).


Related issues

Related to GROMACS - Task #1505: improve handling of loggingNew
Related to GROMACS - Feature #1209: Reduce tools verbosityClosed2013-03-25
Related to GROMACS - Bug #1889: mdrun -cpi file presence dilemmaRejected

History

#1 Updated by Szilárd Páll 4 months ago

  • Description updated (diff)

#2 Updated by Erik Lindahl 4 months ago

Verbose does not mean that a copy of the logfile should be echoed to stderr.

The information is present in the logfile.

#3 Updated by Erik Lindahl 4 months ago

NB: Several of US have complained that there is FAR too much irrelevant diagnostic information printed to stderr.

The usual reason for the verbose flag is that the user wants to see an interactive update how the simulation is proceeding (i.e., the stepcounter), not that all diagnostic information should be on stderr. In particular, we most definitely do not have any rule that we print everything until users complain.

#4 Updated by Szilárd Páll 4 months ago

  • Affected version changed from 2016.1 to 2018-beta1

#5 Updated by Szilárd Páll 4 months ago

  • Description updated (diff)

#6 Updated by Mark Abraham 4 months ago

In the short term, I don't think hardware or parallelism report or summary should ever go to a terminal output file, because that's content that isn't useful without all the other things that are present in a log file.

In the medium term (e.g. to discuss with gmx-developers and then users after release), I think it makes sense to do something like

  • to stderr: nothing except information about an actionable error
  • to stdout: 1-2 lines (first line: gmx mdrun, detailed version number, maybe 20 chars of running on x nodes, y ranks and z gpus; second line: Simulation completed/halted because of -maxh/SIGTERM/etc. after x elapsed time, rate y ns/day); if stdout is a TTY then output the progress meter before replacing it with the 2nd line (any alternatives?)
  • to log file: roughly as now, full hardware and version report, warnings, notes, errors, nstlog output, note upon steps when checkpointing, note about why we are stopping when we stop, performance summary with pointer to the perf log, truncation upon appending as now
  • to perf log file: full hardware and version report, lots of breakdown about what is going on and why, no pressure to limit the number of lines, no cryptic '!' symbols to suggest maybe there's an issue (do we even document anything about that with DLB?), full performance report, no need to truncate upon appending

gmx mdrun -h etc. writes to stdout because the user asked for the help. Usage descriptions that help the user understand why gmx mdrun -defnm doesn't work go to stderr.

gmx mdrun -quiet goes away because that's now the default-and-only option (but we don't give an error if some old script still uses it)

I don't have a vision for what would be useful for users in gmx mdrun -v, but I think it should be something that they couldn't get by reading the normal log file online/later.

I could buy gmx mdrun -v having additional behaviour if GMX_DEVELOPER_BUILD was on, but I think there is value in us dog-fooding our code the same way we expect users to use it (e.g. read the log file(s)).

Useful background thoughts: https://www.jstorimer.com/blogs/workingwithcode/7766119-when-to-use-stderr-instead-of-stdout

#7 Updated by Mark Abraham 4 months ago

  • Related to Task #1505: improve handling of logging added

#8 Updated by Mark Abraham 4 months ago

#9 Updated by Berk Hess 4 months ago

I agree with Szilard that we should not be printed task assignments etc to stderr when we have not printed how many cores we have and what GPUs the GPU id's refer to. We should either print a complete, intelligible picture or print nothing.

On a more general note, I don't think a beta phase is the right time to remove lots of output from stderr and log. We should discuss our strategy here more broadly, and especially with a wider range of user types before makes such user facing decisions.

#10 Updated by Erik Lindahl 4 months ago

  • Tracker changed from Bug to Task
  • Subject changed from mdrun verbose output is missing hardware summary to Agree on standards for different types of output and log files
  • Target version set to 2019
  • Affected version deleted (2018-beta1)

I don't think anybody is arguing that we need to remove more stuff. The original changes Szilard refers to were uploaded in the beginning of november and went in well before beta1.

I agree that we should keep the rest of the discussion for 2019, and have changed topic accordingly.

Maybe a better way to think about it (to avoid flooding stdout) is to decide that each module/functionality (e.g. hardware assignment or domain decomposition) can use one line of output to stdout for notes, and another one for warnings - and then it is up to each module what information to prioritise.

#11 Updated by Mark Abraham 3 months ago

I will shortly propose some patches that stop checkpoint restarts writing to stdout/stderr when things are proceeding normally. That should go to the log file. The mental model of "mdrun is doing what I asked for unless I hear from it" is useful. So "gmx mdrun -cpi" will either give a fatal error if the checkpoint file isn't found, or proceed with having read it and started from it. That means not having fancy guesses about what might be an acceptable fallback, because that's the kind of thing we are tempted to write to a terminal.

#12 Updated by Mark Abraham 2 months ago

As discussed at https://gerrit.gromacs.org/c/7595, it would be useful to be able to query the gmx_device_info_t.stat field (or some error string generated during GPU detection) to get appropriate context for hardware printing and/or useful error messages.

#13 Updated by Mark Abraham about 2 months ago

Mark Abraham wrote:

I will shortly propose some patches that stop checkpoint restarts writing to stdout/stderr when things are proceeding normally. That should go to the log file. The mental model of "mdrun is doing what I asked for unless I hear from it" is useful. So "gmx mdrun -cpi" will either give a fatal error if the checkpoint file isn't found, or proceed with having read it and started from it. That means not having fancy guesses about what might be an acceptable fallback, because that's the kind of thing we are tempted to write to a terminal.

On gmx-users on March 1, 2018 someone ran into a crash that looks like their explicit mdrun -cpi didn't find the checkpoint, and grompp built a tpr with zero velocities, so mdrun with P-R coupling exploded. That's not a good advertisement for the behaviour where mdrun -cpi doesn't find a checkpoint file and decides to go ahead anyway.

#14 Updated by Mark Abraham about 2 months ago

  • Related to Bug #1889: mdrun -cpi file presence dilemma added

#15 Updated by Mark Abraham 19 days ago

  • Status changed from New to Resolved

Standards have been agreed - feedback on the writeup at #1505 is welcome

Also available in: Atom PDF