Project

General

Profile

Bug #2869

GPU detection error only issued as a note to the log

Added by Szilárd Páll 8 months ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
-
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Given that GPU acceleration provides up to >5x performance improvement, it seems strange to silently fall back to CPU run without even telling the user that the behavior they likely expect (what the automatic default would have been in case if the detection did not fail) was overridden by an error; e.g. CUDA 9.2 build with CUDA 9.1 compatible driver:

gmx mdrun -quiet -v

Back Off! I just backed up md.log to ./#md.log.15#
Compiled SIMD: AVX_256, but for this host/run AVX_128_FMA might be better (see
log).
Reading file topol.tpr, VERSION 4.6.2-dev-20130306-873b985-dirty (single precision)
Note: file tpx version 83, software tpx version 116
Changing nstlist from 10 to 40, rlist from 0.9 to 1.005

Using 1 MPI thread
Using 8 OpenMP threads 

Back Off! I just backed up ener.edr to ./#ener.edr.15#
starting mdrun 'Water'
-1 steps, infinite ps.
step 0^C

Received the INT signal, stopping within 40 steps

No indication whatsoever that something is wrong, most users won't even notice considering that the detection output also got removed.

History

#1 Updated by Erik Lindahl 8 months ago

Hi,

What is the output without the "-quiet" flag?

I would definitely not call it a bug that we don't write non-critical information when the user is explicitly telling us to be quiet - the bug then is rather the 10 non-critical lines we still write.

#2 Updated by Szilárd Páll 8 months ago

Erik Lindahl wrote:

Hi,

What is the output without the "-quiet" flag?

The same. "quiet" doesn't do anything but removes the header.

I would definitely not call it a bug that we don't write non-critical information when the user is explicitly telling us to be quiet - the bug then is rather the 10 non-critical lines we still write.

"quiet" doesn't do much and from the pov of this matter it is irrelevant. That the verbose output should be quiet and not contain a report on a runtime condition that effectively leads to >=5x performance loss sounds pretty strange.

However, I just wanted to note the issues (based on confused user reports, see gmx-users), I'll let others figure out what is a useful/relvant UI/UX element. I prefer to not start a on whether -v only should do rolling counters as implemented originally and all other verbose output is the bug.

Also available in: Atom PDF