Project

General

Profile

Bug #1476

Odd behavior with verlet-buffer-drift

Added by Roland Schulz over 5 years ago. Updated over 3 years ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

After change a586b4168d35 the nbnxn_vsite test failed with nstlist 25 (default for GPU). This was fixed for 4.6 by regressiontest change 92d615e56929. Two things are odd about this. When running the Gromacs version with the change but without the fix in the tests:
  • The output in md.log is "The maximum allowed number of cells is: X 3 Y 3 Z 2". But then "There is no domain decomposition for 2 nodes". So something is wrong about the log output.
  • It seems odd that we automatically change nstlist to a value which then causes an error. It seems it would make sense that we change nstlist at maximum as much, so that the requested number of nodes give a valid domain decomposition.

History

#1 Updated by Roland Schulz over 5 years ago

  • Description updated (diff)

#2 Updated by Szilárd Páll over 5 years ago

Roland Schulz wrote:

  • It seems odd that we automatically change nstlist to a value which then causes an error. It seems it would make sense that we change nstlist at maximum as much, so that the requested number of nodes give a valid domain decomposition.

Indeed, this can happen, IIRC I've seen it too.

it should be possible to the the effect of nstlist on DD and avoid failures (unlike avoiding slowdown which does happen too) - although it may require some complex logic (as the nstlist override happens before the DD initialization).

#3 Updated by Szilárd Páll over 5 years ago

  • Category set to mdrun

#4 Updated by Berk Hess over 5 years ago

This message is not contradictory. It prints the maximum number of cells. In triclinic boxes with relatively large cut-offs, 2 domains in x or y might not works, whereas 1 or 3 will work. This is a bit counterintuitive, but unfortunately this is the case.
It would be nice to prevent the nstlist increase in such cases, but since that is done before the DD is initialized, this is hard. A better practical solution for this issue would be to allow one domain to utilize multiple GPUs. We are working on this.

#5 Updated by Berk Hess over 5 years ago

  • Priority changed from Normal to Low
  • Target version set to future

#6 Updated by Erik Lindahl over 3 years ago

I would agree with roland that the message itself is highly inconsistent, even though the code might not be. If we haven't done any work on this, could we at least update the error message to be more explicit and convey the message that even 2 nodes might not work in some cases?

Also available in: Atom PDF