Bug #1476
Odd behavior with verlet-buffer-drift
Description
- The output in md.log is "The maximum allowed number of cells is: X 3 Y 3 Z 2". But then "There is no domain decomposition for 2 nodes". So something is wrong about the log output.
- It seems odd that we automatically change nstlist to a value which then causes an error. It seems it would make sense that we change nstlist at maximum as much, so that the requested number of nodes give a valid domain decomposition.
History
#1 Updated by Roland Schulz almost 7 years ago
- Description updated (diff)
#2 Updated by Szilárd Páll almost 7 years ago
Roland Schulz wrote:
- It seems odd that we automatically change nstlist to a value which then causes an error. It seems it would make sense that we change nstlist at maximum as much, so that the requested number of nodes give a valid domain decomposition.
Indeed, this can happen, IIRC I've seen it too.
it should be possible to the the effect of nstlist on DD and avoid failures (unlike avoiding slowdown which does happen too) - although it may require some complex logic (as the nstlist override happens before the DD initialization).
#3 Updated by Szilárd Páll almost 7 years ago
- Category set to mdrun
#4 Updated by Berk Hess over 6 years ago
This message is not contradictory. It prints the maximum number of cells. In triclinic boxes with relatively large cut-offs, 2 domains in x or y might not works, whereas 1 or 3 will work. This is a bit counterintuitive, but unfortunately this is the case.
It would be nice to prevent the nstlist increase in such cases, but since that is done before the DD is initialized, this is hard. A better practical solution for this issue would be to allow one domain to utilize multiple GPUs. We are working on this.
#5 Updated by Berk Hess over 6 years ago
- Priority changed from Normal to Low
- Target version set to future
#6 Updated by Erik Lindahl almost 5 years ago
I would agree with roland that the message itself is highly inconsistent, even though the code might not be. If we haven't done any work on this, could we at least update the error message to be more explicit and convey the message that even 2 nodes might not work in some cases?