Project

General

Profile

Bug #2315

Separate PME ranks are not assigned since e87a53

Added by Erik Lindahl over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Since the new task assignment code was commited, Gromacs no longer assigns separate PME ranks when running on 80 hardware threads (dev-purley02).

Instead of using 24 PME ranks, the domain decomposition spans all 80 ranks, and no message about assigning PME ranks is printed.


Related issues

Related to GROMACS - Bug #2321: mdrun exits with buffer registering error on non-GPU hostClosed

Associated revisions

Revision a0d89bf8 (diff)
Added by Mark Abraham over 1 year ago

Fix mdrun -nb auto -pme auto when GPUs are absent

The logic was flawed such that GPUs were "selected" for use even
though none had been detected. That led to the GPU behaviour of
avoiding using separate PME ranks.

Also made a minor fix to the logic for emulation. The new
interpretation of mdrun -gpu_id does not need to trigger an error when
GPU IDs have been supplied along with the emulation environmnet
variable.

Fixes #2315

Change-Id: I68da27c9bfef9f73b9dae4f04f196066d2efb1e2

History

#1 Updated by Berk Hess over 1 year ago

Is that without GPUs or with? With GPUs you never get PME ranks by default.

#2 Updated by Erik Lindahl over 1 year ago

Without - the host in question is dev-purley02, with 2*20 cores (80 threads).

#3 Updated by Berk Hess over 1 year ago

  • Status changed from New to Feedback wanted

On my machine I do get PME ranks automatically when running with -ntmpi 80.
Could you post a log file?

#4 Updated by Mark Abraham over 1 year ago

I thought I'd reproduced this on one of the purley nodes, but maybe that is the jetlag talking. Will investigate further

#5 Updated by Berk Hess over 1 year ago

I just find out what caused the difference. I had to run with -nb cpu on my machine to avoid GPUs and allow automatic PME rank assignment. Also on purley you get PME ranks when using -nb cpu, but not without -nb cpu.

#6 Updated by Berk Hess over 1 year ago

  • Status changed from Feedback wanted to Accepted

#7 Updated by Mark Abraham over 1 year ago

  • Category set to mdrun
  • Assignee set to Mark Abraham
  • Target version set to 2018-beta2

#8 Updated by Mark Abraham over 1 year ago

  • Status changed from Accepted to In Progress

Auto mode needs to respond to whether GPUs have been detected, and currently does not. I thought that was too hard to do (because compatibleGpus is local to the node, and the decision needs to be consistent across all ranks on all nodes), but hwinfo->ngpu_compatible_tot is useful for the purpose.

#9 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '1' for Issue #2315.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2018~I68da27c9bfef9f73b9dae4f04f196066d2efb1e2
Gerrit URL: https://gerrit.gromacs.org/7288

#10 Updated by Mark Abraham over 1 year ago

  • Status changed from In Progress to Fix uploaded

#11 Updated by Mark Abraham over 1 year ago

  • Status changed from Fix uploaded to Resolved

#12 Updated by Mark Abraham over 1 year ago

  • Status changed from Resolved to Closed

#13 Updated by Mark Abraham over 1 year ago

  • Related to Bug #2321: mdrun exits with buffer registering error on non-GPU host added

Also available in: Atom PDF