Project

General

Profile

Bug #3163

Feature #2816: GPU offload / optimization for update&constraits, buffer ops and multi-gpu communication

Feature #2888: CUDA Update and Constraints module

gpuupdate / task assignment stabilization

Added by Szilárd Páll 10 months ago. Updated 8 months ago.

Status:
Closed
Priority:
High
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

The GPU update flag and frontend infrastructure left some code and inconsistencies that need to be resolved (in rough order of priority):
  • the update task assignment needs to:
    • consider all cases when it should not be enabled (e.g. assertion on buffer ops, vsites, etc.)
    • consider dependencies like buffer ops (and enable these probably during SimulationWorkload init)
    • consider how to treat partial command line specification, e.g. gmx mdrun -update gpu; should this trigger the regular code-path (with bondeds not offloaded?)
  • despite having an -update mdrun flag we still have references to the GMX_UPDATE_CONSTRAIN_GPU env var

Associated revisions

Revision d74a2fea (diff)
Added by Artem Zhmurov 10 months ago

Add environment variable that changes the meaning of '-update auto'

This change creates 'GMX_FORCE_UPDATE_DEFAULT_GPU', that changes the
default behavior of '-update' option to 'gpu'. Also changed the
gpuupdate Jenkins trigger to set this environment variable.

Refs. #3163.

Change-Id: I4463de3266d97c5f91bac65d3d997cf564e6e880

Revision f8a8252c (diff)
Added by Artem Zhmurov 8 months ago

Allow using GPU update with DD and update groups

The GPU update is now can be enabled for the supported DD cases
with the GMX_FORCE_UPDATE_DEFAULT_GPU environment variable. Also
added the checks on whether the SHAKE algorithm was requested,
since SHAKE is not supported by the GPU update.

Refs. #3226, #3163.

Change-Id: I57e3ad3b8a571ec244989e888afd5cfcbaf9b75e

History

#1 Updated by Szilárd Páll 10 months ago

PS: this is a bocker for beta2 and we need action on it ASAP. We also need to make sure the regressiontests and unit tests all work correctly in all cases with -update gpu.

Who can work on this?

#3 Updated by Szilárd Páll 10 months ago

  • Description updated (diff)
  • Assignee set to Artem Zhmurov

#4 Updated by Szilárd Páll 10 months ago

  • Description updated (diff)

#5 Updated by Szilárd Páll 10 months ago

  • Description updated (diff)

#6 Updated by Szilárd Páll 10 months ago

  • Description updated (diff)

#7 Updated by Szilárd Páll 10 months ago

  • Description updated (diff)

#8 Updated by Szilárd Páll 10 months ago

  • Parent task set to #2888

#9 Updated by Szilárd Páll 10 months ago

  • Target version changed from 2020-beta2 to 2020-beta3

We have worked around the urgent need for this, so we will re-assess the defaults for beta3/rc. Bumping.

#10 Updated by Szilárd Páll 9 months ago

Needs an update.

#11 Updated by Szilárd Páll 9 months ago

  • Description updated (diff)

#12 Updated by Szilárd Páll 9 months ago

  • Description updated (diff)

#13 Updated by Paul Bauer 9 months ago

  • Target version changed from 2020-beta3 to 2020-rc1

bump

#14 Updated by Artem Zhmurov 8 months ago

Although there is still a lot of things to do with regards to task assignment, I think this issue can be closed, because:
1. All unsupported cases are dealt with in decideWhetherToUseGpuForUpdate(...)
2. Currently, the update is by default on the GPU with single rank in supported cases. It is never on GPU for multi-rank cases, unless forced by environment variable.
Objections? Suggestions?

#15 Updated by Paul Bauer 8 months ago

  • Status changed from New to Resolved

thanks @Artem

#16 Updated by Paul Bauer 8 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF