Project

General

Profile

Feature #3160

Task #3370: Further improvements to GPU Buffer Ops and Comms

implement direct comm for different src/target memory spaces

Added by Szilárd Páll about 1 year ago. Updated 10 months ago.

Status:
Fix uploaded
Priority:
High
Assignee:
Category:
mdrun
Difficulty:
hard
Close

Description

Instead of always copying data to the GPU and doing GPU<->GPU direct comm, implement use cases where data resides on the CPU and is transferred asa direct CPU->GPU or GPU->CPU copy; specifically:
  • extend to support case where PME is on CPU and PP is on GPU.
  • extend to case where the force reduction is the CPU and a PME rank uses GPU.

This would simplify code and likely be more performant too.

Associated revisions

Revision 6ecc09b0 (diff)
Added by Artem Zhmurov about 1 year ago

Add haveCpuLocalForces flag to DomainLifetimeWorkload

The flag haveCpuLocalForces is moved to DomainLifetimeWorkload.
The name and meaning is also slightly changed, haveCpuLocalForceWork
now now indicates whether any forces in the local domain are computed
on the CPU.
Consequently, the currently redundant PME f reduction conditional has
been removed from the haveCpuLocalForceWork initialization. The
relationship between local force work and existence of local forces on
the CPU however needs to be clarified in relationship to #3160 and how
the PP-PME direct communication is improved/simplified.

Change-Id: Idaed4da41bf5b72435c47bab8aeb7130b36b03c7

Revision 1953fd66 (diff)
Added by Alan Gray 12 months ago

Fix conditional on assertion for combining GPU Update and GPU coordinate send

Fixes a problem introduced in 9a3c2ce312c1e04634d636655388a8aaad53c8d1
which unintentionally blocks any use of GPU update with PME-PP
Communication because !stepWork.doNeighborSearch is missing from the
condition on the assertion.

Related to #3159 and #3160 which, when implemented, will remove the
restriction on coordinate send originating from the GPU (on non search steps).

Change-Id: Ie9b1102386b2c1bc927c09e4d8ead2bd2321320d

History

#1 Updated by Szilárd Páll about 1 year ago

  • Priority changed from Normal to High

#2 Updated by Alan Gray about 1 year ago

  • Description updated (diff)

extend to support case where PME is on CPU and PP is on GPU.

Have code offline that almost works, just failing one local freeenergy test which I'm looking into.

extend to case where the force reduction is the CPU and a PME rank uses GPU.

This already works in master branch and is used for search and virial steps.

#3 Updated by Alan Gray about 1 year ago

  • Status changed from New to In Progress
  • Target version changed from 2020 to 2020-beta3

#4 Updated by Alan Gray about 1 year ago

extend to support case where PME is on CPU and PP is on GPU.

Have code offline that almost works, just failing one local freeenergy test which I'm looking into.

Now passing tests locally with -nt 2 -npme 1, but have hit more issues with -nt 4 -npme 1. It's really tricky to get this working in all cases. E.g. for nbnxn-ljpme-LB it hangs, because each the PP task we have

        if (flags & PP_PME_COORD)
        {
            if (reinitGpuPmePpComms)
            {
                fr->pmePpCommGpu->reinit(n);
            }
        }

but for this test case 1 out of the 3 PP tasks (flags & PP_PME_COORD)==false while the other 2 have it as true (so the false case causes a hang when the PME task tries to communicate with it). Why would that be the case?

#5 Updated by Szilárd Páll about 1 year ago

Do you mean that one of the ranks never calls gmx_pme_send_coordinates()? I don't see in the code where could such a conditionality arise.

#7 Updated by Alan Gray about 1 year ago

Sorry, what I said above wan't quite right. One PP task wasn't taking the path not because of the pasted condition but instead due to another if statement, because the number of atoms to send (n) was equal to zero for that task. I refactored to fix the issue. Proposed change supporting PME on CPU is now at
https://gerrit.gromacs.org/c/gromacs/+/14223

#8 Updated by Paul Bauer 12 months ago

  • Target version changed from 2020-beta3 to 2021-infrastructure-stable
  • Difficulty hard added
  • Difficulty deleted (uncategorized)

I think this should go to 2021 instead

#9 Updated by Alan Gray 12 months ago

  • Status changed from In Progress to Fix uploaded

#10 Updated by Alan Gray 10 months ago

  • Status changed from Fix uploaded to Closed

#11 Updated by Alan Gray 10 months ago

  • Status changed from Closed to Fix uploaded
  • Parent task changed from #2891 to #3370

Re-opening and moving to subtask of #3370, so we don't lose the discussion.

Also available in: Atom PDF