Feature #2888
Feature #2816: GPU offload / optimization for update&constraits, buffer ops and multi-gpu communication
CUDA Update and Constraints module
Description
LINCS for non-water constraints.SETTLE for water constraints.Leap frog integrator.Merge of the three into single module.- Remove the scaffolding from LINCS, SETTLE and Leap-Frog:
Coordinates, velocities, forces management.PBC management.Virial reduction(moved to #3114).Update tests.Remove Impl.Template computeVirial and updateVelocitiesRemove the D2D copy for the coordinates (xp->x)
Subtasks
Related issues
Associated revisions
Combine CUDA Leap-Frog, LINCS and SETTLE. II.
Stand-alone CUDA implementations of Leap-Frog, LINCS
and SETTLE required additional scaffolding for integration
and testing. The most prominent part of this is the
management of coordinates, velocities and forces, which
is removed in this commit. Management of periodic boundary
conditions and virial reduction will be removed in
following commits.
Change-Id: I4c65a6c7088fd8059f4e7fa3cb4637cb2af79ebc
Memory management fixes in CUDA version of LINCS
This fix is to prepare LINCS to run with DD.
1. The masses array size depends on the current number of atoms
rather than on the number of constraints.
2. The size of other arrays should be based on the number of
threads launched on the GPU, which include padding added to
align coupled constraints with the thread blocks. Also
renamed variable according to conventions.
Change-Id: I20cb53ebc6da6a1ff2ee1e385613b27c4a01d11f
Remove PImpl scaffolding from CUDA version of LINCS
The CUDA implementation of LINCS was initially introduced as a
stand-alone feature. This required hiding CUDA-specific variables
and subroutines into the private implementation subclass. Since the
LINCS is not a part of Update and Constraints module, this is no
longer required and can be removed.
Change-Id: I9698224d4702dfb8d99106999335c62e83a511df
Remove PImpl scaffolding from CUDA version of SETTLE
GPU version of SETTLE was implemented as a class with private
implementation so it will be possible to initialize on
non-CUDA hosts. Now, the implementation can be hidden
inside the Update and Constraints PImpl so that the CUDA
specific types and calls can be exposed in SETTLE and
private implementation is no longer needed there.
Change-Id: I4c78f2629be34b42bb5f4f7d34970c3e41515691
Refactor Leap-Frog tests and connect them to CPU version
This introduces test data object and runners to the Leap-Frog
tests, which are now connected to the CPU version of Leap-Frog.
This also makes possible to include tests based on the reference
values, which are needed to make sure that the temperature and(or)
pressure control works fine in new implementations.
Change-Id: Id2d934c43138889ad178a94126cab4da2895bb5a
Refactoring of the SETTLE tests
Current version of tests for CUDA version of SETTLE was a quick
addition to the old tests, with direct comparison of the GPU
implementation with the old original CPU-based implementation.
This commit rearranges the test structure, making it possible
to apply the same set of tests to both implementations. There
is no changes to the tests themselves. Currently, comparison tests
will run twice and will dry-run on CUDA builds without CUDA-
capable devices.
TODO: Add comparison with pre-computed values for coordinates,
velocities and virial. Remove the CPU vs GPU comparison
tests.
Change-Id: Ifcb6af9af6c93787b919b785348f9f4547b6c267
Prepare Update and Constraints for Domain Decomposition
Initial GPU-based version of the update and constraints was not
designed to run with the Domain decomposition. This introduces a
couple of fixes to the memory management that should alow the
module to work with the DD enabled. The memory buffers are now
re-allocated at the set(...) stage, if so needed.
Change-Id: I155884f5797252cf048a6400a2dd7b042d355b7e
Make use of reference data in SETTLE tests
As a temporary measure, the CPU and GPU versions of SETTLE
were tested agains each other. Making use of the reference
data framework allows to test them against precomputed values.
Now, the final positions, velocities and virial are properly
tested in CPU and, if available, in GPU versions.
Change-Id: I8e54e1a741263b8bf9774a21141c527f58130fa9
Remove PImpl scaffolding from CUDA version of SETTLE
GPU version of SETTLE was implemented as a class with private
implementation so it will be possible to initialize on
non-CUDA hosts. Now, the implementation can be hidden
inside the Update and Constraints PImpl so that the CUDA
specific types and calls can be exposed in SETTLE and
private implementation is no longer needed there.
Change-Id: I4c78f2629be34b42bb5f4f7d34970c3e41515691
Prepare Update and Constraints for Domain Decomposition
Initial GPU-based version of the update and constraints was not
designed to run with the Domain decomposition. This introduces a
couple of fixes to the memory management that should alow the
module to work with the DD enabled. The memory buffers are now
re-allocated at the set(...) stage, if so needed.
Change-Id: I155884f5797252cf048a6400a2dd7b042d355b7e
Add temperature coupling to CUDA Leap-Frog integrator
Velocity rescaling added to the integrator. None, one,
or multiple rescaling groups are supported. Tests that
include temperature coupling are added.
NoseHoover temperature control is not implemented.
Change-Id: I1e6850eeb74de829554260fba6a6e6c1c63ceb46
Add Parrinello-Rahman pressure coupling to CUDA Leap-Frog integrator
Parrinello-Rahman isotropic pressure coupling scheme is now
added to CUDA version of the Leap-Frog integrator. The
Leap-Frog tests are updated to check the new addition.
Change-Id: Icf42667621c16a994e68baf5158ea4abac387928
Eliminate D2D copy in update constraints
The intermediate coordinates (x' or xp) are only needed inside
the update-constraints module (for the constraints algorithms)
and never used outside. Hence, the xp variable can be used to
save the coordinates before update, while x stores the final
coordinates. This way, there is no need to make a D2D xp->x
copy after applying the constraints, since x will have the
correct data.
Change-Id: I363b633976a236a8e2bf2137c21d3bf0a765cb06
Link GPU force producer and consumer tasks
The GPU event synchronizer that indicates that forces are ready
for a consumption is now passed to the GPU update-constraints.
The update-constraints enqueue a wait on the event in the update
stream before performing numerical integration and constraining.
Note that the event is conditionally returned by the
StatePropagatorDataGpu and indicates that either the reduction of
forces on the GPU or the H2D copy is done, depending on offload
scenario on a current timestep.
Change-Id: Ic12b0c55b75ec5f0c31ce500a2760fb4d5cf3b91
History
#1 Updated by Artem Zhmurov almost 2 years ago
- Related to Feature #2885: CUDA version of LINCS added
#2 Updated by Artem Zhmurov almost 2 years ago
- Related to Feature #2886: CUDA version of SETTLE added
#3 Updated by Artem Zhmurov almost 2 years ago
- Related to Feature #2887: CUDA version of Leap Frog algorithm added
#4 Updated by Artem Zhmurov almost 2 years ago
- Description updated (diff)
I have LINCS with some tests for it, SETTLE with some tests for it and Leap-Frog integrator with some tests for it. Now I combine them into one "Update and Constrain" module. Any ideas for the test that the merge was successful?
#5 Updated by Artem Zhmurov almost 2 years ago
- Subject changed from CUDA GPU-only loop to CUDA Update and Constraints module
- Description updated (diff)
#6 Updated by Gerrit Code Review Bot almost 2 years ago
Gerrit received a related patchset '4' for Issue #2888.
Uploader: Artem Zhmurov (zhmurov@gmail.com)
Change-Id: gromacs~master~I8730aad0ecaa0230686fe89d1157b0da2f01f7bc
Gerrit URL: https://gerrit.gromacs.org/9329
#7 Updated by Artem Zhmurov almost 2 years ago
- Description updated (diff)
#8 Updated by Artem Zhmurov almost 2 years ago
- Description updated (diff)
#9 Updated by Artem Zhmurov over 1 year ago
- Description updated (diff)
#10 Updated by Artem Zhmurov over 1 year ago
- Description updated (diff)
#11 Updated by Szilárd Páll over 1 year ago
- Related to Task #2936: introduce check that CPU-GPU transfers/assignments are made between compatible types added
#12 Updated by Artem Zhmurov over 1 year ago
- Description updated (diff)
#13 Updated by Artem Zhmurov over 1 year ago
- Description updated (diff)
#14 Updated by Artem Zhmurov over 1 year ago
- Description updated (diff)
#15 Updated by Szilárd Páll over 1 year ago
- Related to Task #3171: schedule CPU H2D force contribution in separate stream added
#16 Updated by Szilárd Páll about 1 year ago
- Related to Task #3195: assess nightly master failures added
#17 Updated by Szilárd Páll about 1 year ago
- Related to Task #3220: change rolling pruning scheduling with GPU update added
#18 Updated by Artem Zhmurov about 1 year ago
- Description updated (diff)
- Status changed from New to Resolved
#19 Updated by Artem Zhmurov about 1 year ago
- Status changed from Resolved to In Progress
#20 Updated by Artem Zhmurov about 1 year ago
- Target version changed from 2020 to 2021-infrastructure-stable
Some of the improvements are now targeted to the next version.
#21 Updated by Artem Zhmurov about 1 year ago
- Status changed from In Progress to Closed
The initial CUDA implementation of update-constraints is done. Further improvements are listed in #3114
Combine CUDA Leap-Frog, LINCS and SETTLE. I.
This is the first step in combining constraints and integrator
into "UpdateAndConstraints" module. The initial merge does not
imply any performance optimisation or code clean-up. Hence, this
patch keeps all the temporary infrastructure that was built
around SETTLE, LINCS and Leap-Frog to allow them to function as
a separate units. In the following commits, this infrastructure
will be removed and these three implementations will be more closely
integrated. To enable, set GMX_UPDATE_CONSTRAIN_GPU environment
variable. Note, that environment variables GMX_LINCS_GPU,
GMX_SETTLE_GPU and GMX_INTEGRATE_GPU will no longer work.
Refs #2816, #2888
Change-Id: I8730aad0ecaa0230686fe89d1157b0da2f01f7bc