Project

General

Profile

Feature #2817

Feature #2816: Device-side update&constraits, buffer ops and multi-gpu comms

GPU X/F buffer ops

Added by Szilárd Páll 28 days ago. Updated 28 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
-
Difficulty:
uncategorized
Close

Description

Implement the native to nbat/nbnxn layout transforms/"buffer ops" in the nbnxn_gpu module (nbnxn_atomdata_add_nbat_f_to_f and nbnxn_atomdata_copy_x_to_nbat_x).

Role and scope:
  • Implementing the coordinate X transform on the GPU will allow transferring only the native layout. While this may not make the code faster -- considering the CUDA API overheads and that the "extra" H2D transfer is typically overlapped --, it does remove the CPU form the critical path in nonbonded communication which is beneficial to scaling and direct GPU-GPU communication.
  • Similarly, a the force layout transform kernel: it will allow direct force communication. Multiple flavors and implementation strategies to be considered:
    - only transform (e.g. if no other force compute on the GPU)
    - transform + accumulate (accumulate with other force compute outputs)
    - consider inline transform function for on-the-fly transform within the nonbonded kernel; in particular for high parallelization the performance hit in the nonbonded kernel may be less than the cost of launching an extra kernel.
Related TODOs:
  • need to improve resolve ownership of GPU input/outputs
  • pinning for currently not pinned/pinnable search data

History

#1 Updated by Szilárd Páll 28 days ago

  • Description updated (diff)

Also available in: Atom PDF