Task #2675: bonded CUDA offload task
Clean up organization of bonded cuda module
At https://gerrit.gromacs.org/#/c/8597/ patch set 4, Szilard noted
As discussed offline, this is not so much coordination, but more ownership / inclusion of the data structure in source files that have nothing to do with most of the data members of this struct.
The only thing that happens outside of the bonded GPU module is the creation of the host-side iList. > This could be done by moving the ownership of that data and passing the generated list to the bonded_gpu_init() -- much like the pair search generated the pair list that gets passed to the GPU module for initialization of the device-side pair list -- or alternatively using a getter in assign_bondeds_to_gpu().
src/gromacs/mdlib/forcerec.cpp PS3, Line 3084:
As discussed offline, the allocation/init of gpuBondedLists could be moved into the init call here
This seems better suited for a bonded_gpu_free() function i nthe bonded_gpu module -- which would allow avoiding to expose the declaration of GpuBondedLists.
I agree that there are some structural aspects that may be worth fixing in 2019 version. I have code for a flavour of unique_ptr that can have a custom deleter that I think will enable resolving some of these. Will consider the rest also.
Assigns responsibility for knowing what work is required for the force
calculation of an MD step to a single object. Moved actual control of
executing any necessary CUDA bonded work to the new schedule
object. Changed low-level routines to assert when invalid calls are
made, because only one place should control whether work is done.
This prepares for making GpuBondedLists an opaque type, when
bonded_gpu_have_interactions will not be able to be an inline
This pimpl-ed class hides the GPU implementation details from
the high-level calling code.
Moved all GPU bonded force-calculation management code into the same
source file, separating them from the kernel definition and launch
file, which may help improve compilation time also.
Bound the kernel launch parameters for device buffers to GpuBonded
directly after neighbour search, for simplicity and efficiency. That
call now comes slightly later in the search-step call sequence.
Separated the launch of the energies transfer and the function
that waits upon, preparing for future reorganization.
Introduced HostStdVector to decrease verbosity of GpuBonded::Impl
Now that there is no reason to have the stream member of
GpuBondedLists as a void *, removed the excess indirection that
Moved symbols into gmx namespace per style. Used the appropriate
inclusion guards on helper .cuh files.
Noted several TODOs for follow up work.