GPU reallocateDeviceBuffer improvements
When the "buffered" GPU allocation was reimplemented/modernized, the buffering got dropped (i.e. the
size_alloc = size at first allocation effectively disabling the buffering).
We should extend the current implementation providing an overloaded version (to accommodate for current legacy code) of reallocateDeviceBuffer() that: takes a standard vector / ArrayRef and also re-introduces the buffering.
The open question is how to implement buffering? Options:
- always keep the device array size/allocation size in sync with
h_vector.capacity() everywhere (in particular PME grids) could risk running out of device memory so consider a different capacity heuristic.
#2 Updated by Szilárd Páll 6 months ago
Berk Hess wrote:
Buffering is actually done in reallocateDeviceBuffer(), but it is not coupled to the CPU side allocation. We might or might not want that.
As we discussed offline, if we want to keep the standard vector allocation behavior and not attempt a always/mostly reserve and avoid post_back approach (for host buffers of data that's copied to the GPU), than for the GPU-side buffers we likely want a custom buffered allocation that does not simply use the
h_vector.capacity() as this could easily lead to running out of memory on low-end GPUs with little global memory.