Project

General

Profile

Feature #2967

GPU reallocateDeviceBuffer improvements

Added by Szilárd Páll 3 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
Difficulty:
uncategorized
Close

Description

When the "buffered" GPU allocation was reimplemented/modernized, the buffering got dropped (i.e. the size_alloc = size at first allocation effectively disabling the buffering).

We should extend the current implementation providing an overloaded version (to accommodate for current legacy code) of reallocateDeviceBuffer() that: takes a standard vector / ArrayRef and also re-introduces the buffering.

The open question is how to implement buffering? Options:
- always keep the device array size/allocation size in sync with h_vector.size() and h_vector.capacity(), resp;
- using h_vector.capacity() everywhere (in particular PME grids) could risk running out of device memory so consider a different capacity heuristic.

History

#1 Updated by Berk Hess 2 months ago

Buffering is actually done in reallocateDeviceBuffer(), but it is not coupled to the CPU side allocation. We might or might not want that.

#2 Updated by Szilárd Páll 2 months ago

Berk Hess wrote:

Buffering is actually done in reallocateDeviceBuffer(), but it is not coupled to the CPU side allocation. We might or might not want that.

As we discussed offline, if we want to keep the standard vector allocation behavior and not attempt a always/mostly reserve and avoid post_back approach (for host buffers of data that's copied to the GPU), than for the GPU-side buffers we likely want a custom buffered allocation that does not simply use the h_vector.capacity() as this could easily lead to running out of memory on low-end GPUs with little global memory.

Also available in: Atom PDF