Task #2454: OpenCL infrastructure improvements
OpenCL code modernization assuming v1.2 reqd
A list of tasks that could be beneficial to consider if we require v1.2 standard.
modernize API use swapping clEnqueueMarker/clEnqueueBarrier/clEnqueueWaitForEvents to clEnqueueMarkerWithWaitList/clEnqueueBarrierWithWaitList;
- consider using CL_MEM_COPY_HOST_WRITE_ONLY, CL_MEM_COPY_HOST_READ_ONLY, CL_MEM_COPY_HOST_NO_ACCESS
use clEnqueueFillBuffer for 0-ing
do not add call to clUnloadCompiler
use OPENCL_C_VERSION for reporting version
Modernize OpenCL memory allocation flags
This change correct the memory allocation flags in the nonbonded module
so that these reflect the R/W use on both host- and device-side; the
former is made possible by requiring OpenCL 1.2.
Also improve some lacking error handling.
#9 Updated by Szilárd Páll over 1 year ago
Aleksei Iupinov wrote:
OK, it's clUnloadCompiler(), not clUnloadPlatformCompiler(), it didn't help me at all with #2449, and it's actually a deprecated in 1.2 :-)
You mean in 2.1. So what exactly is the resource that's using/leaking memory in your case, any idea?
#10 Updated by Aleksei Iupinov over 1 year ago
No, the link is for 2.1, but if you read through it, the function was deprecated in 1.2 already.
No idea so far, but I will eventually have to not compile kernels many times in unit tests (wildly assuming it's the compilation that causes this) - it takes insane time anyway, 2 seconds per each case. We'll see if that 'fixes' it then.
#12 Updated by Szilárd Páll 11 months ago
- Status changed from New to In Progress
- Assignee set to Szilárd Páll
Yes, had some WIP for the second item in the list, was hoping the PME code gets merged, but it's been slow to get there, so I should just upload the modernization for the nonbonded module, and do the rest later.
#16 Updated by Szilárd Páll 10 months ago
This weekend I realized that the allocation modernization won't be possible in the new PME code due to the an omission/oversimplication of the allocateDeviceBuffer() API which doesn't support passing flags.
The allocateDeviceBuffer() call should take some flags, not sure if we want to just add a flag argument with a default value? Also, to be able to call reallocateDeviceBuffer() without having to pass the flags again, we could store the flags in TypedClMemory class.