Task #3077
Feature #2816: GPU offload / optimization for update&constraits, buffer ops and multi-gpu communication
Feature #2891: PME/PP GPU communications
PME/PP GPU Comms unique pointer deletion causes seg fault when CUDA calls exist in destructor
Description
When the unique pointer used for the PME-PP GPU communications objects are automatically deleted, the code sometimes seg-faults. I originally thought this was only the case when CUDA calls exist in the destructor, but have now also seen it happen even with default destructors. I have reverted to regular pointers for now. This should be investigated further, with unique pointers reinstated.
Related issues
Associated revisions
History
#2 Updated by Szilárd Páll 3 months ago
Is this still an issue?
#4 Updated by Mark Abraham 3 months ago
I've not seen any issues with such patches
#5 Updated by Alan Gray about 1 month ago
- Status changed from New to Closed
#6 Updated by Szilárd Páll about 1 month ago
- Status changed from Closed to Feedback wanted
We not have the same issue with gpuHaloExchange, I assume, only because we are not doing cudaStreamCreate?
Also, while looking into this I realized that:
- c2e5f578 added the freeing quite early; I suggest moving it closer to the place where related freeing happens.
in runner.cpp, around where gmx_pme_destroy() is called.
- we do not have a cudaStreamDestroy for pmePpCommStream_; I suggest adding the missing call to the destructor.
As noted on #3021, we need docs on this lifetime management concerns.
Side-note: we could side-step such issues if we had the code for #3115 as that would make the lifetime dependencies more clear.
#7 Updated by Szilárd Páll about 1 month ago
- Related to Feature #3115: Device stream manager added
Explicitly destroy PME-PP GPU communication object
Add code to destroy object when it is no longer required. Even
although object is managed by a unique pointer, this needs to be done
while the GPU context still exists, otherwise a seg fault can occur
when it is automatically destroyed later.
Addresses #3077
Change-Id: I9d6f798d79a73e2ce366c9fb85a0ff9339fc9f88