When the unique pointer used for the PME-PP GPU communications objects are automatically deleted, the code sometimes seg-faults. I originally thought this was only the case when CUDA calls exist in the destructor, but have now also seen it happen even with default destructors. I have reverted to regular pointers for now. This should be investigated further, with unique pointers reinstated.
Explicitly destroy PME-PP GPU communication object
Add code to destroy object when it is no longer required. Even
although object is managed by a unique pointer, this needs to be done
while the GPU context still exists, otherwise a seg fault can occur
when it is automatically destroyed later.
#6 Updated by Szilárd Páll 8 months ago
- Status changed from Closed to Feedback wanted
We not have the same issue with gpuHaloExchange, I assume, only because we are not doing cudaStreamCreate?
Also, while looking into this I realized that:
- c2e5f578 added the freeing quite early; I suggest moving it closer to the place where related freeing happens.
in runner.cpp, around where gmx_pme_destroy() is called.
- we do not have a cudaStreamDestroy for pmePpCommStream_; I suggest adding the missing call to the destructor.