consider implementing mechanisms to ensure pair lists are not used past their max lifetime
Both outer and especially inner pair list can end up being used beyond their intended lifetime for which the original cutoff distance buffer was calculated. This would cause an increased drift that would be particularly hard to notice.
The mechanisms needed to ensure that the outer and inner lists are re-generated before they expire are different, in particular with chunked rolling pruning on GPUs.
The discussion originally came up here: https://gerrit.gromacs.org/#/c/9327/9/src/gromacs/nbnxm/pairlistsets.h@97
where Berk expressed the opinion that it is not possible to track the "age" of a list as it does not have one (only the coordinates do), while Szilard thought that the list, as a proximity relationship structure does have an age and that the coordinates are stored separately is in a way just an implementation detail.
#3 Updated by Szilárd Páll over 1 year ago
When MD steps overlap the step counter goes out of sync with the "age" / lifespan of the list, especially tricky when scheduling is async, , so I don't think relying on external conditionals in the schedule alone will be feasible anyway. One could always pass the step counter, but it is the task before execution that would have to check consistency, not the schedule.
It seems reasonable to try to implement an internal counter in the pair list class, even if this is incremented on external signal, but independently of the step counter, e.g. at the end of when all force tasks are completed (which we can keep track of) so that i) the schedule code could check that
currentStep % maxOuterListLifespan == pairList.outerList as well as before launching a force kernel that
pairList.innerListlifeSpan <= maxInnerListLifespan and do that independently of the scheduling code itself.
pairList.innerListlifeSpan would have to be reset of course with a CAS when concurrency is possible (like on a GPU) and that way we would have a consistency check that would allow spotting e.g. if a GPU task dependency is missing. This could of course be restricted to debug mode to avoid overheads.