Project

General

Profile

Task #3208

improve PP-PME tuning

Added by Szilárd Páll 17 days ago. Updated 14 days ago.

Status:
New
Priority:
High
Assignee:
-
Category:
mdrun
Target version:
Difficulty:
uncategorized
Close

Description

PME tuning increasingly often picks a wrong setup in particular with fast-iterating runs and with new GPU paths in the 2020 release.

TODO: use-cases / link other existing issues

History

#1 Updated by Berk Hess 16 days ago

What should we do? Add a fixed time delay before measuring times?

#2 Updated by Szilárd Páll 15 days ago

Berk Hess wrote:

What should we do? Add a fixed time delay before measuring times?

That would likely be enough (+re-tuning ideally) for the issues I previously observed with the suboptimal tuning results. I have however not looked into whether that is enough for the new code-paths. We might also need to increase the measurement interval with a single setup.

Enabling periodic re-tuning is likely quite important too to avoid long-term performance degradation, but I have not had time to look into it.

#3 Updated by Alan Gray 14 days ago

I looked into this a bit more and the main problem with the new code-paths is that we are no longer synchronizing with the GPU before taking cycle counter readings. In pme_gpu_wait_finish_task() we now have

 if (!pme->gpu->settings.useGpuForceReduction || haveComputedEnergyAndVirial)
    {
        pme_gpu_synchronize(pme->gpu);
    }

so the sync is not called when we have the force reduction on the GPU and the cycle counters are only including the CPU launch time. If I comment out this if statement, then PME-tuning kicks back in again for my Cellulose case (see https://redmine.gromacs.org/issues/2965), and performance increases from 103.3ns/day to 129.4ns/day. There may also be something similar happening on the PP side. So, I guess we need to make the sync(s) happen while PME tuning is active, or replace CPU-side timing with GPU-side timing using events.

Also available in: Atom PDF