Project

General

Profile

Bug #1121

slow MD with sd integrator and GPU / verlet

Added by Floris Buelens almost 5 years ago. Updated almost 5 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
4.6
Affected version:
Difficulty:
uncategorized
Close

Description

I'm seeing MD simulation running a lot slower with the sd integrator than with md - ca. 10 vs. 30 ns/day for my 47000 atom system. I found no documented indication that this should be the case.
Log files are attached. Wall time seems to be accumulating in Update and Rest, adding up to >60% of total.
Using the group cutoff scheme, there seems to be a non-negligible slowdown with sd, also associated with extra wall time in Update and Rest, but this is modest compared to the impact with the verlet scheme.

System: Xeon E5-1620, 1x GTX 680

Timings:

cpu (ns/day)
sd / verlet: 6
sd / group: 10
md / verlet: 9.2
md / group: 11.4

gpu (ns/day)
sd / verlet: 11
md / verlet: 29.8

sd-vv.mdp (4.56 KB) sd-vv.mdp Floris Buelens, 01/16/2013 07:59 AM
md-vv.mdp (4.56 KB) md-vv.mdp Floris Buelens, 01/16/2013 07:59 AM
md-vv.tpr (1.68 MB) md-vv.tpr Floris Buelens, 01/16/2013 07:59 AM
sd-vv.log (22.8 KB) sd-vv.log Floris Buelens, 01/16/2013 07:59 AM
md-vv.log (22.4 KB) md-vv.log Floris Buelens, 01/16/2013 07:59 AM
sd-group.log (24.5 KB) sd-group.log Floris Buelens, 01/16/2013 07:59 AM
md-group.log (24.6 KB) md-group.log Floris Buelens, 01/16/2013 07:59 AM

Associated revisions

Revision d120c370 (diff)
Added by Berk Hess almost 5 years ago

fixed SD+BD integration slowing down with OpenMP threads

The SD and BD integrators would integrate on all OpenMP threads,
making the integration much slower instead of faster.
It is not clear if the results could be affected by this bug.
Fixes #1121

Change-Id: Iea4283e0470b72f6f927cb49503ac91d65025647

Revision 98b6873d (diff)
Added by Berk Hess over 4 years ago

fixed SD and BD integrator OpenMP performance

SD and BD integrator always integrated single threaded.
Really fixes #1121

Change-Id: I2217c40e9c188c7cd57801e413750035c6488f56

History

#1 Updated by Berk Hess almost 5 years ago

  • Status changed from New to Closed

There is no bug here.
(Un)fortunately we made the non-bondeds on the GPU so fast, that with SD we spend more time in the integration and constraints. The absolute time differences between md and sd should be similar with group and verlet, although settle in verlet is slightly slower because no charge groups are used,
You can switch to the sd1 integrator, if that's accurate enough for you. That's about as fast as md.

#2 Updated by Berk Hess almost 5 years ago

PS note that mdp_opt.html says at SD:
An accurate leap-frog stochastic dynamics integrator.
Four Gaussian random number are required
per integration step per degree of freedom. With constraints,
coordinates needs to be constrained twice per integration step.
Depending on the computational cost of the force calculation,
this can take a significant part of the simulation time.
...

And gives the sd1 suggestion as well.

#3 Updated by Floris Buelens almost 5 years ago

thanks for the reply - I'm not convinced that explains the numbers though. Ignoring the GPU for now, here are some more timings using '-nb cpu' only:

ns/day | Update walltime (s) | Rest walltime (s)

Group cutoff scheme:

md: 11.6 | 2.03 | 0.34
sd: 10.9 | 3.95 | 3.42
sd1: 11.7 | 2.25 | 0.33

Verlet cutoff scheme:

md: 9.4 | 2.30 | 1.31
sd: 6.2 | 27.53 | 22.12
sd1: 7.9 | 19.12 | 1.33

... so with the group scheme, Update takes ca. 2x longer when you switch from md to sd
while with the verlet scheme, Update takes ca. 12x longer when you switch from md to sd

Again without GPU, switching from md to sd costs me 6% with the group scheme, and 34% with verlet.

sd1 is indeed about as fast as md with the group scheme, with verlet the hit is still significant (16%). On GPU, my sd1 timing is 18.3 ns/day, against 11 ns/day for sd and 29.8 ns/day for md, so still 38% slower than md.

#4 Updated by Berk Hess almost 5 years ago

  • Status changed from Closed to In Progress

You're right, I overlooked the rest time.
I see now that the SD (and BD) update is not OpenMP threaded.
A (my) comment in the code says we need to find a way to generate random seeds for different threads.
I'll think of one.

#5 Updated by Mark Abraham almost 5 years ago

OK. Erik will be around in the morning, so I expect we can brainstorm something for 4.6. In a private branch, I once fixed seed generation for some parallel context, but I suspect my solution will be either not applicable or over-weight.

#6 Updated by Berk Hess almost 5 years ago

  • Status changed from In Progress to Feedback wanted

You can checkout of pull the fix from the download link on:
https://gerrit.gromacs.org/#/c/2073/

I'm not 100% sure that this bug can't affect results, so use the patched version for production.

#7 Updated by Berk Hess almost 5 years ago

  • Priority changed from Normal to 6

#8 Updated by Mark Abraham almost 5 years ago

  • File deleted (sd-vv.tpr)

#9 Updated by Mark Abraham almost 5 years ago

Arrgh, clicked the wrong link while trying to fetch the tprs. Can somebody re-upload the sd-vv.tpr please?

#10 Updated by Berk Hess almost 5 years ago

I didn't download it. Taking any system with Verlet scheme and changing the integrator to sd will show it.

#11 Updated by Mark Abraham almost 5 years ago

  • Target version set to 4.6

#12 Updated by Berk Hess almost 5 years ago

  • Status changed from Feedback wanted to Closed

Also available in: Atom PDF