Task #3031
evaluate the impact of particle order on PME
Description
The DD sorting does have an impact on PME performance, especially on GPUs. In current code this effect can be measured with single rank vs separate PME rank runs.
This impact should be evaluated across a range of input sizes (possibly densities?).
Related issues
History
#1 Updated by Szilárd Páll over 1 year ago
- Related to Feature #2054: PME on GPU added
#2 Updated by Jonathan Vincent over 1 year ago
What is the change/patch that implements DD sorting?
I should look at this as well.
#3 Updated by Szilárd Páll over 1 year ago
Jonathan Vincent wrote:
What is the change/patch that implements DD sorting?
This is not a new feature, any multi-rank run will do that.
#4 Updated by Jonathan Vincent over 1 year ago
Ran this with the water boxes using either 4 tMPI ranks (3 PP and 1 PME) or 2 tMPI ranks (1 PP and 1 PME). Uisng a seperate PME rank seemed the simplest way to get a result without DD.
0.96 1.5 3 6 12 24 48 96 192 384 768 1536 3072 2019.2, 4 ranks, RTX 2080 6.055 7.344 11.436 17.398 33.652 62.915 107.11 204.226 397.89 765.08 1497.4 2860.3 5762.8 2019.2, 2 ranks, RTX 2080 5.991 7.345 11.492 17.531 32.667 61.885 117.638 237.982 476.64 941.64 1849.64 3502.3 6982.1 Master, 4 ranks, RTX 2080 7.2233 8.5901 11.1816 17.369 31.557 62.42 108.455 212.202 412.4 768.34 1505.26 2924.16 5918.74 Master, 2 ranks, RTX 2080 6.2285 7.4324 11.2968 17.487 30.904 62.122 118.11 241.019 477.51 942.31 1853.08 3516.2 7023.8
The differences are quite small for the smaller boxes, but become significant. The master git hash is 0c26c550ed55e12b77954dd0e8c5d956421ae501. Will look at other hardware as well. The difference becomes quite significant for larger sizes.
#5 Updated by Paul Bauer about 1 year ago
- Target version changed from 2020 to 2021