Project

General

Profile

Task #2134

assess whether Jenkins is testing multi-rank runs appropriately

Added by Mark Abraham over 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
testing
Target version:
Difficulty:
uncategorized
Close

Description

Recent bugs (e.g. #2131 and #2105) suggest that testing of multi-rank mdrun, or -npme > 0 is not occurring appropriately.


Related issues

Related to GROMACS - Bug #2131: mdrun hangs upon "-nsteps " or "-maxh" trigger with more than 20 MPI processesClosed
Related to GROMACS - Bug #2105: multi-domain rerun brokenClosed
Related to GROMACS - Bug #2164: SIMD sqrt in double-precision build does not work correctlyClosed

Associated revisions

Revision 498a4235 (diff)
Added by Mark Abraham over 2 years ago

Limit small essentialdynamics tests to one rank

These systems are tiny, so running a single rank is all that makes sense.
The flooding systems are reasonable for normal DD.

This should help re-implement MPMD releng support.

Refs #2134

Change-Id: Id52f271d91e98f732384640a01b0db728d9a2da1

Revision 877e93df (diff)
Added by Mark Abraham over 2 years ago

Propose a post-submit matrix

A recent bug would have been prevented if some testing of
separate PME ranks had been occuring.

Extended gromacs.py to permit regressiontests to run
with separate PME ranks, and/or designated numbers of
ranks, or GPU ids.

Quieted some newly exposed warnings.

Documented more of the reasoning behind matrix choices, to help us
maintain better. Noted various TODOs for better testing coverage.

Refs #2134

Change-Id: Ib9828ca769d7a446c61fb8bb7a68128a38991aba

Revision a2051ce4 (diff)
Added by Mark Abraham about 2 years ago

Resolve TODO for specifying npme

This meant that the pre- and post-submit matrix configs with npme=1
were not actually testing the intended MPMD path.

Refs #2134

Change-Id: I5adf0691b26d89e248d136b8d93068080014607e

Revision 27246474 (diff)
Added by Mark Abraham about 2 years ago

Fix gmxtest.pl handling of GPU IDs

For example, mdrun -ntmpi 2 -gpu_id 2 would not start 2 PP ranks on
GPU ID 2, which was what we'd intended it to do. Instead, the automatic
repeater would make it run with one rank.

mdrun -ntmpi 2 -gpu_id 01 was OK though.

This change would permit a Jenkins matrix to specify any of

gcc-4.8 gpu gpu_id=1
gcc-4.8 gpu nranks=2 gpu_id=012
gcc-4.8 gpu npme=1 nranks=3 gpu_id=0

and ensures that we get the number and flavour of ranks specified.

This should mean it is no longer desirable to rerun a test case
when the number of ranks and the GPU ID choice do not work.

Refs #2134

Change-Id: Idba4a7de2069a67adb4cdf8b3e025a35d13c1f14

History

#1 Updated by Mark Abraham over 2 years ago

  • Related to Bug #2131: mdrun hangs upon "-nsteps " or "-maxh" trigger with more than 20 MPI processes added

#2 Updated by Mark Abraham over 2 years ago

  • Related to Bug #2105: multi-domain rerun broken added

#3 Updated by Gerrit Code Review Bot over 2 years ago

Gerrit received a related patchset '1' for Issue #2134.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2016~Ib9828ca769d7a446c61fb8bb7a68128a38991aba
Gerrit URL: https://gerrit.gromacs.org/6508

#4 Updated by Mark Abraham over 2 years ago

  • Status changed from New to In Progress
  • Target version changed from 2016.3 to 2016.4

2016.3 is behaving reasonably with separate PME ranks, but the essentialdynamics tests need some love before we can do a good job of this

#5 Updated by Gerrit Code Review Bot over 2 years ago

Gerrit received a related patchset '2' for Issue #2134.
Uploader: Aleksei Iupinov ()
Change-Id: regressiontests~release-2016~Id52f271d91e98f732384640a01b0db728d9a2da1
Gerrit URL: https://gerrit.gromacs.org/6528

#6 Updated by Mark Abraham over 2 years ago

  • Related to Bug #2164: SIMD sqrt in double-precision build does not work correctly added

#7 Updated by Szilárd Páll about 2 years ago

We now have the following related post-submit tests/TODOs:

# Test MPMD PME with thread-MPI
# TODO Add double to this configuration if/when Carsten stablizes essentialdynamics tests
gcc-5 npme=1 nranks=2 no-openmp fftpack simd=avx_128_fma release

# Test MPMD PME with library MPI
clang-3.4 npme=1 nranks=2 mpi

# Test non-default use of mdrun -gpu_id
# Test SSE2 SIMD
gcc-4.8 gpu npme=1 nranks=2 gpu_id=2 cuda-7.5 simd=sse2 release

# TODO
[...]
# Test 3D DD (2D is partially covered in regressiontests)

Two TODOs above seem to still be related. What's missing for these? I know the latter may be tricky given the current regressiontests. What about the former?

#8 Updated by Mark Abraham about 2 years ago

Szilárd Páll wrote:

We now have the following related post-submit tests/TODOs:

[...]

Two TODOs above seem to still be related. What's missing for these? I know the latter may be tricky given the current regressiontests.

Yeah.

What about the former?

We now suspect that this is an issue with a double-precision SIMD operation, but I'm not sure offhand whether it ought to be fixed by stuff under review or recently merged.

#9 Updated by Szilárd Páll about 2 years ago

Mark Abraham wrote:

Szilárd Páll wrote:

We now have the following related post-submit tests/TODOs:

[...]

Two TODOs above seem to still be related. What's missing for these? I know the latter may be tricky given the current regressiontests.

Yeah.

So are there no tests at all that can run 8 ranks for a 3D decomp? Even just running a few and skipping most would be better than nothing, I guess.

What about the former?

We now suspect that this is an issue with a double-precision SIMD operation, but I'm not sure offhand whether it ought to be fixed by stuff under review or recently merged.

OK. Let's wait for the SIMD changes to get merged in and test again, I suggest.

#10 Updated by Mark Abraham about 2 years ago

Szilárd Páll wrote:

Mark Abraham wrote:

Szilárd Páll wrote:

We now have the following related post-submit tests/TODOs:

[...]

Two TODOs above seem to still be related. What's missing for these? I know the latter may be tricky given the current regressiontests.

Yeah.

So are there no tests at all that can run 8 ranks for a 3D decomp?

I don't know - we didn't design or document for such things. I assume that some of them can - in particular complex/swap*

Even just running a few and skipping most would be better than nothing, I guess.

Part of the issue is that all the test harnesses are set up to produce a pass result if the simulation can be run given the inputs to the harness. To test that 3D decomposition works, you need both a system that is large enough wrt its cutoffs (which could be chosen artificially small), and a harness that knows how to check that only when the inputs are suitable (e.g. number of ranks >= 8, and suitably composite) that in fact the pass result was obtained from a run that did 3D decomposition (e.g. by grepping the log file).

What about the former?

We now suspect that this is an issue with a double-precision SIMD operation, but I'm not sure offhand whether it ought to be fixed by stuff under review or recently merged.

OK. Let's wait for the SIMD changes to get merged in and test again, I suggest.

#11 Updated by Szilárd Páll about 2 years ago

Mark Abraham wrote:

Szilárd Páll wrote:

Mark Abraham wrote:

Szilárd Páll wrote:

We now have the following related post-submit tests/TODOs:

[...]

Two TODOs above seem to still be related. What's missing for these? I know the latter may be tricky given the current regressiontests.

Yeah.

So are there no tests at all that can run 8 ranks for a 3D decomp?

I don't know - we didn't design or document for such things. I assume that some of them can - in particular complex/swap*

Even just running a few and skipping most would be better than nothing, I guess.

Part of the issue is that all the test harnesses are set up to produce a pass result if the simulation can be run given the inputs to the harness. To test that 3D decomposition works, you need both a system that is large enough wrt its cutoffs (which could be chosen artificially small), and a harness that knows how to check that only when the inputs are suitable (e.g. number of ranks >= 8, and suitably composite) that in fact the pass result was obtained from a run that did 3D decomposition (e.g. by grepping the log file).

I get that but we have a max ranks file in some of the directories so it would really be trivial to add a file to every directory or add a small hack in the script that for >N rank counts only those tests will be run which have the mx ranks specified and larger than the requested.

#12 Updated by Mark Abraham about 2 years ago

Szilárd Páll wrote:

Mark Abraham wrote:

Szilárd Páll wrote:

Mark Abraham wrote:

Szilárd Páll wrote:

We now have the following related post-submit tests/TODOs:

[...]

Two TODOs above seem to still be related. What's missing for these? I know the latter may be tricky given the current regressiontests.

Yeah.

So are there no tests at all that can run 8 ranks for a 3D decomp?

I don't know - we didn't design or document for such things. I assume that some of them can - in particular complex/swap*

Even just running a few and skipping most would be better than nothing, I guess.

Part of the issue is that all the test harnesses are set up to produce a pass result if the simulation can be run given the inputs to the harness. To test that 3D decomposition works, you need both a system that is large enough wrt its cutoffs (which could be chosen artificially small), and a harness that knows how to check that only when the inputs are suitable (e.g. number of ranks >= 8, and suitably composite) that in fact the pass result was obtained from a run that did 3D decomposition (e.g. by grepping the log file).

I get that but we have a max ranks file in some of the directories so it would really be trivial to add a file to every directory or add a small hack in the script that for >N rank counts only those tests will be run which have the mx ranks specified and larger than the requested.

Go ahead, but there's sub-cases that won't work well with hack implementations, e.g. when a user runs it on a 10-core machine, because there's no 3D DD unless either the user or the harness will choose to use 8 ranks.

#13 Updated by Szilárd Páll about 2 years ago

Mark Abraham wrote:

Szilárd Páll wrote:

Mark Abraham wrote:

Szilárd Páll wrote:

Mark Abraham wrote:

Szilárd Páll wrote:

We now have the following related post-submit tests/TODOs:

[...]

Two TODOs above seem to still be related. What's missing for these? I know the latter may be tricky given the current regressiontests.

Yeah.

So are there no tests at all that can run 8 ranks for a 3D decomp?

I don't know - we didn't design or document for such things. I assume that some of them can - in particular complex/swap*

Even just running a few and skipping most would be better than nothing, I guess.

Part of the issue is that all the test harnesses are set up to produce a pass result if the simulation can be run given the inputs to the harness. To test that 3D decomposition works, you need both a system that is large enough wrt its cutoffs (which could be chosen artificially small), and a harness that knows how to check that only when the inputs are suitable (e.g. number of ranks >= 8, and suitably composite) that in fact the pass result was obtained from a run that did 3D decomposition (e.g. by grepping the log file).

I get that but we have a max ranks file in some of the directories so it would really be trivial to add a file to every directory or add a small hack in the script that for >N rank counts only those tests will be run which have the mx ranks specified and larger than the requested.

Go ahead, but there's sub-cases that won't work well with hack implementations, e.g. when a user runs it on a 10-core machine, because there's no 3D DD unless either the user or the harness will choose to use 8 ranks.

I don't see what would not work when providing a max rank count to each test case (not only some) and simply skipping everything that does not fit the requirement? This can even be a CI-specific use-case if needed (though I don't think it is). Users won't force 3D DD runs like we'd do in jenkins, but will run the regressiontest the same way they do it today, so whatever works with no max ranks specified should work with max too -- if anything, the script can issue a more intelligent note about why a test got skipped (instead of having to observe DD errors).

#14 Updated by Mark Abraham about 2 years ago

Just having a max rank count recorded doesn't help test 3D. One would need also add whatever the custom mdrun arg is, and to add that only when the test case and number of ranks is suitable.

#15 Updated by Szilárd Páll about 2 years ago

Mark Abraham wrote:

Just having a max rank count recorded doesn't help test 3D. One would need also add whatever the custom mdrun arg is, and to add that only when the test case and number of ranks is suitable.

As that the post-submit matrix claims 2D decomp is "partially covered", I was assuming that we do trigger in a jenkins config (and therefore triggering 3D is straightfwd). I also have a vague memory of argument passing to gmxtest, what happened to that feature?

Edit: I found a record of gmxtest+"-extra -args" syntax here. That seems to be enough to pass -dd X Y Z together with a max ranks limit.

#16 Updated by Mark Abraham about 2 years ago

Szilárd Páll wrote:

Mark Abraham wrote:

Just having a max rank count recorded doesn't help test 3D. One would need also add whatever the custom mdrun arg is, and to add that only when the test case and number of ranks is suitable.

As that the post-submit matrix claims 2D decomp is "partially covered",

Complex/dd121 maybe was what I had in mind, but actually that just checks that we can decompose in Y even when the other directions don't, which we force by the dimensions of the box rather than args.

I was assuming that we do trigger in a jenkins config (and therefore triggering 3D is straightfwd).

No we don't.

I also have a vague memory of argument passing to gmxtest, what happened to that feature?

It still has a feature that could be leveraged to suggest an mdrun parameter for a DD, but something would also have to manage the number of ranks to suit.

Edit: I found a record of gmxtest+"-extra -args" syntax here. That seems to be enough to pass -dd X Y Z together with a max ranks limit.

That matrix was dormant, remember. IIRC that syntax was only supported during an early part of the transition. I don't see why you'd pass a max ranks limit with it.

Can you articulate what you'd try to accomplish in a "test of 3D DD"? That might be more productive than worrying about how to implement something that might not do what is wanted.

#17 Updated by Szilárd Páll about 2 years ago

Mark Abraham wrote:

Can you articulate what you'd try to accomplish in a "test of 3D DD"? That might be more productive than worrying about how to implement something that might not do what is wanted.

The very TODO in the post-submit spec file, not sure what more is there to articulate?
mdrun -dd X Y Z, X>1, Y>1, Z>1 is a little more more specific, I guess.

#18 Updated by Mark Abraham about 2 years ago

Szilárd Páll wrote:

Mark Abraham wrote:

Can you articulate what you'd try to accomplish in a "test of 3D DD"? That might be more productive than worrying about how to implement something that might not do what is wanted.

The very TODO in the post-submit spec file, not sure what more is there to articulate?
mdrun -dd X Y Z, X>1, Y>1, Z>1 is a little more more specific, I guess.

Yes it's easy to say what mdrun call is needed. Which layer of scripts is going to choose the number of ranks to run? Cmake, ctest, releng, matrix line, gmxtest. Why? What should a lower layer do with a test case that is known not to support such parallelism - run it at lower parallelism, or skip it? How will we know that we're not erroneously skipping all of them, somehow?

#19 Updated by Gerrit Code Review Bot about 2 years ago

Gerrit received a related patchset '1' for Issue #2134.
Uploader: Mark Abraham ()
Change-Id: regressiontests~master~I5adf0691b26d89e248d136b8d93068080014607e
Gerrit URL: https://gerrit.gromacs.org/6733

#20 Updated by Gerrit Code Review Bot about 2 years ago

Gerrit received a related patchset '1' for Issue #2134.
Uploader: Mark Abraham ()
Change-Id: regressiontests~master~Idba4a7de2069a67adb4cdf8b3e025a35d13c1f14
Gerrit URL: https://gerrit.gromacs.org/6734

#21 Updated by Mark Abraham about 2 years ago

  • Target version changed from 2016.4 to 2018

I think this is good enough now in 2016.4, but obviously there is room to improve

#22 Updated by Mark Abraham over 1 year ago

  • Status changed from In Progress to Closed

There's still no 3D DD test case, but the other TODO Szilard referred to earlier is now resolved. Also, there are -npme 1 tests both with and without GPUs, and multi-sim tests are running in Jenkins with 2 sims of one rank each.

If there's a particular TODO to consider/discuss, I suggest we use a redmine for that.

Also available in: Atom PDF