Project

General

Profile

Bug #1337

regression test with GPU builds don't test CPU Verlet kernels

Added by Szilárd Páll about 6 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
testing
Target version:
Affected version - extra info:
5.x
Affected version:
Difficulty:
uncategorized
Close

Description

In the regression test runs mdrun uses automated selection of CPU or GPU - the same way as it would happen in a standalone
run. This means, that with a GPU-enabled build, when a GPU is present in the build machine, make check will only use the GPU non-bonded Verlet kernels for testing and leave the CPU kernels untested. In this case the complex/nbnxn_* tests should be ran twice, both with GPU and CPU kernels.

Associated revisions

Revision 86822241 (diff)
Added by Mark Abraham over 5 years ago

Made test_case function

Refactor test_systems into a loop over test_case. This prepares for
new functionality. Also cleaned up indentation and whitespace. No
overall functionality changes.

Refs #1337

Change-Id: I7068a63c14a1ef27c5bfbbc6251d14f8557284ce

Revision 4b7c99d8 (diff)
Added by Mark Abraham over 5 years ago

Refactor test_case to separate functionality

Now the test machinery can use different input and output directories,
and name for the test case, which will be useful for new functionality
that wants to run the same test case in a different mode and keep the
output separate.

Added some more detail to output strings where it might not always
have been clear which test case was producing the output.

Refs #1337

Change-Id: Ic3de4202e1445af36c263ffeed2663a7e0b3a26e

Revision b19f5ba6 (diff)
Added by Mark Abraham about 5 years ago

Permit test cases to rerun with -nb cpu

This allows tests that did run using (real and emulated) GPU kernels
to run again automatically using only CPU kernels. This means that
users and Jenkins can automatically test all the important non-bonded
code paths. Output is kept separate in a "cpu-only" subdirectory, and
reporting to the command line and Jenkins-suitable XML works. The
reference data is shared, rather than duplicated.

If the user was playing games with ./gmxtest.pl -mdparams "-nb xxx"
then the new mechanism gracefully stays out of the way.

The mechanism for returning the numbers of total and passed test cases
is enhanced to behave correctly with both old and new behaviour, and
also to adapt to the number of test cases selected with ./gmxtest.pl
-only

./gmxtest.pl clean also behaves correctly, whether or not any CPU-only
runs took place.

Fixes #1337

Change-Id: I21af1dd306325aa389d82184ec7c8a8fabdc26ff

History

#1 Updated by Erik Lindahl over 5 years ago

  • Tracker changed from Bug to Feature
  • Priority changed from High to Low
  • Target version set to 5.x

It would make for an extremely complicated test environment if all test scripts should be aware of the internal implementation deep inside Gromacs modules. For instance, depending on settings the group interactions might use either SIMD or reference kernels, and then we need double tests for all those too.

I much prefer a simple setup where we keep things simple as just test the binaries as a black box. The one thing we could consider is to add a flag to the gmxtest.pl script to allow it to force CPU execution. Is there any scenario where the auto option might pick CPU, although the code would run on a GPU? I.e., is there ever a need for forcing the GPU option?

#2 Updated by Szilárd Páll over 5 years ago

To be honest, I don't think that this should have low priority and a vague target! We are testing on hardware configurations, e.g. with AVX or AVX2 support, but in fact those machines all have GPUs, hence all regressiontests using Verlet scheme (if the tested features are supported) will only run the CUDA kernels. The same is true for users: make check won't ensure that their the CPU SIMD kernels behave OK if there is a GPU in the machine - unless they rerun make test with e.g. CUDA_VISIBLE_DEVICES="". This way both we and users end up witha false sense of security.

Erik Lindahl wrote:

It would make for an extremely complicated test environment if all test scripts should be aware of the internal implementation deep inside Gromacs modules. For instance, depending on settings the group interactions might use either SIMD or reference kernels, and then we need double tests for all those too.

I agree that testing all code-paths compiled into a binary would be a complex task and somewhat difficult one (but not impossible and not necessarily a bad idea either). However, my suggestion was quite specific and much more simple: if a test used GPUs, rerun it with GPU acceleration disabled.

I much prefer a simple setup where we keep things simple as just test the binaries as a black box. The one thing we could consider is to add a flag to the gmxtest.pl script to allow it to force CPU execution.

Well, that's exactly what I suggested! More concretely, we could
  • Detect in advance that GPU acceleration is feasible and rerun only the tests that will use GPUs with -nb cpu. This has maintenance overhead as feature support needs to be tracked.
  • Check whether GPUs were used in a specific set of tests (e.g. grep log file) and run those tests again with -nb cpu.

Is there any scenario where the auto option might pick CPU, although the code would run on a GPU? I.e., is there ever a need for forcing the GPU option?

No, ATM whenever the simulation can use a GPU it will use it.

I think there is a clear gain to be obtained here with a limited amount of effort, I just have not invested time in it because I am not familiar with the details of how the tests are carried out. However, I would be happy to contribute when I get a bit more time.

#3 Updated by Erik Lindahl over 5 years ago

In terms of the build system, I think the obvious solution here is to set up a second build on some alternatives that are compiled+configured without GPU. That will solve the problem, and it will avoid the complex situation that some some builds will run all tests twice, so we can have either "complex/nbnxn_rf" or "complex/nbnxn_rf" fail, but those are really completely different tests.

That will also be MUCH cleaner when there are GPU build issues, since those will only show up in the particular build that is configured to use GPUs.

#4 Updated by Szilárd Páll over 5 years ago

  • Tracker changed from Feature to Bug
  • Priority changed from Low to Normal
  • Affected version - extra info set to 5.x
  • Affected version set to 4.6.x

It's not just about our jenkins CI/verification setup. We want users to validate their GROMACS build by running make check, but in fact, a considerable part of the code will be untested and this depends only on whether there is a GPU in the machine or not.

To fix this erroneous behavior, one does not have to rerun all tests, only the ones that used a GPU. After discussing with Mark, this seems to be a rather small addition to gmxtest.pl.

And again, this is not just for our sake, if we don't do this, than we need to clearly tell the users to always run the regressiontests twice (the second time by disabling GPUs for which they need to unload the driver or use a driver-magic environment variable) on any machine with a GPU present. Without fixing either a gmxtest.pl or the documentation this remains an unintended behavior of the testing suite giving a false sense of security, hence a bug.

On the long run, it would be better to employ a testing framework that is capable of listing the actual tests it does so that developers and users can see that e.g. "complex/nbnxn_rf" was ran twice (the second time one would simply append a message like "CPU only").

#5 Updated by Mark Abraham over 5 years ago

  • Status changed from New to Fix uploaded

#6 Updated by Mark Abraham about 5 years ago

  • Status changed from Fix uploaded to Closed
  • Target version changed from 5.x to 5.0.1

GPU builds now test CPU verlet kernels.

Also available in: Atom PDF