Project

General

Profile

Task #2629

stablise testing matrices for GROMACS 2019

Added by Mark Abraham 11 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
testing
Target version:
Difficulty:
simple
Close

Description

In September, we should plan to stabilize our testing matrix before the final merge flurry. Arriving soon are
  • gcc 8 (stable, but we need to fix an issue or two including https://gerrit.gromacs.org/#/c/7846/)
  • clang 7 (already rc2; want bug fixes for clang-tidy)
  • cuda 10 (probably)
  • cmake (currently 3.12.1, but probably a newer version will come out)
  • AMDGPU-PRO 18.2 (currently, but there is a minor patch update that we can upgrade to)

What about icc? Intel OpenCL stack? ARM HPC stack?


Related issues

Related to GROMACS - Bug #2627: Build fails with clang-tidyClosed
Related to GROMACS - Task #2659: test and evaluate CUDA 10Closed
Related to GROMACS - Task #2685: tweak jenkins post-submit test to use ARMPL for FFTsClosed

Associated revisions

Revision 4841dea8 (diff)
Added by Paul Bauer 11 months ago

Changes to documentation build script

Moved around some of the builds in documentation.py so they are actually
in the correct order, and made sphinx-input a combined target of both
the input files for the pdf manual and the webpage image files.

Commented on using individual builds in the script instead of simply
using the webpage target and building everything at once.

Refs #2629

Change-Id: I15f556db6dfff12b48b52e8b461f179486a4ea28

Revision 62f5defa (diff)
Added by Mark Abraham 10 months ago

Test with gcc 8 and cuda 10

They're not supported together but we can efficiently update the
testing matrix just once.

Now using the internal XDR on Darwin because gcc 8 warns about casts
that are technically invalid because of the way xdrproc_t is defined
on Darwin, but not a problem in practice.

Refs #2629

Change-Id: Ia6e272cb208ad8f79c8655199aafc95007d2a605

Revision 68b57569 (diff)
Added by Szilárd Páll 10 months ago

Use new GPU slave for AMD OpenCL verify runs

With this we switch to the AMDGPU-PRO stack, a lot newer hardware
(Radeon RX 560) on the new bs-gpu01 slave.

Refs #2629

Change-Id: Ib2c5e14e5d89665130ae8d6c749c3230ccfe72a9

Revision 0d92a08b (diff)
Added by Mark Abraham 10 months ago

More updates to test matrix

Tested icc 19 and cuda 10 with no GPU

Also fixed recent issue introduced for clang+nvcc

Refs #2629

Change-Id: I194568c7fd32743f64fde9690f64250d715f7ba9

Revision f72ba08a (diff)
Added by Mark Abraham 12 days ago

Fix pre-submit build matrix and description

Pre-submit should not fail because the ftp server is not reachable, so
moved it to nightly matrix. Also recorded the intent to test
such a build.

Also fixed that the intent to test AVX_128_FMA in mixed precision was
broken in Ib2c5e14e5d89665130ae8d6c749c3230ccfe72a9. There's no
ability to test this SIMD in pre-submit without wider changes, and their
is neither large risk nor consequence of breaking it, so nightly is OK.

Refs #2809, #2629

Change-Id: I9cd41a359ba662a8a9529540c50c9a7ef546cd29

History

#1 Updated by Mark Abraham 11 months ago

  • Related to Bug #2627: Build fails with clang-tidy added

#2 Updated by Mark Abraham 11 months ago

CUDA 10 is in pre-release mode. We could install one, and use the symlink on a slave to point to that until the real release comes out. Szilard has inquired of NVIDIA about their thoughts on whether the release will be out in time for us.

#3 Updated by Mark Abraham 11 months ago

  • Description updated (diff)

#4 Updated by Mark Abraham 11 months ago

  • Description updated (diff)

#5 Updated by Roland Schulz 11 months ago

ICC 2019 beta has been out since a few months and final release should be out in about a month.

#6 Updated by Roland Schulz 11 months ago

For Intel OpenCL neo we have a release about every week or so (https://github.com/intel/compute-runtime/releases). Shouldn't matter which exact version we use given that there shouldn't be any API level changes. But so far we don't test Neo at all yet. Any chance of getting a BDW or newer (SKL, KBL, CFL) integrated GPU added to Jenkins before, so that we can add a Neo configuration?

#7 Updated by Szilárd Páll 11 months ago

Mark Abraham wrote:

CUDA 10 is in pre-release mode. We could install one, and use the symlink on a slave to point to that until the real release comes out. Szilard has inquired of NVIDIA about their thoughts on whether the release will be out in time for us.

ATM CUDA 10 requires a driver that is not stable enough to rely on on build slaves. Let's wait for a stable device driver before we upgrade critical infrastructure. We could do a post-submit build with CUDA 10 on a machine with no GPU to ensure that the code builds fine, but I'm not sure it's worth the hassle.

#8 Updated by Szilárd Páll 11 months ago

Roland Schulz wrote:

For Intel OpenCL neo we have a release about every week or so (https://github.com/intel/compute-runtime/releases). Shouldn't matter which exact version we use given that there shouldn't be any API level changes.

Sounds good. How stable are the current releases?

But so far we don't test Neo at all yet. Any chance of getting a BDW or newer (SKL, KBL, CFL) integrated GPU added to Jenkins before, so that we can add a Neo configuration?

I won't have the time to upgrade one of the existing workstation box slaves to a new mobo+CPU. We could consider a NUC or similar in post-submit, but with that too there's an upfront deployment effort that is not negligible -- though if it's for NEO only, it might be a nearly fully automated deployment through our MAAS, I guess.

#10 Updated by Gerrit Code Review Bot 11 months ago

Gerrit received a related patchset '2' for Issue #2629.
Uploader: Paul Bauer ()
Change-Id: gromacs~master~I15f556db6dfff12b48b52e8b461f179486a4ea28
Gerrit URL: https://gerrit.gromacs.org/8302

#11 Updated by Szilárd Páll 11 months ago

  • Description updated (diff)

#12 Updated by Szilárd Páll 11 months ago

Mark Abraham wrote:

  • AMDGPU-PRO 18.2 (currently, but there is a minor patch update that we can upgrade to)

For the record, I'd like us to use ROCm, but it does not work with clFFT. If the 1.9 release arrives in time and clFFT compilation is fixed, we can reconsider.

ARM HPC stack?

We have the latest, 18.4; would be good to use the ARM perf libraries for FFTs which should be a matter of passing the right FFTWF_LIBRARY/FFTWF_INCLUDE_DIR to cmake.

#13 Updated by Mark Abraham 10 months ago

  • Related to Task #2659: test and evaluate CUDA 10 added

#14 Updated by Gerrit Code Review Bot 10 months ago

Gerrit received a related patchset '1' for Issue #2629.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~Ia6e272cb208ad8f79c8655199aafc95007d2a605
Gerrit URL: https://gerrit.gromacs.org/8479

#15 Updated by Gerrit Code Review Bot 10 months ago

Gerrit received a related patchset '8' for Issue #2629.
Uploader: Szilárd Páll ()
Change-Id: gromacs~master~Ib2c5e14e5d89665130ae8d6c749c3230ccfe72a9
Gerrit URL: https://gerrit.gromacs.org/7879

#16 Updated by Gerrit Code Review Bot 10 months ago

Gerrit received a related patchset '1' for Issue #2629.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~I194568c7fd32743f64fde9690f64250d715f7ba9
Gerrit URL: https://gerrit.gromacs.org/8490

#17 Updated by Szilárd Páll 10 months ago

  • AMDGPU-PRO 18.2 (currently, but there is a minor patch update that we can upgrade to)

Upgraded to 18.3 on bs-gpu01, in use & stable.

ARM HPC stack?

Will get https://gerrit.gromacs.org/#/c/8175/ working -- or a version of it. Got WIP code that allows linking against the ARM performance libraries, so hopefully before the beta I can add that too (but not critical, could be done later).

#18 Updated by Szilárd Páll 10 months ago

  • Related to Task #2685: tweak jenkins post-submit test to use ARMPL for FFTs added

#19 Updated by Mark Abraham 9 months ago

  • Target version changed from 2019-beta1 to 2019-beta2

#20 Updated by Paul Bauer 9 months ago

  • Target version changed from 2019-beta2 to 2019-beta3

How much is left to do here?

#21 Updated by Mark Abraham 9 months ago

I have some WIP that addresses a bunch of trivial issues (extra assertions that clarify what other pieces of code enforce in practice, which pacifies the analyzer). But one case was large enough that I'm not sure it's a good idea to incorporate in the release branch.

#22 Updated by Szilárd Páll 9 months ago

Paul Bauer wrote:

How much is left to do here?

AFAIK:
- there are gcc 8 warnings remaining;
- ARM HPC toolchain in post-submit needs re-enabling (8652) & there is a new release that I'll try to upgrade to and switch to, but that's not a blocker IMO.
- I'd like to also have a test with the ARM Perf Libs (for FFT at least, but we might as well link BLAS/LAPACK too) -- needs 8621 and minor releng tweaks (WIP)

#23 Updated by Paul Bauer 8 months ago

  • Target version changed from 2019-beta3 to 2019-rc1

ok, bumping then to rc1

#24 Updated by Mark Abraham 8 months ago

Szilárd Páll wrote:

Paul Bauer wrote:

How much is left to do here?

AFAIK:
- there are gcc 8 warnings remaining;

https://gerrit.gromacs.org/#/c/8757/ fixed some of those. IIRC there's still some from TNG.

@Paul Can you and Magnus please coordinate on assessing whether there's a straightforward fix we can make in the TNG repo. Ideally we then make a TNG patch release, import that into release-2019 branch, and update the metadata in our repo accordingly. Check the git log for the last time we did this to see what kinds of things we should do.

- ARM HPC toolchain in post-submit needs re-enabling (8652) & there is a new release that I'll try to upgrade to and switch to, but that's not a blocker IMO.
- I'd like to also have a test with the ARM Perf Libs (for FFT at least, but we might as well link BLAS/LAPACK too) -- needs 8621 and minor releng tweaks (WIP)

Szilard and I have be working on this, but not a priority for rc1

#25 Updated by Mark Abraham 8 months ago

  • Target version changed from 2019-rc1 to 2019-rc2

#26 Updated by Paul Bauer 7 months ago

  • Target version changed from 2019-rc2 to 2019

no second release candidate is planned

#27 Updated by Paul Bauer 7 months ago

  • Status changed from New to In Progress
  • Target version changed from 2019 to 2019.1

there is still some work needed for the OpenCL release matrix entry, so I retargeted this at 2019.1 instead of closing it.

#28 Updated by Mark Abraham 6 months ago

  • Status changed from In Progress to Resolved

If we fix OpenCL we might add tat to the release matrix, but otherwise we are stable and not changing stuff here.

#29 Updated by Mark Abraham 6 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF