Project

General

Profile

Task #2161

update the way testing matrices are specified

Added by Mark Abraham about 2 years ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
testing
Target version:
Difficulty:
uncategorized
Close

Description

Discussion at https://gerrit.gromacs.org/#/c/6588/6 suggests changes along the lines of

  • refactoring build specifications for GPU jobs to be "[opencl-x.y|cuda-x.y] gpu=[amd|nvidia]" (this may require that we fork off the 2016-era releng so that the former opencl and gpu tags continue to work well and be implemented reasonably well)
  • specify gcc labels as "gcc=[4.8|4.9|5|6|7]" so that it can be understood that it's a major (or former major.minor) version that is tested, and what's actually on the slave might evolve as bugs in stuff come to light. icc and msvc need no change. Using "clang=[3.4...3.9|4|5]" seems like the approach we'll need there, too. Should we relax cmake versioning on similar lines?

Related issues

Related to GROMACS - Task #2135: check non-Jenkins compilers workClosed
Related to Support Platforms - Feature #2180: releng matrices would work better with a hint for executionFix uploaded

Associated revisions

Revision 79b533ba (diff)
Added by Mark Abraham about 2 years ago

Update clang-4

Test it in pre- and post-submit matrices

Refs #2161

Change-Id: I4b67920cb2c5a8caad07426ed98f06eeea8bd57e

Revision 2eb5a630 (diff)
Added by Mark Abraham about 2 years ago

Updated use of gcc-5 specifiers

Removed some TODOs that have been resolved by previous merge to
master.

Refs #2161

Change-Id: Ic4d47bcd282f9f973eca996168234a8b48948214

Revision b88ebed6 (diff)
Added by Mark Abraham about 2 years ago

Updated use of compiler specifiers in testing

Per #2161, we want to name newer gcc and clang via symlinks like
gcc-5, so that bug-fix updates can go smoothly.

Updated matrices, and coverage job script. Nothing else seems
to need it.

Refs #2161

Change-Id: I35bc20b160c1b1d90cca341f53dd366033c16c86

Revision b52743fc (diff)
Added by Szilárd Páll 9 months ago

Update test matrices with new GPU specifiers

Separates software stack and hardware requirement specification; the
latter is done using the new gpuhw=vendor syntax.

Note that OpenCL version specified is for now ignored (cmake support
in follow-up commit).

Refs #2161

Change-Id: Ia1dfb175b2d47579577c5588a71d8b69a1bff07b

History

#1 Updated by Mark Abraham about 2 years ago

I'm installing clang-4 on bs_nix-amd. No ppas or macports of gcc 7 are out, yet.

#2 Updated by Gerrit Code Review Bot about 2 years ago

Gerrit received a related patchset '1' for Issue #2161.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~I4b67920cb2c5a8caad07426ed98f06eeea8bd57e
Gerrit URL: https://gerrit.gromacs.org/6615

#3 Updated by Teemu Murtola about 2 years ago

I would suggest avoiding branching of the releng scripts; that likely leads to more extra maintenance than it is worth. It is not a big one-time pain to change the verification matrices in all the release branches if backwards compatibility seems too hard to keep longer, if one picks a time when the release branches do not have many active changes in Gerrit.

For the GPU parameters, the rationale for the different parameters should be clearly defined. What are the needs here? Ability to choose the build host based on the type of the GPU it has? Would that have any effect on how the code is built, or would that be selected only with the cuda/opencl flags? If possible, it would be good to decouple these two or three things from each other.

The compiler change is trivial and just needs a decision (last time it was discussed, it was explicitly requested to keep the patch versions); only slaves.py needs to list the new labels, and the slaves need to ensure that the compilers can be invoked with the names of the labels (e.g., gcc-5). Some documentation should also change to not reference the patch versions, e.g., when mentioning compilers we have tested.

#4 Updated by Mark Abraham about 2 years ago

Teemu Murtola wrote:

I would suggest avoiding branching of the releng scripts; that likely leads to more extra maintenance than it is worth. It is not a big one-time pain to change the verification matrices in all the release branches if backwards compatibility seems too hard to keep longer, if one picks a time when the release branches do not have many active changes in Gerrit.

Sure, that could easily be best.

For the GPU parameters, the rationale for the different parameters should be clearly defined. What are the needs here? Ability to choose the build host based on the type of the GPU it has? Would that have any effect on how the code is built, or would that be selected only with the cuda/opencl flags? If possible, it would be good to decouple these two or three things from each other.

(Background information at https://gerrit.gromacs.org/#/c/6588/6/admin/builds/pre-submit-matrix.txt@55)

An OpenCL build needs only (as far as we know) vendor-neutral header and libOpenCL.so ICD library that we can arrange for slaves to have (and later, perhaps different versions too), either in a default path or somewhere releng can know to tell cmake about. There are e.g. ubuntu packages that do this, but if we might want multiple versions then we could plan to install them in /opt/opencl-x.y (or whatever) up front. Specifically, it doesn't need either CUDA or AMD SDKs. So the opencl-x.y tag is enough to assign a build to a slave that has those installed (which need not be a slave that has a GPU, when we get that far).

A CUDA build needs its SDK, currently working fine from /opt/cuda-x.y via cuda-x.y tag

In both cases, running the tests needs a driver and device that supports the version of the build, which is tagged with gpu=vendor.

Currently we build and test on the same slave, so we will need slaves that support the range of tag combinations that are testable / that we wish to test. But the above breakdown already decouples the needs of the build from the needs of running the tests. So would already support some future where we might build in a docker slave somewhere and move the container to a slave with a GPU to test it.

The necessary range of combinations is currently
  • gpu=amd opencl-1.1|1.2|2.0 (which can be arranged target bs_nix-amd_gpu)
  • gpu=nvidia opencl-1.1 (which can be arranged to target bs_nix1204 or bs_nix1310)
  • gpu=nvidia cuda-5.5|6.0|...|8.0 (likewise)
At some future time, we might e.g. have
  • gpu=intel opencl-1.1

We could at some later time want to do a build against a vendor's OpenCL header+ICD, if so we might want opencl-1.1-nvidia (or similar)

The compiler change is trivial and just needs a decision (last time it was discussed, it was explicitly requested to keep the patch versions); only slaves.py needs to list the new labels, and the slaves need to ensure that the compilers can be invoked with the names of the labels (e.g., gcc-5). Some documentation should also change to not reference the patch versions, e.g., when mentioning compilers we have tested.

Yeah, I was going to experiment with the change with the additions for clang-4 and gcc-7, and then unify and update documentation.

#5 Updated by Mark Abraham about 2 years ago

  • Related to Task #2135: check non-Jenkins compilers work added

#6 Updated by Mark Abraham about 2 years ago

Mark Abraham wrote:

  • specify gcc labels as "gcc=[4.8|4.9|5|6|7]" so that it can be understood that it's a major (or former major.minor) version that is tested, and what's actually on the slave might evolve as bugs in stuff come to light. icc and msvc need no change. Using "clang=[3.4...3.9|4|5]" seems like the approach we'll need there, too. Should we relax cmake versioning on similar lines?
https://gerrit.gromacs.org/#/c/6632/2 is starting to implement this. For stability, the plan is
  • keep the old symlinks from ~jenkins/bin,
  • add new ones,
  • update the slave labels to add the new ones,
  • update manual Jenkins configurations to point at them,
  • update release-2016 and master matrices to point at them,
  • remove old slave labels,
  • don't bother to remove old symlinks(?)

#7 Updated by Gerrit Code Review Bot about 2 years ago

Gerrit received a related patchset '1' for Issue #2161.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~I35bc20b160c1b1d90cca341f53dd366033c16c86
Gerrit URL: https://gerrit.gromacs.org/6635

#8 Updated by Gerrit Code Review Bot about 2 years ago

Gerrit received a related patchset '1' for Issue #2161.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2016~Ic4d47bcd282f9f973eca996168234a8b48948214
Gerrit URL: https://gerrit.gromacs.org/6636

#9 Updated by Mark Abraham about 2 years ago

  • Related to Feature #2180: releng matrices would work better with a hint for execution added

#10 Updated by Mark Abraham about 2 years ago

  • Project changed from GROMACS to Support Platforms
  • Category deleted (releng)
  • Target version deleted (2018)

#11 Updated by Gerrit Code Review Bot almost 2 years ago

Gerrit received a related patchset '1' for Issue #2161.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2016~Idd985d7e4e45b58aeb9bd4f711622a91017a18a4
Gerrit URL: https://gerrit.gromacs.org/6738

#12 Updated by Mark Abraham over 1 year ago

Such changes would also be beneficial if we could support a matrix description like

cuda-8.0 no-gpu

so that we can build with CUDA and test that the resulting binary runs correctly on a machine where no CUDA device is available.

#13 Updated by Mark Abraham over 1 year ago

  • Project changed from Support Platforms to GROMACS
  • Category set to testing
  • Assignee set to Mark Abraham
  • Target version set to 2019
  • Difficulty uncategorized added

Further work for 2019 release

#14 Updated by Szilárd Páll about 1 year ago

The necessary range of combinations is currently
  • gpu=amd opencl-1.1|1.2|2.0 (which can be arranged target bs_nix-amd_gpu)
  • gpu=nvidia opencl-1.1 (which can be arranged to target bs_nix1204 or bs_nix1310)

As discussed off-redmine, there's more complexity here that we could but perhaps don't need to tackle: the OpenCL headers are the only thing we can detect at compile-time; those will generally be at least 2.0 compatible. The runtime/JIT compiler and hardware on the other hand is what will generally be the limiter. However the range of supported hardware/OpenCL standards is quite narrow, so I'm not sure whether we need to implement a version support that matches API and hardware flags; in practice what what we'd have is:

gpu=amd|nvidia|intel opencl-1.2; we might possibly use opencl-2.0 on some platforms, but that will be uniquely identified by the platform, so not sure if version matching against API & hardware is necessary.

In my recnt WIP change I tried express the first part with an "amdgpu" flag, but the role is essentially the same.

  • gpu=nvidia cuda-5.5|6.0|...|8.0 (likewise)

+1

At some future time, we might e.g. have
  • gpu=intel opencl-1.1

The future has arrived )

We could at some later time want to do a build against a vendor's OpenCL header+ICD, if so we might want opencl-1.1-nvidia (or similar)

Not sure there is a need for that; I don't know what vendors provide, I've not used custom/vendor distributed headers or ICD loaders for a long time.

#15 Updated by Gerrit Code Review Bot 10 months ago

Gerrit received a related patchset '1' for Issue #2161.
Uploader: Szilárd Páll ()
Change-Id: gromacs~master~Ia1dfb175b2d47579577c5588a71d8b69a1bff07b
Gerrit URL: https://gerrit.gromacs.org/8321

#16 Updated by Gerrit Code Review Bot 9 months ago

Gerrit received a related patchset '1' for Issue #2161.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2018~I4c04ac20aebdf0ba2798a26efce04f4a1235c23f
Gerrit URL: https://gerrit.gromacs.org/8486

#17 Updated by Gerrit Code Review Bot 9 months ago

Gerrit received a related patchset '1' for Issue #2161.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2016~Ibb865a13e67dc7782b0ac0f03ef9106d0e973238
Gerrit URL: https://gerrit.gromacs.org/8487

#18 Updated by Mark Abraham 8 months ago

  • Status changed from New to Resolved

matrix updates are continuing, but we've more or less fulfilled the redmine

#19 Updated by Mark Abraham 8 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF