Task #2161
update the way testing matrices are specified
Description
Discussion at https://gerrit.gromacs.org/#/c/6588/6 suggests changes along the lines of
- refactoring build specifications for GPU jobs to be "[opencl-x.y|cuda-x.y] gpu=[amd|nvidia]" (this may require that we fork off the 2016-era releng so that the former opencl and gpu tags continue to work well and be implemented reasonably well)
- specify gcc labels as "gcc=[4.8|4.9|5|6|7]" so that it can be understood that it's a major (or former major.minor) version that is tested, and what's actually on the slave might evolve as bugs in stuff come to light. icc and msvc need no change. Using "clang=[3.4...3.9|4|5]" seems like the approach we'll need there, too. Should we relax cmake versioning on similar lines?
Related issues
Associated revisions
Updated use of gcc-5 specifiers
Removed some TODOs that have been resolved by previous merge to
master.
Refs #2161
Change-Id: Ic4d47bcd282f9f973eca996168234a8b48948214
Update test matrices with new GPU specifiers
Separates software stack and hardware requirement specification; the
latter is done using the new gpuhw=vendor syntax.
Note that OpenCL version specified is for now ignored (cmake support
in follow-up commit).
Refs #2161
Change-Id: Ia1dfb175b2d47579577c5588a71d8b69a1bff07b
History
#1 Updated by Mark Abraham over 3 years ago
I'm installing clang-4 on bs_nix-amd. No ppas or macports of gcc 7 are out, yet.
#2 Updated by Gerrit Code Review Bot over 3 years ago
Gerrit received a related patchset '1' for Issue #2161.
Uploader: Mark Abraham (mark.j.abraham@gmail.com)
Change-Id: gromacs~master~I4b67920cb2c5a8caad07426ed98f06eeea8bd57e
Gerrit URL: https://gerrit.gromacs.org/6615
#3 Updated by Teemu Murtola over 3 years ago
I would suggest avoiding branching of the releng scripts; that likely leads to more extra maintenance than it is worth. It is not a big one-time pain to change the verification matrices in all the release branches if backwards compatibility seems too hard to keep longer, if one picks a time when the release branches do not have many active changes in Gerrit.
For the GPU parameters, the rationale for the different parameters should be clearly defined. What are the needs here? Ability to choose the build host based on the type of the GPU it has? Would that have any effect on how the code is built, or would that be selected only with the cuda/opencl flags? If possible, it would be good to decouple these two or three things from each other.
The compiler change is trivial and just needs a decision (last time it was discussed, it was explicitly requested to keep the patch versions); only slaves.py needs to list the new labels, and the slaves need to ensure that the compilers can be invoked with the names of the labels (e.g., gcc-5
). Some documentation should also change to not reference the patch versions, e.g., when mentioning compilers we have tested.
#4 Updated by Mark Abraham over 3 years ago
Teemu Murtola wrote:
I would suggest avoiding branching of the releng scripts; that likely leads to more extra maintenance than it is worth. It is not a big one-time pain to change the verification matrices in all the release branches if backwards compatibility seems too hard to keep longer, if one picks a time when the release branches do not have many active changes in Gerrit.
Sure, that could easily be best.
For the GPU parameters, the rationale for the different parameters should be clearly defined. What are the needs here? Ability to choose the build host based on the type of the GPU it has? Would that have any effect on how the code is built, or would that be selected only with the cuda/opencl flags? If possible, it would be good to decouple these two or three things from each other.
(Background information at https://gerrit.gromacs.org/#/c/6588/6/admin/builds/pre-submit-matrix.txt@55)
An OpenCL build needs only (as far as we know) vendor-neutral header and libOpenCL.so ICD library that we can arrange for slaves to have (and later, perhaps different versions too), either in a default path or somewhere releng can know to tell cmake about. There are e.g. ubuntu packages that do this, but if we might want multiple versions then we could plan to install them in /opt/opencl-x.y (or whatever) up front. Specifically, it doesn't need either CUDA or AMD SDKs. So the opencl-x.y tag is enough to assign a build to a slave that has those installed (which need not be a slave that has a GPU, when we get that far).
A CUDA build needs its SDK, currently working fine from /opt/cuda-x.y via cuda-x.y tag
In both cases, running the tests needs a driver and device that supports the version of the build, which is tagged with gpu=vendor.
Currently we build and test on the same slave, so we will need slaves that support the range of tag combinations that are testable / that we wish to test. But the above breakdown already decouples the needs of the build from the needs of running the tests. So would already support some future where we might build in a docker slave somewhere and move the container to a slave with a GPU to test it.
The necessary range of combinations is currently- gpu=amd opencl-1.1|1.2|2.0 (which can be arranged target bs_nix-amd_gpu)
- gpu=nvidia opencl-1.1 (which can be arranged to target bs_nix1204 or bs_nix1310)
- gpu=nvidia cuda-5.5|6.0|...|8.0 (likewise)
- gpu=intel opencl-1.1
We could at some later time want to do a build against a vendor's OpenCL header+ICD, if so we might want opencl-1.1-nvidia (or similar)
The compiler change is trivial and just needs a decision (last time it was discussed, it was explicitly requested to keep the patch versions); only slaves.py needs to list the new labels, and the slaves need to ensure that the compilers can be invoked with the names of the labels (e.g.,
gcc-5
). Some documentation should also change to not reference the patch versions, e.g., when mentioning compilers we have tested.
Yeah, I was going to experiment with the change with the additions for clang-4 and gcc-7, and then unify and update documentation.
#5 Updated by Mark Abraham over 3 years ago
- Related to Task #2135: check non-Jenkins compilers work added
#6 Updated by Mark Abraham over 3 years ago
Mark Abraham wrote:
https://gerrit.gromacs.org/#/c/6632/2 is starting to implement this. For stability, the plan is
- specify gcc labels as "gcc=[4.8|4.9|5|6|7]" so that it can be understood that it's a major (or former major.minor) version that is tested, and what's actually on the slave might evolve as bugs in stuff come to light. icc and msvc need no change. Using "clang=[3.4...3.9|4|5]" seems like the approach we'll need there, too. Should we relax cmake versioning on similar lines?
- keep the old symlinks from ~jenkins/bin,
- add new ones,
- update the slave labels to add the new ones,
- update manual Jenkins configurations to point at them,
- update release-2016 and master matrices to point at them,
- remove old slave labels,
- don't bother to remove old symlinks(?)
#7 Updated by Gerrit Code Review Bot over 3 years ago
Gerrit received a related patchset '1' for Issue #2161.
Uploader: Mark Abraham (mark.j.abraham@gmail.com)
Change-Id: gromacs~master~I35bc20b160c1b1d90cca341f53dd366033c16c86
Gerrit URL: https://gerrit.gromacs.org/6635
#8 Updated by Gerrit Code Review Bot over 3 years ago
Gerrit received a related patchset '1' for Issue #2161.
Uploader: Mark Abraham (mark.j.abraham@gmail.com)
Change-Id: gromacs~release-2016~Ic4d47bcd282f9f973eca996168234a8b48948214
Gerrit URL: https://gerrit.gromacs.org/6636
#9 Updated by Mark Abraham over 3 years ago
- Related to Feature #2180: releng matrices would work better with a hint for execution added
#10 Updated by Mark Abraham over 3 years ago
- Project changed from GROMACS to Support Platforms
- Category deleted (
releng) - Target version deleted (
2018)
#11 Updated by Gerrit Code Review Bot over 3 years ago
Gerrit received a related patchset '1' for Issue #2161.
Uploader: Mark Abraham (mark.j.abraham@gmail.com)
Change-Id: gromacs~release-2016~Idd985d7e4e45b58aeb9bd4f711622a91017a18a4
Gerrit URL: https://gerrit.gromacs.org/6738
#12 Updated by Mark Abraham about 3 years ago
Such changes would also be beneficial if we could support a matrix description like
cuda-8.0 no-gpu
so that we can build with CUDA and test that the resulting binary runs correctly on a machine where no CUDA device is available.
#13 Updated by Mark Abraham about 3 years ago
- Project changed from Support Platforms to GROMACS
- Category set to testing
- Assignee set to Mark Abraham
- Target version set to 2019
- Difficulty uncategorized added
Further work for 2019 release
#14 Updated by Szilárd Páll over 2 years ago
The necessary range of combinations is currently
- gpu=amd opencl-1.1|1.2|2.0 (which can be arranged target bs_nix-amd_gpu)
- gpu=nvidia opencl-1.1 (which can be arranged to target bs_nix1204 or bs_nix1310)
As discussed off-redmine, there's more complexity here that we could but perhaps don't need to tackle: the OpenCL headers are the only thing we can detect at compile-time; those will generally be at least 2.0 compatible. The runtime/JIT compiler and hardware on the other hand is what will generally be the limiter. However the range of supported hardware/OpenCL standards is quite narrow, so I'm not sure whether we need to implement a version support that matches API and hardware flags; in practice what what we'd have is:
gpu=amd|nvidia|intel opencl-1.2; we might possibly use opencl-2.0 on some platforms, but that will be uniquely identified by the platform, so not sure if version matching against API & hardware is necessary.
In my recnt WIP change I tried express the first part with an "amdgpu" flag, but the role is essentially the same.
- gpu=nvidia cuda-5.5|6.0|...|8.0 (likewise)
+1
At some future time, we might e.g. have
- gpu=intel opencl-1.1
The future has arrived )
We could at some later time want to do a build against a vendor's OpenCL header+ICD, if so we might want opencl-1.1-nvidia (or similar)
Not sure there is a need for that; I don't know what vendors provide, I've not used custom/vendor distributed headers or ICD loaders for a long time.
#15 Updated by Gerrit Code Review Bot over 2 years ago
Gerrit received a related patchset '1' for Issue #2161.
Uploader: Szilárd Páll (pall.szilard@gmail.com)
Change-Id: gromacs~master~Ia1dfb175b2d47579577c5588a71d8b69a1bff07b
Gerrit URL: https://gerrit.gromacs.org/8321
#16 Updated by Gerrit Code Review Bot over 2 years ago
Gerrit received a related patchset '1' for Issue #2161.
Uploader: Mark Abraham (mark.j.abraham@gmail.com)
Change-Id: gromacs~release-2018~I4c04ac20aebdf0ba2798a26efce04f4a1235c23f
Gerrit URL: https://gerrit.gromacs.org/8486
#17 Updated by Gerrit Code Review Bot over 2 years ago
Gerrit received a related patchset '1' for Issue #2161.
Uploader: Mark Abraham (mark.j.abraham@gmail.com)
Change-Id: gromacs~release-2016~Ibb865a13e67dc7782b0ac0f03ef9106d0e973238
Gerrit URL: https://gerrit.gromacs.org/8487
#18 Updated by Mark Abraham over 2 years ago
- Status changed from New to Resolved
matrix updates are continuing, but we've more or less fulfilled the redmine
#19 Updated by Mark Abraham over 2 years ago
- Status changed from Resolved to Closed
Update clang-4
Test it in pre- and post-submit matrices
Refs #2161
Change-Id: I4b67920cb2c5a8caad07426ed98f06eeea8bd57e