Bug #1428
rdtscp and illegal instructions
Description
gmx-users
where our use of rdtscp
has caused confusion and complaints for users. The current behavior is:
- If the build host has a CPU that provides the instruction, it is silently used in
mdrun
. The only way to see that this is enabled is to runmdrun -version
, but most users probably don't understand the output. - If such a binary is used on a host that doesn't support the instruction, then
mdrun
aborts with an illegal instruction. Many people seem to have now understood that switchingGMX_CPU_ACCELERATION
to a common denominator for the hosts is a way to solve this (and this is even what people usually suggest on the mailing list), but even settingGMX_CPU_ACCELERATION=None
does not solve this problem. - It is possible to disable
rdtscp
by providing-DGMX_DISTRIBUTABLE_BINARY=ON
to CMake, but this is not documented anywhere, nor can this be set in any CMake GUI, since it is not provided as a cache variable.
rdtscp
:
- Minimal solution: document
GMX_DISTRIBUTABLE_BINARY
and make it a proper cache variable, so that users actually have a chance of finding this option. - If the feature is compiled in, make
mdrun
(or thegmx
binary) check in the beginning whether the instruction is actually supported by the CPU, and provide a clear error message in such a case. Unlike for the SIMD instruction set, this instruction shouldn't get automatically inserted by the compiler, so it should be easy to do the check early enough.
So the question now is that is the benefit of using rdtscp
worth this trouble? Does it influence the timings that much?
Related issues
Associated revisions
portability aspects in install guide + minor tweaks
Added information on portability aspects related to CPU instruction
sets, related to #1428.
Additionally, made several minor updates and tweaks related to
compilers, platforms, cmake, etc.
Change-Id: I621262c939c119e5bdd5e7c91dda0ae3ffc60b7b
History
#1 Updated by Teemu Murtola almost 6 years ago
- Related to Bug #857: Sequencing rdtsc in cycle-counting funcs added
#2 Updated by Mark Abraham almost 6 years ago
Teemu Murtola wrote:
There have been several reports ongmx-users
where our use ofrdtscp
has caused confusion and complaints for users. The current behavior is:
- If the build host has a CPU that provides the instruction, it is silently used in
mdrun
. The only way to see that this is enabled is to runmdrun -version
, but most users probably don't understand the output.
This is poor, and I have fixed it.
- If such a binary is used on a host that doesn't support the instruction, then
mdrun
aborts with an illegal instruction. Many people seem to have now understood that switchingGMX_CPU_ACCELERATION
to a common denominator for the hosts is a way to solve this (and this is even what people usually suggest on the mailing list), but even settingGMX_CPU_ACCELERATION=None
does not solve this problem.
I've been suggesting this because I didn't know GMX_DISTRIBUTABLE_BINARY even existed! :-)
Possible solutions of making this more user-friendly while keeping
- It is possible to disable
rdtscp
by providing-DGMX_DISTRIBUTABLE_BINARY=ON
to CMake, but this is not documented anywhere, nor can this be set in any CMake GUI, since it is not provided as a cache variable.rdtscp
:
- Minimal solution: document
GMX_DISTRIBUTABLE_BINARY
and make it a proper cache variable, so that users actually have a chance of finding this option.
Yes
- If the feature is compiled in, make
mdrun
(or thegmx
binary) check in the beginning whether the instruction is actually supported by the CPU, and provide a clear error message in such a case. Unlike for the SIMD instruction set, this instruction shouldn't get automatically inserted by the compiler, so it should be easy to do the check early enough.
Indeed. gmx_detect_hardware provides this information, and is called early in runner.c before any timing code, so the fix is straightforward.
So the question now is that is the benefit of using
rdtscp
worth this trouble? Does it influence the timings that much?
I can't answer that. But my patch at least now informs the user that they might want to arrange use it if they could and are not.
#3 Updated by Mark Abraham almost 6 years ago
- Status changed from New to Fix uploaded
- Target version changed from 5.0 to 4.6.6
#4 Updated by Mark Abraham almost 6 years ago
#5 Updated by Szilárd Páll almost 6 years ago
Change looks good.
One thing that I'd like to mention is that with this change we implicitly assume that GMX_DISTRIBUTABLE_BINARY
will be present in the GROMACS build system for a while and it will do what its name means. Hence, anything we add in the future that can make the binary non-portable, will have to be turned off by this option. Right now it's only RDTSC and if we want to keep it that way, it is perhaps more advantageous to name the option GMX_USE_TSC_COUNTER
.
#6 Updated by Mark Abraham almost 6 years ago
For implementations, it makes sense to have specific names until there is a second case from which to start generalizing. For user interfaces, it is sometimes friendlier to describe the intent or effect, rather than the detail. But with Erik's suggestion on the patch, very few people will need to use it, and making the name specific makes good sense.
Is there an advantage to "GMX_USE_TSC_COUNTER" as a name? HAVE_RDTSCP is the name of the #define that gets set, and the assembly instructions are similar. I'd like to avoid referring to TSC_COUNTER in one place and not in others. (Kinda like avoiding "pinning" vs "setting thread affinity.")
#7 Updated by Gerrit Code Review Bot almost 6 years ago
Gerrit received a related patchset '1' for Issue #1428.
Uploader: Szilárd Páll (pall.szilard@gmail.com)
Change-Id: I8bc884ef9ea8ea4661626b60490182ae2b302648
Gerrit URL: https://gerrit.gromacs.org/3082
#8 Updated by Gerrit Code Review Bot almost 6 years ago
Gerrit received a related patchset '1' for Issue #1428.
Uploader: Szilárd Páll (pall.szilard@gmail.com)
Change-Id: I621262c939c119e5bdd5e7c91dda0ae3ffc60b7b
Gerrit URL: https://gerrit.gromacs.org/3121
#9 Updated by Mark Abraham almost 6 years ago
- Status changed from Fix uploaded to Resolved
- % Done changed from 0 to 100
Applied in changeset a997205edbf8b550904d1c6e9a0fbb2af7727067.
#10 Updated by Rossen Apostolov almost 6 years ago
- Status changed from Resolved to Closed
avoid mdrun crash when rdtscp is not supported
When using rdtscp, mdrun now detects at runtime whether the CPU supports
this instruction and if this is not the case, it issues a fatal error
and instructs the user to recompile mdrun for the compute host. Note
that this will happen rarely, only when cross-compiling from a newer
host for a rather old one.
Additionally, when the user manually picks AVX, we also turn on RDTSCP
as all AVX-capable CPUs support it.
Also made CMake advanced cache option for GMX_USE_RDTSCP. This replaces
the previously hidden GMX_DISTRIBUTABLE_BUILD option.
Fixes #1428
Change-Id: I8bc884ef9ea8ea4661626b60490182ae2b302648