inconsistent pinning behavior due to missing SMT info on AMD Zen
The mdrun native hardware detection does recognize hardware thread order on the AMD Zen uarch processors, but does not detect the SMT to correctly assign the hardware threads to cores. As a result (besides the reporting being incorrect), only at most half of the cores are used when a run is launched with #threads<#hwthreads/2. On Intel with HT in such ases the default stride is switched to 2 to spread threads across cores when the total thread count is <=#cores.
- Native detection
Hardware topology: Basic Sockets, cores, and logical processors: Socket 0: [ 0] [ 16] [ 1] [ 17] [ 2] [ 18] [ 3] [ 19] [ 4] [ 20] [ 5] [ 21] [ 6] [ 22] [ 7] [ 23] [ 8] [ 24] [ 9] [ 25] [ 10] [ 26] [ 11] [ 27] [ 12] [ 28] [ 13] [ 29] [ 14] [ 30] [ 15] [ 31]
- Detection with hwloc:
Hardware topology: Full, with devices Sockets, cores, and logical processors: Socket 0: [ 0 16] [ 1 17] [ 2 18] [ 3 19] [ 4 20] [ 5 21] [ 6 22] [ 7 23] [ 8 24] [ 9 25] [ 10 26] [ 11 27] [ 12 28] [ 13 29] [ 14 30] [ 15 31]
#1 Updated by Szilárd Páll about 1 month ago
Having briefly looked at
CpuInfo::detect(), I'm not sure whether this is a technical limitation of cpuid on AMD, but if it is not possible/hard to correct, I suggest we make the assumption that if the logical processor indexing suggests that SMT is on, we switch to stride 2 as we do on Intel. IIUC, this should be safe and as long as sibling + index indicates #cores stride, the kernel has to be configured in a very strange manner for the the assumption to not be correct.
#7 Updated by Erik Lindahl 20 days ago
- Tracker changed from Bug to Feature
- Affected version - extra info deleted (
- Affected version deleted (
Well, it's not technically a bug since the hardware info module properly detects that we can't see it, and correctly specifies that only basic topology information is available. It would be nice to have it for this type of hardware too, but that's a new feature I'm not sure how easy it is to implement (depends on whether it can be extracted from cpuid).
Overall, isn't the best solution to simply recommend people to use hwloc? I'm skeptic to start assuming we have SMT in cases where it has not been properly detected, because such assumptions have historically come back and bitten us in hard ways.
#8 Updated by Szilárd Páll 15 days ago
- Tracker changed from Feature to Bug
- Subject changed from SMT info AMD Zen lacking with native hardware detection to inconsistent pinning behavior due to missing SMT info on AMD Zen
- Affected version set to 2018
This is not a feature request. The described issue leads to inconsistent behavior -- both between the hwloc and no-hwloc build on the same machine and between different x86 platforms (in that in this single case out of four use-cases a different set of threads will be used by default). There is no practical difference between Intel HT and AMD SMT, so we should not implement entirely different default behavior unless something makes it impossible to do the same detection we do on Intel -- that's why I asked whether the Intel-specific cpuid can be extended.
Suggesting people to use hwloc won't solve the inconsistency either (though removing the Intel-only topology detection and related assumptions would perhaps improve things a bit).