Project

General

Profile

Task #3061

support for Zen2

Added by Szilárd Páll about 1 month ago. Updated 18 days ago.

Status:
Blocked, need info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Difficulty:
uncategorized
Close

Description

TODO:
  • cmake detection needs tweaks to default to (likely ideal) AVX2_256 on Zen2 (SIMD flag + Vendor alone won't be enough, use CPU Family?)
  • tweak runtime detection to default to AVX2_256 as the ideal setting on Zen2
  • perf benchmark
  • backport to 2019.4

Associated revisions

Revision 03c59bcb (diff)
Added by Erik Lindahl about 1 month ago

Default to AVX2_256 SIMD for Zen2

From Zen2, we should no longer use the previous
hack with 128-bit AVX2 since the microarchitecture
can now execute two full-width AVX2 instructions
per cycle. Rather than specializing for Zen2, the
logic has been changed so we only apply the 128-bit
optimization for the chips where we know it helps
(Zen and Zen+, based on the model numbers), while
we default to full-width AVX2 for all other AMD
CPUs - which for now is only Zen2.

Fixes #3061.

Change-Id: I66017b200cd627bb9792f53ee39dd80d8e05965a

History

#1 Updated by Szilárd Páll about 1 month ago

Szilárd Páll wrote:

  • cmake detection needs tweaks to default to (likely ideal) AVX2_256 on Zen2 (SIMD flag + Vendor alone won't be enough, use CPU Family?)

There are three new instructions that should be able to distinguish between Zen/Zen+ and Zen2: CLWB, WBNOINVD, and RDPID. However if running virtualized, e.g. in the cloud some flags will not be passed through, so not sure if this is the most robust solution (would be better to rely on CPU model/stepping, but I guess doing so in cmake is pain).

#2 Updated by Szilárd Páll about 1 month ago

Here's the /proc/cpuinfo of the new Zen2:
https://www.spec.org/cpu2017/results/res2019q3/cpu2017-20190723-16385.html
While the Zen1 looks like this:
https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171211-01539.html

Seems like model >=49 may be the right check?

#3 Updated by Anonymous 30 days ago

  • Status changed from New to Resolved

#4 Updated by Szilárd Páll 25 days ago

  • Status changed from Resolved to Blocked, need info

We still have the kernel layout (4xn vs 2xnn) and the Ewald correction treatment re-assessed. The latter certainly needs to be revised, the former will likely require more thought (preliminary data shows that with 1 thread/core 2xnn is faster in most cases, while 4xn is always faster with 2 threads/core).

#5 Updated by Szilárd Páll 18 days ago

Szilárd Páll wrote:

We still have the kernel layout (4xn vs 2xnn) and the Ewald correction treatment re-assessed. The latter certainly needs to be revised, the former will likely require more thought (preliminary data shows that with 1 thread/core 2xnn is faster in most cases, while 4xn is always faster with 2 threads/core).

Correction: looks like 4xn should work across the board on Zen2.

Also available in: Atom PDF