SIMD algorithms for ARM SVE // nobnonded cluster and otheres
The ARM SVE instruction set presents both challenges and opportunities for expressing the nonbonded algorithm. To tackle these, a simple engineering approach (i.e. code something in a day or two) will unlikely to be useful. Therefore, the aim of this change is to discuss ideas and track progress on what we can / will do to express MD kernels, and in particular the cluster nonbonded algorithm for the SVE ISA (note that in fact the RISC V SIMD ISA will be quite similar too).Challenges [RFC]:
- ARM compilers will not provide intrinsics, everything will have to be vectorized with flexible width; for some algorithms this will be fine (e.g. settle or bondeds), but for the nonbondeds where SIMD width is part of the way the algorithm is expressed it may be challenging to write truly SIMD width-agnostic code.
- For the above reason we may have to assume (and possibly initially compile) for a certain SIMD width -- whether that can be detected or not; e.g. we might assume that if it's a Fujitsu SVE chip we're building for, we're targeting 512 bit wide execution in the nonbondeds.
- Predication might offer significant benefits. E.g. in the nonbondeds we might be able to improve performance (and/or reduce energy/interaction) if we can make use of predication to skip computing interactions (either on a cluster or cut-off check level).