SIMD version of the free-energy kernel
The performance of the slow, serial free-energy non-bonded kernel is a serious bottleneck in free-energy simulations. Using SIMD should give a significant speed-up of this compute intensive kernel. Since the nbnxm non-bonded scheme for the normal non-bonded interactions is not very beneficial here, the plan is to use simple vectorization over the j-particles. This leads to many SIMD gather loads and a few scatter force writes, but hopefully most of this can be hidden by arithmetic. The kernel function should be templated on real/SIMD type, so there is only a single code for both plain-C and SIMD.
#1 Updated by Erik Lindahl about 2 months ago
Although the NxM layout might not give us extra performance, I think we should consider to just use that to simplify code and organization.
That way we would avoid introducing even more types of neighbourlist layouts, and we could make free energy versions just be another (templated) flavour of the default kernels, called through a new common class interface where we just also also get the free energy - and we would avoid yet another part of the code with extensive use of SIMD in a way that's different from the default kernels.
In any case: I would strongly suggest we postpone this until we have completed the work of creating separate modules for neighborlists, neighboursearching, and shortrange calculation so we don't add even more complexity.