In https://gerrit.gromacs.org/#/c/7837/, TODOs were noted that we should consider
- lazy pre-compilation of FFT kernels for PME running on OpenCL
- thread-safe RAII-style management of (at least) the underlying clfft library setup and tear down
Added the bundled clFFT into OpenCL builds
Used an object library, since we have no need of a real library, to
have or to install, whether shared or static. Checked for the
availability of dynamic loading, and made it available portably to
Clfft initialization class is added and used in mdrunner to
initialize/tear down clFFT library resources in a thread-safe
manner, and only on ranks that require such setup. Noted TODOs
for future work.
Noted a useful style for explicit listing of source files.
Move initialization of clFFT
Gave ClfftInitializer the responsibility for mutual exclusion, which
means the initialization is now convenient to do alongside other
PME-on-GPU initialization tasks. This simplifies the code.
Removed mention of lazy initialization, which was not implemented at
#1 Updated by Mark Abraham about 2 years ago
The compilation of such kernels do depend on the FFT grid size, which may or may not be known until after the PME module is set up (because a user may have used
fourierspacing), and might also be subject to auto-tuning. Not needing to re-compile if auto-tuning returns to a previous grid size would be an advantage, but that's a different consideration from pre-compilation.
#2 Updated by Aleksei Iupinov about 2 years ago
clFFT clearly has some mentions of caching in its code. I would think that properly storing all grid-related data of PME (such as FFT plan instances) in a map, using grid dimensions as a key, instead of deleting all the old stuff on each reinit, would already achieve this. One would still have to watch out for resource exhaustion, of course, and try to delete old stuff if really needed.