We're collaborating with http://streamcomputing.eu/ to accelerate mdrun using OpenCL. The idea here is to permit GROMACS acceleration on a wider range of hardware, provide a flexible framework for future devices that may appear, and give streamcomputing.eu a high-impact project they can use to demonstrate their abilities without NDA restrictions.
We are continuing to work with Nvidia to provide high-quality CUDA-based support for their hardware, and adding native support for Intel's accelerators will continue.
The plan is to demonstrate the first stage(s) at SC'14 in November, and do a special OpenCL feature release of GROMACS at that time. Work is ongoing at streamcomputing.eu's github (https://github.com/StreamComputing/gromacs/tree/asc). Anca Hamuraru is the technical lead for the project, with Vincent Hindriksen and Dimitris Karkoulis also contributing.
In the long term, keeping the OpenCL port working correctly can be assured via the GROMACS Jenkins CI testing, but the wider GROMACS community will need to contribute optimizations for devices of interest. The port will be merged into the GROMACS master branch periodically, and be part of subsequent releases under the usual GROMACS license (LGPL v2.1).
Stage 1 is to port to Nvidia's OpenCL 1.1 compiler + devices, so we can demonstrate correctness of the infrastructure, and where we can re-use some of the existing design for performance on Nvidia hardware. Not all Nvidia hardware features are supported by their OpenCL compiler, so this will likely be more of a learning and preparatory exercise than performance achievement.
In subsequent stages, support for Intel CPUs and AMD GPUs is planned. (We might try to interest the AMD-based Top500 machine at KAUST in a demonstration calculation, as KAUST does have GROMACS users already.) Perhaps porting the whole PME stack (spread-3DFFT-gather) will be an option, but the current implementation of all of that is CPU only, and the current hardware makes any such implementation only interesting on a single node.
I suggest we use this issue for high-level discussion and planning. Redmine supports linking issues in a parent-child way, so if there's a issue needing discussion, please decide whether it should go here or be a child (technical) issue.
#2 Updated by Mark Abraham over 5 years ago
We're in the final stages of preparing code for upload to Gerrit for Gromacs review. This stage will support OpenCl 1.1 on AMD (and Nvidia) devices. There are no plans to support a combined CUDA+OpenCL build (complex to code, complex to balance load, probably limited market). Future work post-5.1 is planned, e.g. OpenCL 2.0.My current plan is to separate development into two patches:
- one patch that relaxes the current Gromacs assumption that "CUDA" and "GPU" are synonymous, and sets up a generic #include interface that has three particular mutually-exclusive implementations (CUDA, OpenCL, neither). (In the longer term, that might make more sense as a set of classes, but that wasn't reasonable when the project started.) I might roll in some relaxation of the assumption that only short-ranged kernels will run on the GPU, because that might be changing soon also.
- then another patch that introduces the OpenCL implementation of the interface, plus supporting machinery and docs
#3 Updated by Mark Abraham over 5 years ago
- Target version changed from 5.x to 5.1
Patches are now live in Gerrit, see https://gerrit.gromacs.org/#/q/status:open+project:gromacs+branch:master+topic:opencl