Task #1444

break up the nbnxn_cuda module into multiple compilation units

Added by Szilárd Páll over 6 years ago. Updated over 4 years ago.

core library
Target version:


As now we have 120 kernels compiled for up to four different target architectures, the nbnxn_cuda module takes a very long time (1.5-2 min on a fast Intel CPU) to build and can become the bottleneck during compilation.

For this we may need to introduce intermediate wrapper functions because AFAIK the host-side call and kernel need to be in the same compilation unit (need to double-check).

Associated revisions

Revision 61db73ad (diff)
Added by Szilárd Páll over 4 years ago

split NBNXN CUDA kernels into four compilation units

The CUDA nonbonded kernels are no longer included into,
but are built in four different compilation units (w/wo energy, w/wo
pruning) when this is supported/possible; since we only support CUDA

=v5.0, the condition is: that CC >=3.0 devices have to be targeted.

Note that with CC 2.x devices all current CUDA compilers including 7.0
generate incorrect kernel code (hence the criterion above).

Switching back to using single compilation unit happens automatically
whenever nvcc-flags are auto-generated (as {sm,compute}_20 is added
by default).
Switching manually can be done using the

Fixes #1444

Change-Id: If4eeaa5b58a35c5cd59babd20ef1179c7f27782e


#1 Updated by Erik Lindahl about 6 years ago

  • Target version changed from 5.0 to 5.x

#2 Updated by Szilárd Páll about 6 years ago

  • Status changed from New to In Progress

Here is a brief update on this. I won't bore you with technical details, but I want to give an update so that if (miraculously) somebody is interested in helping, there is a bit more info here.

I've had the implementation ready for quite some time (it's in Gerrit as draft), but unfortunately, when the GPU module is split into multiple compilation units, kernel launches fail for no good reason. We have been investigating this with Jiri Kraus (NVIDIA), but unfortunately there is still no solution to the problem. I'm afraid this won't make it into 5.0 and we will have to deal with long compile times.

#3 Updated by Gerrit Code Review Bot about 5 years ago

Gerrit received a related patchset '19' for Issue #1444.
Uploader: Szilárd Páll ()
Change-Id: If4eeaa5b58a35c5cd59babd20ef1179c7f27782e
Gerrit URL:

#4 Updated by Szilárd Páll over 4 years ago

  • Status changed from In Progress to Resolved

#5 Updated by Szilárd Páll over 4 years ago

  • Status changed from Resolved to Closed
  • Target version changed from 5.x to 2016

Also available in: Atom PDF