remove the use of nbat->alloc/free pointers
Tto use page locked memory with CUDA, the
nbat->alloc/free function pointers are used. With the required CUDA version bumped to 4.0, we can now (unconditionally) switch to page-locking memory using
cudaHostRegister() instead of calling the CUDA API to allocate.
The solution should involve page-locking all host memory used in transfers to or from a GPU (when using GPUs).
The use of the write-combined flag could be considered again.