double precision support in GPU acceleration
- some GPUs have very low double precision arithmetic throughput (e.g Kepler1 1/24-th of single precision);
- no native double precision atomic operations (can be emulated with CAS, see more here and here).
While 1. would require only warning the user that double precision is extremely slow (and perhaps switching to CPU kernels by default), 2. would need generating the extra set of kernels as well as some performance tweaking.
Most of this is more or less straightforward, but for now not considered of high priority.