Project

General

Profile

Feature #1953

use more regular polymorphism for GPU code

Added by Mark Abraham over 3 years ago. Updated over 3 years ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
mdrun
Target version:
Difficulty:
uncategorized
Close

Description

Recent changes to GPU profiling interfaces have highlighted that home-grown macro APIs are easy to misuse, and require good compilation coverage to avoid wasting develoer time. Having an OpenCL config will help, but another thing that would help is to move towards using real polymorphism, and let every compiler compile all three "GPU" implementations on every configuration.

This particular case seems quite simple, because we only ever support compiling for one flavour of GPU at a time. We make a class gmx::IGpuProfiler with the obvious handful of pure virtual methods, which will have three concrete (perhaps header-only) implementations that defer the function call to real CUDA/OpenCL/no code, respectively. So long as whoever owns the gpuProfiler object calls its methods through the actual type (which we manage via defines in config.h like other configurability), rather than a pointer to either base or derived class, then devirtualization is trivial. The compiler knows the concrete type, and that no other type is possible - see http://hubicka.blogspot.se/2014/01/devirtualization-in-c-part-1.html.

This way, the compiler compiles the abstract base class, and all three derived classes (checking all of them for syntax, style and Doxygen), but only one of them per build configuration is actually called. So those calls compile down to the same direct function call we have now. AFAIK the linker doesn't care when some code that can't be called calls functions that aren't defined anywhere in this build configuration. One still can't catch all problems, but e.g. adding a new parameter to one of the virtual methods for use with CUDA, and not updating the OpenCL code will lead to at least an unused-parameter warning on all build types, not just an OpenCL one. And we avoid developers having to write, learn, use and mis-use unfamiliar one-off constructs like CUDA_FUNC_TERM.

The only catch I have thought of is that e.g. the compiler for a CUDA build has to see the function calls to OpenCL code, which have to be declared in some header, and that header has to compile even though #include <CL/cl.h> etc. generally shouldn't/won't refer to a real file. So we might need some fakery which e.g. we already do so that MPI_Comm is always a valid type, even in a no-MPI build.

I think the same goes for the rest of the GPU code - we just need the caller to know the actual derived type in this build config, which seems straightforward. Obviously the CPU-only code is completely unaffected by any of this, because the conditionals we do every MD step for whether this is a GPU run have been run before we make any call to a GPU API.

History

#1 Updated by Teemu Murtola over 3 years ago

Mark Abraham wrote:

So long as whoever owns the gpuProfiler object calls its methods through the actual type (which we manage via defines in config.h like other configurability), rather than a pointer to either base or derived class, then devirtualization is trivial.

"Trivial" is a strong statement. To actually make this work even theoretically, you either need C++11 final (which is problematic if the code is ever visible to CUDA, until we bump up the required version somewhat more), or you can never pass around any references or pointers, just copies of the actual objects.

This way, the compiler compiles the abstract base class, and all three derived classes (checking all of them for syntax, style and Doxygen), but only one of them per build configuration is actually called. So those calls compile down to the same direct function call we have now. AFAIK the linker doesn't care when some code that can't be called calls functions that aren't defined anywhere in this build configuration.

This is not true. Everything that the linker sees needs to link properly. The only possible exception is with static linking with .a files (and even with .a files, the linker only drops complete object files if none of the symbols defined there are referenced; I don't know whether this happens before or after actually doing something else with the contents).

The only catch I have thought of is that e.g. the compiler for a CUDA build has to see the function calls to OpenCL code, which have to be declared in some header, and that header has to compile even though #include <CL/cl.h> etc. generally shouldn't/won't refer to a real file. So we might need some fakery which e.g. we already do so that MPI_Comm is always a valid type, even in a no-MPI build.

I think a realistic way of doing this is to have the functions with #ifdef'd contents at the point where they need to call into CUDA/OpenCL code. Whether this is in a header or a source file depends a bit how we want to organize things, but it should not be difficult to keep any CUDA/OpenCL specific types outside these common headers.

#2 Updated by Mark Abraham over 3 years ago

Teemu Murtola wrote:

Mark Abraham wrote:

So long as whoever owns the gpuProfiler object calls its methods through the actual type (which we manage via defines in config.h like other configurability), rather than a pointer to either base or derived class, then devirtualization is trivial.

"Trivial" is a strong statement. To actually make this work even theoretically, you either need C++11 final (which is problematic if the code is ever visible to CUDA, until we bump up the required version somewhat more), or you can never pass around any references or pointers, just copies of the actual objects.

Calling a method through a concrete type uses a known vtable pointer, so a known method, as discussed in the article I linked. As noted there, final is indeed useful for declaring that the vtable pointer is known when calling through a pointer, but that isn't the case to which I was referring. For this kind of usage (managing access to hardware resources), it might even be desirable for there to be no way for a non-owner to get a handle to it. Why shouldn't an NbnxnManager keep sole responsibility for being able to call methods of the GpuProfiler with which it was configured? (And even if not, a handful of indirect function calls per MD step are a small delta on the few percent of the run time that we spend running single-threaded highly-branched high-level control code.)

This way, the compiler compiles the abstract base class, and all three derived classes (checking all of them for syntax, style and Doxygen), but only one of them per build configuration is actually called. So those calls compile down to the same direct function call we have now. AFAIK the linker doesn't care when some code that can't be called calls functions that aren't defined anywhere in this build configuration.

This is not true. Everything that the linker sees needs to link properly. The only possible exception is with static linking with .a files (and even with .a files, the linker only drops complete object files if none of the symbols defined there are referenced; I don't know whether this happens before or after actually doing something else with the contents).

OK, thanks. Using ifdef inside the function bodies is also an option.

The only catch I have thought of is that e.g. the compiler for a CUDA build has to see the function calls to OpenCL code, which have to be declared in some header, and that header has to compile even though #include <CL/cl.h> etc. generally shouldn't/won't refer to a real file. So we might need some fakery which e.g. we already do so that MPI_Comm is always a valid type, even in a no-MPI build.

I think a realistic way of doing this is to have the functions with #ifdef'd contents at the point where they need to call into CUDA/OpenCL code. Whether this is in a header or a source file depends a bit how we want to organize things, but it should not be difficult to keep any CUDA/OpenCL specific types outside these common headers.

Yeah. Faking a couple of types isn't too bad if we have to, but we don't want functions that are conditionally-defined stubs (which are problematic now).

Also available in: Atom PDF