Project

General

Profile

Task #2048

C++11: CUDA dependency on general headers

Added by Roland Schulz about 3 years ago. Updated about 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
core library
Target version:
-
Difficulty:
uncategorized
Close

Description

Based on my understanding from the 8/31 call:
1) All non-kernel code is encouraged to use modern C++11 to write as easy to understand code as possible
2) It is up to the kernel code whether it wants to use C/C++98/C++11. Kernel/host code which doesn't want or can't use C++11 needs to use a interface layer and shouldn't use headers such as logger.h directly

This means that either:
a) CUDA compilation has to always be with C++11 (Is only unofficial in 6.5. Requires 8.0 for ICC/xlC.)
b) CUDA code should not include any of the general headers and access all functions/data through a abstraction layer (if that is written in C it could be shared with OpenCL for which this is needed anyhow)


Related issues

Related to GROMACS - Task #2053: refine notation in GPU codeNew
Related to GROMACS - Task #2454: OpenCL infrastructure improvementsClosed

History

#1 Updated by Mark Abraham about 3 years ago

Agree 1 & 2.

b seems quite straightforward, and roughly speaking is what we already have. Note that any layer doesn't need to be literally "written in C" but for now the data layout needs to match what C would do (e.g. for OpenCL to work). The worst-case scenario would be things like

/* some header file included by kernel, host and high-level code */
struct nbnxn_sci_t  { /* POD things, like currently */ };
struct nbnxn_cj4_t  { /* POD things, like currently */ };
struct nbnxn_excl_t { /* POD things, like currently */ };

struct SimpleShortRangeStruct
{
    int                     nsci;        /* The number of i-super-clusters in the list */
    nbnxn_sci_t            *sci;         /* The i-super-cluster list                 */
    int                     ncj4;        /* The total number of 4*j clusters         */
    nbnxn_cj4_t            *cj4;         /* The 4*j cluster list, size ncj4          */
    int                     nexcl;       /* The count for excl                       */
    nbnxn_excl_t           *excl;        /* Atom interaction bits (non-exclusions)   */
};

/* call from sim_util.cpp */
    nbnxn_gpu_init_pairlist(nbv->gpu_nbv,
                            packSimpleStruct(fancyNeighborList),
                            SimpleShortRangeStruct
                             { list.superClusters.size(), list.superClusters.data(),
                               list.jClusters.size(),     list.jClusters.data(),
                               list.exclusionBits.size(), list.exclusionBits.data() };
                            eintLocal);
    // and of course this function is still implemented like the current code
...

This puts a design constraint on the search code that it produces something readily convertible to a flat C-array, but that's what we do now and I see no reason that would need to change. The corresponding CPU kernel code can have the size() and data() methods called a bit closer to the kernels (but perhaps not in the kernels, since we probably don't want to be compiling #include <vector> in hundreds of kernels). That is, even if it's legal to use C++11 constructs in the kernel, there can be practical reasons to avoid doing so, depending how people feel about the trade-offs of developer time vs compilation time vs correctness vs run-time speed.

A common downside of lots of approaches is that in current compiler implementations, fields of structs lose things like alignment or restrict annotations.

#2 Updated by Roland Schulz about 3 years ago

  • Description updated (diff)

#3 Updated by Mark Abraham over 1 year ago

  • Related to Task #2053: refine notation in GPU code added

#4 Updated by Mark Abraham over 1 year ago

  • Related to Task #2454: OpenCL infrastructure improvements added

Also available in: Atom PDF