Project

General

Profile

Task #2053

refine notation in GPU code

Added by Szilárd Páll about 3 years ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
Difficulty:
simple
Close

Description

It is often beneficial to know the memory space a pointer points to without having to inspect the declaration (or when a pointer is passed to a function, the original declaration). This applies both to kernel and host-device interface code. In the former, I propose allowing easy identification of:
- shared, global, and constant memory pointers in CUDA (and equivalents in OpenCL) using prefixes (e.g. s_, g_, c_), no prefix for register space;
- host and device pointers prefixed for clarity when the two are mixed in interface code


Related issues

Related to GROMACS - Feature #2054: PME on GPUAccepted
Related to GROMACS - Task #2048: C++11: CUDA dependency on general headersNew

Associated revisions

Revision 56067ac8 (diff)
Added by Artem Zhmurov 8 months ago

GPU naming conventions

In GPU programing, it is convenient to indicate what memory space
the pointer points to. This is often done by adding prefixes to the
pointers, which is now indicated in the developers manual.

Refs #2053.

Change-Id: Id39ad0b9c5876e4362fa4e261d0c011125dc380a

History

#1 Updated by Aleksei Iupinov about 3 years ago

I like the idea; the only confusing thing would be conditional compilation. Currently I have in my code:

#if PME_GPU_PARALLEL_SPLINE
__shared__
#endif
float data[dataSize * order];

What would I do with that in general case?
Granted, this is already slightly confusing code.

#2 Updated by Szilárd Páll about 3 years ago

Good point, I don't think there is a good way to solve that issue without introducing branching at the point where the pointer is used.

#3 Updated by Mark Abraham about 3 years ago

Szilárd Páll wrote:

It is often beneficial to know the memory space a pointer points to without having to inspect the declaration (or when a pointer is passed to a function, the original declaration). This applies both to kernel and host-device interface code. In the former, I propose allowing easy identification of:
- shared, global, and constant memory pointers in CUDA (and equivalents in OpenCL) using prefixes (e.g. s_, g_, c_), no prefix for register space;
- host and device pointers prefixed for clarity when the two are mixed in interface code

Sounds like a good approach. Note that we already have some conventions at http://jenkins.gromacs.org/job/Documentation_Nightly_master/javadoc/dev-manual/naming.html#id1 and these would clash somewhat with how we already use s_ and g_. Even though the words are the same, the meaning differs and I think this has the potential for creating confusion. How about simply sharedData, globalData, constantData for an identifier that is currently called data?

Aleksei Iupinov wrote:

I like the idea; the only confusing thing would be conditional compilation. Currently I have in my code:

#if PME_GPU_PARALLEL_SPLINE
__shared__
#endif
float data[dataSize * order];

Off-topic, but if you need to do this kind of thing a lot, see src/gromacs/utility/basedefinitions.h for examples of how you can define gmx_shared in some PME header, so that the places where you use it have less clutter. But mostly that's useful for things that have to change when configuration does (e.g. this compiler needs this keyword variation that another will reject).

What would I do with that in general case?

I suggest using
@#if PME_GPU_PARALLEL_SPLINE
typedef shared float SharedFloat;
#else
typedef float SharedFloat;
#endif

//...

SharedFloat sharedData[dataSize * order];
@

If you're reading/writing code that is intended to work in both cases regardless of the type of an object, then its notional type must be the union of the constraints on the two types, so give that a name. You still have the option to specialize upon PME_GPU_PARALLEL_SPLINE if you need to.

#4 Updated by Aleksei Iupinov about 3 years ago

Sounds like a good approach. Note that we already have some conventions at http://jenkins.gromacs.org/job/Documentation_Nightly_master/javadoc/dev-manual/naming.html#id1 and these would clash somewhat with how we already use s_ and g_. Even though the words are the same, the meaning differs and I think this has the potential for creating confusion.

Alright, adding "m":
-- gm_, sm_, cm_ in the kernels
-- h_, d_ in the host GPU code

How about simply sharedData, globalData, constantData for an identifier that is currently called data?

This also works:
-- global, shared, constant in the kernels
-- host, device in the host GPU code
Really have no preference as my code is stuck between the two.
I think the NB code uses h_ and d_ but not the in-kernel prefixes.
How about we vote :-)

#5 Updated by Szilárd Páll about 3 years ago

-- gm_, sm_, cm_ in the kernels
-- h_, d_ in the host GPU code

I should have realized the clash with e.g. "c_" already being used for constants. Adding "m" sounds like a reasonable trade-off.

How about simply sharedData, globalData, constantData for an identifier that is currently called data?

I'd prefer not calling it "data" as that's is not really correct. The prefix is meant to refer to a memory space, so it's not the data pointed to by the pointer that's global or constant, but the memory space where the data resides.

Secondly, adding more than a few character long prefixes will, in my opinion make things worse not better for the code readability. I find kernels hard to read when many statements end up 100-150 characters long, algorithms and data flow end becomes much harder to follow. Additionally, in such kernel, especially the inner loops, it's really not the casual reader's perspective that I'd cater for, and I think it's more important to have compact, readable, but of course reasonably self-documenting code.

I'm not sure where the balance should be, but I'd rather keep kernel code compact.

One aspect that we have yet to factor in is OpenCL compatibility of the naming scheme. Unfortunately, due to naming differences, we'd have to compromise one way or another. For now, the only clash is shared vs local memory.

#6 Updated by Aleksei Iupinov about 3 years ago

As I'm not familiar with it, what are the OpenCL naming peculiarities?

#7 Updated by Szilárd Páll almost 3 years ago

Aleksei Iupinov wrote:

As I'm not familiar with it, what are the OpenCL naming peculiarities?

The differences are summarized on slide 30 of this talk

Most notable example is that what's what's called "shared memory" in CUDA is called "local memory" in OpenCL. (Both names make sense, but the represent a different pov).

Not sure if it's easy to reconcile these, but we need to make a decision asap and adapt new code accordingly.

#8 Updated by Aleksei Iupinov almost 3 years ago

Oh well, these do seem incompatible. I would suggest sticking to the already suggested CUDA-style prefixes in all the code, even OpenCL (for easier code comparison/portability). No strong reasoning.

#9 Updated by Aleksei Iupinov over 2 years ago

#10 Updated by Aleksei Iupinov almost 2 years ago

  • Target version changed from 2018 to future

#11 Updated by Mark Abraham over 1 year ago

  • Related to Task #2048: C++11: CUDA dependency on general headers added

#12 Updated by Szilárd Páll 8 months ago

  • Target version changed from future to 2020

This policy has unfortunately been ignored during the bonded GPU change and it is not being followed by current developments either. We need to make sure everyone is aware of it.

#13 Updated by Mark Abraham 8 months ago

Szilárd Páll wrote:

This policy has unfortunately been ignored during the bonded GPU change and it is not being followed by current developments either. We need to make sure everyone is aware of it.

Indeed. The "m" suffixes and CUDA-centric naming seem reasonable (preferably documented). Can you upload a change to the coding style, please?

#14 Updated by Szilárd Páll 8 months ago

Mark Abraham wrote:

Szilárd Páll wrote:

This policy has unfortunately been ignored during the bonded GPU change and it is not being followed by current developments either. We need to make sure everyone is aware of it.

Indeed. The "m" suffixes and CUDA-centric naming seem reasonable (preferably documented). Can you upload a change to the coding style, please?

yes, working on it!

#15 Updated by Gerrit Code Review Bot 8 months ago

Gerrit received a related patchset '1' for Issue #2053.
Uploader: Artem Zhmurov ()
Change-Id: gromacs~master~Id39ad0b9c5876e4362fa4e261d0c011125dc380a
Gerrit URL: https://gerrit.gromacs.org/9334

#16 Updated by Artem Zhmurov 7 months ago

Suggest naming for common indexing:

// Thread Index in Block
int tib = static_cast<int>(threadIdx.x);
// Total number of threads in the block
int blockSize = static_cast<int>(blockDim.x);
// Thread Index on Device
int tid = static_cast<int>(blockIdx.x*blockDim.x + threadIdx.x);

In case of multidimensional grid, one can use tibi or tibx, whatever is appropriate for the case.

#17 Updated by Artem Zhmurov 5 months ago

The reallocateDeviceBuffer uses two numbers too keep track of allocated memory:
1. What is the number of the active elements in the array.
2. What size was allocated.
Suggest the following naming convention for these variables for d_<bufferName>_:
numBufferNameCurrent_
numBufferNameAlloc_

#18 Updated by Artem Zhmurov 5 months ago

Update for the naming convention for these variables for d_<bufferName>_:
numBufferName_
numBufferNameAlloc_

#19 Updated by Mark Abraham 2 months ago

Artem Zhmurov wrote:

Update for the naming convention for these variables for d_<bufferName>_:
numBufferName_
numBufferNameAlloc_

Since we already need devs to be familiar with std::vector's naming of size vs capacity, that would be useful to adopt here also.

#20 Updated by Artem Zhmurov 2 months ago

Mark Abraham wrote:

Artem Zhmurov wrote:

Update for the naming convention for these variables for d_<bufferName>_:
numBufferName_
numBufferNameAlloc_

Since we already need devs to be familiar with std::vector's naming of size vs capacity, that would be useful to adopt here also.

Good point. Suggest:
h_bufferName_
d_bufferName_
bufferNameSize_
bufferNameCapacity_

Also available in: Atom PDF