Project

General

Profile

Bug #2380

cycle counter issues with separate PME rank + GPUs

Added by Szilárd Páll 3 months ago. Updated 3 months ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
mdrun
Target version:
-
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Some minor issues discovered with separate PME + GPU case:
- the "Wait PME GPU gather" entry, when originating from separate PME ranks with different thread count than the PP, has incorrect "Num Threads" column;
- it's somewhat confusing that the "Wait PME GPU gather" from separate PME ranks shows up in the main table;
- the "PME mesh" counter seems to be higher than the sum of the measured work on the PME rank -- with GPU launch + wait (not tested with CPU).

History

#1 Updated by Szilárd Páll 3 months ago

  • Subject changed from cycle counter mixup with separate PME ranks with GPUs to cycle counters issues with separate PME ranks + GPUs

#2 Updated by Szilárd Páll 3 months ago

  • Subject changed from cycle counters issues with separate PME ranks + GPUs to cycle counter issues with separate PME rank + GPUs

#3 Updated by Szilárd Páll 3 months ago

Szilárd Páll wrote:

Some minor issues discovered with separate PME + GPU case:
- the "Wait PME GPU gather" entry, when originating from separate PME ranks with different thread count than the PP, has incorrect "Num Threads" column;
- it's somewhat confusing that the "Wait PME GPU gather" from separate PME ranks shows up in the main table;

+ without and additional marker (e.g. there's a "*" marker for PME-only ranks's counts, should that be added here too)?

Additionally, if we end up separating or marking counters, the separate PME rank GPU launch time should not be aggregated with the NB launch time either.

Also available in: Atom PDF