Project

General

Profile

Bug #2921

hwloc test makes invalid assumptions

Added by Mark Abraham 5 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
testing
Target version:
Affected version - extra info:
likely all with hwloc support
Affected version:
Difficulty:
uncategorized
Close

Description

The login node of JUWELS has 2 20-core Xeon sockets, for 80 hardware threads. However two hardware threads unavailable for user processes (confirmed with sysadmin; some container magic is being used). hwloc correctly reports 78 total logical processors and that only one such is available on the cores of the reserved threads.

However our test HardwareTopologyTest.ProcessorSelfConsistency fails. It relies on the assumption that it is appropriate for HardwareTopology::Machine::logicalProcessors to be a std::vector<LogicalProcessors>. This may be true, but it cannot assume that the total number of logical processors is the product of socketsInMachine * coresPerSocket * hwThreadsPerCore.


Related issues

Related to GROMACS - Bug #2880: 2019.1 Multiple errors with AVX512 on testsFeedback wanted

Associated revisions

Revision 2019d8ae (diff)
Added by Mark Abraham 4 months ago

Fix self-consistency tests of hwloc data structures

These relied on assumptions of regularity that have been shown to be
violated in practice. The new tests check that there is a bijective
mapping of logical processors to valid hardware thread descriptors.

Fixes #2921

Change-Id: I31e998b7a2881c05dfb5c7c8b46550489cbdefd6

History

#1 Updated by Mark Abraham 5 months ago

We can work around this by disabling the test if we can't agree on a solution. But the consistency test can be reworked to loop over the logical processors to check that they are mutually distinct and contain reasonable values.

#2 Updated by Mark Abraham 5 months ago

  • Related to Bug #2880: 2019.1 Multiple errors with AVX512 on tests added

#3 Updated by Mark Abraham 5 months ago

  • Status changed from New to Fix uploaded

#4 Updated by Mark Abraham 4 months ago

  • Status changed from Fix uploaded to Resolved

#5 Updated by Paul Bauer 4 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF