Project

General

Profile

Bug #1725

Failing unit tests on Power7 (big endian)

Added by Erik Lindahl over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
core library
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

The MdrunTests hang (apparently in the TNG routines), and the RandomUnitTests fail when using the IBM compiler, but not GCC-4.4 or 4.9.

This could be a bug in the IBM compiler, but since Power7 is likely the only big-endian platform we've tested on it could also be signs of more subtle bugs in our code (which will eventually hit us elsewhere), so we should try to investigate this prior to the final 5.1 release.

Associated revisions

Revision d47c56bf (diff)
Added by Erik Lindahl over 4 years ago

Fix out-of-range integer literals for threefry

The 64-bit literals need to have LL suffixes, or
xlc will warn about out-of-range literals and generate
code that fails the unit test. There was also a
single 64-bit literal in the implementation. Other
compilers appear to have handled this correctly,
so it is unlikely to have caused any user bug.

Refs #1725.

Change-Id: I575b6c12ace14973d4fae1edc81836686a2ff9d5

Revision c42dfb39 (diff)
Added by Magnus Lundborg over 4 years ago

TNG version 1.7.6

Fixed bug with reading and writing TNG compressed blocks
on big endian platforms.
Fixed compiler warning about potentially uninitialized values.

Fixes #1725.

Change-Id: I92062492116fa3044c3d1d0d0f920cfe42d21cbe

History

#1 Updated by Mark Abraham over 4 years ago

xlc defaults to -O in Release mode with CMake, and my experience of BG xlc 12.1 on less than -O3 is that it's not great. So if you were compiling with -O, the results with -O3 would be interesting.

#2 Updated by Erik Lindahl over 4 years ago

Yeah, it defaults to O, but that should still produce correct code, right? I don't think I've seen any compilers that do a worse job of producing correct results when they don't have to optimize :)

#3 Updated by Mark Abraham over 4 years ago

ahahahaha don't try OpenMP on -O0

#4 Updated by Erik Lindahl over 4 years ago

;-)

Let me modify that: If there are fragile parts in our code that triggered this we should fix it, otherwise it's a platform/compiler combination I don't care much about anymore.

The more I see of these things, the more I'm starting to think that we should not just discontinue support for anything but gcc/clang/icc, but also actively prevent people from using anything else...

#5 Updated by Mark Abraham over 4 years ago

Erik Lindahl wrote:

;-)

Let me modify that: If there are fragile parts in our code that triggered this we should fix it, otherwise it's a platform/compiler combination I don't care much about anymore.

Jaja - my intended point was merely that if -O3 shows a problem then I am confident that we have one.

The more I see of these things, the more I'm starting to think that we should not just discontinue support for anything but gcc/clang/icc, but also actively prevent people from using anything else...

:-)

#6 Updated by Gerrit Code Review Bot over 4 years ago

Gerrit received a related patchset '2' for Issue #1725.
Uploader: Erik Lindahl ()
Change-Id: I575b6c12ace14973d4fae1edc81836686a2ff9d5
Gerrit URL: https://gerrit.gromacs.org/4696

#7 Updated by Erik Lindahl over 4 years ago

The failed RNG unit test was just a matter of the reference data using 64-bit integer literals without LL suffix, and has been fixed in the above patch. The mdrun tests still hang due to something related to TNG.

#8 Updated by Erik Lindahl over 4 years ago

  • Subject changed from Failing unit tests with xlC-12.1 on Power7 (big endian) to Failing unit tests on Power7 (big endian)

The mdrun failed tests is due to TNG, and also happens with GCC. My guess is that TNG might be buggy for big endian platforms.

#9 Updated by Erik Lindahl over 4 years ago

  • Status changed from New to In Progress

The remaining error is caused by TNG always writing the TNG-compressed data blocks in little-endian form to the file (as part of the compression routine), which means the endian-swapping routines will just swap it a second time. In other words, the TNG library can never have worked on a big-endian host. We need to decide whether to change and write the data in native form, or just alter the check so we don't double-swap it when reading on big endian hosts.

#10 Updated by Gerrit Code Review Bot over 4 years ago

Gerrit received a related patchset '2' for Issue #1725.
Uploader: Magnus Lundborg ()
Change-Id: I92062492116fa3044c3d1d0d0f920cfe42d21cbe
Gerrit URL: https://gerrit.gromacs.org/4706

#11 Updated by Erik Lindahl over 4 years ago

  • Status changed from In Progress to Fix uploaded

Note that we should wait for https://gerrit.gromacs.org/4696 to be committed before closing this.

#12 Updated by Erik Lindahl over 4 years ago

  • Status changed from Fix uploaded to Resolved

#13 Updated by Erik Lindahl over 4 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF