Project

General

Profile

Bug #1933

some internal unit tests fail with illegal instruction (mrrc) on armv7hfp in wallcycle_have_counter()

Added by Dominik Mierzejewski over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Affected version - extra info:
bec9c8757e59cae58fc61ed841c0bb73c84079db
Affected version:
Difficulty:
uncategorized
Close

Description

On ARMv7 HFP running Fedora development branch (rawhide), I'm getting illegal instruction when running the internal testsuite (make check).

Here's a gdb session of mdrun-test binary showing it fails on mrrc instruction at line 242 of src/gromacs/timing/cyclecounter.h:

[mockbuild@arm03-packager01 serial]$ LD_LIBRARY_PATH=/builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/serial/lib gdb bin/mdrun-test 
GNU gdb (GDB) Fedora 7.11-63.fc25
[...]
Reading symbols from bin/mdrun-test...done.
(gdb) run
Starting program: /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/serial/bin/mdrun-test 
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.23.90-4.fc25.armv7hl
Cannot parse expression `.L1030 4@r4'.
warning: Probes-based dynamic linker interface failed.
Reverting to original interface.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
[==========] Running 18 tests from 7 test cases.
[----------] Global test environment set-up.
[----------] 1 test from GromppTest
[ RUN      ] GromppTest.EmptyMdpFileWorks

NOTE 1 [file /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/serial/src/programs/mdrun/tests/Testing/Temporary/GromppTest_EmptyMdpFileWorks_input.mdp, line 1]:
  /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/serial/src/programs/mdrun/tests/Testing/Temporary/GromppTest_EmptyMdpFileWorks_input.mdp did not specify a value for the .mdp option "cutoff-scheme". Probably it
  was first intended for use with GROMACS before 4.6. In 4.6, the Verlet
  scheme was introduced, but the group scheme was still the default. The
  default is now the Verlet scheme, so you will observe different behaviour.

NOTE 2 [file /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/serial/src/programs/mdrun/tests/Testing/Temporary/GromppTest_EmptyMdpFileWorks_input.mdp]:
  For a correct single-point energy evaluation with nsteps = 0, use
  continuation = yes to avoid constraining the input coordinates.

Setting the lambda MC random seed to 1286452158
Generated 279 of the 1225 non-bonded parameter combinations
Excluding 2 bonded neighbours molecule type 'Methanol'
Excluding 2 bonded neighbours molecule type 'SOL'
Removing all charge groups because cutoff-scheme=Verlet
Number of degrees of freedom in T-Coupling group rest is 12.00

NOTE 3 [file /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/serial/src/programs/mdrun/tests/Testing/Temporary/GromppTest_EmptyMdpFileWorks_input.mdp]:
  NVE simulation: will use the initial temperature of 1046.791 K for
  determining the Verlet buffer size

Determining Verlet buffer for a tolerance of 0.005 kJ/mol/ps at 1046.79 K
Calculated rlist for 1x1 atom pair-list as 1.061 nm, buffer size 0.061 nm
Set rlist, assuming 4x4 atom pair-list, to 1.056 nm, buffer size 0.056 nm
Note that mdrun will redetermine rlist based on the actual pair-list setup

NOTE 4 [file /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/serial/src/programs/mdrun/tests/Testing/Temporary/GromppTest_EmptyMdpFileWorks_input.mdp]:
  You are using a plain Coulomb cut-off, which might produce artifacts.
  You might want to consider using PME electrostatics.

This run will generate roughly 0 Mb of data

There were 4 notes
[       OK ] GromppTest.EmptyMdpFileWorks (94 ms)
[----------] 1 test from GromppTest (94 ms total)

[----------] 1 test from CompelTest
[ RUN      ] CompelTest.SwapCanRun

NOTE 1 [file /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/programs/mdrun/tests/OctaneSandwich.mdp]:
  The Berendsen thermostat does not generate the correct kinetic energy
  distribution. You might want to consider using the V-rescale thermostat.

Setting the lambda MC random seed to 1145221959
Generated 330891 of the 330891 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5
Generated 330891 of the 330891 1-4 parameter combinations
Excluding 3 bonded neighbours molecule type 'Protein'
turning all bonds into constraints...
Excluding 3 bonded neighbours molecule type 'OCT'
turning all bonds into constraints...
Excluding 1 bonded neighbours molecule type 'NA'
turning all bonds into constraints...
Excluding 1 bonded neighbours molecule type 'CL'
turning all bonds into constraints...
Excluding 3 bonded neighbours molecule type 'Protein'
Excluding 3 bonded neighbours molecule type 'OCT'
Excluding 2 bonded neighbours molecule type 'SOL'
turning all bonds into constraints...
Removing all charge groups because cutoff-scheme=Verlet
Split0 group 'Ch0' contains 83 atoms.
Split1 group 'Ch1' contains 83 atoms.
Solvent group 'SOL' contains 11931 atoms.
Swap group 'NA+' contains 19 atoms.
Swap group 'CL-' contains 19 atoms.
Number of degrees of freedom in T-Coupling group System is 27869.00
Determining Verlet buffer for a tolerance of 0.005 kJ/mol/ps at 300 K
Calculated rlist for 1x1 atom pair-list as 1.314 nm, buffer size 0.314 nm
Set rlist, assuming 4x4 atom pair-list, to 1.260 nm, buffer size 0.260 nm
Note that mdrun will redetermine rlist based on the actual pair-list setup

NOTE 2 [file /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/programs/mdrun/tests/OctaneSandwich.mdp]:
  You are using a plain Coulomb cut-off, which might produce artifacts.
  You might want to consider using PME electrostatics.

This run will generate roughly 1 Mb of data

There were 2 notes

Running on 1 node with total 4 cores, 4 logical cores
Hardware detected:
  CPU info:
    Vendor: ARM
    Brand:  Unknown CPU brand
    SIMD instructions most likely to fit this hardware: ARM_NEON
    SIMD instructions selected at GROMACS compile time: None

  Hardware topology: Basic

Compiled SIMD instructions: None, GROMACS could use ARM_NEON on this machine, which is better.

Reading file /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/serial/src/programs/mdrun/tests/Testing/Temporary/CompelTest_SwapCanRun.tpr, VERSION 2016-dev (single precision)
Using 1 MPI thread
Using 1 OpenMP thread 

Program received signal SIGILL, Illegal instruction.
wallcycle_have_counter ()
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/gromacs/timing/wallcycle.cpp:131
131        return gmx_cycles_have_counter();
(gdb) where
#0  wallcycle_have_counter ()
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/gromacs/timing/wallcycle.cpp:131
#1  0xb69a5328 in wallcycle_init (fplog=0x2a0d08, resetstep=resetstep@entry=-1, cr=cr@entry=0x176648)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/gromacs/timing/wallcycle.cpp:139
#2  0x0006665c in gmx::mdrunner (hw_opt=0xbeffe60c, hw_opt@entry=0xb6ffec88 <__stack_chk_guard>, 
    fplog=<optimized out>, cr=0x176648, cr@entry=0x22b8, nfile=1167448, nfile@entry=33, fnm=fnm@entry=0xbeffe8b8, oenv=
    0x11c858, bVerbose=bVerbose@entry=0, nstglobalcomm=nstglobalcomm@entry=-1, ddxyz=<optimized out>, 
    ddxyz@entry=0xbeffe63c, dd_rank_order=1, npme=<optimized out>, rdd=0, rconstr=0, 
    dddlb_opt=dddlb_opt@entry=0xc025c "auto", dlb_scale=0.800000012, ddcsx=ddcsx@entry=0x0, ddcsy=ddcsy@entry=0x0, 
    ddcsz=ddcsz@entry=0x0, nbpu_opt=0xc025c "auto", nstlist_cmdline=0, nsteps_cmdline=-2, nstepout=100, resetstep=-1, 
    nmultisim=0, repl_ex_nst=0, repl_ex_nex=0, repl_ex_seed=<optimized out>, pforce=-1, cpt_period=15, max_hours=-1, 
    imdport=8888, Flags=4294967295)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/programs/mdrun/runner.cpp:1119
#3  0x0005c2ac in gmx_mdrun (argc=<optimized out>, argc@entry=23, argv=0xbeffebd0)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/programs/mdrun/mdrun.cpp:533
#4  0x00054b28 in gmx::test::SimulationRunner::callMdrun (this=this@entry=0x113cbc, callerRef=...)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/programs/mdrun/tests/moduletest.cpp:282
#5  0x00050bc0 in gmx::test::CompelTest_SwapCanRun_Test::TestBody (this=0x113cb0)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/programs/mdrun/tests/swapcoords.cpp:97
#6  0x000bbfb4 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (
    location=0xcacec "the test body", method=&virtual testing::Test::TestBody(), object=0x113cb0)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/external/gmock-1.7.0/gtest/src/gtest.cc:2078
#7  testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x113cb0, 
    method=&virtual testing::Test::TestBody(), location=0xcacec "the test body")
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/external/gmock-1.7.0/gtest/src/gtest.cc:2114
#8  0x000b23d4 in testing::Test::Run (this=this@entry=0x113cb0)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/external/gmock-1.7.0/gtest/src/gtest.cc:2151
#9  0x000b25d8 in testing::Test::Run (this=0x113cb0)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/external/gmock-1.7.0/gtest/src/gtest.cc:2142
#10 testing::TestInfo::Run (this=0x10bcf8)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/external/gmock-1.7.0/gtest/src/gtest.cc:2326
#11 0x000b275c in testing::TestInfo::Run (this=<optimized out>)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/external/gmock-1.7.0/gtest/src/gtest.cc:2301
#12 testing::TestCase::Run (this=0x10bd90)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/external/gmock-1.7.0/gtest/src/gtest.cc:2444
#13 0x000b2ee0 in testing::TestCase::Run (this=<optimized out>)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/external/gmock-1.7.0/gtest/src/gtest.cc:2430
#14 testing::internal::UnitTestImpl::RunAllTests (this=0x10b680)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/external/gmock-1.7.0/gtest/src/gtest.cc:4315
#15 0x000b34fc in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
    location=0xcae58 "auxiliary test code (environments or event listeners)", method=<optimized out>, 
    object=<optimized out>)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/external/gmock-1.7.0/gtest/src/gtest.cc:2078
#16 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
    location=0xcae58 "auxiliary test code (environments or event listeners)", 
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0xb286c <testing::internal::UnitTestImpl::RunAllTests()>, object=0x10b680)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/external/gmock-1.7.0/gtest/src/gtest.cc:2114
#17 testing::UnitTest::Run (this=<optimized out>)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/external/gmock-1.7.0/gtest/src/gtest.cc:3926
#18 0x0003b554 in RUN_ALL_TESTS ()
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/external/gmock-1.7.0/gtest/include/gtest/gtest.h:2290
#19 main (argc=<optimized out>, argv=<optimized out>)
    at /builddir/build/BUILD/gromacs-bec9c8757e59cae58fc61ed841c0bb73c84079db/src/testutils/unittest_main.cpp:65
(gdb) disass $pc-32,$pc+32
Dump of assembler code from 0xb69a52bc to 0xb69a52fc:
   0xb69a52bc <print_cycles(FILE*, double, char const*, int, int, int, double, double)+436>:    cdpcc    14, 1, cr2, cr1, cr11, {0}
   0xb69a52c0 <print_cycles(FILE*, double, char const*, int, int, int, double, double)+440>:    rsbeq    r3, r1, r8, ror r9
   0xb69a52c4 <print_cycles(FILE*, double, char const*, int, int, int, double, double)+444>:    andeq    r3, r0, r8, lsr #12
   0xb69a52c8 <print_cycles(FILE*, double, char const*, int, int, int, double, double)+448>:    ldrsheq    r2, [r2], #-84    ; 0xffffffac
   0xb69a52cc <print_cycles(FILE*, double, char const*, int, int, int, double, double)+452>:    subseq    r9, r7, r4, asr r8
   0xb69a52d0 <print_cycles(FILE*, double, char const*, int, int, int, double, double)+456>:    subseq    r9, r7, r4, lsr #16
   0xb69a52d4 <print_cycles(FILE*, double, char const*, int, int, int, double, double)+460>:    subseq    r2, r2, r8, lsl #11
   0xb69a52d8 <wallcycle_have_counter()+0>:    push    {r4, r5, lr}
=> 0xb69a52dc <wallcycle_have_counter()+4>:    mrrc    15, 1, r12, r2, cr14
   0xb69a52e0 <wallcycle_have_counter()+8>:    mrrc    15, 1, lr, r3, cr14
   0xb69a52e4 <wallcycle_have_counter()+12>:    orr    r4, r12, lr
   0xb69a52e8 <wallcycle_have_counter()+16>:    mov    r0, #0
   0xb69a52ec <wallcycle_have_counter()+20>:    orr    r12, r0, r4
   0xb69a52f0 <wallcycle_have_counter()+24>:    mov    r1, r2
   0xb69a52f4 <wallcycle_have_counter()+28>:    mov    r5, r3
   0xb69a52f8 <wallcycle_have_counter()+32>:    orr    r2, r12, r0
End of assembler dump.
(gdb) 

Hardware info:

$ cat /proc/cpuinfo
processor    : 0
model name    : ARMv7 Processor rev 0 (v7l)
BogoMIPS    : 2795.11
Features    : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x3
CPU part    : 0xc09
CPU revision    : 0

processor    : 1
model name    : ARMv7 Processor rev 0 (v7l)
BogoMIPS    : 2795.11
Features    : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x3
CPU part    : 0xc09
CPU revision    : 0

processor    : 2
model name    : ARMv7 Processor rev 0 (v7l)
BogoMIPS    : 2795.11
Features    : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x3
CPU part    : 0xc09
CPU revision    : 0

processor    : 3
model name    : ARMv7 Processor rev 0 (v7l)
BogoMIPS    : 2795.11
Features    : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x3
CPU part    : 0xc09
CPU revision    : 0

Hardware    : Highbank
Revision    : 0000
Serial        : 01234567890123456789

Compiler:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/armv7hl-redhat-linux-gnueabi/6.0.0/lto-wrapper
Target: armv7hl-redhat-linux-gnueabi
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --disable-libgcj --with-isl --disable-libmpx --enable-gnu-indirect-function --disable-sjlj-exceptions --with-tune=cortex-a8 --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-abi=aapcs-linux --build=armv7hl-redhat-linux-gnueabi
Thread model: posix
gcc version 6.0.0 20160311 (Red Hat 6.0.0-0.17) (GCC) 

Associated revisions

Revision 0d4ea603 (diff)
Added by Erik Lindahl over 3 years ago

Add detection for ARMv7 cycle counter support

ARMv7 requires special kernel settings to allow cycle
counters to be read. This change adds a cmake setting
to enable/disable counters. On all architectures but ARMv7
it is enabled by default, and on ARMv7 we run a small test
program to see if the can be executed successfully. When
cross-compiling to ARMv7 counters will be disabled, but
either choice can be overridden by setting a value for
GMX_CYCLECOUNTERS in cmake.

Fixes #1933.

Change-Id: I1e217d7a09f84a6bcf4eb5bf4a656d430465c915

History

#1 Updated by Dominik Mierzejewski over 3 years ago

Please note that ARM documentation (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0274b/index.html) says:

4.6         Timing and delays

It is common in IA-32 applications to use the RDTSC instruction (or the serializing RDTSCP variant) to read the system Time-Stamp Counter. This is a 64-bit counter which increases with each processor clock cycle. Using this, it is simple to implement timing delays to a very high resolution. Use of this instruction is not restricted to privileged modes.

The nearest equivalent to this in ARMv7-A processors is the cycle counter included as part of the Performance Monitoring Unit. However, access to this requires execution of privileged instructions and, if it is available to application code at all, is usually provided through an Operating System API.

New ARM processors have generic timers which provide a high-precision counter which can be configured to be accessible in user mode. The Operating System should make available an API for accessing these.

Apparently, you cannot call the mrrc instruction from user mode, only from kernel. It seems possible to enable access from userland, but not without executing some code in kernel mode first. For example: [[https://stackoverflow.com/questions/3247373/how-to-measure-program-execution-time-in-arm-cortex-a8-processor]]. I think you cannot rely on this being enabled by default.

#2 Updated by Dominik Mierzejewski over 3 years ago

It looks like it might be possible to use the perf_even_open syscall, though only on Linux: http://neocontra.blogspot.co.uk/2013/05/user-mode-performance-counters-for.html .

#3 Updated by Erik Lindahl over 3 years ago

  • Status changed from New to Accepted
  • Assignee set to Erik Lindahl

#4 Updated by Gerrit Code Review Bot over 3 years ago

Gerrit received a related patchset '1' for Issue #1933.
Uploader: Erik Lindahl ()
Change-Id: I62cd39469c188deed2b4c6620429bb2cb450dfde
Gerrit URL: https://gerrit.gromacs.org/5770

#5 Updated by Erik Lindahl over 3 years ago

  • Status changed from Accepted to Fix uploaded
  • Target version set to 2016

Hi Dominik,

Unfortunately we cannot use the performance event syscalls, since that will add thousands of cycles of overhead (since it is a system call). I'm well aware that we are not allowed to read the counters by default, but the system we first tried this on (a Jetson TK1) simply returned 0 in this case, rather than throwing the SIGILL.

I have modified the detection to also use a POSIX longjmp to catch the SIGILL condition in the linked patch above. However, since I cannot test that it actually works, would you mind testing it and getting back to us?

#6 Updated by Erik Lindahl over 3 years ago

  • Status changed from Fix uploaded to In Progress

Unfortunately I just realized we might have a worse problem: For the other architectures we just use the checking to decide if the read() call returns anything meaningful, but the actual call cannot be conditional (for performance reasons). This likely means the fix above will just move the SIGILL occurrence until the first time any subroutine calls gmx_cycles_read().

One solution is of course to make it a compile-time option, but that will mean many prebuilt binaries cannot do load balancing CPU-GPU, so that's not ideal either.

#7 Updated by Mark Abraham over 3 years ago

PME load balancing only needs ewcSTEP, so we could consider having a separate implementation of that, so we can use something low res for that on ARM7, e.g. the final suggestion at http://neocontra.blogspot.co.uk/2013/05/user-mode-performance-counters-for.html

#8 Updated by Alexey Shvetsov over 3 years ago

Usage of mrrc and other instructions of this type is highly dependedt on kernel configuration e.g. it needs CONFIG_ARM_ARCH_TIMER to be selected. And not all distros do that (e.g. debian doesnt do it for allwinner devices for example). If this option is not selected then you'll get SIGILL.

#9 Updated by Szilárd Páll over 3 years ago

Alexey Shvetsov wrote:

Usage of mrrc and other instructions of this type is highly dependedt on kernel configuration e.g. it needs CONFIG_ARM_ARCH_TIMER to be selected. And not all distros do that (e.g. debian doesnt do it for allwinner devices for example). If this option is not selected then you'll get SIGILL.

Does that translate to some way to test for support at run- or compile-time? It would be good to fix as aarch64 is becoming a relevant target.

#10 Updated by Mark Abraham over 3 years ago

  • Status changed from In Progress to Blocked, need info

#11 Updated by Erik Lindahl over 3 years ago

  • Status changed from Blocked, need info to In Progress

Unblocking it, since I think I know everything, just haven't had time.

Short story: Arm64 will be fine.

On Arm32, the problem is that we are not necessarily building on the same host as we are running on. Since the cyclecounters are used inline in the most performance-sensitive code we have, there is no way we can make that conditional at runtime.

For now I intend to disable them by default on Arm32, but give the user an option to enable.

#12 Updated by Erik Lindahl over 3 years ago

Update:

Would it be an acceptable compromise to

1) Provide a configuration option where it can be turned on/off
2) Test it when building natively. If it works, we enable it.
3) If we cross-compile, by default we disable it for Arm, but enable it elsewhere

#13 Updated by Erik Lindahl over 3 years ago

Clarification:

3) If we cross-compile, by default we disable it for armv7, but enable it for Armv8 and elsewhere.

#14 Updated by Mark Abraham over 3 years ago

Seems reasonable. Someone cross-compiling for ARM8 is going to need to manage it explicitly, unless we can observe they are cross compiling for some particular version.

Tips - you probably know that CMake has a CROSS_COMPILING boolean already. This option sounds like it should be made advanced. Re-running cmake should be quiet.

#15 Updated by Erik Lindahl over 3 years ago

Cross-compiling for 64-bit ARMv8 should be fine (as far as I know those cycle counters are always enabled), but for ARMv7 it would have to be enabled explicitly.

#16 Updated by Gerrit Code Review Bot over 3 years ago

Gerrit received a related patchset '1' for Issue #1933.
Uploader: Erik Lindahl ()
Change-Id: I1e217d7a09f84a6bcf4eb5bf4a656d430465c915
Gerrit URL: https://gerrit.gromacs.org/6018

#17 Updated by Erik Lindahl over 3 years ago

  • Status changed from In Progress to Fix uploaded

#18 Updated by Erik Lindahl over 3 years ago

  • Status changed from Fix uploaded to Resolved

#19 Updated by Erik Lindahl over 3 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF