Project

General

Profile

Bug #1542

two unit tests fail on 32-bit ARM

Added by Szilárd Páll about 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Category:
core library
Target version:
Affected version - extra info:
5.0.1-dev-20140629-ed48a3c
Affected version:
Difficulty:
uncategorized
Close

Description

On a 32-bit ARM (Tegra 3) platform with up-to-date Uubntu 12.04.4 LTS the LegacyToolsTests and MdrunTests unit tests fail with segfault.

Attached are the CMake cache, make check output, and mdrun -version output. The configure command used:

CC=gcc-4.7 CXX=g++-4.7 cmake ../ -DGMX_GPU=OFF -DGMX_BUILD_OWN_FFTW=ON

Note that the version is flagged "dirty" because the Random123 library does not oficially support ARM and the #error at src/external/Random123-1.08/include/Random123/features/gccfeatures.h:38 preventing compilation needs to be disabled.

gmx-version.out (2.75 KB) Szilárd Páll, 06/30/2014 07:51 PM

CMakeCache.txt View (46.2 KB) Szilárd Páll, 06/30/2014 07:51 PM

make-check.log View (30.7 KB) Szilárd Páll, 06/30/2014 07:51 PM


Related issues

Related to GROMACS - Task #1545: test Random123 on unsupported platforms Closed 07/01/2014
Related to GROMACS - Bug #1559: Writing TNG files fails on Xeon Phi Closed 07/10/2014
Duplicated by GROMACS - Bug #1546: Legacy tools tests failed on armv7a with SIGBUS Closed 07/02/2014
Duplicated by GROMACS - Bug #1547: MDRUN tools tests failed on armv7a with SIGBUS Closed 07/02/2014

Associated revisions

Revision 898166c3 (diff)
Added by Magnus Lundborg almost 3 years ago

Fixed TNG memory alignment problem and reset pointers.

This is a temporary fix to the memory alignment problems on some
platforms. In the main TNG repository the whole I/O system is
rewritten to address this problem. This fix is to avoid the
problems without making too large changes.

There are also some pointers that were not reset after memory
was freed. This is also fixed in here.

This commit does not correspond to any commit in the TNG
repository.

Fixes #1542, #1546, #1547 and #1559.

Change-Id: I90a6406cccbc43fd57d4423c2b661019cf7763e8

History

#1 Updated by Szilárd Páll about 3 years ago

#2 Updated by Roland Schulz about 3 years ago

Can you add a stacktrace?

#3 Updated by Szilárd Páll about 3 years ago

  • Related to Task #1545: test Random123 on unsupported platforms added

#4 Updated by Szilárd Páll about 3 years ago

Both tests fail with "Bus error".

With the LegacyToolsTests the crash happens at the 6th test:

[----------] 5 tests from NoFatalErrorWhenWritingFrom/TrjconvWithIndexGroupSubset
[ RUN      ] NoFatalErrorWhenWritingFrom/TrjconvWithIndexGroupSubset.WithDifferentInputFormats/0
Will write tng: Trajectory file (tng format)
Select group for output
Group     0 (         System) has     6 elements
Group     1 (FirstWaterMolecule) has     3 elements
Group     2 (SecondWaterMolecule) has     3 elements
Select a group: Selected 2: 'SecondWaterMolecule'
Bus errorrame       1 time    0.000    ->  frame      1 time    0.000      

The MdrunTest fails right after the following output:

starting mdrun 'spc-and-methanol'
6 steps,      0.0 ps.

I suspect that both of these may be related to TNG, but let me get some backtraces to confirm.

#5 Updated by Szilárd Páll about 3 years ago

Below is the bactrace for mdrun-test. This indicates that the issue is in the TNG compression function and based on the SIGBUS, my guess is that this is perhaps an alignment issue.

Program received signal SIGBUS, Bus error.
0x005e78f2 in quantize_float (x=0x25b6323, natoms=3, nframes=2, precision=0.000999999931, quant=0x25b6370)
    at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/compression/tng_compress.c:91
91              quant[iframe*natoms*3+i*3+j]=(int)floor((x[iframe*natoms*3+i*3+j]/precision)+0.5);
(gdb) (gdb) (gdb) 
(gdb) 
(gdb) bt
#0  0x005e78f2 in quantize_float (x=0x25b6323, natoms=3, nframes=2, precision=0.000999999931, quant=0x25b6370)
    at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/compression/tng_compress.c:91
#1  0x005e9d34 in tng_compress_pos_float (pos=0x25b6323, natoms=3, nframes=2, desired_precision=0.00100000005, speed=0, algo=0x16129e8, nitems=0xbe7c11ec)
    at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/compression/tng_compress.c:1248
#2  0x005e9e2e in tng_compress_pos_float_find_algo (pos=0x25b6323, natoms=3, nframes=2, desired_precision=0.00100000005, speed=0, algo=0x16129e8, 
    nitems=0xbe7c11ec) at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/compression/tng_compress.c:1279
#3  0x005b691a in tng_compress (tng_data=0x24ee4c8, block=0x26d7450, n_frames=2, n_particles=3, type=2 '\002', start_pos=0x25b6323)
    at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/lib/tng_io.c:4837
#4  0x005b901e in tng_particle_data_block_write (tng_data=0x24ee4c8, block=0x26d7450, block_index=0, mapping=0x0, hash_mode=1 '\001')
    at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/lib/tng_io.c:6311
#5  0x005c7622 in tng_frame_set_write (tng_data=0x24ee4c8, hash_mode=1 '\001')
    at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/lib/tng_io.c:13153
#6  0x005c76f4 in tng_frame_set_premature_write (tng_data=0x24ee4c8, hash_mode=1 '\001')
    at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/lib/tng_io.c:13192
#7  0x00442948 in fflush_tng (tng=0x24ee4c8) at /home/pszilard/data/gromacs-5.0/src/gromacs/fileio/tngio.cpp:866
#8  0x00195f7a in mdoutf_write_to_trajectory_files (fplog=0x2458c58, cr=0x15a2be0, of=0x24ecaa0, mdof_flags=19, top_global=0x17cfb70, step=6, 
    t=0.0060000000000000001, state_local=0x1613170, state_global=0x2458fd8, f_local=0x16117b0, f_global=0x16117b0)
    at /home/pszilard/data/gromacs-5.0/src/gromacs/fileio/mdoutf.c:298
#9  0x0018e40c in do_md_trajectory_writing (fplog=0x2458c58, cr=0x15a2be0, nfile=35, fnm=0xbe7c26a8, step=6, step_rel=6, t=0.0060000000000000001, 
    ir=0x17cf800, state=0x1613170, state_global=0x2458fd8, top_global=0x17cfb70, fr=0x24f1790, outf=0x24ecaa0, mdebin=0x16112a0, ekind=0x2460bf0, 
    f=0x16117b0, f_global=0x16117b0, wcycle=0x0, nchkpt=0xbe7c1928, bCPT=1, bRerunMD=0, bLastStep=1, bDoConfOut=4096, bSumEkinhOld=0)
    at /home/pszilard/data/gromacs-5.0/src/gromacs/fileio/trajectory_writing.c:146
#10 0x00101452 in do_md (fplog=0x2458c58, cr=0x15a2be0, nfile=35, fnm=0xbe7c26a8, oenv=0x189dae0, bVerbose=0, bCompact=1, nstglobalcomm=5, vsite=0x0, 
    constr=0x24ec380, stepout=100, ir=0x17cf800, top_global=0x17cfb70, fcd=0x24592c8, state_global=0x2458fd8, mdatoms=0x24ec928, nrnb=0x24f13c0, 
    wcycle=0x0, ed=0x0, fr=0x24f1790, repl_ex_nst=0, repl_ex_nex=0, repl_ex_seed=-1, membed=0x0, cpt_period=15, max_hours=-1, deviceOptions=0x6be7bc "", 
    imdport=8888, Flags=1055744, walltime_accounting=0x24ec888) at /home/pszilard/data/gromacs-5.0/src/programs/mdrun/md.c:1320
#11 0x000fa61e in mdrunner (hw_opt=0xbe7c29f0, fplog=0x2458c58, cr=0x15a2be0, nfile=35, fnm=0xbe7c26a8, oenv=0x189dae0, bVerbose=0, bCompact=1, 
    nstglobalcomm=-1, ddxyz=0xbe7c2a7c, dd_node_order=1, rdd=0, rconstr=0, dddlb_opt=0x6c4c74 "auto", dlb_scale=0.800000012, ddcsx=0x0, ddcsy=0x0, 
    ddcsz=0x0, nbpu_opt=0x6c4c74 "auto", nstlist_cmdline=0, nsteps_cmdline=-2, nstepout=100, resetstep=-1, nmultisim=0, repl_ex_nst=0, repl_ex_nex=0, 
    repl_ex_seed=-1, pforce=-1, cpt_period=15, max_hours=-1, deviceOptions=0x6be7bc "", imdport=8888, Flags=1055744)
    at /home/pszilard/data/gromacs-5.0/src/programs/mdrun/runner.c:1774
#12 0x00109af0 in gmx_mdrun (argc=1, argv=0x17d4f20) at /home/pszilard/data/gromacs-5.0/src/programs/mdrun/mdrun.cpp:789
#13 0x000f1938 in gmx::test::MdrunTestFixture::callMdrun (this=0x25ad0f0, callerRef=...)
    at /home/pszilard/data/gromacs-5.0/src/programs/mdrun/tests/moduletest.cpp:209
...
[13 more frames left out]

Update:
It is an alignment issue:

$ dmesg | tail -n4
[364077.687276] Switched to NOHz mode on CPU #1
[364078.447618] Alignment trap: not handling instruction ed937a00 at [<005e78ee>]
[364078.460748] Unhandled fault: alignment exception (0x001) at 0x016913f3
[364079.950780] CPU1: shutdown

#6 Updated by Teemu Murtola about 3 years ago

  • Duplicated by Bug #1546: Legacy tools tests failed on armv7a with SIGBUS added

#7 Updated by Teemu Murtola about 3 years ago

  • Duplicated by Bug #1547: MDRUN tools tests failed on armv7a with SIGBUS added

#8 Updated by Magnus Lundborg about 3 years ago

  • Assignee set to Magnus Lundborg

I'm working on this. There should be a fix within a few days.

#9 Updated by Magnus Lundborg about 3 years ago

  • Status changed from New to In Progress

#10 Updated by Magnus Lundborg about 3 years ago

  • Related to Bug #1559: Writing TNG files fails on Xeon Phi added

#11 Updated by Gerrit Code Review Bot about 3 years ago

Gerrit received a related patchset '1' for Issue #1542.
Uploader: Magnus Lundborg ()
Change-Id: I96e0704d3858264ca918603bf1d7e3b27b4db7ea
Gerrit URL: https://gerrit.gromacs.org/3799

#12 Updated by Magnus Lundborg about 3 years ago

  • Status changed from In Progress to Fix uploaded

#13 Updated by Teemu Murtola about 3 years ago

  • Category set to core library
  • Target version set to 5.1

#14 Updated by Roland Schulz almost 3 years ago

  • Status changed from Fix uploaded to Closed

Also available in: Atom PDF