Bug #1542
two unit tests fail on 32-bit ARM
Description
On a 32-bit ARM (Tegra 3) platform with up-to-date Uubntu 12.04.4 LTS the LegacyToolsTests and MdrunTests unit tests fail with segfault.
Attached are the CMake cache, make check
output, and mdrun -version
output. The configure command used:
CC=gcc-4.7 CXX=g++-4.7 cmake ../ -DGMX_GPU=OFF -DGMX_BUILD_OWN_FFTW=ON
Note that the version is flagged "dirty" because the Random123 library does not oficially support ARM and the #error
at src/external/Random123-1.08/include/Random123/features/gccfeatures.h:38
preventing compilation needs to be disabled.
Related issues
Associated revisions
History
#1 Updated by Szilárd Páll over 5 years ago
- File gmx-version.out gmx-version.out added
- File CMakeCache.txt CMakeCache.txt added
- File make-check.log make-check.log added
#2 Updated by Roland Schulz over 5 years ago
Can you add a stacktrace?
#3 Updated by Szilárd Páll over 5 years ago
- Related to Task #1545: test Random123 on unsupported platforms added
#4 Updated by Szilárd Páll over 5 years ago
Both tests fail with "Bus error".
With the LegacyToolsTests the crash happens at the 6th test:
[----------] 5 tests from NoFatalErrorWhenWritingFrom/TrjconvWithIndexGroupSubset [ RUN ] NoFatalErrorWhenWritingFrom/TrjconvWithIndexGroupSubset.WithDifferentInputFormats/0 Will write tng: Trajectory file (tng format) Select group for output Group 0 ( System) has 6 elements Group 1 (FirstWaterMolecule) has 3 elements Group 2 (SecondWaterMolecule) has 3 elements Select a group: Selected 2: 'SecondWaterMolecule' Bus errorrame 1 time 0.000 -> frame 1 time 0.000
The MdrunTest fails right after the following output:
starting mdrun 'spc-and-methanol' 6 steps, 0.0 ps.
I suspect that both of these may be related to TNG, but let me get some backtraces to confirm.
#5 Updated by Szilárd Páll over 5 years ago
Below is the bactrace for mdrun-test
. This indicates that the issue is in the TNG compression function and based on the SIGBUS, my guess is that this is perhaps an alignment issue.
Program received signal SIGBUS, Bus error. 0x005e78f2 in quantize_float (x=0x25b6323, natoms=3, nframes=2, precision=0.000999999931, quant=0x25b6370) at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/compression/tng_compress.c:91 91 quant[iframe*natoms*3+i*3+j]=(int)floor((x[iframe*natoms*3+i*3+j]/precision)+0.5); (gdb) (gdb) (gdb) (gdb) (gdb) bt #0 0x005e78f2 in quantize_float (x=0x25b6323, natoms=3, nframes=2, precision=0.000999999931, quant=0x25b6370) at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/compression/tng_compress.c:91 #1 0x005e9d34 in tng_compress_pos_float (pos=0x25b6323, natoms=3, nframes=2, desired_precision=0.00100000005, speed=0, algo=0x16129e8, nitems=0xbe7c11ec) at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/compression/tng_compress.c:1248 #2 0x005e9e2e in tng_compress_pos_float_find_algo (pos=0x25b6323, natoms=3, nframes=2, desired_precision=0.00100000005, speed=0, algo=0x16129e8, nitems=0xbe7c11ec) at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/compression/tng_compress.c:1279 #3 0x005b691a in tng_compress (tng_data=0x24ee4c8, block=0x26d7450, n_frames=2, n_particles=3, type=2 '\002', start_pos=0x25b6323) at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/lib/tng_io.c:4837 #4 0x005b901e in tng_particle_data_block_write (tng_data=0x24ee4c8, block=0x26d7450, block_index=0, mapping=0x0, hash_mode=1 '\001') at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/lib/tng_io.c:6311 #5 0x005c7622 in tng_frame_set_write (tng_data=0x24ee4c8, hash_mode=1 '\001') at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/lib/tng_io.c:13153 #6 0x005c76f4 in tng_frame_set_premature_write (tng_data=0x24ee4c8, hash_mode=1 '\001') at /home/pszilard/data/gromacs-5.0/src/external/tng_io/src/lib/tng_io.c:13192 #7 0x00442948 in fflush_tng (tng=0x24ee4c8) at /home/pszilard/data/gromacs-5.0/src/gromacs/fileio/tngio.cpp:866 #8 0x00195f7a in mdoutf_write_to_trajectory_files (fplog=0x2458c58, cr=0x15a2be0, of=0x24ecaa0, mdof_flags=19, top_global=0x17cfb70, step=6, t=0.0060000000000000001, state_local=0x1613170, state_global=0x2458fd8, f_local=0x16117b0, f_global=0x16117b0) at /home/pszilard/data/gromacs-5.0/src/gromacs/fileio/mdoutf.c:298 #9 0x0018e40c in do_md_trajectory_writing (fplog=0x2458c58, cr=0x15a2be0, nfile=35, fnm=0xbe7c26a8, step=6, step_rel=6, t=0.0060000000000000001, ir=0x17cf800, state=0x1613170, state_global=0x2458fd8, top_global=0x17cfb70, fr=0x24f1790, outf=0x24ecaa0, mdebin=0x16112a0, ekind=0x2460bf0, f=0x16117b0, f_global=0x16117b0, wcycle=0x0, nchkpt=0xbe7c1928, bCPT=1, bRerunMD=0, bLastStep=1, bDoConfOut=4096, bSumEkinhOld=0) at /home/pszilard/data/gromacs-5.0/src/gromacs/fileio/trajectory_writing.c:146 #10 0x00101452 in do_md (fplog=0x2458c58, cr=0x15a2be0, nfile=35, fnm=0xbe7c26a8, oenv=0x189dae0, bVerbose=0, bCompact=1, nstglobalcomm=5, vsite=0x0, constr=0x24ec380, stepout=100, ir=0x17cf800, top_global=0x17cfb70, fcd=0x24592c8, state_global=0x2458fd8, mdatoms=0x24ec928, nrnb=0x24f13c0, wcycle=0x0, ed=0x0, fr=0x24f1790, repl_ex_nst=0, repl_ex_nex=0, repl_ex_seed=-1, membed=0x0, cpt_period=15, max_hours=-1, deviceOptions=0x6be7bc "", imdport=8888, Flags=1055744, walltime_accounting=0x24ec888) at /home/pszilard/data/gromacs-5.0/src/programs/mdrun/md.c:1320 #11 0x000fa61e in mdrunner (hw_opt=0xbe7c29f0, fplog=0x2458c58, cr=0x15a2be0, nfile=35, fnm=0xbe7c26a8, oenv=0x189dae0, bVerbose=0, bCompact=1, nstglobalcomm=-1, ddxyz=0xbe7c2a7c, dd_node_order=1, rdd=0, rconstr=0, dddlb_opt=0x6c4c74 "auto", dlb_scale=0.800000012, ddcsx=0x0, ddcsy=0x0, ddcsz=0x0, nbpu_opt=0x6c4c74 "auto", nstlist_cmdline=0, nsteps_cmdline=-2, nstepout=100, resetstep=-1, nmultisim=0, repl_ex_nst=0, repl_ex_nex=0, repl_ex_seed=-1, pforce=-1, cpt_period=15, max_hours=-1, deviceOptions=0x6be7bc "", imdport=8888, Flags=1055744) at /home/pszilard/data/gromacs-5.0/src/programs/mdrun/runner.c:1774 #12 0x00109af0 in gmx_mdrun (argc=1, argv=0x17d4f20) at /home/pszilard/data/gromacs-5.0/src/programs/mdrun/mdrun.cpp:789 #13 0x000f1938 in gmx::test::MdrunTestFixture::callMdrun (this=0x25ad0f0, callerRef=...) at /home/pszilard/data/gromacs-5.0/src/programs/mdrun/tests/moduletest.cpp:209 ... [13 more frames left out]
Update:
It is an alignment issue:
$ dmesg | tail -n4 [364077.687276] Switched to NOHz mode on CPU #1 [364078.447618] Alignment trap: not handling instruction ed937a00 at [<005e78ee>] [364078.460748] Unhandled fault: alignment exception (0x001) at 0x016913f3 [364079.950780] CPU1: shutdown
#6 Updated by Teemu Murtola over 5 years ago
- Has duplicate Bug #1546: Legacy tools tests failed on armv7a with SIGBUS added
#7 Updated by Teemu Murtola over 5 years ago
- Has duplicate Bug #1547: MDRUN tools tests failed on armv7a with SIGBUS added
#8 Updated by Magnus Lundborg over 5 years ago
- Assignee set to Magnus Lundborg
I'm working on this. There should be a fix within a few days.
#9 Updated by Magnus Lundborg over 5 years ago
- Status changed from New to In Progress
#10 Updated by Magnus Lundborg over 5 years ago
- Related to Bug #1559: Writing TNG files fails on Xeon Phi added
#11 Updated by Gerrit Code Review Bot over 5 years ago
Gerrit received a related patchset '1' for Issue #1542.
Uploader: Magnus Lundborg (magnus.lundborg@scilifelab.se)
Change-Id: I96e0704d3858264ca918603bf1d7e3b27b4db7ea
Gerrit URL: https://gerrit.gromacs.org/3799
#12 Updated by Magnus Lundborg over 5 years ago
- Status changed from In Progress to Fix uploaded
#13 Updated by Teemu Murtola over 5 years ago
- Category set to core library
- Target version set to 5.1
#14 Updated by Roland Schulz over 5 years ago
- Status changed from Fix uploaded to Closed
Fixed TNG memory alignment problem and reset pointers.
This is a temporary fix to the memory alignment problems on some
platforms. In the main TNG repository the whole I/O system is
rewritten to address this problem. This fix is to avoid the
problems without making too large changes.
There are also some pointers that were not reset after memory
was freed. This is also fixed in here.
This commit does not correspond to any commit in the TNG
repository.
Fixes #1542, #1546, #1547 and #1559.
Change-Id: I90a6406cccbc43fd57d4423c2b661019cf7763e8