gmx analysis tools require big-endian TRR trajectories
Recently on the Amber mailing list a user (Robert Molt) reported that do_x3dna would crash when he tried to run it with a TRR trajectory generated by cpptraj from AmberTools. However, VMD was able to read and display the cpptraj-generated trajectory with no issues. I was able to reproduce this behavior with 'gmx rms' and found that when I forced cpptraj to write the TRR trajectory as big-endian, 'gmx rms' was able to run successfully. Is this intentional behavior? Should I be forcing cpptraj to write TRR trajectories as big endian by default? Thanks.
GROMACS version: VERSION 5.1.2
Memory model: 32 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support: disabled
OpenCL support: disabled
invsqrt routine: gmx_software_invsqrt(x)
SIMD instructions: SSE2
FFT library: fftw-3.3.4-sse2
RDTSCP usage: disabled
C++11 compilation: enabled
TNG support: enabled
Tracing support: disabled
Built on: Thu Mar 17 07:30:12 MDT 2016
Built by: droe@choxnuxe [CMAKE]
Build OS/arch: Linux 3.13.0-83-generic i686
Build CPU vendor: GenuineIntel
Build CPU brand: Intel(R) Atom(TM) CPU N280 @ 1.66GHz
Build CPU family: 6 Model: 28 Stepping: 2
Build CPU features: apic clfsh cmov cx8 htt lahf_lm mmx msr pdcm pse sse2 sse3 ssse3
C compiler: /usr/bin/cc GNU 4.8.4
C compiler flags: -msse2 -Wundef -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds
C++ compiler: /usr/bin/c++ GNU 4.8.4
C++ compiler flags: -msse2 -std=c++0x -Wundef -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds
Boost version: 1.55.0 (internal)
Fix trr magic number reading
The trr header-reading routine returned an "OK" value if the magic
number was wrong, which might lead to chaotic results everywhere,
because checking of return values tends not to happen (even when
they're right). Other GROMACS magic-number reading routines tend to
give a fatal error if the the number is wrong, e.g. by reading a file
written in wrong endianness. (This should never be a thing for XDR
files, which are defined to be big endian, but such code has existed.)
This change adds/restores that behaviour to trr reading, along
with separating the behaviour of failing to read a magic integer
from reading one that doesn't match.
Fix logic of TRR reading
Commit f7d4d019 introduced a bug where TRR reading reaching the end of
the file was indistinguishable from a reading error or a magic-number
error. This is now fixed, restoring the end-of-file behaviour that
existed before f7d4d019, while retaining the wrong-magic-number
behaviour that it introduced.
#1 Updated by Mark Abraham over 3 years ago
It's not clear whether you are saying that gmx rms reproduced the crash of do_x3dna or the success of VMD.
In any case, the TRR format is based on XDR, which is defined as big-endian, and I think that all XDR writers should conform. Thus I think that any tool that fails to read a little-endian TRR file has a reasonable implementation.
That VMD apparently also copes with little-endian is nice from a certain point of view, but I don't see any value in encouraging proliferation of forms of XDR. I would encourage e.g. cpptraj, VMD and other tools to use a standard XDR library, or at least http://www.gromacs.org/Developer_Zone/Programming_Guide/XTC_Library, rather than rolling their own code, and thus accidentally writing non-conforming XDR, leading to further fragmentation of the software landscape. :-)
#2 Updated by Daniel Roe over 3 years ago
That's good to know, although I was not able to find any clear documentation to this effect. For example, the first couple of results of a google search for "gromacs trr file format description" indicate no clear link between the XDR format and gromacs TRR files (although the fourth hit, a link to MDAnalysis documentation, starts to hint at this indirectly via the way the modules are named, but even then it's not clear). Compare this to the first result of a google search for "amber netcdf file format description", which for me is http://ambermd.org/netcdf/nctraj.pdf.
Now that I know that TRR should be XDR I will make the appropriate updates in cpptraj, although it does seem strange to me to have a big-endian format in what is primarily a little-endian world! Thanks for the info,
#5 Updated by Daniel Roe over 3 years ago
I'm attaching a tar/gz containing files that can be used to reproduce the behavior. It contains a README describing each file and steps to reproduce the bug. Let me know if you need any more info.
#6 Updated by Erik Lindahl over 3 years ago
- Status changed from New to Rejected
IMHO, this is not a bug, but a file that has been generated incorrectly with some other tool.
Once upon a long time ago Gromacs had another trajectory format (trj) which was endian-dependent. However, the whole idea with using XDR data representation is that we leave it to the operating system to translate data to the XDR network standard representation (just as it does e.g. for any data over NFS). While it's a fun quirk of history that all modern systems happen to be little-endian, that doesn't change the fact that the XDR standard is big endian, so all unix systems covert back-and-forth. However, XDR also goes further than this by specifying that the conversions have to work even if the system e.g. isn't using IEEE754 floating-point format internally.
So, while it's of course unfortunate if somebody missed that and generate files that are suddenly endian-dependent, I wouldn't put very high priority at rewriting the GROMACS trr layer to handle that type of files, but we might be able to detect it and refuse to read the file in a nicer fashion.
And, just for the record: The future of file formats in Gromacs is TrajNG, which will gradually become the new standard we use (it's just not written by default - yet)!
#10 Updated by Mark Abraham over 3 years ago
- Category changed from analysis tools to core library
- Status changed from In Progress to Fix uploaded
The code that read the trr header wasn't doing anything reasonable when it saw the wrong magic number, but merely returned with a success error code, and then e.g. gmx rms didn't check the error code (which was anyway wrong), which led to the segfault. Other tools will have had other strange behaviour. My patch fixes this for the 2016 release, by issuing a fatal error like our other file-reading code does when it can't find a magic number, but the whole fileio module needs a complete overhaul.