Project

General

Profile

Bug #3408

Gmxapi* tests segfault in rpmbuild

Added by Christoph Junghans 3 months ago. Updated 2 months ago.

Status:
Fix uploaded
Priority:
Normal
Assignee:
Category:
build system
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

From: https://koji.fedoraproject.org/koji/taskinfo?taskID=42153050

49/52 Test #49: GmxapiExternalInterfaceTests ........***Exception: SegFault  0.90 sec
[==========] Running 9 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 8 tests from GmxApiTest
[ RUN      ] GmxApiTest.ApiRunnerRestrainedMD
Setting the LD random seed to 394101933
Generated 331705 of the 331705 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5
Generated 331705 of the 331705 1-4 parameter combinations
Excluding 2 bonded neighbours molecule type 'SOL'
Excluding 3 bonded neighbours molecule type 'methane'
NOTE 1 [file spc_and_methane.top, line 33]:
  The bond in molecule-type methane between atoms 1 C and 2 H1 has an
  estimated oscillational period of 1.1e-02 ps, which is less than 10 times
  the time step of 2.0e-03 ps.
  Maybe you forgot to change the constraints mdp option.
Number of degrees of freedom in T-Coupling group System is 18.00
NOTE 2 [file /builddir/build/BUILD/gromacs-2020.1/serial/src/api/cpp/tests/Testing/Temporary/GmxApiTest_ApiRunnerRestrainedMD_input.mdp]:
  You are using a plain Coulomb cut-off, which might produce artifacts.
  You might want to consider using PME electrostatics.
There were 2 notes
      Start 50: GmxapiMpiTests
50/52 Test #50: GmxapiMpiTests ......................***Exception: SegFault  0.87 sec
[==========] Running 9 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 8 tests from GmxApiTest
[ RUN      ] GmxApiTest.ApiRunnerRestrainedMD
Setting the LD random seed to -369162639
Generated 331705 of the 331705 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5
Generated 331705 of the 331705 1-4 parameter combinations
Excluding 2 bonded neighbours molecule type 'SOL'
Excluding 3 bonded neighbours molecule type 'methane'
NOTE 1 [file spc_and_methane.top, line 33]:
  The bond in molecule-type methane between atoms 1 C and 2 H1 has an
  estimated oscillational period of 1.1e-02 ps, which is less than 10 times
  the time step of 2.0e-03 ps.
  Maybe you forgot to change the constraints mdp option.
Number of degrees of freedom in T-Coupling group System is 18.00
NOTE 2 [file /builddir/build/BUILD/gromacs-2020.1/serial/src/api/cpp/tests/Testing/Temporary/GmxApiTest_ApiRunnerRestrainedMD_input.mdp]:
  You are using a plain Coulomb cut-off, which might produce artifacts.
  You might want to consider using PME electrostatics.
There were 2 notes
      Start 51: GmxapiInternalInterfaceTests
51/52 Test #51: GmxapiInternalInterfaceTests ........***Exception: SegFault  0.87 sec
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from GmxApiTest
[ RUN      ] GmxApiTest.BuildApiWorkflowImpl
Setting the LD random seed to -1197576397
Generated 331705 of the 331705 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5
Generated 331705 of the 331705 1-4 parameter combinations
Excluding 2 bonded neighbours molecule type 'SOL'
Excluding 3 bonded neighbours molecule type 'methane'
NOTE 1 [file spc_and_methane.top, line 33]:
  The bond in molecule-type methane between atoms 1 C and 2 H1 has an
  estimated oscillational period of 1.1e-02 ps, which is less than 10 times
  the time step of 2.0e-03 ps.
  Maybe you forgot to change the constraints mdp option.
Number of degrees of freedom in T-Coupling group System is 18.00
NOTE 2 [file /builddir/build/BUILD/gromacs-2020.1/serial/src/api/cpp/workflow/tests/Testing/Temporary/GmxApiTest_BuildApiWorkflowImpl_input.mdp]:
  You are using a plain Coulomb cut-off, which might produce artifacts.
  You might want to consider using PME electrostatics.
There were 2 notes
      Start 52: GmxapiInternalsMpiTests
52/52 Test #52: GmxapiInternalsMpiTests .............***Exception: SegFault  0.85 sec
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from GmxApiTest
[ RUN      ] GmxApiTest.BuildApiWorkflowImpl
Setting the LD random seed to -1980902306
Generated 331705 of the 331705 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5
Generated 331705 of the 331705 1-4 parameter combinations
Excluding 2 bonded neighbours molecule type 'SOL'
Excluding 3 bonded neighbours molecule type 'methane'
NOTE 1 [file spc_and_methane.top, line 33]:
  The bond in molecule-type methane between atoms 1 C and 2 H1 has an
  estimated oscillational period of 1.1e-02 ps, which is less than 10 times
  the time step of 2.0e-03 ps.
  Maybe you forgot to change the constraints mdp option.
Number of degrees of freedom in T-Coupling group System is 18.00
NOTE 2 [file /builddir/build/BUILD/gromacs-2020.1/serial/src/api/cpp/workflow/tests/Testing/Temporary/GmxApiTest_BuildApiWorkflowImpl_input.mdp]:
  You are using a plain Coulomb cut-off, which might produce artifacts.
  You might want to consider using PME electrostatics.
There were 2 notes
92% tests passed, 4 tests failed out of 52
Label Time Summary:
GTest              =  75.32 sec*proc (52 tests)
IntegrationTest    =  16.76 sec*proc (9 tests)
MpiTest            =  50.27 sec*proc (8 tests)
SlowTest           =  46.38 sec*proc (2 tests)
UnitTest           =  12.19 sec*proc (41 tests)
Total Test time (real) =  75.39 sec
The following tests FAILED:
     49 - GmxapiExternalInterfaceTests (SEGFAULT)
     50 - GmxapiMpiTests (SEGFAULT)
     51 - GmxapiInternalInterfaceTests (SEGFAULT)
     52 - GmxapiInternalsMpiTests (SEGFAULT)
Errors while running CTest

Build log attached.

To reproduce this on fedora do:

git clone https://github.com/junghans/gromacs-rpm.git
cd gromacs-rpm.git
spectool -g gromacs.spec
fedpkg --release f33 srpm
mock -r fedora-rawhide-ppc64le --no-clean gromacs-2020.1-1.fc33.src.rpm
mock -r fedora-rawhide-ppc64le --shell

ppc64le_build.log.txt (6.17 MB) ppc64le_build.log.txt Christoph Junghans, 03/03/2020 07:35 PM
x86_64_build.log.txt (7.41 MB) x86_64_build.log.txt Christoph Junghans, 03/03/2020 07:54 PM

Related issues

Related to GROMACS - Feature #951: Multiple versions of Gromacs (e.g., single and double) in the same library/binaryNew
Related to GROMACS - Feature #2896: Python packagingFeedback wanted
Blocked by GROMACS - Task #2756: gmxapi integration testingIn Progress

Associated revisions

Revision 0e1d4376 (diff)
Added by Christoph Junghans 3 months ago

cmake: use libsuffix on gmxapi as well

Related to #3408

Change-Id: I2f5648321bdea1f1d564b738783092a709429fa8

History

#1 Updated by Christoph Junghans 3 months ago

Actually that also happens on x86_64.

#2 Updated by Christoph Junghans 3 months ago

  • Subject changed from Gmxapi* tests segfault on ppc64le to Gmxapi* tests segfault on fedora rawhide

#3 Updated by Paul Bauer 3 months ago

I think this is still the same issue as before for the 2020 release. I can try to dig into it again but don't know what is going on.
Eric?

#4 Updated by Eric Irrgang 3 months ago

  • Assignee changed from Paul Bauer to Eric Irrgang

I haven't seen this. I'll look now.

Is the problem new with the 2020.1 patch release?

#5 Updated by Eric Irrgang 3 months ago

  • Status changed from New to Blocked, need info

I didn't figure out how to get the emulation working to do the ppc64 mock thing, but in a docker container (gmxapi/issue3408 available with docker pull gmxapi/issue3408) I was able to see the reported seg faults with rpmbuild --rebuild gromacs-2020.1-1.fc33.src.rpm

However, I tried several manual builds and I was not able to reproduce the problem. The following all worked.

cmake ../../gromacs-2020.1 -DGMX_THREAD_MPI=ON -DBUILD_TESTING=ON && make -j4 check
cmake ../../gromacs-2020.1 -DGMX_MPI=ON -DBUILD_TESTING=ON -DCMAKE_CXX_COMPILER=/usr/lib64/openmpi/bin/mpicxx -DCMAKE_C_COMPILER=/usr/lib64/openmpi/bin/mpicc -DMPIEXEC=/usr/lib64/openmpi/bin/mpiexec && make -j4 tests && OMPI_ALLOW_RUN_AS_ROOT=1 OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 make check
cmake ../../gromacs-2020.1 -DGMX_MPI=OFF -DBUILD_TESTING=ON  && make -j4 check

It's been a while since I've used the rpmbuild system, and I'm not able to effectively debug it. Could there be important compiler flags that are being passed to libgromacs but not to libgmxapi? Could libgmxapi be linking against the wrong libgromacs? If you can clarify CMake configure and build steps that will reproduce the problem, I can look again.

#6 Updated by Christoph Junghans 3 months ago

As the segfault shows up on all archs, you can run "mock -r fedora-rawhide-x86_64 --no-clean gromacs-2020.1-1.fc33.src.rpm" instead, which should be much faster.

#7 Updated by Eric Irrgang 3 months ago

I see the seg fault with "mock ..." but I don't know how to debug it. The problem is either in the way the build tool chains are invoked, or the problem will require a debug build to locate bad code. Either way, I need to be able to interact more granularly with the build system. Can you provide instructions for reproducing the error in terms of cmake and make?

#8 Updated by Christoph Junghans 3 months ago

I haven't tried to reproduce it outside of mock.

You can do "mock -r fedora-rawhide-x86_64 --shell" to jump into the mock environment, you might want to install an editor first, e.g. "mock -r fedora-rawhide-x86_64 --install vim", that worked for me in the past.

#9 Updated by Eric Irrgang 3 months ago

  • Subject changed from Gmxapi* tests segfault on fedora rawhide to Gmxapi* tests segfault in gromacs-2020.1-1.fc33.src.rpm
  • Assignee deleted (Eric Irrgang)

With mock, in the gmxapi/issue3408 docker image, I always end up with

ERROR: Command failed:
     # /bin/mount -n --bind /var/cache/mock/fedora-rawhide-x86_64-bootstrap/yum_cache /var/lib/mock/fedora-rawhide-x86_64-bootstrap/root/var/cache/yum

Does the problem only occur in the Rawhide distribution? Was the problem introduced with the transition from GROMACS 2019 to 2020, from GROMACS 2020 to 2020.1, or from Fedora 32 to 33?

In addition to Docker, I tried to install a virtual machine from the 20200303 builds of Rawhide Everything and Rawhide Server, but I'm getting errors with the mirrors URLs when trying to install/update packages and/or having a hard time getting a bootable installation under Parallels. If you can point me to a VirtualBox, VMWare, or Parallels image or a Dockerfile or Docker image, I can try again. But I don't think I can volunteer to debug the SPEC file.

Does the Spack recipe work?

#10 Updated by Christoph Junghans 3 months ago

Eric Irrgang wrote:

Does the problem only occur in the Rawhide distribution? Was the problem introduced with the transition from GROMACS 2019 to 2020, from GROMACS 2020 to 2020.1, or from Fedora 32 to 33?

2019.6 works on all platforms, so I am guessing 2019 to 2020, but 2020.0 had a bunch of other issues, which prevented it from being build at that time on Fedora and in the meanwhile gcc-10 has arrived in F33, so it is hard to say.

In addition to Docker, I tried to install a virtual machine from the 20200303 builds of Rawhide Everything and Rawhide Server, but I'm getting errors with the mirrors URLs when trying to install/update packages and/or having a hard time getting a bootable installation under Parallels. If you can point me to a VirtualBox, VMWare, or Parallels image or a Dockerfile or Docker image, I can try again. But I don't think I can volunteer to debug the SPEC file.

I am running f31 in virtualbox and can reproduce this issue. I remember the above mock issue in docker and usually I worked around that by using --old-chroot, but that doesn't seem to work anymore in mock-2.0, so I have asked the developer about that issue on docker, stay tuned.

Does the Spack recipe work?

Not sure how that is related and one would certainly need to inject exactly the same flags rpmbuild is using. I haven't worked on gromacs in spack for a while (except bumping the versions) hence the spackage is currently a bit out of date and e.g. doesn't even support spack's micro archs (see https://github.com/spack/spack/pull/13636).

#11 Updated by Eric Irrgang 3 months ago

one would certainly need to inject exactly the same flags rpmbuild is using

That's what I'm getting at. It seems likely that the problem is related to the flags that rpmbuild is using, but it is not clear to me what they are. I could not find a complete cmake command line with expanded variables in any of the terminal or log file output.

I tried several build configurations and didn't reproduce the problem, but if you can provide a set of flags, we can zero in on whether this is a bug or a usage problem, and whether a technical solution or documentation solution is appropriate.

One difference between 2019 and 2020 releases is that the GMXAPI CMake option default changed from OFF to ON, so the tests may not have been triggered before. I have a Fedora 31 virtual machine set up, so I will try again with the mock --shell thing, but I appreciate any help you can provide in identifying the differences between the failure case and the build scenarios that are documented and tested.

#12 Updated by Christoph Junghans 3 months ago

I think that should trigger the issue as well:

docker run -it fedora:rawhide /bin/bash
dnf install -y fedpkg make
git clone https://github.com/junghans/gromacs-rpm.git
cd gromacs-rpm/
dnf builddep gromacs.spec
useradd -g users gmx
su - gmx
git clone https://github.com/junghans/gromacs-rpm.git
cd gromacs-rpm/
spectool -g gromacs.spec
fedpkg --release f33 local

#13 Updated by Christoph Junghans 3 months ago

Christoph Junghans wrote:

I think that should trigger the issue as well:
[...]

Confirmed, it triggers the issue.

#14 Updated by Eric Irrgang 3 months ago

Christoph Junghans wrote:

Christoph Junghans wrote:

I think that should trigger the issue as well:
[...]

Confirmed, it triggers the issue.

I tried using fedpkg --release f33 local. There are extra problems if I don't first install blas and lapack packages.

I interrupt fedpkg in order to finish building and running the tests myself, and to save time since the failure was noted for the first of many configurations it builds.

I have tried interrupting fedpkg immediately after if finishes the CMake configure step and I have tried waiting until it finishes the initial build phase for the serial directory.

In either case, I then go to the serial directory and do cmake --build . --target tests, and then I do LD_LIBRARY_PATH=/gromacs-rpm/gromacs-2020.1/serial/lib ctest -R Gmxapi and the tests pass. Note that LD_LIBRARY_PATH needs to be provided because the RPATH related CMake options have been overridden by the SPEC file.

So it seems that the SPEC file at least is able to properly configure the CMake build tree, but is unable to produce properly working binaries or execution/linking environment. Either there is something weird about the way that rpmbuild is invoking the tool chain, or something about the several configurations that it builds is not properly isolated. (Maybe two linked objects are picking up different typedefs for real or something?)

I haven't tried letting fedpkg run to completion since manually installing lapack and blas packages, so I'm doing that now out of curiosity, but it will take a while.

In other words, I still can't reproduce the problem unless the build/test is managed through the RPM-packaing tools, and even they seem to be able to produce a valid CMake environment, so I can't find anything to try to debug.

#15 Updated by Christoph Junghans 3 months ago

In my case “ dnf builddep” installs all the dependencies incl. lapack for me.

#16 Updated by Eric Irrgang 3 months ago

I re-ran the entire fedpkg, which failed, and then I went directly to the build directory to run the tests manually, and they passed.


The following tests FAILED:
     49 - GmxapiExternalInterfaceTests (SEGFAULT)
     50 - GmxapiMpiTests (SEGFAULT)
     51 - GmxapiInternalInterfaceTests (SEGFAULT)
     52 - GmxapiInternalsMpiTests (SEGFAULT)
Errors while running CTest
make[3]: *** [CMakeFiles/run-ctest-nophys.dir/build.make:78: CMakeFiles/run-ctest-nophys] Error 8
make[3]: Leaving directory '/gromacs-rpm/gromacs-2020.1/serial'
make[2]: *** [CMakeFiles/Makefile2:2476: CMakeFiles/run-ctest-nophys.dir/all] Error 2
make[2]: Leaving directory '/gromacs-rpm/gromacs-2020.1/serial'
make[1]: *** [CMakeFiles/Makefile2:2455: CMakeFiles/check.dir/rule] Error 2
make[1]: Leaving directory '/gromacs-rpm/gromacs-2020.1/serial'
make: *** [Makefile:269: check] Error 2
make: Leaving directory '/gromacs-rpm/gromacs-2020.1/serial'
error: Bad exit status from /var/tmp/rpm-tmp.ERpKVs (%check)
    Bad exit status from /var/tmp/rpm-tmp.ERpKVs (%check)

RPM build errors:
Could not execute local: rpmbuild --define '_sourcedir /gromacs-rpm' --define '_specdir /gromacs-rpm' --define '_builddir /gromacs-rpm' --define '_srcrpmdir /gromacs-rpm' --define '_rpmdir /gromacs-rpm' --define 'dist %{?distprefix}.fc33' --define 'fedora 33' --eval '%undefine rhel' --define 'fc33 1' -ba /gromacs-rpm/gromacs.spec | tee .build-2020.1-1.fc33.log
[root@6506a7a644c2 gromacs-rpm]# cd gromacs-2020.1/serial
[root@6506a7a644c2 serial]# LD_LIBRARY_PATH=/gromacs-rpm/gromacs-2020.1/serial/lib ctest -R Gmxapi
Test project /gromacs-rpm/gromacs-2020.1/serial
    Start 49: GmxapiExternalInterfaceTests
1/4 Test #49: GmxapiExternalInterfaceTests .....   Passed    1.03 sec
    Start 50: GmxapiMpiTests
2/4 Test #50: GmxapiMpiTests ...................   Passed    0.82 sec
    Start 51: GmxapiInternalInterfaceTests
3/4 Test #51: GmxapiInternalInterfaceTests .....   Passed    0.91 sec
    Start 52: GmxapiInternalsMpiTests
4/4 Test #52: GmxapiInternalsMpiTests ..........   Passed    0.89 sec

100% tests passed, 0 tests failed out of 4

Label Time Summary:
GTest              =   3.65 sec*proc (4 tests)
IntegrationTest    =   3.65 sec*proc (4 tests)
MpiTest            =   1.71 sec*proc (2 tests)

Total Test time (real) =   3.68 sec
[root@6506a7a644c2 serial]#

However the fedpkg is invoking the tests, the test binaries must be getting mismatched versions of libgmxapi and libgromacs. The normal linking details managed by CMake are supposed to prevent this. Do you know why the SPEC file modifies this behavior? I recognize that the RPM packaging tools have some provisions for helping to configure libraries for their final destination, but maybe it is overly aggressive in this case, at least in the build tree.

#17 Updated by Christoph Junghans 3 months ago

There are a couple of differences: the spec runs “make check” (in parallel) instead of just “ctest” and as Fedora disallows rpath (CMAKE_SKIP_RPATH=ON is used) and hence LD_LIBRARY_PATH is set for running “make check”. Without the latter the other test won’t find libgromacs.so, libgmock.so and so on. And it actually surprises me that you got the api tests to run, did you use the same LD_LIBRARY_PATH? I think you only had one, the spec has an additional “%{buildroot}%{_libdir}“ pointing to the just installed libraries.

#18 Updated by Eric Irrgang 3 months ago

Ah. I think we're getting somewhere. It does sound like the problem is likely mismatched binaries.

Christoph Junghans wrote:

Fedora disallows rpath

Interesting! How are libraries supposed to find each other? Is everything supposed to go in /usr/lib or are packages supposed to update /etc/ld.conf? I didn't notice a linker error when I used RPATH... is this just a policy thing for official packages? In that case, I would think it would be okay to use RPATH in the build tree, at least.

And it actually surprises me that you got the api tests to run

I just tried again with LD_LIBRARY_PATH=/gromacs-rpm/gromacs-2020.1/serial/lib make check. All tests passed in serial and all tests failed in serial_d.

pointing to the just installed libraries.

Why do the tests run against the installation instead of the build tree? It seems like this can allow a linking target to be replaced after the test binaries have been linked. I don't think CMake or the build system can check for this. The infrastructure assumes that the test binary will be executed with the library binary associated with the build system target, which has not been rebuilt. I had more flexible handling for library locations in earlier versions of libgmxapi and its tests, but this was deemed to be undesirably complex and beyond the scope of supported use cases.

There are no provisions to install multiple configurations of libgmxapi, so it seems likely that between libgmxapi, the test binary, and (one or more) libgromacs libraries, there is a mixture of incompatible typdefs for real causing inconsistent calculations of object sizes.

I will take some steps to prevent ambiguity in the size of loaded symbols and/or harden the ABI. I don't know if there is a use case for multiple precisions of libgmxapi in a single install location. If you think there is, we can consider how best to approach it. Otherwise, the SPEC file should probably set -DGMXAPI=OFF for all but one build. Note that libgmxapi is internally aware of the definition of real in its libgromacs dependency, but real is not part of its public interface. libgmxapi should be agnostic to the MPI choices in libgromacs but this has not been rigorously explored because it is assumed that libgmxapi can always identify its corresponding libgromacs and that, if client code links against both libraries, CMake infrastructure ensures compatible library targets.

I definitely think that we should be testing public API client code against common installations, but we have not deployed infrastructure for this or enumerated the test cases. Also reference #951, #2756 and #2896

#19 Updated by Eric Irrgang 3 months ago

  • Subject changed from Gmxapi* tests segfault in gromacs-2020.1-1.fc33.src.rpm to Gmxapi* tests segfault in rpmbuild
  • Category set to build system
  • Status changed from Blocked, need info to Accepted
  • Assignee set to Eric Irrgang

#20 Updated by Eric Irrgang 3 months ago

  • Blocked by Task #2756: gmxapi integration testing added

#21 Updated by Eric Irrgang 3 months ago

  • Related to Feature #951: Multiple versions of Gromacs (e.g., single and double) in the same library/binary added

#22 Updated by Eric Irrgang 3 months ago

#23 Updated by Christoph Junghans 3 months ago

Eric Irrgang wrote:

Ah. I think we're getting somewhere. It does sound like the problem is likely mismatched binaries.

Christoph Junghans wrote:

Fedora disallows rpath

Interesting! How are libraries supposed to find each other? Is everything supposed to go in /usr/lib or are packages supposed to update /etc/ld.conf? I didn't notice a linker error when I used RPATH... is this just a policy thing for official packages? In that case, I would think it would be okay to use RPATH in the build tree, at least.

Yeah, everything do into /usr and ldconfig is run by the package manager usually (see https://docs.fedoraproject.org/en-US/packaging-guidelines/#_beware_of_rpath), I tried in the past, e.g. by setting CMAKE_INSTALL_RPATH to empty, but I could never find a way to have no rpath in installation without also disabling it at build time.

And it actually surprises me that you got the api tests to run

I just tried again with LD_LIBRARY_PATH=/gromacs-rpm/gromacs-2020.1/serial/lib make check. All tests passed in serial and all tests failed in serial_d.

pointing to the just installed libraries.

Why do the tests run against the installation instead of the build tree?

Historic reasons, I guess, older version of gromacs didn't put all libraries in the same directory hence using the in builddir libraries was painful.

#24 Updated by Christoph Junghans 3 months ago

Ok I got a bit closer:
LD_LIBRARY_PATH=/home/gmx/rpmbuild/BUILDROOT/gromacs-2020.1-1.fc33.x86_64/usr/lib64/:$PWD/lib ctest -R Gmxapi -V
segfault, while
LD_LIBRARY_PATH=$PWD/lib ctest -R Gmxapi -V
is successful (both run in serial/ buildfolder)

I am not a 100% sure, what the difference of the libs in /home/gmx/rpmbuild/BUILDROOT/gromacs-2020.1-1.fc33.x86_64/usr/lib64/ and lib/ (in the build folder) is.

You can also run LD_LIBRARY_PATH=/home/gmx/rpmbuild/BUILDROOT/gromacs-2020.1-1.fc33.x86_64/usr/lib64/:$PWD/lib /home/gmx/gromacs-rpm/gromacs-2020.1/serial/bin/workflow-details-mpi-test "-ntomp" "2" "-ntmpi" "2" "--gtest_output=xml:/home/gmx/gromacs-rpm/gromacs-2020.1/serial/Testing/Temporary/GmxapiInternalsMpiTests.xml" to trigger the segfault.

I have an idea for a workaround in the spec, but this issue is still strange.

#25 Updated by Eric Irrgang 3 months ago

but this issue is still strange.

I think what is happening is that libgmxapi is getting repeatedly replaced in the staging area, so that the test binaries are able to find a different version of libgmxapi than they were compiled with and, worse, a libgmxapi that depends on a different version of libgromacs than the test binary is linked against.

I'm not 100% sure that mismatched binaries is the (only) problem, but it is definitely likely occurring with what we know, and is not a use case that we had attempted to support. In other words, I'm not surprised there is a problem, but that's not very helpful. :-]

Thanks for the link. That page indicates that it is acceptable to install libraries into /usr/lib/<myapp>/....so, and to use rpath to find them from other binaries installed with the same package. Maybe the libraries should be moved to /usr/lib/gromacs, /usr/lib/gromacs_d, etcetera. GROMACS 2020 doesn't officially claim to offer a public API, but it could still be appropriate to also add a config file to /etc/ld.so.conf.d/, as suggested at that site.

The gmxapi test binaries originally did not link against libgromacs at all, which also may have prevented this problem. Efforts are underway to lighten the coupling of test binaries to libgromacs and to merge libgromacs and libgmxapi, but I wouldn't count a resolution for at least a few months.

#26 Updated by Christoph Junghans 3 months ago

Eric Irrgang wrote:

but this issue is still strange.

I think what is happening is that libgmxapi is getting repeatedly replaced in the staging area, so that the test binaries are able to find a different version of libgmxapi than they were compiled with and, worse, a libgmxapi that depends on a different version of libgromacs than the test binary is linked against.

I'm not 100% sure that mismatched binaries is the (only) problem, but it is definitely likely occurring with what we know, and is not a use case that we had attempted to support. In other words, I'm not surprised there is a problem, but that's not very helpful. :-]

The funny thing is that those are the same libraries in the one case they are installed in to other case are not. For the different builds we use different library suffixes, so there should be no overwriting.

#27 Updated by Christoph Junghans 3 months ago

I think I understand. libgmxapi doesn't get installed with the correct library suffix and hence the single precision libgmxapi.so gets overwritten by the double precision one, but then make the single precision test fail in the rpmbuild.

#28 Updated by Eric Irrgang 2 months ago

  • Status changed from Accepted to Fix uploaded

Has this been satisfactorily resolved?

Also available in: Atom PDF