Project

General

Profile

Bug #1021

Building release-4-6 difficult on Mac OS X with recent hardware

Added by Teemu Murtola almost 7 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
build system
Target version:
Affected version - extra info:
no released versions
Affected version:
Difficulty:
uncategorized
Close

Description

Tried to compile Gromacs on my new laptop with OS X 10.8 and hardware AVX 256 support. Was surprisingly difficult, so it would be good to improve the build system and error messages in this respect. Here is roughly what I tried and what were the results:
  1. First, tried to compile without setting any cmake options (except Debug build and a custom install directory). Was hit by #1018.
    • After setting GMX_GPU=OFF, the next complaint from cmake were several warnings about not being able to find AVX support, and finally a fatal error for not finding immintrin.h.
    • After setting also GMX_ACCELERATION=SSE4.1, was able to run cmake to the end. There was a warning about not being able to find OpenMP support, and the warning indicated that OpenMP support would be disabled.
    • Running make fails in linking libgmx because of missing OpenMP symbols.
    • Note that in this case, cmake picks gcc-4.2 as the C compiler and clang-4.1 as the C++ compiler, which seems to confuse at least the OpenMP detection.
  2. Then, did the same as above, but with CMAKE_CXX_COMPILER=g++ to use g++-4.2.
    • Again, had to set GMX_GPU=OFF and GMX_ACCELERATION=SSE4.1 manually. There was a message about OpenMP not being supported with these compilers.
    • Running make fails in linking libgmxana because of undefined symbol ___builtin_object_size, referenced from _do_hbac_omp_fn.0 in gmx_hbond.c. So it seems that something is still wrong with OpenMP.
  3. Then, set CMAKE_C_COMPILER=clang to use clang for both C and C++.
    • This time, had to set only GMX_GPU=OFF, and there was a message about not being able to find OpenMP support. But finally, compilation was successful.
  4. Finally, set CMAKE_C_COMPILER=gcc-mp-4.7, CMAKE_CXX_COMPILER=g++-mp-4.7, for gcc 4.7 compiled from MacPorts.
    • After setting GMX_GPU=OFF, cmake ran through successfully.
    • Build fails now with "no such instruction" errors from the assembler. It appears that the system assembler (that MacPorts is using) does not support AVX instructions.
    • Setting GMX_ACCELERATION=SSE4.1 makes the compilation work.

So, to sum it up, it seems that it is very difficult to compile Gromacs with all the performance options enabled: I was not able to find any combination where I could enable both OpenMP and AVX with the same compiler. Googling a bit finds some workarounds for replacing the system assembler with the one that clang is using, but this will likely intimidate most people. Also, to be able to compile Gromacs at all, I was forced to set multiple options by hand, and in most cases the error messages were not helpful (not suggesting how to solve the issue). User who is not aware of the different compilers present on Mac systems, or is otherwise unexperienced, would probably have hard time getting this far.

If the plan is to support Gromacs on Macs, I think there is need to 1) make very explicit documentation on how to successfully compile it, 2) improve the error messages such that the user is guided to the correct direction, and/or 3) improve the detection code to allow compilation without so many manual overrides (warnings about loss of performance are ok, as long as they are not fatal errors).

I can provide more details if needed to find out why things are not working; did not look into why the build is actually failing in the first two cases.

CMakeCache-4-6-gcc42-clang41.txt (36.8 KB) CMakeCache-4-6-gcc42-clang41.txt Cache with gcc 4.2 and clang 4.1 (default compilers) Teemu Murtola, 10/06/2012 07:26 PM
CMakeCache-4-6-gcc42-g__42.txt (37.6 KB) CMakeCache-4-6-gcc42-g__42.txt Cache with gcc 4.2 and g++ 4.2 (CMAKE_CXX_COMPILER set) Teemu Murtola, 10/06/2012 07:26 PM

Associated revisions

Revision cb93f945 (diff)
Added by Roland Schulz almost 7 years ago

Make Gromacs compile without C++ compiler

Fixes that OpenMP is enabled if C compiler supports OpenMP
but C++ compiler does not.

Partially fixes #1021

Change-Id: I4bb109801ee57aac4826881022c34240768a841e

Revision 2b204038 (diff)
Added by Roland Schulz over 6 years ago

Fix that no OpenMP flags are used with GMX_OPENMP=no

If OpenMP flags were set because find_package(OpenMP) could
find something but GMX_OPENMP was false (either set by user
or because find_package(OpenMP) was only partial succesful)
then compiler or linker flags was set which were not needed.
This could cause undefined OpenMP linker errors with
GMX_OPENMP=no.

Fixes part of #1021

Change-Id: I9b66a8c89a84374081886cd2eeb46c87664c1e39

Revision 22a9c5b8 (diff)
Added by Roland Schulz over 6 years ago

Test that compiling and linking of AVX works

Older assemblers don't have support for AVX

Related to #1021

Change-Id: If8da47f9458c592d5408c7322280ffacce81145d

Revision ddb6b523 (diff)
Added by Teemu Murtola over 6 years ago

Remove non-functional compiler override for MacOSX.

As discussed in #1021, at least with CMake 2.8.9 this does not work, so
it only causes confusion. A better solution can be implemented if
identified in #1021 discussion.

Related to #1021.

Change-Id: Iec704dcae65cd7a213cf31861a262ac0b2763474

History

#1 Updated by Erik Lindahl almost 7 years ago

Hi Teemu,

I don't think we have experienced any problems with not finding AVX support on Macs either on 10.7 or 10.8, but thinking about it all of us might have had icc installed (which might have affected header files even when using clang). Could you provide a bit more information there? Did you force a C++ build? (remember that 4.6 is the last C-only version, so we haven't tested C++ extensively).

I agree we should turn GPU support off by default on platforms where it is very unlikely to be present - Macs have never come with strong graphics cards as far as I know. The message you are referring to should be changed to a warning IMHO, not a fatal error (since it is not fatal by definition - Gromacs will work just fine without GPU support).

For item 4 there is not a whole lot we can do, apart from theoretically testing compilers and disabling things that don't work. A much better option in that case is likely to install binutils that supports AVX, but that we can't do automatically.

#2 Updated by Teemu Murtola almost 7 years ago

  • Description updated (diff)

As I said, the first attempt is just an invocation of cmake with only CMAKE_BUILD_TYPE=Debug and CMAKE_INSTALL_PREFIX set, so I'm not forcing anything. It seems that although the C++ compiler should not be used by the build, it still has an effect on the OpenMP detection. I have no idea why using both gcc-4.2 and g++-4.2 (the second attempt) still fails the build.

And AVX detection works also for me for clang. The problem is that by default, the compilation uses gcc-4.2, which does not support AVX (nor should it, I think).

I agree that the problem with the assembler is not really related to Gromacs, but just wanted to bring that up as using a more recent compiler from MacPorts is one possibility for following the suggestion in some of the error messages to use a newer compiler. At least we should mention this limitation in some build documentation, and even better if we can provide a concrete solution.

#3 Updated by Szilárd Páll almost 7 years ago

I've known about most of the issues you mention and no matter how nice I am trying to phrase it, I think Mac OS is more and more broken from a programmer's point of view. Or maybe I should be politically correct: it's broken from a Gromacs programmer's point of view.

Teemu Murtola wrote:

Tried to compile Gromacs on my new laptop with OS X 10.8 and hardware AVX 256 support. Was surprisingly difficult, so it would be good to improve the build system and error messages in this respect. Here is roughly what I tried and what were the results:
  1. First, tried to compile without setting any cmake options (except Debug build and a custom install directory). Was hit by #1018.
    • After setting GMX_GPU=OFF, the next complaint from cmake were several warnings about not being able to find AVX support, and finally a fatal error for not finding immintrin.h.
    • After setting also GMX_ACCELERATION=SSE4.1, was able to run cmake to the end. There was a warning about not being able to find OpenMP support, and the warning indicated that OpenMP support would be disabled.
    • Running make fails in linking libgmx because of missing OpenMP symbols.
    • Note that in this case, cmake picks gcc-4.2 as the C compiler and clang-4.1 as the C++ compiler, which seems to confuse at least the OpenMP detection.

I've seen this compiler mixup on Lion as well, I think it is caused by Xcode overwriting a c++ soft link to point to clang instead of leaving it to the system default (which is btw still using this super-outdated gcc 4.2 frontend).

  1. Then, did the same as above, but with CMAKE_CXX_COMPILER=g++ to use g++-4.2.
    • Again, had to set GMX_GPU=OFF and GMX_ACCELERATION=SSE4.1 manually. There was a message about OpenMP not being supported with these compilers.
    • Running make fails in linking libgmxana because of undefined symbol ___builtin_object_size, referenced from _do_hbac_omp_fn.0 in gmx_hbond.c. So it seems that something is still wrong with OpenMP.

There, I think, the OpenMP detection is to blame because that should not happen -- unless for some funky reason something silently enabled OpenMP behind the scenes. If you can reproduce this, could you attach a CMakeCache?

  1. Then, set CMAKE_C_COMPILER=clang to use clang for both C and C++.
    • This time, had to set only GMX_GPU=OFF, and there was a message about not being able to find OpenMP support. But finally, compilation was successful.

Yay!

  1. Finally, set CMAKE_C_COMPILER=gcc-mp-4.7, CMAKE_CXX_COMPILER=g++-mp-4.7, for gcc 4.7 compiled from MacPorts.
    • After setting GMX_GPU=OFF, cmake ran through successfully.
    • Build fails now with "no such instruction" errors from the assembler. It appears that the system assembler (that MacPorts is using) does not support AVX instructions.
    • Setting GMX_ACCELERATION=SSE4.1 makes the compilation work.

That's again a known issue, specific to Mac, but this time the Ports are slightly broken.

So, to sum it up, it seems that it is very difficult to compile Gromacs with all the performance options enabled: I was not able to find any combination where I could enable both OpenMP and AVX with the same compiler. Googling a bit finds some workarounds for replacing the system assembler with the one that clang is using, but this will likely intimidate most people. Also, to be able to compile Gromacs at all, I was forced to set multiple options by hand, and in most cases the error messages were not helpful (not suggesting how to solve the issue). User who is not aware of the different compilers present on Mac systems, or is otherwise unexperienced, would probably have hard time getting this far.

If the plan is to support Gromacs on Macs, I think there is need to 1) make very explicit documentation on how to successfully compile it, 2) improve the error messages such that the user is guided to the correct direction, and/or 3) improve the detection code to allow compilation without so many manual overrides (warnings about loss of performance are ok, as long as they are not fatal errors).

I can provide more details if needed to find out why things are not working; did not look into why the build is actually failing in the first two cases.

I agree with most of what you are saying. However, to be honest, fixing all these issues or even just detecting all these cases correctly and providing reasonable error messages requires a lot of effort and probably a lot of extra CMake code. All that for what? Just to work around an OS with pretty broken developer tools.

I really hope that there will be someone willing to adress these issues, but I would very much prefer turning off OpenMP and AVX on Mac OS and focusing on other deficiencies of the build system that affect a wider audience.

#4 Updated by Erik Lindahl almost 7 years ago

Szilard,

It is completely irrelevant whether OS X is slightly broken or not. It is currently by far the most common personal/laptop hardware in academia, and according to the Nvidia survey more than 1/3 of our users primarily use Gromacs on their personal machines. If we count the number of users (rather than CPUs running Gromacs), I can't imagine any other single OS/setup even coming close in install base (unless one groups all of linux together). To be blunt: If we had to choose between GPU support and OS X support, it is almost guaranteed to be the GPU support that would be thrown out :-)

Thus, we need to have the default installation on a default-configuration Mac working well - I will help with that.

However, non-default installations I think we need to leave to the users. It is not that we don't care, but there must be million ways a user can install (possibly broken) software that makes their system limited or unstable, and if we want to test for everything we would get an extremely bloated code-base!

As far as I know, OS X 10.8 is now fully clang/LLVM. However, Teemu: You might have to upgrade the command-line tools inside Xcode to avoid having outdated compilers (if you upgraded). I just tried this with the Xcode 4.5.1 update released days ago; I still had cc available on the command line after running software update, and the version was clang 4.0. After going into preferences and selecting "install command line tools", this was changed to clang 4.1.

#5 Updated by Teemu Murtola almost 7 years ago

Erik, this is on a brand new laptop with 10.8 pre-installed, and I directly installed Xcode 4.5.1. I have cmake 2.8.9 compiled from MacPorts and also, e.g., the mentioned gcc 4.7 from the ports, but hopefully those shouldn't interfere with what I'm seeing in the first three cases. And yes, the gcc-4.2 is using llvm.

So I think there are two problems here:
  • I think it would be nicer that if the AVX detection finds that the compiler does not actually support AVX (like it does in both of the first two cases), it would print a warning and use a lower level of acceleration (or even just suggest to the user that this is a potential solution). Currently, it just fails with a fatal error (the error message suggests to use a newer compiler).
  • The first two cases fail to link, although cmake succeeds. In both cases, OpenMP seems to be to blame.
    Perhaps the simplest solution would be to suggest that all users use clang for compilation (and even better if this can be done automatically in the build system). But at least to me it seems that there is something fishy in the build system, since just changing the C++ compiler changes the behaviour, although there is no C++ source code involved in the build...

I will attach the cache files for the first two cases when I have a bit more time.

I agree that for the more exotic cases (like my point 4), it's not necessary to put a lot of effort into making the build system work nicely. But I still think it's worth a mention in some installation instructions. It's only one or two sentences that it requires, so that shouldn't be a major effort. When I wrote and/or in my message, I really meant that; I agree that it doesn't make sense to try to make the build system recover from every possible misconfiguration.

#6 Updated by Szilárd Páll almost 7 years ago

It is completely irrelevant whether OS X is slightly broken or not. It is currently by far the most common personal/laptop hardware in academia, and according to the Nvidia survey more than 1/3 of our users primarily use Gromacs on their personal machines. If we count the number of users (rather than CPUs running Gromacs), I can't imagine any other single OS/setup even coming close in install base (unless one groups all of linux together). To be blunt: If we had to choose between GPU support and OS X support, it is almost guaranteed to be the GPU support that would be thrown out :-)

I am having a hard time believing that the whole academia is using Mac OS (unless the reference is Sweden :), especially considering that Apple computers are quite overpriced compared to PCs with equivalent hardware. I don't have the numbers, but if you have them please share.

Additionally, I am quite certain that the NVIDIA survey is biased by what the motivating factor was for filling out the survey. As the survey was announced on the gmx-users list, I think a large part of those who took it could be occasional users or students, and not the really active users. Was there anything in the survey to that supports the contrary?

Regarding the "throwing out GPU support" comment: I never suggested not supporting Mac OS. What I said is that instead of spending the very limited development resources on trying to work around the messy development environment on Mac, we could just default to a configuration that should always work, even if it won't result in the fastest binary possible. After all, that's exactly what Apple themselves do.

Thus, we need to have the default installation on a default-configuration Mac working well - I will help with that.

Exactly. The solution is simple: change the default to no OpenMP (with clang and gcc-llvm) and no AVX (unless we can detect whether it really works).

However, non-default installations I think we need to leave to the users. It is not that we don't care, but there must be million ways a user can install (possibly broken) software that makes their system limited or unstable, and if we want to test for everything we would get an extremely bloated code-base!

Again, that's exactly what I said. Working around all these issues would require hundreds of lines of CMake code.

As far as I know, OS X 10.8 is now fully clang/LLVM. However, Teemu: You might have to upgrade the command-line tools inside Xcode to avoid having outdated compilers (if you upgraded). I just tried this with the Xcode 4.5.1 update released days ago; I still had cc available on the command line after running software update, and the version was clang 4.0. After going into preferences and selecting "install command line tools", this was changed to clang 4.1.

Btw, did Apple pull these clang version out of a hat? The official llvm website shows 3.1 as the most recent release.

#7 Updated by Teemu Murtola almost 7 years ago

Here are the cache files. In both cases, it seems that GMX_OPENMP=OFF in the cache, but there are still -fopenmp flags in several places in the cache.

clang --version shows Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn).

#8 Updated by Szilárd Páll almost 7 years ago

Teemu Murtola wrote:

Here are the cache files. In both cases, it seems that GMX_OPENMP=OFF in the cache, but there are still -fopenmp flags in several places in the cache.

As far as i can tell, in both cases the OpenMP detection can be considered buggy as it ends up with a GMX_OPEMP=OFF, but it still compiles with OpenMP. That is unless one of the compilers silently enables OpenMP which could be the quite suspicious case 1.

However, I would still consider the mixed compiler case a fishy one caused by Xcode 4.5 simply swapping the default C++ compiler to clang.

clang --version shows Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn).

Unrelated, but I am truly worried about this as it will be a hassle to distinguish the future official clang 4.1 from the Apple clang 4.1.

#9 Updated by Teemu Murtola almost 7 years ago

I took a brief look at the CMake code, and it looks like multiple issues there, with nothing really OS X specific. It just happens to work by luck in other common cases... CMakeLists.txt first checks whether GMX_OPENMP is set by the user, and if it is, then find_package(OpenMP) is called. The result of this check is used to set linker flags. Also, this checks sets OpenMP_C_FLAGS, which is then used for several targets to set compiler flags. After this is done, there is then other code that may actually turn GMX_OPENMP off, but it has very little effect as it doesn't affect the things that were already done...

This stuff really needs to be reordered, and I think that things will then work also for OS X, even for this weird mixed compiler case. This should only require those ~5 OS X specific lines that are there already (disabling OpenMP with llvm-gcc-4.2). Do you really think that this is too much to ask for? It seems that at least OpenMM builds are affected by this same issue.

#10 Updated by Szilárd Páll almost 7 years ago

Teemu Murtola wrote:

I took a brief look at the CMake code, and it looks like multiple issues there, with nothing really OS X specific. It just happens to work by luck in other common cases... CMakeLists.txt first checks whether GMX_OPENMP is set by the user, and if it is, then find_package(OpenMP) is called. The result of this check is used to set linker flags. Also, this checks sets OpenMP_C_FLAGS, which is then used for several targets to set compiler flags. After this is done, there is then other code that may actually turn GMX_OPENMP off, but it has very little effect as it doesn't affect the things that were already done...

Thanks for having a look, I was planning to do it myself because as as said before, at least one of the cases was raising my suspicion that this is an internal bug.

This stuff really needs to be reordered, and I think that things will then work also for OS X, even for this weird mixed compiler case. This should only require those ~5 OS X specific lines that are there already (disabling OpenMP with llvm-gcc-4.2). Do you really think that this is too much to ask for? It seems that at least OpenMM builds are affected by this same issue.

Based on your findings, this really needs to be completely reworked. This will still not work around the Macports issue, but at least normal CPU runs will performance-wise not be hindered much, except by the lack of functional OpenMP builds (or more precisely the lack of OpenMP on hardware with AVX support).

#11 Updated by Teemu Murtola almost 7 years ago

Just to confirm, both of the first two cases work correctly for the OpenMP part if GMX_OPENMP=OFF is set manually. So if the detection would work correctly, then I think the behaviour would be acceptable: it is not possible to do an OpenMP build with the default compilers, but this gets switched off automatically (with a warning). And if the main benefit with OpenMP is with GPU, and typical builds with Macs would be with GMX_GPU=OFF, so this should not matter much.

This leaves the other issue that I think should be improved, i.e., that if the hardware has AVX support, but the compiler does not, then the configuration fails with multiple warnings/error messages that have different instructions on how to proceed. Is this really something OS X specific? I think that it would again be best to start the detection with AVX support, and if the compiler does not support that, fall back to SSE4.1 or even SSE2 if not able to find the necessary headers. Of course, a clear warning should be given to the user, but I think that a fatal error is an overkill. This wouldn't even require that much extra code, as it should be sufficient to reorder the current code and add the fallback path. And if we want to add a summary of performance options at the end of a CMake run (as someone suggested somewhere), the warning could be easily added there to make it clear.

#12 Updated by Erik Lindahl almost 7 years ago

Unfortunately I think it would require a lot of code and checks to automatically support fallbacks for multiple architectures and instruction sets. What should we do if the C compiler supports the instruction set, but not the C++ one? Then one would need to go back and remove compiler flags, re-execute the CPU detection code (since AVX support does not imply SSE4.1, e.g. on AMD), and also re-execute and other detection code that relied on those flags. The standalone CPU detection routine would have to be reworked to support arguments for what instruction sets not to use, alternatively test for them all and output an entire array.

Basically: Not for 4.6 :-)

#13 Updated by Teemu Murtola almost 7 years ago

Unfortunately I think it would require a lot of code and checks to automatically support fallbacks for multiple architectures and instruction sets. What should we do if the C compiler supports the instruction set, but not the C++ one? Then one would need to go back and remove compiler flags, re-execute the CPU detection code (since AVX support does not imply SSE4.1, e.g. on AMD), and also re-execute and other detection code that relied on those flags. The standalone CPU detection routine would have to be reworked to support arguments for what instruction sets not to use, alternatively test for them all and output an entire array.

So you are saying that since we can't be perfect and handle all weird situations that would not even occur in the current code, let's not do anything to improve? The minimum would be to put in better error messages (and/or installation instructions), but really, how much work do you think it would be to make it work a bit better than it is now? Just some speculation below.

Currently, there are no such interdependent checks that would need to be re-executed. Writing the tests such that they don't set the global compiler flags before they have actually checked that the acceleration works hardly adds any code. If you really want to be on a safe side with the CPU detection code, then just add an option to check whether a particular instruction set is usable, and re-execute it in the end. And in every situation, just error out if there is something that the code is not sure to handle.

And if C compiler supports something, but C++ does not, then so what? We are not using the C++ compiler for anything in 4.6. So just either error out, or ignore (the latter needs to be reworked for 5.0, though). Producing an error in such a mismatch is not that hard...

#14 Updated by Erik Lindahl almost 7 years ago

Teemu,

Let's separate bugs (which this report was first about) from features-that-would-be-nice-to-have-in-the-future. If anybody wants to provide such fallback code in the future it would be useful, but it's not a bug.

#15 Updated by Teemu Murtola almost 7 years ago

Erik, could you please specify what you then consider to be in scope of this issue? From your earlier comments, I was under the impression that this issue is about trying to make the default build on Mac OS X work reasonably well. And even though fixing the OpenMP detection will improve things considerably, the default build will still fail (on new hardware) because the default compiler (gcc-4.2) on Mac OS X does not support AVX. But now you are effectively forbidding discussion on some possibilities of solving this, so I'm a bit confused. I was just trying to find some middle ground here, since Szilard does not seem to like the idea to put in any Mac OS X specific checks.

Could you also provide some suggestion of how you think this should be solved? I don't think it's very effective (or encouraging) for us to guess what you might find acceptable, and then try to develop those further, just to get them dismissed for some reason or another. I've already listed quite a few alternatives in my earlier comments, and Szilard as well; would one of those be enough for this issue?

I'm perfectly fine with Szilard's suggestion of just disabling AVX on Macs by default (even though at least in my case, it does work with clang). In that case, it may be a bit annoying to have mdrun print out performance warnings every time it is invoked, though. In most cases, I would think that users who compile Gromacs on laptops or their personal desktops would not be looking for peak computation capacity, but instead would be mainly using the analysis tools and preparing the simulations.

#16 Updated by Erik Lindahl almost 7 years ago

Hi Teemu,

Judging from the top of the message, the (very valid) bug report is that Gromacs-4.6 is difficult to compile on the lastest version of OS X. We thought we had this working, but obviously it wasn't, so we will definitely fix that. Previously it has worked fine, although the user has gotten a warning when using clang that AVX is not supported, so they should update their compiler or in worst case pick SSE4.1 instead and face performance regression.

Both Apple clang 4.1 and 4.0 (that are the default versions since OS X 10.8) should support AVX (I've been running regression tests of all our AVX code using them, so it's first-hand knowledge). If this does not work on a (correct) vanilla OS X installation it sounds like a minor detection issue.

However, what I was negative against in comment #11 was by solving a hopefully relatively minor bug with a large rewrite of most acceleration detection code and flags affecting all architectures at this stage. Not because it is bad per se, but as we have seen this code has proven to be super-sensitive to different compiler versions, header installations, and we have had to test it on far more OS/compiler combinations than available in the automated regression testing modules. In addition, we will still need to

If somebody has time to write and test such code they are welcome to commit it, but it's likely not something I or Szilard have time for right now - we will try to just fix the detection with the least code possible :-)

#17 Updated by Erik Lindahl almost 7 years ago

PS: The reason for the lack-of-time is that I'm trying to use every available minute for the updated group kernels that have priority 1 - but the first patch is coming there...

#18 Updated by Teemu Murtola almost 7 years ago

That's more like the explanation I was looking for. ;) I'm fine with keeping things simple and/or just doing small changes at this point, but I still feel that the different directions are worth discussing a bit to not skip some solution just because it initially looks like it could be a bit more work. As said, I'm satisfied with the current discussion.

And just to clarify: If I manually set the compiler to clang (my case 3), then the build works just fine. The problem is just that clang is not the default C compiler, and there are all these issues with gcc-4.2. The OpenMP stuff is probably worth fixing even if it doesn't actually affect clang, since it may bite also on other systems, and can be a source of a lot of confusion. If it is easy to either disable AVX with gcc-4.2 or suggest with a clear message the user to use clang, then that should be fine for the rest of the stuff.

#19 Updated by Szilárd Páll almost 7 years ago

Note that my suggestion was very simplistic and I am quite fine with having a few lines of checks added for Mac OS if it helps in making the default compilation work for the most common cases. However, the situation is pretty complex and would require testing several custom Apple clang and gcc-llvm versions with quite many combinations of build configurations to make sure that if we suggest something that really does work.

Until now the discussion has been quite focused around the latest versions of the OS and the developer tools, but I would very much prefer to see the build system be able to:
  1. automatically fall back to a working configuration or
  2. simply default to a configuration that represents a "smallest" common set of settings that most probably works
    on every reasonably new OS version + development tool combination (as in the case of other supported OS/platforms).

I think it is quite hard to accomplish 1 that's why I suggested 2. Alternatively, we could combine the two by turning options off by default and re-enabling some for cases when it's known to work (e.g. enabling AVX for Apple clang >=3.1).

Finally, I have to say I agree with Teemu that it will cause a lot of headache having an overly eager and less than rock-solid CPU detection that picks a setting before making sure that this most probably result in a working build configuration (the same is true for OpenMP!). This would only require setting the acceleration initially to "None" (or perhaps a separate "N/A" option) and changing this value to the suggested one only in case if all checks succeed. Automated fallback would be nice, but I would be OK with putting the burden of selecting a fallback on the user.

#20 Updated by Berk Hess almost 7 years ago

Just a minor note: AMD Bulldozer does support SSE4.1, so I would think that currently, and probably also in the future, AVX support does imply SSE4.1 support.

#21 Updated by Teemu Murtola almost 7 years ago

Note that Szilard's suggestion at the end of comment 19 is exactly equivalent to a simple fallback of setting GMX_ACCELERATION=None if the suggested acceleration is not found.

#22 Updated by Szilárd Páll almost 7 years ago

Teemu Murtola wrote:

Note that Szilard's suggestion at the end of comment 19 is exactly equivalent to a simple fallback of setting GMX_ACCELERATION=None if the suggested acceleration is not found.

I would actually rephrase my suggestion: no action related to enabling CPU instruction set or accelerated kernels should be taken (including setting GMX_ACCELERATION, adding compiler flags, etc.) before all test pass unless the respective action is easy to revers in code (and this is actually done in a correct and consistent manner), e.g. changing a cache variable's value is possible, but no other action dependent on the value of the respective variable should happen before the final value is set.

#23 Updated by Szilárd Páll almost 7 years ago

Teemu Murtola wrote:

Just to confirm, both of the first two cases work correctly for the OpenMP part if GMX_OPENMP=OFF is set manually. So if the detection would work correctly, then I think the behaviour would be acceptable: it is not possible to do an OpenMP build with the default compilers, but this gets switched off automatically (with a warning). And if the main benefit with OpenMP is with GPU, and typical builds with Macs would be with GMX_GPU=OFF, so this should not matter much.

Btw, even without GPUs using only OpenMP multi-threading will always be faster than thread-MPI on all desktop machines. The difference is not negligible, I've seen up to 20% advantage with OpenMP.

#24 Updated by Roland Schulz almost 7 years ago

I uploaded a fix for the OpenMP linker issues: https://gerrit.gromacs.org/#/c/1505

The problem that we currently disable OpenMP if either the C or C++ compiler doesn't have OpenMP, is caused by the behavior of find_package(OpenMP). It only reports OPENMP_FOUND=yes if all enabled languages support it. I think the proper fix is to not enable C++ unless it is needed (i.e. if OpenMM is enabled). This solves the OpenMP problem and more importantly fixes that it is possible to compile 4.6 without C++ compiler which is currently not possible. I'll upload a fix for that in <1hr.

#25 Updated by Szilárd Páll almost 7 years ago

Roland Schulz wrote:

I uploaded a fix for the OpenMP linker issues: https://gerrit.gromacs.org/#/c/1505

The problem that we currently disable OpenMP if either the C or C++ compiler doesn't have OpenMP, is caused by the behavior of find_package(OpenMP). It only reports OPENMP_FOUND=yes if all enabled languages support it. I think the proper fix is to not enable C++ unless it is needed (i.e. if OpenMM is enabled). This solves the OpenMP problem and more importantly fixes that it is possible to compile 4.6 without C++ compiler which is currently not possible. I'll upload a fix for that in <1hr.

Actually, CUDA uses the C++ compiler as a back-end, so we can only compile 4.6 with no OpenMM and no native GPU support!

#26 Updated by Roland Schulz almost 7 years ago

Szilárd Páll wrote:

Actually, CUDA uses the C++ compiler as a back-end, so we can only compile 4.6 with no OpenMM and no native GPU support!

Thanks for reminding me! I'll update my patch. BTW: That would be another reason to not enable GPU by default. It would enable C++ by default which is against the idea that 4.6 is our last C-only release.

#27 Updated by Szilárd Páll almost 7 years ago

Roland Schulz wrote:

Szilárd Páll wrote:

Actually, CUDA uses the C++ compiler as a back-end, so we can only compile 4.6 with no OpenMM and no native GPU support!

Thanks for reminding me! I'll update my patch. BTW: That would be another reason to not enable GPU by default. It would enable C++ by default which is against the idea that 4.6 is our last C-only release.

Oh, come on... The initial plan was to even compile the whole 4.6 with C++. IMHO whether it's C-only or not is a mostly cosmetic thing -- except the few C++ concerns we've talked about that need to be addressed anyway. Which reminds me, I've had the patch to remove the "--add-needed" and add the dummy C++ file, but I was not sure about converting mdrun.c to cpp. Are you certain that it is required?

#28 Updated by Szilárd Páll almost 7 years ago

Btw, C++-tainted source compilation is really such a big deal, I could try to switch to making CUDA generate C host code, but it might take quite some. Should I?

#29 Updated by Roland Schulz almost 7 years ago

Well I was in favor of having 4.6 already be C++. But that's besides the point. The decision was made and shouldn't be changed right before the release. If I remember correctly the decision was that it should work without a C++ compiler. As far as I know you can't generate C code with CUDA>3 (the comment in FindCUDA says so). I'm not saying just because the C++ issue GMX_GPU can't be on by default. It just adds to the arguments. And the only thing relevant to this issue reported by Teemu is: not requiring C++ fixes the problem that both the C++ and C compiler need to have OpenMP support.

If I understand the C++ FAQ correctly it is not guaranteed to be OK to have C++ files and the main method not compiled by C++ (you don't need to rename it to make cmake use the C++ compiler). Whether it can really cause a problem with any compiler, I don't know. We haven't noticed any so I assume it isn't an issue with those compilers we tested. But nonetheless I would think it is better to be safe (e.g. what if you use different C and C++ like Teemu did - does it still work?), unless you see some possible disadvantage. Of course we should only compile the main method(s) with C++ if either OpenMM or GMX_GPU is on.

#30 Updated by Roland Schulz over 6 years ago

Are all OpenMP issues resolved? If so and given that the GPU issue has it's own issue (1018), do I see it correctly, that the only remaining subtask for this issue is the AVX linker problem ("no such instruction")?

What do we want to do about that? Should we test link an AVX file or should we detect the linker version? After detecting a problem should we just print an error or should we try to automatically use clang as linker?

#31 Updated by Teemu Murtola over 6 years ago

OpenMP is no longer causing issues (and that was the most critical of the reported problems). There are still two problems:
  • For some reason, by default, cmake picks up GCC as the C compiler, and then fails because AVX is not supported. So the user needs to manually either select SSE4.1 acceleration or set the compiler to clang. In the beginning of the main CMakeLists.txt, there is actually some lines that attempt to solve this, but don't seem to work.
  • The MacPorts gcc is broken with AVX. From the discussion, I gather that there is reluctance to do anything for this, but at least we should add a line in the installation instructions.

#32 Updated by Roland Schulz over 6 years ago

Teemu Murtola wrote:

OpenMP is no longer causing issues (and that was the most critical of the reported problems). There are still two problems:
  • For some reason, by default, cmake picks up GCC as the C compiler, and then fails because AVX is not supported. So the user needs to manually either select SSE4.1 acceleration or set the compiler to clang. In the beginning of the main CMakeLists.txt, there is actually some lines that attempt to solve this, but don't seem to work.

Should clang be the default? It has the disadvantage it doesn't support OpenMP. Also the CMAKE_C_COMPILER_INIT doesn't seem to be meant for setting the compiler and cmake doesn't seem to provide a way to set the default compiler. It depends on which version of gcc and which version of the linker you have whether AVX fails.

  • The MacPorts gcc is broken with AVX. From the discussion, I gather that there is reluctance to do anything for this, but at least we should add a line in the installation instructions.

No the gcc is fine. The assembler (from binutils) is too old (I incorrectly wrote linker in my last comment). The solution is to use gcc as compiler and clang as assembler: http://old.nabble.com/Re%3a-gcc,-as,-AVX,-binutils-and-MacOS-X-10.7-p32584737.html . But I'm not sure how much we want to do about it.

#33 Updated by Teemu Murtola over 6 years ago

Roland Schulz wrote:

Should clang be the default? It has the disadvantage it doesn't support OpenMP. [clip] It depends on which version of gcc and which version of the linker you have whether AVX fails.

I think the discussion here should focus on the default compilers on OS X (at least that is what Erik and Szilard have been pushing). This means that the choice is between Apple-provided gcc 4.2 (which does not support AVX, nor OpenMP (explicitly blacklisted)), or clang (where at least the version for Lion and Mountain Lion supports AVX out of the box, but no OpenMP). Since neither supports OpenMP, it makes no difference, and the AVX support speaks for clang. But if there is no way to do this in CMake, this is another thing that should be mentioned in installation instructions.

  • The MacPorts gcc is broken with AVX. From the discussion, I gather that there is reluctance to do anything for this, but at least we should add a line in the installation instructions.

No the gcc is fine. The assembler (from binutils) is too old (I incorrectly wrote linker in my last comment).

I guess it is a matter of semantics of what is broken here. The system provides more than one assembler, and the gcc from MacPorts chooses to use one that breaks AVX support. And MacPorts explicitly targets this environment, so I would argue that they should fix it such that AVX support works for their gcc without the user hacking in their system files, but I guess there are other views. There is not much we can do in CMake unless we want to introduce special-purpose detection code just for this purpose, so I think a mention in installation instructions is sufficient.

#34 Updated by Erik Lindahl over 6 years ago

My main reason for advocating clang is that it is very obvious apple is trying to move away from gcc as quickly as they can due to GPLv3. I think it's quite possible OS X 10.9 won't even have gcc any more. Realistically, most apple installations are going to be low-core laptops and desktops, where OpenMP support does not make a huge difference, if any.

Is there any way we can detect the broken MacPorts gcc already during compile without executing stuff? (i.e., functionality-wise, not by checking a version).

#35 Updated by Szilárd Páll over 6 years ago

Erik Lindahl wrote:

My main reason for advocating clang is that it is very obvious apple is trying to move away from gcc as quickly as they can due to GPLv3. I think it's quite possible OS X 10.9 won't even have gcc any more. Realistically, most apple installations are going to be low-core laptops and desktops, where OpenMP support does not make a huge difference, if any.

Actually, on SB with 4-cores OpenMP can be 10-15% faster than thread-MPI. Additionally, there might be some non-negligible performance difference between clang and newer gcc-s as well.

Is there any way we can detect the broken MacPorts gcc already during compile without executing stuff? (i.e., functionality-wise, not by checking a version).

If nothing else one could try to compile a small piece of code and see if the assembling fails, I guess, but there might be more elegant solutions.

#36 Updated by Roland Schulz over 6 years ago

By default cmake search for the C compiler in the order "gcc cc cl bcc xlc". By setting CMAKE_GENERATOR_CC (before project(...)) to e.g. "clang cc gcc" or "icc gcc-4.7 clang cc gcc" we could modify the preference for the compiler and the user could still choose a different one as before. The disadvantage of this approach is that CMAKE_GENERATOR_CC is only suppose to be set by the generator so it is somewhat of undefined behavior but it seems to work fine. I can't find any other way to influence the compiler preference, so this would probably the only automatic way. The alternative, as Teemu already said, is to simply document that the user should choose a different compiler than the gcc-4.2 chosen by default. The compiler can't be changed later on in CMakeLists.txt. So we couldn't try gcc-4.7 first, but use clang if the gcc uses an assembler without AVX support.

#37 Updated by Roland Schulz over 6 years ago

I uploaded https://gerrit.gromacs.org/#/c/1678/1 for testing AVX. For CMAKE_GENERATOR_CC I'll wait for feedback.

#38 Updated by Mark Abraham over 6 years ago

  • Assignee changed from Rossen Apostolov to Mark Abraham

I rediscovered this issue this evening, sigh. macports gcc 4.7 emits assembly that does not assemble with GMX_CPU_ACCELERATION=AVX_256.

GNU assembler does not support Mac, and the system assembler does not support AVX (http://mac-os-forge.2317878.n4.nabble.com/gcc-as-AVX-binutils-and-MacOS-X-10-7-td144472.html) and Apple's bug tracker seems to be down and I really don't care any more whether Apple might ever fix/update the tool chain - probably they won't.

The hack solution in that thread (and as mentioned previously in this thread) of replacing the system assembler (which gcc is hard-coded at configure time to use before an as in the path!) works. GROMACS CMake can compile the AVX test compilation, and proceed to compile at least the SSE kernels (but is rather noisy about warnings, of course). However, that solution uses clang in assembler mode. gmx_wallcycle.c can then fail with errors like "error: invalid instruction mnemonic 'vcvttsd2siq'" which is apparently caused by an LLVM bug that was apparently (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-July/052056.html) fixed (http://llvm.org/viewvc/llvm-project?view=rev&revision=160775) in July this year. The GROMACS code is fine, but at least macports clang-2.9 can't assemble the code emitted by gcc from gmx_wallcycle.c (and mdebin_bar.c and sim_util.c, I think)

macports clang-3.3 was able to assemble if I called it from the hack script, but all of this is far too much of a contortion for us to contemplate supporting/encouraging. CC=/opt/local/bin/clang-mp-3.3 cmake .. also failed the AVX test, and I no longer care why.

One reasonable thing we can do is augment the error message CMake emits when GROMACS tries to compile the AVX test program. There is apparently sort-of no functional AVX assembler available for Mac, and the user should re-configure with a lower acceleration setting. Unless we have a real report that Intel compiler or Apple clang works with AVX?

However that means every user building on a recent Mac will hit that error, and that makes us look bad. We could also implement a work-around that the suggested acceleration on Mac is capped at SSE4.1. If the user knows what they are doing, they can set -DGMX_GPU_ACCELERATION and organize AVX support from their compiler and/or assembler (and our install guide can hint that this is possible), but we need the default build to Just Work.

#39 Updated by Roland Schulz over 6 years ago

Or we could recommend to use pre-compiled binaries for Mac.

#40 Updated by Erik Lindahl over 6 years ago

We are not talking huge differences between the various assembly alternatives, so I think we should consider moving the warning about non-matching levels to the log file. Using SSE4.1 is quite fine.
Alternatively, I'm not sure how much we lose by skipping OpenMP on Macs.

When using icc on Macs to try and configure GPUs, I get a CMake error message about "the current compiler not being compatible with nvcc". If I ignore that and simply set the nvcc host compiler to /usr/bin/gcc (clang) I get:
cc1plus: error: unrecognized command line option "-ip"
cc1plus: error: unrecognized command line option "-mavx"
CMake Error at gpu_utils_generated_memtestG80_core.cu.o.cmake:206 (message):

Error generating
/Users/lindahl/Code/gmx/git/release-4-6/bgpu/src/gmxlib/gpu_utils/CMakeFiles/gpu_utils.dir//./gpu_utils_generated_memtestG80_core.cu.o

That looks like an obvious error. Instead I installed gcc-4.7 and try to use that for everything, but then I get pretty much the same error:

nvcc fatal : redefinition of argument 'compiler-bindir'
CMake Error at gpu_utils_generated_memtestG80_core.cu.o.cmake:206 (message):
Error generating
/Users/lindahl/Code/gmx/git/release-4-6/bgpu/src/gmxlib/gpu_utils/CMakeFiles/gpu_utils.dir//./gpu_utils_generated_memtestG80_core.cu.o

This is when using ccmake (since I'm lazy and cannot remember all options). It might be something that is auto-configured too early under-the-hood, but that would work fine if set on the command line prior to invoking cmake.

Apart from the GPU side of things, icc works great and supports both OpenMP and AVX.

#41 Updated by Szilárd Páll over 6 years ago

Erik Lindahl wrote:

We are not talking huge differences between the various assembly alternatives, so I think we should consider moving the warning about non-matching levels to the log file. Using SSE4.1 is quite fine.
Alternatively, I'm not sure how much we lose by skipping OpenMP on Macs.

In the range of 5-25% so as the AVX kernels didn't deliver as much performance as expected, the lack of OpenMP can easily cause more performance loss than using SSE4.1 instead of AVX.

When using icc on Macs to try and configure GPUs, I get a CMake error message about "the current compiler not being compatible with nvcc". If I ignore that and simply set the nvcc host compiler to /usr/bin/gcc (clang) I get:
cc1plus: error: unrecognized command line option "-ip"
cc1plus: error: unrecognized command line option "-mavx"
CMake Error at gpu_utils_generated_memtestG80_core.cu.o.cmake:206 (message):

Error generating
/Users/lindahl/Code/gmx/git/release-4-6/bgpu/src/gmxlib/gpu_utils/CMakeFiles/gpu_utils.dir//./gpu_utils_generated_memtestG80_core.cu.o

That looks like an obvious error. Instead I installed gcc-4.7 and try to use that for everything, but then I get pretty much the same error:

nvcc fatal : redefinition of argument 'compiler-bindir'
CMake Error at gpu_utils_generated_memtestG80_core.cu.o.cmake:206 (message):
Error generating
/Users/lindahl/Code/gmx/git/release-4-6/bgpu/src/gmxlib/gpu_utils/CMakeFiles/gpu_utils.dir//./gpu_utils_generated_memtestG80_core.cu.o

This is when using ccmake (since I'm lazy and cannot remember all options). It might be something that is auto-configured too early under-the-hood, but that would work fine if set on the command line prior to invoking cmake.

Apparently CUDA only supports the gcc toolchain that comes with Xcode:
http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-mac-os-x/index.html

That doesn't mean it could not work with icc, but I'd have to take a closer look to see what's wrong. I hinted some possible reasons here: https://gerrit.gromacs.org/#/c/1676/8.

Apart from the GPU side of things, icc works great and supports both OpenMP and AVX.

It would still be good to have a default that both works and even it it's sub-optimal it suggests a way to get better performance - at least a way to compile with OpenMP.

#42 Updated by Erik Lindahl over 6 years ago

  • Status changed from New to Closed

This issue has been fixed to the extent that is realistic in the 4.6 branch. Gromacs builds both with clang and gcc now, but we cannot both have the cake (AVX) and eat it (OpenMP) unless we use icc.

Also available in: Atom PDF