Project

General

Profile

Bug #958

MPI on Windows

Added by Roland Schulz about 7 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Change 417d0affff5485d10f905aa3581f14602aca873a to solve issues 851 and 636 broke the support for MPI on Windows. MPICH2 for Windows doesn't come with any mpi wrappers thus relaying on them cannot work on Windows. Without the change I am able to compile with MPICH2 on Windows. With OpenMPI I can't get it to compile with either the wrapper and with the change, or without the wrapper and without the change.

Associated revisions

Revision 92789c84 (diff)
Added by Roland Schulz about 7 years ago

Fix MPI build for MPI library without mpi wrapper

Some MPI library (e.g. MPICH and OpenMPI on Windows) don't
contain any MPI wrappers and thus 417d0afff broke the support for
those MPI libraries.

This change uses the FindMPI module if the compiler is not a
mpi wrapper and cmake >= 2.8.5. Requiring mpi wrappers for
cmake < 2.8.5 avoids reintroducing the problems of #851.

To avoid any confusion of different behavior with older and newer
versions of cmake, all documentation should recommend the mpi wrapper
approach, which works for all versions. The FindMPI approach should
only be discussed for advanced users (to support e.g. Windows).

As a side effect this change makes it more convinient to use with
cmake>=2.8.5. No need to speciy mpi wrapper and less problems with
nvcc.

Fixes #958

Change-Id: Ic53d8125c5a58edc6789fe16f2b710e7e2568d4f

History

#1 Updated by Roland Schulz about 7 years ago

I suggest we add FindMPI back in so that we don't break Windows and potential other odd MPI implementations (MPI standard doesn't require mpicc to exist). I think we can avoid all the problems described in #851 by keeping the current approach as the recommended/documented approach.
I propose, if the user does not set CC=mpicc (meaning CC is an MPI wrapper - it doesn't need to be called "mpicc") then we automatically fall back to FindMPI. This is in fact exactly as it was before. We already had a check whether CC=mpicc and only used FindMPI if that wasn't the case. The only difference to before removing FindMPI would be that CC=mpicc would be the documentation/recommendation appraoch.

The result:
- The user is expected to use CC=mpicc. This works with all cmake versions.
- If the user does not set CC=mpicc we use FindMPI (might not work with older cmake versions).
- For an MPI implementation without mpicc (e.g. Windows) it works as long as the cmake version is current enough.
- For older cmake versions the user is not worse off then now. Whether the user forgot CC=mpicc on *nix or is on Windows: in both cases it doesn't work but it doesn't work right now either.
- If CC!=mpicc and FindMPI fails then we print a message recommending to set CC to an mpi wrapper.

#2 Updated by Szilárd Páll about 7 years ago

I very much for this change!

I would add that not having to use mpicc as a compiler would solve another important issue: ATM with MPI the users has to manually set the nvcc host-compiler (e.g. gcc) manually as mpicc often doesn't work with nvcc.

Additionally, FindMPI almost always worked for me and it was slightly more hassle-free than having to always set CC=mpicc CXX=mpicxx.

#3 Updated by Mark Abraham about 7 years ago

I'm happy with Roland's suggested solution. We need to make sure the attempt to use FindMPI.cmake only takes place on sufficiently recent versions of CMake, and that not having an MPI compiler set up properly doesn't lead to error messages about CMake version that obscure the real problem.

I wouldn't be happy with reverting to FindMPI.cmake as the front-line weapon. In its current form it's not suited for our usage pattern. See links in #851 discussion.

On Szilard's side issues, I script all my initial calls to CMake, so that setting flags like CC is a non-event for me. When we beta-release 4.6 we should ensure the installation guide makes it easy to do a copy-and-paste CMake invocation that would suit OpenMPI-on-*nix usedrs. I'd script my CMake calls to make nvcc/gcc/mpicc behave, if I had that problem. ;-)

#4 Updated by Szilárd Páll about 7 years ago

Mark Abraham wrote:

On Szilard's side issues, I script all my initial calls to CMake, so that setting flags like CC is a non-event for me. When we beta-release 4.6 we should ensure the installation guide makes it easy to do a copy-and-paste CMake invocation that would suit OpenMPI-on-*nix usedrs. I'd script my CMake calls to make nvcc/gcc/mpicc behave, if I had that problem. ;-)

Scripting does not solve the pain the users have to go through when occasionally compiling GROMACS with GPU + MPI support (always having to separately set the host-compiler). Additionally, scripts don't solve the pain I've been having (since FindMPI got removed) while testing/benchmarking on a dozen machines.

#5 Updated by Szilárd Páll about 7 years ago

Actually, I would prefer that when the compiler is not set explicitly and GMX_MPI=ON, we should simply try using FindMPI (and issue a note if CMake is >=v2.8.5 or a warning otherwise).

#6 Updated by Roland Schulz about 7 years ago

Szilárd Páll wrote:

Actually, I would prefer that when the compiler is not set explicitly and GMX_MPI=ON, we should simply try using FindMPI

Yes. If it cannot compile the test-mpi-program without any flags (and thus the compiler is not an mpi wrapper) it automatically uses FindMPI if cmake >=2.8.5.

(and issues a note if CMake is >=v2.8.5 or a warning otherwise).

I just uploaded a new version for https://gerrit.gromacs.org/#/c/1159/ with updated error message if MPI isn't available (by any attempted approach). It now says for <2.8.5:

  MPI support requested, but no MPI compiler found.  The C compiler
  (CMAKE_C_COMPILER) has to be set to the MPI compiler (often called mpicc).
  Or use a newer cmake version (>=2.8.5) for semi-automatic MPI detection.

And for >=2.8.5:
  MPI support requested, but no MPI compiler found.  Either set the
  C-compiler (CMAKE_C_COMPILER) to the MPI compiler (often called mpicc), or
  set the variables reported missing for MPI_C above.

#7 Updated by Mark Abraham about 7 years ago

Szilárd Páll wrote:

Mark Abraham wrote:

On Szilard's side issues, I script all my initial calls to CMake, so that setting flags like CC is a non-event for me. When we beta-release 4.6 we should ensure the installation guide makes it easy to do a copy-and-paste CMake invocation that would suit OpenMPI-on-*nix usedrs. I'd script my CMake calls to make nvcc/gcc/mpicc behave, if I had that problem. ;-)

Scripting does not solve the pain the users have to go through when occasionally compiling GROMACS with GPU + MPI support (always having to separately set the host-compiler). Additionally, scripts don't solve the pain I've been having (since FindMPI got removed) while testing/benchmarking on a dozen machines.

AFAIK, the "pain" should only be at the initial call to CMake, where CMake captures the contents of the environment - and for most users that is a "once in a long time" event. Compiling shouldn't ever be a pain, I think. Since setting up MKL is a pain, when I want to test it, I use

#!/bin/sh
module load intel-cc
module load intel-mkl
ccmake .. -DGMX_FFT_LIBRARY=mkl -DMKL_LIBRARIES=${MKL}/lib/em64t/libmkl_intel_thread.so;${MKL}/lib/em64t/libmkl_lapack.so;${MKL}/lib/em64t/libmkl_core.so;${MKL}/lib/em64t/libmkl_em64t.a;${MKL}/lib/em64t/libguide.so;/usr/lib64/libpthread.so -DMKL_INCLUDE_DIR=${MKL}/include -DGMX_MPI=ON -DCMAKE_C_COMPILER=`which mpicc` -DGMX_THREADS=OFF -DCMAKE_INSTALL_PREFIX=$HOME/progs -DCMAKE_C_FLAGS_DEBUG="-g -O0 -Wstrict-prototypes -diag-disable 2215"

What configuration can you do that can't be done in a shell script before the call to CMake?

#8 Updated by Roland Schulz about 7 years ago

This discussion is getting off-topic. This bug is about fixing that MPI on windows is broken. The patch I submitted to Jenkins fixes that without breaking anything else. Please let me know if you have alternative solutions or improvements to my solution.

#9 Updated by Szilárd Páll about 7 years ago

True that.

My single objection is that FindMPI used to work just fine with pre-2.8.0 CMake version as well. It there any strong reason why we can't enable it? I'm not very familiar with the module, but it seems that the MPI_C/MPI_CXX something new so I couldn't just test it by changing the version conditional.

The reason why I would like to have the FindMPI mechanism back is that right now users will have to bother with setting CUDA_NVCC_HOST_COMPILER because i) the MPI compiler wrappers often don't work as host-compiler with nvcc ii) it's hard to automatically figure out what compiler is the wrapper using.

However, if the user didn't have to use MPI compiler wrappers as CC/CXX, a better automation is possible (see cmake/gmxManaNvccConfig.cmake).

#10 Updated by Mark Abraham about 7 years ago

Szilárd Páll wrote:

True that.

My single objection is that FindMPI used to work just fine with pre-2.8.0 CMake version as well. It there any strong reason why we can't enable it? I'm not very familiar with the module, but it seems that the MPI_C/MPI_CXX something new so I couldn't just test it by changing the version conditional.

FindMPI.cmake was introduced at some point of the 2.8.x CMake development cycle, and IIRC at least one version we once used does not work correctly with all of them. Presumably we can trawl git log or Redmine for details. I'm happy if we explore relaxing the use of FindMPI.cmake to pre-2.8.5, but if my memory is right, we know we need to validate for each CMake version.

The reason why I would like to have the FindMPI mechanism back is that right now users will have to bother with setting CUDA_NVCC_HOST_COMPILER because i) the MPI compiler wrappers often don't work as host-compiler with nvcc ii) it's hard to automatically figure out what compiler is the wrapper using.

mpicc -showme:commmand works for OpenMPI to solve ii). Dunno about i).

However, if the user didn't have to use MPI compiler wrappers as CC/CXX, a better automation is possible (see cmake/gmxManaNvccConfig.cmake).

#11 Updated by Szilárd Páll about 7 years ago

Mark Abraham wrote:

Szilárd Páll wrote:

True that.

My single objection is that FindMPI used to work just fine with pre-2.8.0 CMake version as well. It there any strong reason why we can't enable it? I'm not very familiar with the module, but it seems that the MPI_C/MPI_CXX something new so I couldn't just test it by changing the version conditional.

FindMPI.cmake was introduced at some point of the 2.8.x CMake development cycle, and IIRC at least one version we once used does not work correctly waith all of them. Presumably we can trawl git log or Redmine for details. I'm happy if we explore relaxing the use of FindMPI.cmake to pre-2.8.5, but if my memory is right, we know we need to validate for each CMake version.

Right, that's exactly why I was suggesting that FindMPI could always attempt to make things work (and if we know it might not work, i.e. with pre-2.8.5 CMake we can issue a big warning suggesting that in case of failure reconfigure with the MPI compiler wrappers).

I know that the implementation will be neither as robust nor as as elegant as it is now (it would require a special case for the non MPI_C-case right?), but would save the user from having to deal with the the quite scary notion of nvcc host-compiler -- which in most cases will be the compiler that would anyway be used -- or ignore the message and run the risk of potentially having funky errors due to mixing compilers.

The reason why I would like to have the FindMPI mechanism back is that right now users will have to bother with setting CUDA_NVCC_HOST_COMPILER because i) the MPI compiler wrappers often don't work as host-compiler with nvcc ii) it's hard to automatically figure out what compiler is the wrapper using.

mpicc -showme:commmand works for OpenMPI to solve ii). Dunno about i).

Good to know, thanks for the info! I could implement a special case in gmxManageNvccConfig to detect the underlying compiler, but this will, of course, not cover all cases.

#12 Updated by Szilárd Páll about 7 years ago

  • Category set to mdrun
  • Target version set to 4.6

#13 Updated by Anonymous about 7 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

#14 Updated by Szilárd Páll about 7 years ago

  • Status changed from Closed to Feedback wanted

Unfortunately this has been merged in prematurely before deciding on whether to pursue the goal dicussed above.

I'll change it to "feedback" until we finish the discussion.

Btw anonymous is Redmine auto-closing the issue (FYI Rossen changed the settings recently). For details and further discussion go to #694.

#15 Updated by Mark Abraham about 7 years ago

https://gerrit.gromacs.org/#/c/1166/ extended the fix to allow CMake < 2.8.5 to try its FindMPI.cmake. Results will be mixed (see discussion there).

#16 Updated by Roland Schulz about 7 years ago

  • Status changed from Feedback wanted to Closed

1166 was merged.

#17 Updated by Szilárd Páll about 7 years ago

It would be good to have the documentation explain the changes in the MPI detection.

#18 Updated by Roland Schulz about 7 years ago

Where do you want to add it? Do we have any installation instructions?

#19 Updated by Szilárd Páll about 7 years ago

Roland Schulz wrote:

Where do you want to add it? Do we have any installation instructions?

There is a CMake instruction page, quite misplaced, under the developer documentation.

#20 Updated by Roland Schulz about 7 years ago

done

Also available in: Atom PDF