Project

General

Profile

Task #996

C++ MPI Framework

Added by Roland Schulz about 7 years ago. Updated over 5 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
core library
Target version:
Difficulty:
uncategorized
Close

Description

Select a C++ MPI Framework so that C++ classes can be send via MPI.


Related issues

Related to GROMACS - Task #765: Improving serialization of data structures prior to communicationNew06/21/2011
Blocks GROMACS - Feature #868: Implement parallelization support to analysis frameworkIn Progress

History

#1 Updated by Roland Schulz about 7 years ago

  • Assignee set to Roland Schulz

#2 Updated by Roland Schulz about 7 years ago

Introduction to MPP mpi type traits as proposed for Gromacs

Parallel analysis in Gromacs requires a way to transfer large data structures between processes. Data structures need to be serialized, transferred, and then unserialized after delivery. Serializing complex data structures (e.g. in the 4.6 code for global-statistics or topology-broadcast) can potentially require a great deal of extra code per data structure, which can be difficult to maintain. Thus serialization code is currently often avoided, and instead several communication calls are used to send one structure. This causes bad performance and bad scaling. With MPP mpi type traits, however, most of this difficulty can be avoided with very little additional support code. MPP infers the MPI types from the C++ types of the send/recv buffer, and has (usually) no runtime overhead over hand-coded MPI code. For simple data build-in MPI datatypes are used and for struct/classes it is very simple to declare derived datatypes. The line

mpi::send(data, rank, tag);

will automatically send data correctly no-matter whether data is of type "int", "vector<int>", "vector<some_struct>", or any other type. With mpi type traits, a "type trait" is defined for any custom type that needs to be serialized. These traits specify how to serialize the type and are defined external to the type itself, so no additional methods need to be added to already defined classes. Most importantly, type traits can be defined recursively. MPP already defines how to serialize basic types (int, float, etc.), pointer types (scoped, shared, etc.), and STL vectors of defined types. (Other STL container types can be added as needed.) This in turn makes it simple to define serializations for complex Gromacs types.

A type trait consists of one function: get_layout. This function is defined so that they can be immediately used in normal MPI calls. For example, stl::vector get_layout returns the layout of the stored type only resized by the length of the vector. Similarly, type traits for pointer types return the layout of the type pointed to, with an adjusted address.

Defining a new type trait is normally straightforward by using the "struct_layout_builder" class. For example, the following is the type trait definition for the AnalysisDataFrameHeader (a clsss from the new analysis framework):

template <>
inline data_layout mpi_type_traits<gmx::AnalysisDataFrameHeader>::get_layout(
        const gmx::AnalysisDataFrameHeader& adfh)
{
    return struct_layout_builder(adfh,3).
            add(adfh.index_).add(adfh.x_).add(adfh.dx_).build();
}
SET_MPI_STATIC(gmx::AnalysisDataFrameHeader);

The "add" method works for any types already defined. If a type is not defined, a similar builder can be used for that typed. Users of the library (=Gromacs developers) only need to know about the builder to use mpp for MPI communication. The SET_MPI_STATIC macro is not required but is recommended for contiguous types that do not contain pointers. This allows for various optimizations when transferring data.

Alternatives for MPI Communication in Gromacs

Several C++ projects to simplify MPI serialization exist: Boost::MPI, mpp, TPO++, MPI Serializer and a few others (referred in the respective papers). We have looked in detail at Boost and mpp and created for each a reference implementation. We also looked at the others in some detail at least as far as reading the papers.

The different solutions can be categorized by:
  • Usage of MPI datatypes
  • External tool or library

Boost::MPI uses the Boost::Serialization framework to serialize objects which can then be sent by MPI. It relies on a large part of Boost because Boost::Serialization relies on both Boost::Preprocessor and Boost::MPL. The performance, especially for small messages, is not very good.

MPI Serializer is a GUI that does one-way code generation. This makes it difficult to work with the resulting code outside of the GUI, and the project also seems to no longer be maintained.

TPO++ uses MPI datatypes and is similar to mpp (see next paragraph) but is significantly larger and hasn't been updated since 2007.

mpp, the currently implemented method, is a very lightweight C++ wrapper (~1000 lines) around MPI datatypes. MPI datatypes have been added for the purpose of sending complex datatypes and are in some cases hardware accelerated. The "mpp paper": http://doi.ieeecomputersociety.org/10.1109/PDP.2012.42 has performance comparisons to boost. Type traits are used to define the mpi-datatype for each class. The type traits allow one to define the datatypes with custom classes but also allow one to create them for stl classes, which would directly work if it was implemented as member functions. mpp doesn't do any allocation or error checking on the receiving side. It assumes that the sender and receiver objects are allocated, including all dynamically allocated members. Also it assumes that the receiver and sender sizes (e.g. for vectors) match. While this is less convenient, it allows sending data in one step without first having to send sizes.

Currently mpp doesn't work with tmpi. As long as mpp is only used for the analysis tools this isn't a problem because thread parallelization is planned anyhow. Thus MPI is not needed for runs on a single node. If/when mpp is also used for code which needs to work with tmpi (e.g. replica-exchange) then tmpi needs to be extended by MPI-datatypes. The main function used is MPI_Type_create_struct.

Disadvantages of MPP

The paper on TPO++ (http://tpo.sourceforge.net/doc/P07-icpp2000_final.pdf) provides a nice comparison of different approaches. Mpi++ is discussed in the TPO++ paper. It is very similar to the approach of MPP. Here we address some possible disadvantages of MPP (as pointed out for Mpi++):

  • Type traits for subclasses
    The paper states that this approach "prohibits inheritance of type information". This is not true. With MPP, a subclass can cast to its superclass to access its type, and then append the new fields. For example:

    class MyStoredFrame :: StoredFrame
    {
      int x;
    }
    
    template <>
    inline MPI_Datatype mpi_type_traits<gmx::AnalysisDataStorage::Impl::MyStoredFrame>::get_layout(const gmx::AnalysisDataStorage::Impl::MyStoredFrame& msf)
    {
    mpi_type_builder builder(msf);
    builder.add(static_cast<StoredFrame>(msf));
    builder.add(msf.x);
    return builder.build();
    }
    

    This approach has the disadvantage that the additional fields are in a different MPI_Datatype then the elements from the superclass. This could cause unnecessary overhead for deep inheritance hierarchies. With additional library methods, this could be fixed (by allowing access to builders of other types). In the current code, the above approach has not yet been necessary. Alternative one could automatically simply the created MPI_Datatypes as described in point 3 to optimize types of both inheritance and composition.
  • Code Simplification
    The paper states that "The implementation of user-defined types does not substantially simplify the MPI datatypes interface." The builder interface that has been added on top of MPP (see introduction), though, does make for a much simpler interface.
  • Inefficient type construction
    Currently, nested structs are treated as separate MPI_Datatypes, rather than being combined into a single struct. It is not known if common MPI libraries do this optimization. In the future it would be worthwhile to benchmark whether this approach significantly delays type construction, either when the type is actually built or when packets are built prior to communication. MPI_datatypes are constructed "on-the-fly" as needed, and are not currently cached, which exacerbates this problem. This could be mitigated some by caching types, but caching would only work for static types or for dynamic type where only values are changed. Update: Caching has been added.

Update: See Teemu's comment below on how to optimize deep inheritance and deep composition.

#3 Updated by Roland Schulz about 7 years ago

A WIP implementation is at https://gerrit.gromacs.org/#/c/1316/

#4 Updated by Teemu Murtola about 7 years ago

A few brief comments on the "disadvantages of MPP" section:
  • I think both the first and third point would be possible to solve by changing the signature of data_layout mpi_type_traits::get_layout(const T &) to void mpi_type_traits::get_layout(const T &, mpi_type_builder &) (using names from the latest version in gerrit). A bit more complex declaration, but on the other hand, possibly easier code otherwise, in particular since it is only called from the global get_layout(). Don't really know which one would be better, though, as using nested types potentially allow better sharing.
  • In the example code, dynamic_cast<StoredFrame> should probably be static_cast<StoredFrame>. Or it could also be simply builder.add<StoredFrame>(msf).
  • All the partial template specialization etc. in MPP may confuse people not very familiar with C++. This is in an external library, though, and not in Gromacs itself. But people probably want to understand how it works if they need to write serialization for their own classes.

#5 Updated by Roland Schulz about 7 years ago

Thanks for the feedback.
Teemu Murtola wrote:

A few brief comments on the "disadvantages of MPP" section:
  • I think both the first and third point would be possible to solve by changing the signature of data_layout mpi_type_traits::get_layout(const T &) to void mpi_type_traits::get_layout(const T &, mpi_type_builder &) (using names from the latest version in gerrit). A bit more complex declaration, but on the other hand, possibly easier code otherwise, in particular since it is only called from the global get_layout(). Don't really know which one would be better, though, as using nested types potentially allow better sharing.

I like that idea. Non-static types can't be shared anyhow so at least for them it would be advantageous. For static types one could use a heuristic or make it configurable whether they are shared or whether they are incoporated into the larger type. E.g. one could use the version if the cache if it already exists and if not incorporate it into the type.

  • In the example code, dynamic_cast<StoredFrame> should probably be static_cast<StoredFrame>. Or it could also be simply builder.add<StoredFrame>(msf).

Agree

  • All the partial template specialization etc. in MPP may confuse people not very familiar with C++. This is in an external library, though, and not in Gromacs itself. But people probably want to understand how it works if they need to write serialization for their own classes.

I agree that several places in mpp don't qualify as easy C++. But I doubt it is possible to write a serialization library only in easy C++ without having significant disadanvatages. E.g. I tried to replace the template specialization with function overloading. But the serious disadvantage of this is that if one forgets to create a overloaded get_layout function for a type which has an implicit type conversion to some other type which does have a get_layout overload, then one doesn't get a compile error but undefined behavior. So it is probably more a disadvantage of the idea to want to use any serialization/communication library then specifically mpp. Of course there are some complicate usage of C++ which could be removed without significant disadvantages. E.g. if you think we should remove the enable template parameter of mpi_type_traits we could do so. It is anyhow not used by Gromacs because it is used only for the enum specialization which cannot be used by code which needs to be C++03 compatibile.

#6 Updated by Roland Schulz almost 7 years ago

  • Target version set to 5.0

#7 Updated by Mark Abraham over 5 years ago

  • Target version changed from 5.0 to future

#8 Updated by Teemu Murtola over 5 years ago

  • Project changed from Next-generation analysis tools to GROMACS
  • Category set to core library

Also available in: Atom PDF