Particles decomposition

  • Finest element of parallelism is the update group (one heavy atom plus its hydrogens). The update groups can be independently integrated.
  • Grouping has to be done with locality for modules (this means sorting).
  • Weird modules can be fairly efficiently implemented as consumers of sets of update groups, even if this is slightly more data movement.
  • There will be a spatial domain decomposition.
  • The needs of important modules should be prepared by search code, e.g. halo exchange and (restrained) update for a domain.
  • Particles in a domain should be grouped as local (L), non-local to be sent (NL send), and non-local that should be received (NL receive). Within each of three particles are grouped as DD cells that can be routed to e.g. different GPUs on this (or another) node.
  • Within DD cells, particles should be sorted by the kind of constraint kernel needed (in some cases, by the kind of bonded type or special forces type).
  • Within DD cells, particles may be grouped into search cells for SR efficiency.
  • Within search cells:
    • Neighbor lists are constructed for SR kernels, * Update group ranges are constructed for update constraints kernels * From LINCS atom ranges are used for bonded kernels
The reasoning for decomposing the DD cells by the type of constraints needed:
  • Different constraint kernel will be called for different groups.
  • SETTLERS don't have bonds, hence SETTLE kernel can run simultaneously with the bonded kernel

Attached image can be edited at: