- Finest element of parallelism is the update group (one heavy atom plus its hydrogens). The update groups can be independently integrated.
- Grouping has to be done with locality for modules (this means sorting).
- Weird modules can be fairly efficiently implemented as consumers of sets of update groups, even if this is slightly more data movement.
- There will be a spatial domain decomposition.
- The needs of important modules should be prepared by search code, e.g. halo exchange and (restrained) update for a domain.
- Particles in a domain should be grouped as local (L), non-local to be sent (NL send), and non-local that should be received (NL receive). Within each of three particles are grouped as DD cells that can be routed to e.g. different GPUs on this (or another) node.
- Within DD cells, particles should be sorted by the kind of constraint kernel needed (in some cases, by the kind of bonded type or special forces type).
- Within DD cells, particles may be grouped into search cells for SR efficiency.
- Within search cells:
- Neighbor lists are constructed for SR kernels, * Update group ranges are constructed for update constraints kernels * From LINCS atom ranges are used for bonded kernels
- Different constraint kernel will be called for different groups.
- SETTLERS don't have bonds, hence SETTLE kernel can run simultaneously with the bonded kernel
Attached image can be edited at: https://docs.google.com/drawings/d/1GNmP5-258x6dfViV-zVFol8tBzvWc__lIMzKUlxtE-Y/edit?usp=sharing