Project

General

Profile

Task #1793

cleanup of integration loop

Added by Mark Abraham over 4 years ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Difficulty:
uncategorized
Close

Description

The integrator loop has been in need of work for a long time (and it's not just the fault of the velocity-Verlet implementation), which we have been slowly cleaning up (see https://gerrit.gromacs.org/#/q/topic:md-loop-cleanup for just some of those patches).

This should help
  • find and fix bugs
  • make it reasonable to implement and document old and new integration schemes well (e.g. #1137),
  • modularize code (e.g. constraint, integrator, coupling algorithm, non-bonded scheme, and global-summation code should know as little about each other as possible)
  • pave the road for future task parallelism at this level (currently every function call gets passed pretty much everything, which means we have no idea what the data dependencies are, and bugs could be anywhere)
  • provide optimization opportunities (e.g. with leap-frog, at nsttcouple-1 steps we should be able to avoid using a blocking MPI_Allreduce to get the KE for use at the next step, but currently the necessary conditions are unclear)
  • avoid pessimizations like #692
  • make clear where developers of new features should add code

Subtasks

Task #2423: modernize constraints codeNewMark Abraham
Task #2492: implement force calculation via ForceProviders containing collections of IForceProviderNewMark Abraham
Task #2644: Replace compute_globalsNewMark Abraham

Related issues

Related to GROMACS - Feature #1137: Proposal for integrator framework (do_md) in future GROMACSNew
Related to GROMACS - Bug #1858: compute globals should not have logic about which integrator is in useClosed
Related to GROMACS - Task #1868: implement mdrun -rerun better, simplifying do_mdClosed
Related to GROMACS - Bug #2105: multi-domain rerun brokenClosed
Related to GROMACS - Task #2616: Model for MD stateNew

Associated revisions

Revision 7bc8d6d2 (diff)
Added by Mark Abraham about 4 years ago

Remove CGLO_RERUNMD

Only one use of cglo_flags combines more than a single flag, so made
cglo_flags specific to that case.

No functionality changes here

Refs #1793

Change-Id: I10dbae9efdd3f8da340b7efec08305d5de563f9c

Revision 6ead809b (diff)
Added by Mark Abraham about 4 years ago

Remove "support" for twin-range with VV integrators

Group-scheme twin-range non-bonded interactions never worked with
VV+constraints, and removing it was a to-do item. There are no plans
to make it work with VV, and there are plans to remove the twin-range
scheme entirely, as well as rework leap-frog more closerly into the
Trotter scheme.

Refs #1137, #1793

Change-Id: Ibd70b5397568bfcd328cd6dd1c5c99384d7aaca8

Revision 600266df (diff)
Added by Mark Abraham about 4 years ago

Removed state_global from mdrun energy-summation

state_global was used only when MD_READ_EKIN was set, ie. when
restoring KE data when restarting from checkpoints. So there is
no use passing it to every call to compute_globals.

Accordingly, moved use of restore_ekinstate_from_state().

Also added some docs and const correctness.

Refs #1793

Change-Id: I937f397602c761c92cd8e48121b56b1805a05470

Revision 8b6c99a1 (diff)
Added by Mark Abraham almost 4 years ago

Simplified energy-summation code

We need to check that the DD has the correct number of bonded
interactions being handled across all domains. The old check would
happen at every call for computing energies, which was unnecessary
because the number of domain-local bonds can change only at DD
steps. It also required log file, global state and global topology to
be passed into every call to global_stat to handle the rare error case.

This patch moves the check for correct DD of bonded interactions up to
do_md, where it can be scheduled to happen only once per DD lifetime.
global stat now does not need to be passed parameters that only needed
for reporting in the case of a failed check.

The dispersion correction code now gets the global number of atoms
from a field in forcerec. This should really be set up when a
dispersion-correction object was created, but this approach will do
for now (particularly because dispersion correction code is probably
being called too often, so there's bigger problems to fix).

Together, mdrun energy-summation code no longer needs to be passed the
global topology, reducing code complexity.

Fixed documentation of global_stat.

Refs #1793

Change-Id: I595b4eff8f4cbb0bbaf295c386041a0966e4a094

Revision 8023fcdc (diff)
Added by Mark Abraham almost 4 years ago

Fix logic for DD missing-interactions check

The logic for the active part of this check got broken in 90aa9d65e0c3
before v4.5. Historically, there was no significant effect, because
all that patch changed is that we sum a double whose result won't
always be read. For all integrators, there's eventually a call where
CGLO_ENERGY is set and the sum is read, so the check is active and
works correctly. Thus, the worst-case result was that interaction(s)
were missing and not flagged until the next global-energies
communication step.

However, cleanup for #1793 moved the responsibility for the check out
to do_md, exposing the fact that for VV integrators (only),
CGLO_ENERGY isn't set for the "global signalling + leapfrog" call to
compute_globals. Thus the sum wasn't read, so the check failed when
totalNumberOfBondedInteractions still had the value -1 with which it
was initialized. Apparently, the regressiontests don't have enough
coverage of VV integrators to find this.

This also made me realise that this DD check has also been inactive
for the calls to compute_globals after initial DD, DD after replica
exchange, and DD during reruns, because none of those cases set
CGLO_ENERGY.

Fixes #1882. Refs #1793.

Change-Id: I0b5cc448175873ec0e5cee3c3d5023654b4f1b27

Revision f3c239f6 (diff)
Added by Mark Abraham almost 3 years ago

Fix handling of previous virials

These quantities get written to checkpoint files only for the Trotter
pressure-coupling integrators that need them, but they were being
copied in do_md for all Trotter integrators. This meant that an
appending restart of md-vv plus nose-hoover plus no pressure coupling
truncated off a correct edr frame and wrote one with zero virial and
wrong pressure. And in the same case, a no-append restart writes a
duplicate frame that does not agree with the one written before
termination.

Fixed by copying them only when they are useful to copy, so that in
the problem case, the correctly computed post-restart virial is not
wiped with zeroes from state fields that were never read from the
checkpoint. Cases that use the previous-virial values are not
affected.

Refs #1793

Change-Id: I84908122aefbe8658f423eaf4e5bd4ae25a93d24

Revision 280ba5a3 (diff)
Added by Mark Abraham over 1 year ago

Free more memory in grompp and mdrun

This will be increasingly useful as we start adding more GoogleTest
cases that call grompp and mdrun repeatedly.

Refs #1793, #1868

Change-Id: Ic0faaa4c3dec2891d0cdc6166561f8ccd4391f44

Revision ad79f778 (diff)
Added by Mark Abraham over 1 year ago

Create gromacs/mdrun module and move code there

Improving how we dispatch the work for do_md and friends requires that
they all compile as part of libgromacs (or all outside of it). The
code currently in src/programs/mdrun is there simply because it hasn't
been moved from the old src/kernel layout, rather than through any
clear design, so we should move it now that we have a reason.

Other code from src/programs/mdrun now needs to move into libgromacs
also, but we currently don't have a module for miscellanous mdrun
features, so we're adding to the abuse of mdlib for now.

Refs #1793
Refs #2423

Change-Id: I4f9ad5d0bf6585675472b49b2d5654f588b7214e

Revision 6f1f895f (diff)
Added by Mark Abraham over 1 year ago

Simplified handling of simulation runners

Argments to runners are now passed via struct members. That struct is
expected to change over the medium term, as most of its contents
should either be in a container of modules, or data with such a
module. As such, it does not make sense to do a lot of coding to
manage its contents in a sound way. The practical effects of using
this struct are almost the same as the old approach of passing a
forest of arguments - some variables are declared in a scope,
initialized with values, then used once.

The form is chosen so that
  • the do_md() etc. functions do not need a large number of useless
    textual changes now
  • future changes to names and types of the former parameters do not
    need to be edited in matching way in multiple places of code and
    their doxygen
  • we don't need to mark things gmx_unused anywhere

Useless unchecked return values have been removed.

Some excessive includes in integrator.h were removed, which
generates a few minor fixes elsewhere.

Refs #1793

Change-Id: I678598175192c9c68113fdd79fcee17f8e5c504e

Revision 1ab524ff (diff)
Added by Mark Abraham over 1 year ago

Reorganize energy evaluation for EM

Used aggregate initialized structs to simplify future refactoring.

Refs #1793, #2423

Change-Id: Iedb8bf3b4cb2d7f238c4b674ce459564a3f76a20

Revision acc3940a (diff)
Added by Mark Abraham over 1 year ago

Break apart update_constraints

There are four distinct kinds of work being done, and never was any
call to update_constraints doing all of them, so it's better to have a
group of functions, each of which do one thing, and the relevant ones
called. This also makes it simpler to express by returning fast that
when we don't have constraints, we do nothing.

Made the logic for whether this is a log or energy step match that of
the main MD loop. The old implementation may not have prepared for the
last step correctly when it was triggered by something other than the
nsteps inputrec value.

Removed a commment mentioning iteration, which is a feature
that was removed a while ago.

Removed some ancient debug dump output.

Refs #2423, #1793

Change-Id: I21c10826721ddc9a79a33b1dc75971a20d0855d9

Revision e9a53455 (diff)
Added by Mark Abraham over 1 year ago

Shift per-step control logic to do_md

There are multiple reasons why do_md can decide this is the last step,
so we should be consistent about it.

Refs #1793

Change-Id: I46f377630fa12be482ccb532c05d96cfe0537523

Revision 5ae5bf42 (diff)
Added by Mark Abraham over 1 year ago

Refactor SD update

The former use of multiple boolean control variables made the logic
hard to follow. Without constraints, the two parts of the integrator
are fused, which is now expressed explicitly.

This means it is now clear that the second half of the sd update does
not compute anything from the foreces.

Removed some of the vestiges of the way we once had two SD
integrators.

Refs #2423, #1793

Change-Id: I39d4cd0b8568859220b3a592b138f3f4405f8991

Revision 431258df (diff)
Added by Mark Abraham over 1 year ago

Move IMD initialization and fuse DD setup

init_IMD does not use the local state, and only the global state
positions on master rank, so we can move it before the local state
initialization.

This permits us to fuse some DD setup stages

Removed inaccurate comment about DD partioning.

Refs #1793

Change-Id: I00f98c22f5a17a1b9ae1114b513cfa3ea3798334

Revision 6c8a0c0e (diff)
Added by Mark Abraham over 1 year ago

Fix assertions for SD update

Some of the assertion logic in the refactoring of 5ae5bf42159cf was
wrong, but we apparently have no tests of the SD integrator before
now.

Refs #1793

Change-Id: I8acb6c5a9b6128aab7967fa8a1d32247f6365586

Revision 0e16db6d (diff)
Added by Pascal Merz about 1 year ago

Introduce reset handler

This change introduces a reset handler which encapsulates the
setting and handling of resetting signals, and the actual counter resetting.
This cleans up the do_md loop, exchanging several lines of code by
a single command. It also makes a step towards a task-based design by
only calling the internal functions when necessary (if counter resetting
was requested, signal setting only on master).

Refs #1793

Change-Id: I5dec051cd3a0c0b3b52dfe69a72dac7f7e9f65ed

Revision 71ec6f9d (diff)
Added by Pascal Merz about 1 year ago

Introduce checkpoint handler

This change introduces a checkpoint handler which encapsulates the
setting and handling of checkpoint signals, and owns the bCPT boolean.
This cleans up the do_md loop, exchanging several lines of code by
a single command. It also makes a step towards a task-based design by
only calling the internal functions when necessary (if checkpointing
is enabled, signal setting only on master), and by owning as much
data as possible.

Refs #1793

Change-Id: I05ca7abc9e81c9a8deacb6b81086e1077524d005

Revision 620230f3 (diff)
Added by Pascal Merz about 1 year ago

Introduce stop handler

This change introduces a stop handler which encapsulates the
setting and handling of stop signals, and owns the bLastStep boolean
(which is, however, currently mirrored as a local variable in do_md for
convenience). This cleans up the do_md loop, exchanging several lines
of code by a single command. To set the signal, the StopHandler object
loops over a vector of stop condition functions registered previously
with a helper object of type StopHandlerHelper. This allows to formulate
different conditions, and only build them if the current setup requires
them (e.g. only have a stop criterion based on time if a maximal run
time was set, only build stop conditions on master rank, ...).

The stop handler itself is created in do_md by a helper object owned by
the calling code. The helper object offers an interface to register
stop conditions via std::function objects. This allows higher-level
code to inject stop conditions.

Refs #1793

Change-Id: Ia9077841d96a9bc6d6f000eef4093e9fbd9f5363

Revision 1b1abdba (diff)
Added by Pascal Merz 12 months ago

Failproof signal conversion

When introducing the signal handlers, we decided to use scoped enums to
define the different simulation signals (changes Ia90778, I05ca7a,
I5dec05). In the SimulationSignal objects which actually get reduced,
these are stored as signed char. When the signals get handled, these
signed char are converted back to the scoped enums. Currently, this is
done using a static_cast.

Although in the current code, the signals get handled immediately after
being reduced, and signals are only set by master, there is no formal
guarantee that this is always true. If the signals get reduced multiple
times, or by different ranks, the signed char stored in the
SimulationSignal object could get larger than +1 / -1. In the old code,
this would not lead to problems, as it was just checked that the unsigned
char in the SimulationSignal was != 0 (or <0 / >0, for the stop signal).
In the new code, the signal handling would fail in this case, without
proper error message - the static_cast will not fail, but the following
comparison to the enum values will always fail. This commit introduces
a conversion function for each of the enums and explicity enum values
for the StopSignal to return to the failproof behavior of the old code.

Refs #1793

Change-Id: Iac8ea3946effba1797e16d8c918c7d49ce4dc828

Revision efa13a69 (diff)
Added by Mark Abraham 3 months ago

Add integration tests for exact restarts

These tests demonstrates the extent to which mdrun checkpoint restarts
reproduce the same run that would have taken place without the
restart.

I've been working on these, and the bugs they exposed, for a few
years, but the code has been fixed for a few years now.

The tests don't run with OpenCL because they have caused driver out of
memory issues.

Refs #1137, #1793, #1882, #1883

Change-Id: I8bc441d945f13158bbe10f097e772ea87cc6a559

History

#1 Updated by Mark Abraham over 4 years ago

  • Related to Feature #1137: Proposal for integrator framework (do_md) in future GROMACS added

#2 Updated by Mark Abraham over 4 years ago

  • Description updated (diff)

#3 Updated by Gerrit Code Review Bot about 4 years ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: I10dbae9efdd3f8da340b7efec08305d5de563f9c
Gerrit URL: https://gerrit.gromacs.org/5185

#4 Updated by Gerrit Code Review Bot about 4 years ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: Ibd70b5397568bfcd328cd6dd1c5c99384d7aaca8
Gerrit URL: https://gerrit.gromacs.org/5187

#5 Updated by Gerrit Code Review Bot about 4 years ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: I937f397602c761c92cd8e48121b56b1805a05470
Gerrit URL: https://gerrit.gromacs.org/5188

#6 Updated by Gerrit Code Review Bot about 4 years ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: I595b4eff8f4cbb0bbaf295c386041a0966e4a094
Gerrit URL: https://gerrit.gromacs.org/5189

#7 Updated by Gerrit Code Review Bot about 4 years ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: I1f9583991608ffdd655439f0c3dff5bec861ec64
Gerrit URL: https://gerrit.gromacs.org/5379

#8 Updated by Gerrit Code Review Bot about 4 years ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: I150db7a679f397754931602ca643aaf85d47ba1d
Gerrit URL: https://gerrit.gromacs.org/5380

#9 Updated by Mark Abraham about 4 years ago

  • Related to Bug #1858: compute globals should not have logic about which integrator is in use added

#10 Updated by Gerrit Code Review Bot almost 4 years ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: I0b5cc448175873ec0e5cee3c3d5023654b4f1b27
Gerrit URL: https://gerrit.gromacs.org/5525

#11 Updated by Mark Abraham almost 3 years ago

  • Related to Task #1868: implement mdrun -rerun better, simplifying do_md added

#12 Updated by Mark Abraham almost 3 years ago

  • Related to Bug #2105: multi-domain rerun broken added

#13 Updated by Gerrit Code Review Bot almost 3 years ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-5-1~I84908122aefbe8658f423eaf4e5bd4ae25a93d24
Gerrit URL: https://gerrit.gromacs.org/6467

#14 Updated by Gerrit Code Review Bot almost 2 years ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~I4f9ad5d0bf6585675472b49b2d5654f588b7214e
Gerrit URL: https://gerrit.gromacs.org/7648

#15 Updated by Gerrit Code Review Bot almost 2 years ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~I678598175192c9c68113fdd79fcee17f8e5c504e
Gerrit URL: https://gerrit.gromacs.org/7649

#16 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '2' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~Ic0faaa4c3dec2891d0cdc6166561f8ccd4391f44
Gerrit URL: https://gerrit.gromacs.org/7671

#17 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~Iedb8bf3b4cb2d7f238c4b674ce459564a3f76a20
Gerrit URL: https://gerrit.gromacs.org/7745

#18 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~I46f377630fa12be482ccb532c05d96cfe0537523
Gerrit URL: https://gerrit.gromacs.org/7760

#19 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~I39d4cd0b8568859220b3a592b138f3f4405f8991
Gerrit URL: https://gerrit.gromacs.org/7764

#20 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~I3c75d75b13c93d3159dea9a364bfa3258484bd68
Gerrit URL: https://gerrit.gromacs.org/7971

#21 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '3' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~I00f98c22f5a17a1b9ae1114b513cfa3ea3798334
Gerrit URL: https://gerrit.gromacs.org/7948

#22 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~I8acb6c5a9b6128aab7967fa8a1d32247f6365586
Gerrit URL: https://gerrit.gromacs.org/8148

#23 Updated by Mark Abraham about 1 year ago

#24 Updated by Gerrit Code Review Bot about 1 year ago

Gerrit received a related patchset '13' for Issue #1793.
Uploader: Pascal Merz ()
Change-Id: gromacs~master~I05ca7abc9e81c9a8deacb6b81086e1077524d005
Gerrit URL: https://gerrit.gromacs.org/8271

#25 Updated by Gerrit Code Review Bot about 1 year ago

Gerrit received a related patchset '11' for Issue #1793.
Uploader: Pascal Merz ()
Change-Id: gromacs~master~Ia9077841d96a9bc6d6f000eef4093e9fbd9f5363
Gerrit URL: https://gerrit.gromacs.org/8278

#26 Updated by Gerrit Code Review Bot about 1 year ago

Gerrit received a related patchset '6' for Issue #1793.
Uploader: Pascal Merz ()
Change-Id: gromacs~master~I5dec051cd3a0c0b3b52dfe69a72dac7f7e9f65ed
Gerrit URL: https://gerrit.gromacs.org/8272

#27 Updated by Gerrit Code Review Bot about 1 year ago

Gerrit received a related DRAFT patchset '1' for Issue #1793.
Uploader: Pascal Merz ()
Change-Id: gromacs~master~Iac8ea3946effba1797e16d8c918c7d49ce4dc828
Gerrit URL: https://gerrit.gromacs.org/8578

#28 Updated by Gerrit Code Review Bot about 1 year ago

Gerrit received a related patchset '1' for Issue #1793.
Uploader: Pascal Merz ()
Change-Id: gromacs~release-2019~Iac8ea3946effba1797e16d8c918c7d49ce4dc828
Gerrit URL: https://gerrit.gromacs.org/8636

Also available in: Atom PDF