Resolve whether and how to resolve "state" variables stored in .tpr
Following on from some discussion on https://gerrit.gromacs.org/#/c/4575/...
In the dark ages, the .tpr was the only way to tell mdrun where to start for free-energy algorithms, but these days the state variables can also be read from a checkpoint file. The inputrec has init-lambda and init-lambda state, but grompp fills fields like state->fep_state (at least, and probably others), only to perhaps have them over-written by the checkpoint later. Michael's free-energy changes contain a lot of comments that we need to handle such initialization more sanely. It is bug-prone to have code that handles setting and I/O of state variables to both the .tpr and .cpt.
I can see three reasonable ways forward
- Instead of writing state variables to the .tpr, bundle the state data (I/O done in the checkpoint-file format by the checkpoint-file code) after the parameter and inputrec data.
- As above, but have grompp write a separate checkpoint file
- Continue as before, but move free-energy state-initialization code from grompp to mdrun and run it instead of reading the checkpoint, when there is no checkpoint to read
2 has the defect that can't start a run from a single file any more, and I think that is useful enough that we want to keep it. 1 is a bit of work, but is conceptually clean - under the hood there's always a "checkpoint file" to read - so the implementation is easier to write and maintain. 3 is the easiest to do, but it's not a useful step on the path to 1.
Fix uninitialized fields in grompp t_state
Allocating t_state on the stack and using an incomplete
pseudo-constructor means it is possible to write an uninitialized
value in fep_state field to the .tpr file, which we've been
doing. Found with Memory Sanitizer. Perhaps this behaviour lies behind
some of the strange behaviour that is periodically seen on Jenkins.
Fixed by allocating t_state on the heap, which might resolve other
issues, since snew() zeroes the memory as a side effect. Also
initialized fep_state field in init_state().
#3 Updated by Berk Hess almost 5 years ago
The doesn't seem to be an issue for the fix of initializing fep_state in state, so I'm writing here.
Apart from all the valid points made above, it seems to me that state->fep_state should only be accessed when free-energy=yes. The only exceptions might be printing and comparing the state. Since the current code neither prints nore compares fep_state without free-energy, this sounds like there's are bug somewhere where code read fep_state when free-energy=no. Mark, did you find use of uninitialized fep_state?
I added a print of fep_state to gmx dump and saw that all reference tprs in our regressiontests have fep_state=0.