Resolve whether and how to resolve "state" variables stored in .tpr

Following on from some discussion on

In the dark ages, the .tpr was the only way to tell mdrun where to start for free-energy algorithms, but these days the state variables can also be read from a checkpoint file. The inputrec has init-lambda and init-lambda state, but grompp fills fields like state->fep_state (at least, and probably others), only to perhaps have them over-written by the checkpoint later. Michael's free-energy changes contain a lot of comments that we need to handle such initialization more sanely. It is bug-prone to have code that handles setting and I/O of state variables to both the .tpr and .cpt.

I can see three reasonable ways forward

  1. Instead of writing state variables to the .tpr, bundle the state data (I/O done in the checkpoint-file format by the checkpoint-file code) after the parameter and inputrec data.
  2. As above, but have grompp write a separate checkpoint file
  3. Continue as before, but move free-energy state-initialization code from grompp to mdrun and run it instead of reading the checkpoint, when there is no checkpoint to read

2 has the defect that can't start a run from a single file any more, and I think that is useful enough that we want to keep it. 1 is a bit of work, but is conceptually clean - under the hood there's always a "checkpoint file" to read - so the implementation is easier to write and maintain. 3 is the easiest to do, but it's not a useful step on the path to 1.

The doesn't seem to be an issue for the fix of initializing fep_state in state, so I'm writing here.
Apart from all the valid points made above, it seems to me that state->fep_state should only be accessed when free-energy=yes. The only exceptions might be printing and comparing the state. Since the current code neither prints nore compares fep_state without free-energy, this sounds like there's are bug somewhere where code read fep_state when free-energy=no. Mark, did you find use of uninitialized fep_state?

I added a print of fep_state to gmx dump and saw that all reference tprs in our regressiontests have fep_state=0.

write_tpx from grompp on master copied the uninitialized state->fep_state value from new_status into IIRC a tpx-header structure and wrote that. MSan complained when that memory got passed to the XDR write routine

