Project

General

Profile

Feature #2068

Access to low level classes

Added by David van der Spoel about 3 years ago. Updated 9 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
core library
Target version:
Difficulty:
hard
Close

Description

With on-going modularization and encapsulation it is becoming more difficult to access low-level variables since many are being implemented using "bare outside class interface with Impl_ for the details". Obviously this has many advantages like decluttered interfaces and modularization and the code becomes less prone to errors due to inadverted changes of parameters.

However, in some cases one might want to access the low-level variables in a reliable manner (that is without causing side-effects). For instance by changing the electric field parameters from a program (not mdrun) and re-evaluate the energy and dipole of a molecule, one can extract the polarizability. The functions to do this are now not accessible anymore, except by faking reading a new MDP input (and even that is not possible anymore outside readir.cpp since the last few patches).

There may be similar cases, where one would like to change variables, like steered MD, certain free energy schemes or analysis tools. Suggestions for making low-level access possible without losing the advantages due to encapsulation and modularization are welcome.

History

#1 Updated by Erik Lindahl about 3 years ago

Note that there's a significant difference between "accessing" in the sense of reading a value vs. changing it.

I don't see any way where we can let users adjust variables in arbitrary ways, since that's the equivalent of making all data public (even if it would be done through a method). While it might seem benign when only considering a specific case (e.g. electric field), the complexity will explode when the principle is applied to all settings, and then we're back in the situation "anything can change anytime, so we need lots of long checks", and things become buggy.

For a few special things that we know will change during normal simulations (e.g. free energy), we will need to provide an interface where these things are updated every step.

However, for the more general case of analysis tools, conceptually we really are rerunning/recalculating things with a different topology or settings (even though the change might be small for an electric field), and then I think the safe approach is to reinitialize things. Still, I don't think we should do this from MDP files long-term, but use the KVT concept and alter settings in the tree?

#2 Updated by Teemu Murtola about 3 years ago

As Erik says, you cannot arbitrarily modify internal state of some code without any control, and claim that it would not have side effects. Also, it is not clear what you actually would want to achieve. The current electric field code does not compute any energies, nor dipoles, so it doesn't match your goal. And even if it did, that computation would be perhaps 10 lines of out those 500+ that it currently has. And for using even that in analysis tools, you would need to convert from the data structures available there to mdrun-specific data structures.

If you want to just make this computation usable in multiple context, you can put it into a separate class that you can call from other context as well, including the current electricfield.cpp. The existing interfaces are meant to make it possible to extend mdrun and isolate mdrun from the implementation details of every module. As such, it probably is not suitable for other uses. And even the current implementation of the electric field code is not really reusable; for example, it is unlikely that other cases than mdrun would be interested in printing out the values into a file specified from a command line, in the same operation that computes the forces...

If you want to have another mdrun module change the parameters, e.g., in the electricfield code, you can either provide a separate interface from the electricfield code that allows this and implement that in the other module, or you can make it possible from the other module to call to the electricfield module. But mdrun should not be aware of such coupling, and it should not be possible to do this adhoc by, e.g., adjusting parameters of another module directly, without that module explicitly providing that interface.

#3 Updated by David van der Spoel about 3 years ago

I agree access has to be specific, but it is necessary. Another case is QMMM code that modifies atomic charges during a simulation. This should not need to be done by rewriting a top file and then processing it etc.

Teemu's proposal of making an additional interface for such specific cases (including electric fields) sounds useful. I will think about it and implement an example.

#4 Updated by David van der Spoel about 3 years ago

To be more specific on the electric field issue. There I would like to change the field on a polarizable model, run a force calculation that minimizes the shell positions and then computes the total dipole. From this the molecular polarizability can be computed. In theory one could run a single simulation for each case with different mdp files but that is not very efficient.

#5 Updated by Peter Kasson about 3 years ago

Would it be helpful to differentiate between modifying settings "in flight" i.e. during a simulation and modifying settings while a simulation isn't running? One example of the latter might be if we want to do some sort of parameter space walk (e.g. multiple temperatures) and don't want to write a TPR file for each of these, particularly if it's a multidimensional parameter space. The solution might be to load the TPR, modify the desired parameter(s), and then start the run. This would of course be recorded in the simulation output files.
(Or there might be some better way to do this e.g. designate some settings as mutable and some as constant.)

#6 Updated by Teemu Murtola about 3 years ago

@David: And why would you need the time-dependent (or pulsed) electric field that the current code implements? Would you want to run that as part of mdrun, or otherwise? Computing the forces from a static electric field on a single configuration is maybe three lines of code; why would you want to use the full mdrun module with mdp parsing etc. for that?

@Peter: Both of those should be perfectly doable with the current approach, without changing anything. If the code would run all input validation checks also in mdrun (which it should, since the modules cannot currently distinguish at all whether the input comes from grompp or mdrun), then it would be straightforward to add an option to mdrun to override some of the parameters. And nothing in the modules implementing the interfaces would need to change. But it seems that David wants something else.

#7 Updated by David van der Spoel about 3 years ago

For this I need to call the relax_shells routine that iteratively calls do_force until convergence of the shell forces. Not trivial code. And to be clear, we don't use mdp parsing etc, all input is generated on the fly, so using MDP input is a detour.

#8 Updated by Teemu Murtola about 3 years ago

David van der Spoel wrote:

For this I need to call the relax_shells routine that iteratively calls do_force until convergence of the shell forces. Not trivial code. And to be clear, we don't use mdp parsing etc, all input is generated on the fly, so using MDP input is a detour.

You didn't answer any of my questions. Why do you then want to use the mdrun-specific electric field code for this? If you have full control over the calling code, you can trivially create a separate implementation of IForceProvider that computes your constant electric field (this is literally just a few lines of code) and assign that into t_forcerec::efield. And this is perfectly possible without changing the current mdrun code at all. Just because you are applying an electric field does not mean that you should try to reuse some unrelated code that also deals with electric fields, if it actually does something completely different from what you want.

#9 Updated by David van der Spoel about 3 years ago

@Teemu: you are still a few steps ahead in code comprehension. Sorry for being slow to catch up. I had not contemplated any solution like what you are describing. I guess rewriting a few lines like you are describing is simple indeed.

Despite that, the issue with e.g. QMMM remains, where charges need to be changed (even though these still are in mdatoms).

#10 Updated by Eric Irrgang about 3 years ago

@David, would it be sufficient to add accessor methods and signals/slots as use cases arise?

#11 Updated by Erik Lindahl about 3 years ago

The problem isn't writing a set() method - that takes 5 minutes. The issues is the other 3 million lines of code and keeping things consistent.

Simple example: The second you allow a set() method to change any parameter related to nonbonded interactions, the entire nonbonded module as well as all GPU code might have to be reinitialized in case we are using tables. IMHO, allowing arbitrary set() methods only achieves saving a bit of work for the person who needs it by pushing huge amounts of work on other developers who now need to keep track of when things should be reinitialized.

#12 Updated by Eric Irrgang about 3 years ago

I agree it is not a decision to be taken lightly. The ability to trigger reinitialization of such code might be a desirable added versatility in the long run, and might be a whole project in itself. So if the question is whether to preemptively slap setters on every quantity rather than consider each case-by-case, I think that would be hard to justify.

Ideally, if someone adds a set method, they would add signals/slots/connections throughout the code that uses that data. That puts a barrier to entry on the implementer of a new set method and puts minimal burden on other maintainers. It can work the other way, too, in that developers can codify the assumptions of immutability that they make by introducing a signal that other developers must trigger in future code.

#13 Updated by Erik Lindahl about 3 years ago

That sounds like a horrible kludge where we would end up with lots of extra methods for all levels (not to mention tons of code) so that every class can signal a re-initialization. And, at the end of the day... all it achieves is that we are performing the complete re-initialization anyway. I think it's far less work - and gives us much less complex code to maintain - to simply put the burden of reinitialization where it belongs: the routine that wants different parameters.

However, in any case this does not sound like something we'll even consider before all structures have been turned into proper classes. The last thing I want to see is that the people doing the tedious work of restructuring into classes also have to do the work of implementing this signaling. After that, I guess it depends on whether somebody wants to write a patch suggesting a clean implementation.

#14 Updated by Teemu Murtola about 3 years ago

QMMM is quite different in the sense that it wants to modify atom parameters. I don't think that we can realistically abstract the notion of a "topology" into a separate module that is encapsulated from the rest of mdrun. Rather, this is a core concept in mdrun, and if a module needs to modify parameters there, it needs to be part of the contract between the generic mdrun code and the modules. The contract needs to clearly specify when a module can change parameters, which parameters it is allowed to change, and what other modules need to do (and not to do) to support that functionality.

I think we should also try to keep the code clear in the sense that stuff that comes from input should not be reused for other stuff. If it needs to change, it would be better to have separate, transient state that is changing, but the input remains the same. This also makes things conceptually much simpler with checkpointing etc., since that transient state always needs to be considered in checkpointing, but checkpointing the input does not make much sense in general.

#15 Updated by Eric Irrgang 9 months ago

It seems like parts of this have been either resolved or rejected, and related development for QMMM, topology data structures, IForceProvider inputs/outputs, or MD-level API are not tracking this issue. Should this issue be refreshed or closed?

Also available in: Atom PDF