Project

General

Profile

Bug #2607

Grompp becomes extremely slow when many pull groups are present

Added by Semen Yesylevskyy over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
preprocessing (pdb2gmx,grompp)
Target version:
-
Affected version - extra info:
>=5.1.2
Affected version:
Difficulty:
uncategorized
Close

Description

This issue does not affect functionality, so this is not a bug technically speaking, but it affects the workflow a lot in certain situations and should be fixed if possible.
grompp is extremely slow for large number of pull groups. For ~8k pull groups it runs for 10 minutes while without them it completes in 1-2 seconds. This behavior does not depend on pulling parameters, just the number of groups in the mdp file matters. It is also observed for all versions starting from 5.1.2.
This becomes a major inconvenience when one need to play with pull parameters and the number of groups is large enough. Moreover, inexperienced user may easily consider that grompp hangs since it is apparently doing nothing for many minutes.
I don't know the internals of pull parameters processing but it seems that something is not optimal there because parsing 8k values should not take so long.
If it is impossible to make it faster it would be useful to issue a warning about large processing time in such cases.

History

#1 Updated by Berk Hess over 1 year ago

My guess is that this is due to order #groups^2 operations in inputrec parsing. For every mdp option read we need to check that the option was not set before. This could be accelerated by using a hash table, but we need to evaluate if that complication is worth the effort and maintenance cost. I did not spot any double loops over pull groups or coordinates. This is not

Does mdrun still run reasonably fast with that many pull groups?

Can't you use distance or other restraints instead?

#2 Updated by Semen Yesylevskyy over 1 year ago

mdrun seems to be reasonably fast for this setup. I can't use restraints in this case because we have to pull many individual dummy particles simultaneously to simulate moving repulsive wall of potentially non-trivial geometry. While our systems were relatively small with ~10^2 pull coords it was tolerable, but for ~10^3 coordinates I'm drinking too much coffee while compiling the system :)

Unless I misunderstand something the point is just to keep options in std::map? It seems to me that it should make the code simpler and there is no maintenance cost at all.

#3 Updated by Berk Hess over 1 year ago

In principle you are right, but this concerns a lot of legacy code.
Also, we are moving to key-value trees, so this code will change anyhow.

Still, this code is not too complex. If you want to have a go at replacing the pointer and count by std::map, or probably even better, std::unordered_map, go ahead.

#4 Updated by Semen Yesylevskyy over 1 year ago

I've checked the code in readinp.cpp and I don't feel like I can modify it appropriately. The search_einp() should be changed to take std::unordered_map instead of ArrayRef but this leads to the changes of all other functions working with input record. This is too much for me.
It is probably better to wait until all this legacy code will be rewritten consistently.
So far I can suggest to add a warning about large processing time if the number of groups >= 100.

Also available in: Atom PDF