Project

General

Profile

Bug #307

Water ordering umbrella sampling (pull) bug

Added by Christine Sophie over 11 years ago. Updated over 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Erik Lindahl
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Water ordering umbrella sampling (pull) bug

I found a (possible) bug in the umbrella sampling code, using gromacs 4.0.4_pre (discussed with Erik Lindahl, my supervisor), see attached screenshot, to get a picture.

The task is to pull a helix downwards according the z-axis.
When pulling over a longer time span (e.g. 128 ns) really slowly this bug has been identified. The ordering happend already between 150-200 ps.

For locating the bug I did some test runs:
Each simulation is 200 ps long, having the same setup as my common pulling.
pull_k1 force = 20.000 kJ/mol*nm^2
pull_rate = 2.5e-05 nm/ps (would correspond to a pulling downwards of 32 Å in 128 ns... so just the first 200 ps of the 128 ns simulation).

I did four simulations:
  • double precision serial
  • double precision parallel
  • single precision serial
  • single precision parallel

The double precision parallel run was for instance generated the following ...
grompp_d -f alpha_parallelZ_20000k1_0.1ns_without_disres_double_precision_parallel.mdp -c amber_2r9rB_VSD_single_system_neutralized_0ns_MD_em_eq_em_eq2.gro -p alpha_topol.top -n amber_2r9rB_VSD_single_system_S4_alpha_to_310_index.ndx -o alpha_parallelZ_20000k1_0.1ns_without_disres_double_precision_parallel

... and run with 24 cores (cluster). Serial jobs on my local machine. Both have the same gromacs version installed!

The serial runs do not show the bug after 200 ps simulation, however the parallel.

Note: we do not use pressure coupling, as this could influence our pulling.
The box expands within the first frames, in both cases serial and parallel for (roughly) the same amount. Thus the ordering shouldn't be caused by the pressure coupling per se.

For files see attachment.

water_ordering_bug_test.tgz (9.93 MB) water_ordering_bug_test.tgz test cases Christine Sophie, 03/27/2009 04:55 PM
water_ordering_bug_KcsA_test_cases.tgz (4.76 MB) water_ordering_bug_KcsA_test_cases.tgz loops jumps within first frame into opposite direction, then the pull_vec states. Christine Sophie, 03/27/2009 06:54 PM

History

#1 Updated by Berk Hess over 11 years ago

This problem is not resolved yet?

It is not clear from you comments if you use pressure coupling or not.

Do you have the same problems when running on 2 cores?
That would make debugging easier.

Also there is no attachment.

Berk

#2 Updated by Christine Sophie over 11 years ago

Created an attachment (id=358)
test cases

#3 Updated by Christine Sophie over 11 years ago

Hi Berk,

(In reply to comment #1)

This problem is not resolved yet?

No the problem is still existing. Erik looked at it, and created a simpler system with just a helix in a water box, and could reproduce the bug. He has maybe already some more ideas what it could be?

It is not clear from you comments if you use pressure coupling or not.

I tried it with and without pressure coupling. In both the bug appeared. We believe that it is due to the parallel part of the code, as it occurs just parallel. Serial it looks good, but I could not test it for a very long time, it just takes too long on my system.

Do you have the same problems when running on 2 cores?

I just tested 24, 48 and 64 cores.

That would make debugging easier.

100% agree. I will talk to Erik and run his easy test system on 2 cores, if he has not done it yet :).
I will add those simpler case for you when the runs are finished!

Also there is no attachment.

Oh sorry, my first attempt to work with bugzilla. Now I attached it, hope it worked!

Thank you very much for looking at the problem!

best regards,
Sophie

#4 Updated by Berk Hess over 11 years ago

I could understand that there are bugs when pressure coupling is turned on.
Are you really sure that this also happens without pressure coupling?
The attached files all have pressure coupling turned on.

But if you say that serial it works fine and in parallel it goes
wrong that seems to indicate that would indeed be a problem with
the parallelization.

Could you attach the pullx and pullf files for two identical runs,
one in serial and one in parallel, preferably without pressure coupling?

Thanks,

Berk

#5 Updated by Christine Sophie over 11 years ago

Created an attachment (id=359)
loops jumps within first frame into opposite direction, then the pull_vec states.

#6 Updated by Christine Sophie over 11 years ago

Hej Berk,

(In reply to comment #4)

I could understand that there are bugs when pressure coupling is turned on.
Are you really sure that this also happens without pressure coupling?

Well I actually did not run it for my test cases now, but I run some pressure coupling cases off in another project and there I remembered to saw it happening as well. Now I checked my files clearly again and unfort. did not find it anymore. Just one case were water is restrained in z axis to not enter the membrane. Here the bug is of course not appearing! (Restraining water is no good idea I know, we just did it in test runs).

But if you say that serial it works fine and in parallel it goes
wrong that seems to indicate that would indeed be a problem with
the parallelization.

Could you attach the pullx and pullf files for two identical runs,
one in serial and one in parallel, preferably without pressure coupling?

I don't have those files yet, but I will run some 200ps cases over the weekend without/with pressure coupling on for serial and parallel (2 cores)! Will upload them as soon as they are finished!

An interesting thing I found now as well is shown in the attachment (see comment #4):
Here we wanted to pull the loop helix up into the solvent (into +z direction). However the loop "jumps" strangely within the first frame downwards (-z direction, opposite direction then specified in the pull_vec = 0 0 1!). Might that be due to pressure coupling?

To summarize: I am not sure if pressure coupling could be influence that bug or not. I run mostly with pressure coupling. Lets check first again :), before I say something wrong about it!

Another personal question to pulling: Do you think that pressure coupling is influencing the result a lot? I think when pulling into the solvent, that the helix has to push away some waters, and it is "harder" to pull it downwards, than without pressure coupling? Thus planing to switch it off for productive runs!

Thanks a lot,
/Sophie

#7 Updated by Berk Hess over 11 years ago

I think I found the problem.

Your ordered water state is actually a lot lower in energy
than you non-ordered state.
This seems to be due to the switch function for the electrostatics.
You should change to reaction-field-zero.
I will add a note about this in the manual and I will also add
a warning in grompp.
I guess the difference between serial and parallel is just coincidence,
you probably need a sudden jump in the water configuration to go
to the lower energy state.

Also for LJ I would use shift iso switch, but that probably has far less
effect.

You can not use pressure coupling with absolute reference pulling
for your system, as this will introduce artifacts.
We have discussed the issues with absolute referencing when I was
in Stockholm. If possible you should always avoid pulling with an
absolute reference.

Berk

#8 Updated by Berk Hess over 11 years ago

Is there any news on this issue?

Can we close this report?

Berk

#9 Updated by Christine Sophie over 11 years ago

(In reply to comment #8)

Is there any news on this issue?

Yes, just good news: in my test runs I started so far, the bug is not occurring anymore :).
Thank you very much for your time and patience with it!

Can we close this report?

Yes!

/Sophie

#10 Updated by Berk Hess over 11 years ago

The problem indeed seemed to be caused the coulombtype=switch.

Berk

#11 Updated by Christine Sophie over 11 years ago

(In reply to comment #10)

The problem indeed seemed to be caused the coulombtype=switch.

Yes we use now PME instead, which seems to work without any problems!

/Sophie

Also available in: Atom PDF