Project

General

Profile

Bug #249

Jump in pull force after ~ns of simulation

Added by Martin Hoefling almost 11 years ago. Updated almost 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Erik Lindahl
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

mdrun is a 4.0 cvs version with patches for this release.

Setup: Aminoacid/Peptides with caps (ACE & NAC) is pulled away (z-direction) of a surface. Different pull groups are used (COM-ACE, COM-NAC, COM-Protein). After ~ns the simulations crash with a core dump / 1-4 interaction table warning or water settle problem.

The trajectory / forces / com-position to this point are ok.

I attached a single run with checkpoint ~ 20000 steps away from crash to reproduce the problem. If I continue from the checkpoint it fails (depending on which cores and core #) on similar timesteps. The sample output from a checkpoint did run on 4-cores and produces:

==============================================================================
imb F 11% step 965200, will finish Sun Nov 9 06:44:51 2008
imb F 11% step 965300, will finish Sun Nov 9 06:46:36 2008
imb F 11% step 965400, will finish Sun Nov 9 06:48:19 2008
Warning: 1-4 interaction between 3783 and 3787 at distance 40.697 which is larger than the 1-4 table size 2.100 nm
These are ignored for the rest of the simulation
This usually means your system is exploding,
if not, you should increase table-extension in your mdp file
or with user tables increase the table size

-------------------------------------------------------
Program mdrun, VERSION 4.0
Source code file: /home/martin/workspace-ganymede/gromacs-4.0-imd/src/mdlib/pme.c, line: 518

Fatal error:
2 particles communicated to PME node 3 are more than a cell length out of the domain decomposition cell of their charge group
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 3, will try to stop all the nodes
Halting parallel program mdrun on CPU 3 out of 4

Thanx for Using GROMACS - Have a Nice Day

-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 14427 failed on node n0 (127.0.0.1) with exit status 1. ==============================================================================

runinputs.tar.bz contains:
index.ndx - index with pull group defined
input.gro - input structure
pullf.xvg - pull-force every step to checkpoint
pullx.xvg - pull position every 10 steps to checkpoint
state.cpt - checkpoint
topol.tpr - run input file
traj5ps.xtc - 5ps sample trajectory of first ~ns of pull run (here eveything is ok)

fromcheckpoint.tar.bz2 contains:
ener.part0003.edr - energyrun of sample continuation from checkpoint
error.pdb - trajectory with every step from checkpoint
forcepos.agr - diagramm which shows that force jump occurs first.
md.part0003.log - logfile
pullf.part0003.xvg - forces
pullx.part0003.xvg - positions
traj.part0003.trr - trajectory with every 500 steps

Hope this is enough to reproduce this problem.
Best
Martin

(90 Bytes) sample run data from the checkpoint until crash Martin Hoefling, 11/07/2008 02:49 PM
(84 Bytes) run input checkpoint and first ~ns which is ok Martin Hoefling, 11/07/2008 02:50 PM

History

#1 Updated by Martin Hoefling almost 11 years ago

Created an attachment (id=322)
sample run data from the checkpoint until crash

#2 Updated by Martin Hoefling almost 11 years ago

Created an attachment (id=323)
run input checkpoint and first ~ns which is ok

#3 Updated by Berk Hess almost 11 years ago

Maybe a stupid question,
but are you not pull the groups so far that the distance
switches to a different periodic image?

Berk

#4 Updated by Martin Hoefling almost 11 years ago

(In reply to comment #3)

Maybe a stupid question,
but are you not pull the groups so far that the distance
switches to a different periodic image?

What do you exactly mean? It looks as if I can observe that this happens near the middle of my box in z-direction. So this would indicate that it's absolutely possible, that it's connected to periodic images. Maybe I set it up wrongly, here's what I wanna do:

Absolute pulling, so that the "spring attachment point" moves with constant speed in z-direction but not is not restrained in the x-y plane. Is there sth wrong with my setup? Does this mean that I can only pull half of my box-z size?

Here's the relevant part of my mdp file:

pull = umbrella
pull_geometry = direction
pull_dim = N N Y
pull_start = yes
pull_ngroups = 1
;Absolute pulling
pull_group0 =
pull_group1 = Pull
pull_vec1 = 0 0 1.0
pull_rate1 = 0.0002
pull_k1 = 10000

Best
Martin

#5 Updated by Martin Hoefling almost 11 years ago

Here's the relevant part of my mdp file:

pull = umbrella
pull_geometry = direction

maybe position is more what I want? Due to indentation, I read position option as if it belongs to cylinder option in gromacs 4.0 manual. :-)

pull_dim = N N Y
pull_start = yes
pull_ngroups = 1
;Absolute pulling
pull_group0 =
pull_group1 = Pull
pull_vec1 = 0 0 1.0
pull_rate1 = 0.0002
pull_k1 = 10000

Best
Martin

#6 Updated by Berk Hess almost 11 years ago

I think you are simple pulling to more than half the box height.
All of the current option will use the closest periodic distance
and will therefore lead to this problem.

Do you want to pull a molecule over more than half the box length?
Currently that is not possible.

Berk

#7 Updated by Martin Hoefling almost 11 years ago

(In reply to comment #6)

Do you want to pull a molecule over more than half the box length?

Yes

Currently that is not possible.

OK Berk I think I got it, thanks a lot for your comments! I wasn't aware of these limitations of the pull code. I think my solution will be just to provide a reference in z-direction between the two surfaces. If it's exactly in the middle, it should be always the closest to the same image.

You saved my day ;-)

Martin

#8 Updated by Berk Hess almost 11 years ago

I hope you can do what you want.

The absolute reference (with empty pull group 0) is always (0,0,0).
You would have to define a group halfway the box.

Maybe it would make more sense to determine the distance
from the reference point when no group 0 is given.

But I guess in most cases it is unnatural to have an absolute
reference, since there is usually no fixed frame of reference
for a simulation.

You can anyhow accomplish what you want by changing a few
lines in get_pullgrp_distance in src/mdlib/pull.c.

Berk

#9 Updated by Martin Hoefling almost 11 years ago

(In reply to comment #8)

I hope you can do what you want.

Well, at least half of the box now should be no problem.

The absolute reference (with empty pull group 0) is always (0,0,0).
You would have to define a group halfway the box.

Yes this is probably not so trivial, since in my setup, the surface thickness is smaller than half of the box z-thickness, means that even if I pick two surface atoms, the com will be in the slab and not in the middle of the gap in between.

Maybe it would make more sense to determine the distance
from the reference point when no group 0 is given.

You mean from the pull_init point in group 1? Just an Idea here, not sure if this can work: To pull the full box dimension one can introduce a check (with output of warnings) if one exceeds the maximum value during simulation. If not on can determine the endpoint of pulling and set the reference in the middle instead as (0 0 0). Do you think that's possible?

But I guess in most cases it is unnatural to have an absolute
reference, since there is usually no fixed frame of reference
for a simulation.

Well except for my fancy surface frozen simulations :-)

You can anyhow accomplish what you want by changing a few
lines in get_pullgrp_distance in src/mdlib/pull.c.

Yes, I could implement the input of an absolute position or just "quick hack" it. Do you think the above idea makes sense?

Best
Martin

#10 Updated by Berk Hess almost 11 years ago

The simplest solution would be to completely turn of the pbc
when using an absolute reference.
I would have to think a bit if this might not be desirable
in certain situations.

Berk

Also available in: Atom PDF