Project

General

Profile

Bug #3042

core dump error in grompp command

Added by 세영 박 3 months ago. Updated 5 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Dear all,
I am trying to run large coarse-grained biomolecular system which includes about 800million beads in it. (about 500million among them are water beads). The .gro file of my system is about 35.62GB. The problem is, that although I’m trying to run grompp command to make input .tpr file, I continuously get this not enough memory error:

===================================================
.
.
Excluding 1 bonded neighbours molecule type 'W'
Excluding 1 bonded neighbours molecule type 'WF'
Excluding 1 bonded neighbours molecule type 'W'
Excluding 1 bonded neighbours molecule type 'WF'
Removing all charge groups because cutoff-scheme=Verlet

------------------------------------------------------------------------------
Program gmx_mpi, VERSION 5.0.6
Source code file: /scratch/x1671a04/gromacs/gromacs-5.0.6/src/gromacs/utility/smalloc.c, line: 224

Fatal error:
Not enough memory. Failed to realloc -6970315816 bytes for b->a, b->a=ceb96010
(called from file /scratch/x1671a04/gromacs/gromacs-5.0.6/src/gromacs/gmxlib/index.c, line 153)
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------------------------------
: Cannot allocate memory
Halting program gmx_mpi

=====================================================


or this segmentation fault error:
======================================================
.
.
Excluding 1 bonded neighbours molecule type 'WF'
Excluding 1 bonded neighbours molecule type 'W'
Excluding 1 bonded neighbours molecule type 'NA'
Excluding 1 bonded neighbours molecule type 'WF'

NOTE 2 [file 11_billion.top, line 372]:
  System has non-zero total charge: 22320.000000
  Total charge should normally be an integer. See
  http://www.gromacs.org/Documentation/Floating_Point_Arithmetic
  for discussion on how close it should be to an integer.

Removing all charge groups because cutoff-scheme=Verlet
/var/spool/slurm/d/job06849/slurm_script: line 10: 36729 Segmentation fault      (core dumped) gmx_mpi grompp -f minimization.mdp -c 11_billion.gro -p 11_billion.top -o 11_billion.tpr

=========================================================

The gromacs version that I’m using is 5.0.6. I tried gromacs version of 5.0.6 and 2018.3, and grompp by double and single, but both did not work.

This is my command line : gmx_mpi grompp -f minimization.mdp -c waterbox_for100billion.gro -p 800_billions_only_water_box.top -o test.tpr

I’m running grompp command in CPU node which has 768GB of memory. I tried to find any method to generate .tpr file with parallel calculation, but I couldn’t, so I had to grompp the system in node with very large memory.
However, when I tracked my memory usage during grompp, the maximum memory usage was only about 20% of total available memory. Therefore, I guess it may not be the problem of memory shortage.

The .mdp file that I used in grompp is for minimization, and I attached the file.

When I uploaded this issue in gromacs-users, I got an answer that says it's likely that grompp is trying to using all the available memory, and failing shortly after unsuccessfully allocating an array whose size is related to the number of particles.

I hope this could be fixed.
Thank you for your time.

minimization.mdp (4.81 KB) minimization.mdp 세영 박, 07/19/2019 03:04 AM

History

#1 Updated by Paul Bauer 3 months ago

This looks like the integer that keeps track of the number of molecules overflows with that many molecules.
Can you upload the error message you are getting when running this in any version of the 2019 branch.
The 5.x branch is no longer supported and the 2018 branch is also only receiving fixes for scientific correctness issues.

Cheers!

#2 Updated by Berk Hess 3 months ago

What makes you think this is an integer overflow issue?
800 million particles means 19 GB for the coordinates and velocities alone. I don't know if there are more buffers that have O(#atoms) entries, but this can run out of memory unless your machine sufficient memory and swap space.

#3 Updated by Mark Abraham 5 days ago

  • Description updated (diff)

Also available in: Atom PDF