Project

General

Profile

Bug #2086

Crashes/memory problems with selections

Added by Chris Neale almost 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
selections
Target version:
Affected version - extra info:
all earlier versions with selection code
Affected version:
Difficulty:
uncategorized
Close

Description

I am not sure if this is an error in gmx distance or simply an indication that the program doesn't fail gracefully when I do something idiotic.

I have attached a .pdb file and a script for running gmx distance commands that works on some selections but gives a memory allocation error on other selections. I believe that all selections should be valid.

More information is available here:
https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2016-November/109623.html

gmxDistanceErrorFiles.tgz (564 KB) gmxDistanceErrorFiles.tgz Chris Neale, 12/02/2016 09:25 PM

Associated revisions

Revision c848f4a4 (diff)
Added by Teemu Murtola almost 3 years ago

Fix use of position variables with plus/merge

If a selection contained a position variable (e.g., 'com of ...') that
was used more than once, and at least one of those uses was with
plus/merge, there were out-of-bounds memory writes. This was caused by
the internal position structure not getting fully initialized.
Incomplete initialization happens in all contexts with such variables,
but only plus/merge (and possibly permute) actually use the values that
remained uninitialized, which caused them to incorrectly compute the
amount of memory required to store the result.

Fixes part of #2086.

Change-Id: I016e796db268a11d557309935c02cbd1bc79a83c

Revision b7817e2d (diff)
Added by Teemu Murtola almost 3 years ago

Fix possible memory error with long selections

If a selection was more than 1000 characters long and there was a
whitespace exactly at the 1000 point, a buffer overflow could occur.
Replaced the buffer with std::string, simplifying the code
significantly. Update the generated code to use a newer flex, which
also removes the need for some suppressions.

Should fix #2086.

Change-Id: I56513bcf5ee99f05ce144461740d0f868be10186

History

#1 Updated by Gerrit Code Review Bot almost 3 years ago

Gerrit received a related patchset '1' for Issue #2086.
Uploader: Teemu Murtola ()
Change-Id: I56513bcf5ee99f05ce144461740d0f868be10186
Gerrit URL: https://gerrit.gromacs.org/6354

#2 Updated by Teemu Murtola almost 3 years ago

  • Category changed from analysis tools to selections
  • Status changed from New to Fix uploaded
  • Assignee set to Teemu Murtola
  • Target version set to 2016.2
  • Affected version - extra info set to all earlier versions with selection code

This looks like a rare memory error in the selection code, where a buffer can be overrun by one (a zero gets written past the end of a buffer) if your selections is longer than about 1000 characters, and you happen to have a word break at exactly that 1000 character mark (or at higher multiples of 1000).

#3 Updated by Chris Neale almost 3 years ago

just wondering if you checked your fix against my second group of selection attempts (included in the script originally uploaded) that utilized the -sf option in place of the -select option. In the case of -sf, I saw segfaults, though again this could be a case of my not using the tool correctly.

Thank you,
Chris.

#4 Updated by Teemu Murtola almost 3 years ago

I did not have time to test it against anything, except a simple case that produced a memory error around the call stack mentioned in the original email. Are you saying it doesn't work, or just wondering? This issue does not mention anything about a segfault.

#5 Updated by Chris Neale almost 3 years ago

My upload contained a .pdb file and a bash script to run the program. That bash script showed first a success, then a failure (both with -select). THen I showed segfaults with the -sf usage. That could be due to my error, though it seemed like a reasonable usage to me based on the docs.

I did not re-test any code after your revisions.

Thank you.

#6 Updated by Teemu Murtola almost 3 years ago

Your report (in the email) says that you get "the same kind of crash" with both -select and -sf. The symptoms would be easily explained if the failure is related to the length of the selection, not to anything in gmx distance. My fix is in code that it's executed identically on both paths. I can't easily check the attachment when 95% of the time I'm looking at these issues from a mobile client. There can be other issues than just the one I fixed (your email actually seems to have different selections for -select and -sf), but that one certainly fixes one memory access problem that is easily triggered by your long selection for the -select case.

#7 Updated by Gerrit Code Review Bot almost 3 years ago

Gerrit received a related patchset '1' for Issue #2086.
Uploader: Teemu Murtola ()
Change-Id: gromacs~release-2016~I016e796db268a11d557309935c02cbd1bc79a83c
Gerrit URL: https://gerrit.gromacs.org/6359

#8 Updated by Teemu Murtola almost 3 years ago

There was another issue that was triggered if you used a variable with a position value (like com of ...) with plus or merge more than once. That is fixed by the second linked change.

In the future, it would be good not to hide essential information about the nature of the bug report into comments in a script within a compressed attachment.

#9 Updated by Chris Neale almost 3 years ago

OK, sorry. Thanks for fixing it.

#10 Updated by Mark Abraham almost 3 years ago

Teemu's second fix certainly fixes the segfault arising from the implementation of "plus".

Chris, I didn't observe any problem with your "residues 22 to 40" command in analyse.sh, using either release-2016 branch or 5.1.2. I got plausible -oxyz output in both cases. Are you still able to reproduce the issue? With release-2016 branch? Does Teemu's first fix resolve the issue with really long selection strings? Thanks

#11 Updated by Teemu Murtola almost 3 years ago

  • Subject changed from gmx distance has possible memory problems to Crashes/memory problems with selections
  • Status changed from Fix uploaded to Resolved

#12 Updated by Chris Neale almost 3 years ago

Probably I did not download the correct source or something, but I still have problems.

I obtained the new code today like this:
git clone git://git.gromacs.org/gromacs.git

Here is the run command and the output:

$ gmx distance -s my.pdb -f my.pdb -select "$inpstr" -oxyz output_22-40.xvg
           :-) GROMACS - gmx distance, 2017-dev-20161210-dd11612 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov  Herman J.C. Berendsen    Par Bjelkmar   
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra    Gerrit Groenhof  
 Christoph Junghans   Anca Hamuraru    Vincent Hindriksen Dimitrios Karkoulis
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
  Justin A. Lemkul   Magnus Lundborg   Pieter Meulenhoff    Erik Marklund   
   Teemu Murtola       Szilard Pall       Sander Pronk      Roland Schulz   
  Alexey Shvetsov     Michael Shirts     Alfons Sijbers     Peter Tieleman  
  Teemu Virolainen  Christian Wennberg    Maarten Wolf   
                           and the project leaders:
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2015, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx distance, version 2017-dev-20161210-dd11612
Executable:   /scratch/cneale/exe/GROMACS/exec/gromacs_git/2016_Dec12/serial/bin//gmx
Data prefix:  /scratch/cneale/exe/GROMACS/exec/gromacs_git/2016_Dec12/serial
Working dir:  /scratch/cneale/work/2016/December2016/2/gmxDistanceErrorFiles
Command line:
  gmx distance -s my.pdb -f my.pdb -select 'com of (residue 22 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 23 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 24 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 25 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 26 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 27 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 28 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 29 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 30 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 31 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 32 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 33 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 34 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 35 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 36 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 37 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 38 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 39 and name CA) plus com of (resname POPC or resname POPS) plus com of (residue 40 and name CA) plus com of (resname POPC or resname POPS) ' -oxyz output_22-40.xvg

*** glibc detected *** gmx: realloc(): invalid next size: 0x00000000022c8450 ***


Thank you,
Chris.

#13 Updated by Teemu Murtola almost 3 years ago

The fix has not been merged to the master (2017) branch yet.

#14 Updated by Chris Neale almost 3 years ago

Are there instructions online about how I can get the fixed code branch?

I went here: https://gerrit.gromacs.org/#/c/635

Under the Download pulldown arrow, I downloaded the contents of "Archive -> tgz"

same error.

Thank you,
Chris.

#15 Updated by Teemu Murtola almost 3 years ago

That should work, although if you have a git repository already, using the checkout link under Download could be more useful. But your link is broken, so we don't know which fix you downloaded and whether that version contains both fixes or not. Easiest is just to checkout the release-2016 version from git, since both fixes are already merged there. Builds from git also have an unambiguous version string the identifies the exact commit used.

#16 Updated by Mark Abraham almost 3 years ago

Is there any update on whether Chris's issue is resolved yet?

#17 Updated by Chris Neale almost 3 years ago

Either it's not fixed or the download button for that code patch didn't work. Teemu said above that what I did should have given the new code, so if that is true then the proposed fix did not work. Not sure that he meant by "your link is broken"... copy and paste https://gerrit.gromacs.org/#/c/635 worked for me just now. Teemu's instructions about git were insufficient for me to get the code that way (I know nothing about git).

#18 Updated by Aleksei Iupinov almost 3 years ago

Chris Neale wrote:

Either it's not fixed or the download button for that code patch didn't work. Teemu said above that what I did should have given the new code, so if that is true then the proposed fix did not work. Not sure that he meant by "your link is broken"... copy and paste https://gerrit.gromacs.org/#/c/635 worked for me just now. Teemu's instructions about git were insufficient for me to get the code that way (I know nothing about git).

Ah, he meant that your link links one character in the end, so it links to some ancient abandoned change instead. The actual link is https://gerrit.gromacs.org/#/c/6354, try it.

#19 Updated by Aleksei Iupinov almost 3 years ago

your link links one character

*your link misses one character, sorry for dyslexia.

#20 Updated by Mark Abraham almost 3 years ago

Aleksei Iupinov wrote:

Chris Neale wrote:

Either it's not fixed or the download button for that code patch didn't work. Teemu said above that what I did should have given the new code, so if that is true then the proposed fix did not work. Not sure that he meant by "your link is broken"... copy and paste https://gerrit.gromacs.org/#/c/635 worked for me just now. Teemu's instructions about git were insufficient for me to get the code that way (I know nothing about git).

Ah, he meant that your link links one character in the end, so it links to some ancient abandoned change instead. The actual link is https://gerrit.gromacs.org/#/c/6354, try it.

Looks like Teemu already reported he tried it on Chris's inputs and it was good. So we'll regard it as fixed unless Chris finds further problems some time.

#21 Updated by Mark Abraham almost 3 years ago

  • Status changed from Resolved to Closed

#22 Updated by Chris Neale almost 3 years ago

Thank you Aleksei. I checked and that was a copy/paste error on my part. What I actually had downloaded was 6359 (as suggested in above post #7). I didn’t realize that there was another subsequent separate fix (6354). To summarize, either one of these code changes fixes the second error I reported, but neither fixes the first error. Perhaps they need to be combined or something. I give more details below but I am fine with Mark’s suggestion to let it drop.

The following command:

inpstr=$(for((i=22;i<=40;i++)); do echo -n "com of (residue $i and name CA) plus com of (resname POPC or resname POPS) plus "; done|sed "s/plus $//")
gmx distance -s my.pdb -f my.pdb -select "$inpstr" -oxyz output_22-40.xvg

still gives the error:

*** glibc detected *** /scratch/cneale/exe/GROMACS/exec/gromacs_git2/serial/bin/gmx: realloc(): invalid next size: 0x00000000022beba0 ***

with either gerrit 6354 (b7817e2.tar.gz) or gerrit 6359 (c848f4a.tar.gz)

However, both gerrit 6354 or gerrit 6359 do fix the second error that I reported, which is a segfault on:

{
echo "bilayer=com of (resname POPC or resname POPS);" 
for((i=2;i<=20;i++)); do
  echo "residue $i and name CA plus bilayer;" 
done
} > input.selection
gmx distance -s my.pdb -f my.pdb -sf input.selection -oxyz outputB_2-20.xvg

#23 Updated by Chris Neale almost 3 years ago

My mistake. My above post no. 22 is wrong.

both gerrit 6354 and 6359 fix the second error. Only gerrit 6354 fixes the first error.

I agree that the fix is good.

Thank you Teemu for the fix and Mark and Aleksi for helping me apply it.

#24 Updated by Mark Abraham almost 3 years ago

OK great. Thanks Chris!

Also available in: Atom PDF