I' ve found a memory leak that seems somewhat irreproducible. When run on my Mac, it's fine;
when run on a linux cluster, it eventually blew up to 4 GB memory (from about 300MB starting) after about 12 hours -- but it was a slow increase, 4-8 K every step (every ns?) or so.
So I don't know exactly which it was. It was synched on 5/21/13 with release 4.6. It was compiled with gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) ( I haven't been able to get them to upgrade yet)
It was run on a single node with 16 cores (32 with hyperthreading):
mdrun -ntmpi 32 -dd 5 3 2 -npme 2 -deffnm solv.20 -dhdl solv.20.dhdl.xvg
I tried running with the same tpr and same command (32 threads) on a OS X laptop to try to reproduce, but failed -- memory did not increase once it hit stride.
Though the local version was compiled with debug on, so that might also make a difference? Hard to tell with these memory thing.
I'm happy to try some other things if people want to suggest, though I'm not 100% sure what the best option is -- hence asking before I try a lot of things that won't diagnose it.
Logfile build info:
Gromacs version: VERSION 4.6.2-dev-20130521-f78f0dc
GIT SHA1 hash: f78f0dc83f55d588f0fbc049af667519d9cf868e
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled
GPU support: disabled
invsqrt routine: gmx_software_invsqrt(x)
CPU acceleration: SSE2
FFT library: fftw-3.3.3-sse2
Large file support: enabled
RDTSCP usage: disabled
Built on: Tue Apr 23 11:37:55 EDT 2013
Built by: firstname.lastname@example.org [CMAKE]
Build OS/arch: Linux 2.6.32-279.19.1.el6.x86_64 x86_64
Build CPU vendor: GenuineIntel
Build CPU brand: Intel(R) Xeon(R) CPU E5530 @ 2.40GHz
Build CPU family: 6 Model: 26 Stepping: 5
Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pdcm popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ss
C compiler: /usr/bin/gcc GNU gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
C compiler flags: -msse2 -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wall -Wno-unused -Wunused-value -fom
it-frame-pointer -funroll-all-loops -O3 -DNDEBUG
#1 Updated by Michael Shirts almost 4 years ago
I've been able to get it to leak when running on my laptop with just 4 cores (it was leaking before, I just couldn't find it), so it's not quite as exotic as before. This will allow me to make a bit more progress, though if people have tips for finding memory leaks on gromacs, let me know.
#7 Updated by Mark Abraham almost 4 years ago
Found it with DDT, though I had to resort to a diff of memory snapshots at 200 and 400 steps :( Only affects expanded ensemble, it seems. I'll leave it to Michael to fix