Project

General

Profile

Bug #1012

Nose-Hoover Chain Thermostat inconsitent results with varying core counts

Added by Richard Broadbent almost 5 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

I have found widely varying results using different core counts with a nose-hoover chain with integrator md-vv

I have observed markedly different behaviour in large systems run on varying core counts. This includes the formation of Voids within a box of solvent with pressures of around 10^4 - 10^5 Bar (note this was an NVT run which on 8 cores had a pressure of around 1 bar and formed no voids). There were also significant changes in the RDF.

I therefore built the attached smaller test system (512 DMF molecules, 6144 Atoms) and even on short (200ps) runs it is possible to observe significant differences in pressure (~7,000 bar on 8 & 12 cores, ~40,000 bar on 24 cores and ~60,000 bar on 36 and 280 cores).

The formation of voids seems to occur only in larger systems run for longer times. (I can provide examples but these are very large systems and use a many more options in the .mpd file so might make bug finding more complex)

the simulations were run with the commands listed along with the hardware setups below.

The force field used is standard oplsaa I use a personal copy but no parameters for DMF are altered relative to the version in gromacs/4.5.4

the input file was generated on my desktop using:

$ grompp -f new.mdp -p dmf_big.top -c confout.gro -o nvt.tpr

Could you please take a look at this it could be an issue with the pressure coupling system which was mentioned in connection with Bug #1003, however that is just a guess

Thanks,

Richard

Command lines and Systems used:

My desktop
quad core 8 thread Xeon version 4.5.5 built using:

$ ./configure --prefix=/common/ubuntu/12.04/gromacs/gromacs-4.5.5/ --with-fft=mkl CC=icc CXX=icpc F77=ifort LIBS="-Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -lpthread"
$ make && make install

simulation run with:
mdrun -s nvt.tpr -v -deffnm 8_cores -dd 2 2 2

12, 24, and 36 cores of a local cluster of 12 core (2x6 core Xeon) infiniband nodes (gromacs version 4.5.5)
Test system was run using pbs scripts with the following line where FILE=nvt :

12_cores : mpiexec mdrun_d -s $FILE.tpr -maxh 4 -deffnm 12_cores -npme 4
24_cores : mpiexec mdrun_d -s $FILE.tpr -maxh 4 -deffnm 24_cores
36_cores : mpiexec mdrun_d -s $FILE.tpr -maxh 4 -deffnm 36_cores -npme 9

(no significant change was observed when using a version compiled with:

$ ./configure --disable-cpu-optimization --with-fft=mkl --enable-mpi CFLAGS="-O0 -I$MKL_HOME/include" LDFLAGS="-L$MKL_HOME/lib/64"

and running with -reprod)

I also tried 36 cores and 280 cores of a bluegene/P system (this uses 4.5.4). These were run with:

mpirun -mode VN -np 280 -cwd $PWD -exe mdrun_bgp_d -args "-s nvt.tpr -maxh 13 -npme 64 -deffnm 280_cores"
mpirun -mode VN -np 36 -cwd $PWD -exe mdrun_bgp_d -args "-s nvt.tpr -maxh 13 -npme 9 -deffnm 36_cores"

nvt.tpr (162 KB) Richard Broadbent, 09/28/2012 04:43 PM

confout.gro (414 KB) Richard Broadbent, 09/28/2012 04:43 PM

dmf_big.top (666 Bytes) Richard Broadbent, 09/28/2012 04:43 PM

dmf.itp (1.79 KB) Richard Broadbent, 09/28/2012 04:43 PM

v-rescale.mdp (549 Bytes) Richard Broadbent, 11/12/2012 06:05 PM

12_cores.gro (414 KB) Richard Broadbent, 11/12/2012 06:05 PM

12_cores_v_rescale.log View - tau_t = 0.5 (16.4 KB) Richard Broadbent, 11/12/2012 06:05 PM

12_cores_v_rescale.log View - tau_t = 3.0 (28.7 KB) Richard Broadbent, 11/12/2012 06:05 PM

energy.eps - Total Energy of the system over the entire run (40.4 KB) Richard Broadbent, 11/14/2012 03:53 PM

presure.eps - Pressure of the system over the entire run (42.2 KB) Richard Broadbent, 11/14/2012 03:53 PM

Energy.eps - Total Energy of the system once the system has settled (68.3 KB) Richard Broadbent, 11/14/2012 03:53 PM

Pressure.eps - Pressure of the system once the system has settled (41.8 KB) Richard Broadbent, 11/14/2012 03:53 PM

Associated revisions

Revision b77fa706 (diff)
Added by Michael Shirts over 4 years ago

Some changes for md-vv extracted from 4.6

a. Fixes for the pressure in MTTK with constraints + dispersion + rerun
  • Dispersion is correctly added in rerun
  • COM motion is removed only on the second half of the timestep.
  • Now can do md-vv + rerun with multiple threads.
  • Now gives exact kinetic energy reruns for everything except MTTK, where the iterative algorithm
    makes exact kinetic energy impossible when nstpcouple == 1.

b. md-vv works with v-rescale and berendsen

c. Fixes a bug when pressure control in md-vv when nstcalcenergy is not a
multiple of nstpcouple or nsttcouple. This bug results in boxes slowly
expanding to unphysical sizes because the virial is
neglected in the second half of the md-vv calculation.

Also discovered that as part of the bug, global energies were being communicated
where they did not need to be when nstpcouple and nsttcouple are > 1 in the case
of md-vv, so redid some of the iteration counting and global communication to fix
this all together. In the process, this simplified some of the iteration counting.

Should fix bugs #1116, #1012, #1000, #1129 in redmine.

Change-Id: I1b628d03ab588c29fef2b8789e61254da49c2b6f

History

#1 Updated by Richard Broadbent almost 5 years ago

As an update the systems were consistent (and stable) running in NVE on high core counts with results similar to nose-hoover on low core counts.

#2 Updated by Michael Shirts almost 5 years ago

Richard Broadbent wrote:

As an update the systems were consistent (and stable) running in NVE on high core counts with results similar to nose-hoover on low core counts.

Richard, can you run two other quick test?

1. NVT with Parrinello-Rahman and mv (leapfrog)?
2. NVT with md-vv and v-rescale?

#3 Updated by Michael Shirts almost 5 years ago

1. NVT with Parrinello-Rahman and mv (leapfrog)?

Sorry, 'mv' there should be 'md'

#4 Updated by Roland Schulz almost 5 years ago

  • Status changed from New to Feedback wanted

#5 Updated by Richard Broadbent almost 5 years ago

Sorry for the slow response I've been travelling.

Parrinello-Rahman with md (nose-hoove chain length =1 not 10 as in previous simulations)
no significant differences observed with different core counts.
(tau_p=1.5 simulation length was 100ps significant pressure fluctuations present as no real equilibration was done but the behaviour was very similar in all simulations)

v-rescale is still queueing but I should have the results soon.

Thanks,
Richard

#6 Updated by Richard Broadbent almost 5 years ago

running with v-rescale in md-vv (NVT ensemble) I found that the simulation cell tended to crash. I varied tau_t from 0.5 to 3.0 on 12, 24, and 36 cores

in all cases the temperature rapidly increases from the starting temperature. I have attached the v-rescale file and the log file from the 12 core simulation which got furthest. The input coordinates included velocities and were taken from the end of a 12 core NVT (nose-hoover) simulation (file attached).

I've never used the v-rescale thermostat before so I could be using radically wrong parameters but I thought that given an NVE run using that configuration is stable a relatively large value of tau such as 3.0 would work

The higher core count simulations tended to crash faster, the cause of the crash was the LINCS algorithm used for constraints moving atoms to the wrong coordinates. This was caused by the high temperatures the system was reaching.

The systems were built using:

$ grompp_d -f v-rescale.mdp -c 12_cores.gro -p dmf_big.top -o v_rescale.tpr

and run on our cluster using the same scripts as the previous runs.

Thanks,

Richard

#7 Updated by Michael Shirts almost 5 years ago

OK this is very useful.

Is it possible to check with the current git version of 4.6? This has a patch that should fix the md-vv vrescale problem. If it doesn't, then I need to get on that pronto . . .

#8 Updated by Richard Broadbent almost 5 years ago

This Could take a while as I'm going to have to get our sysadmin to build it unless you can spot an error in this workflow.

I've checked out the git using:

$ git clone git://git.gromacs.org/gromacs.git
$ cd gromacs
$ git checkout --track -b release-4-6 origin/release-4-6

then attempted to build Gromacs with

$ module load intel-suite/2013 mpi/intel-3.1 cmake/2.8.9
$ mkdir ../build
$ cd ../build

for automake I normally use: $ ./configure --prefix=$HOME/GMX --with-fft=mkl --enable-mpi CFLAGS="-I$MKL_HOME/include" LDFLAGS="-L$MKL_HOME/lib/64"

I've never used cmake before but I've tried:

$ CC=mpiicc CXX=mpiicpc cmake -DGMX_MPI=ON -DGMX_DOUBLE=ON -DGMX_GPU=OFF -DGMX_PREFER_STATIC_LIBS=ON -DGMX_FFT_LIBRARY=mkl -DMKL_INCLUDE_DIR=$MKL_HOME/include -DMKL_LIBRARIES=$MKL_HOME/lib/intel64/libmkl_intel_ilp64.a -DGMX_OPENMP=OFF ../gromacs

$ make -j 4

This gets to:

Linking C static library libgmxana_mpi_d.a
[ 87%] Built target gmxana
make: *** [all] Error 2

I decided to check the error with:
$ make

[ 0%] Generating version information
[ 0%] Built target gmx_version
[ 47%] Built target gmx
[ 60%] Built target md
[ 66%] Built target gmxpreprocess
[ 66%] Built target g_luck
[ 66%] Built target g_protonate
[ 67%] Built target g_x2top
[ 67%] Built target gmxcheck
[ 67%] Built target gmxdump
Linking C executable grompp_mpi_d
/apps/intel/2013/mkl/lib/intel64/libmkl_intel_ilp64.a(dfticreatedescriptor_d_md.o): In function `DftiCreateDescriptor_d_md':
../../../../dft/iface/dfti_c/dfticreatedescriptor_d_md.c:(.text+0x1ff): undefined reference to `mkl_dft_dfti_create_dcmd'
../../../../dft/iface/dfti_c/dfticreatedescriptor_d_md.c:(.text+0x216): undefined reference to `mkl_dft_dfti_create_drmd'
../../../../dft/iface/dfti_c/dfticreatedescriptor_d_md.c:(.text+0x226): undefined reference to `mkl_dft_bless_node_omp'
/apps/intel/2013/mkl/lib/intel64/libmkl_intel_ilp64.a(dfticreatedescriptor_d_1d.o): In function `DftiCreateDescriptor_d_1d':
../../../../dft/iface/dfti_c/dfticreatedescriptor_d_1d.c:(.text+0xf3): undefined reference to `mkl_dft_dfti_create_dc1d'
../../../../dft/iface/dfti_c/dfticreatedescriptor_d_1d.c:(.text+0x105): undefined reference to `mkl_dft_dfti_create_dr1d'
../../../../dft/iface/dfti_c/dfticreatedescriptor_d_1d.c:(.text+0x116): undefined reference to `mkl_dft_bless_node_omp'
make2: * [src/kernel/grompp_mpi_d] Error 1
make1:
[src/kernel/CMakeFiles/grompp.dir/all] Error 2
make: *
* [all] Error 2

At this point I have no idea what to try. As I say none of the software I've needed before has used cmake so I could be miss setting it up I'm using mkl as that's what I've always used with gromacs. I did try fftw3 but despite manually specifying:
-DGMX_FFT_LIBRARY=fftw3 -DFFTW_INCLUDE_DIR=$FFTW_HOME/include -DFFTW_LIBRARY=$FFTW_HOME/lib

It said it couldn't find fftw_plan_r2r_1d. This is the first time I've ever had an issue compiling gromacs so I was a little surprised to be having issues.

If you have any advice let me know otherwise it'll probably be a few days before I can get it installed.

Thanks,

Richard

#9 Updated by Roland Schulz almost 5 years ago

try:
-DMKL_LIBRARIES=$MKL_HOME/lib/intel64/libmkl_core.so;$MKL_HOME/lib/intel64/libmkl_intel_lp64.so;$MKL_HOME/lib/intel64/libmkl_sequential.so

Could you also please post the error message you got with fftw (either here or in a separate bug). MKL is somewhat tricky but FFTW should just work, so if you have any problem with that we consider that a bug.

#10 Updated by Richard Broadbent almost 5 years ago

That command line didn't work, running:

$ CC=mpiicc CXX=mpiicpc cmake -DGMX_MPI=ON -DGMX_DOUBLE=ON -DGMX_GPU=OFF -DGMX_PREFER_STATIC_LIBS=ON -DGMX_FFT_LIBRARY=mkl -DMKL_INCLUDE_DIR=$MKL_HOME/include -DMKL_LIBRARIES=$MKL_HOME/lib/intel64/libmkl_core.so;$MKL_HOME/lib/intel64/libmkl_intel_lp64.so;$MKL_HOME/lib/intel64/libmkl_sequential.so -DGMX_OPENMP=OFF ../gromacs

which gave:

CMake Error: The source directory "/home/rb1109/build" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.
Segmentation fault (core dumped)
Segmentation fault (core dumped)

I think that bash is interpreting the ; as an end of line character so I modified it by adding quotes ("") as below:
$ CC=mpiicc CXX=mpiicpc cmake -DGMX_MPI=ON -DGMX_DOUBLE=ON -DGMX_GPU=OFF -DGMX_PREFER_STATIC_LIBS=ON -DGMX_FFT_LIBRARY=mkl -DMKL_INCLUDE_DIR=$MKL_HOME/include -DMKL_LIBRARIES="$MKL_HOME/lib/intel64/libmkl_core.so;$MKL_HOME/lib/intel64/libmkl_intel_lp64.so;$MKL_HOME/lib/intel64/libmkl_sequential.so" -DGMX_OPENMP=OFF ../gromacs

and it was able to build successfully.

I never really had any trouble getting MKL to work, and I've never had to specify the libraries like that before. The last time I compiled gromacs it was for a debug install of 4.5.5 and I used the line:
$ ./configure --prefix=/apps/gromacs/4.5.5-double-noopt/ --disable-cpu-optimization --with-fft=mkl --enable-mpi CFLAGS="-O0 -I$MKL_HOME/include" LDFLAGS="-L$MKL_HOME/lib/64"

and it worked fine.

for building with fftw/3.3.2 cmake I used

$ module load intel-suite/2013 mpi/intel-3.1 cmake/2.8.9 fftw/3.3.2-double

$ CC=mpiicc CXX=mpiicpc cmake -DGMX_MPI=ON -DGMX_DOUBLE=ON -DGMX_GPU=OFF -DGMX_FFT_LIBRARY=fftw3 -DGMX_OPENMP=OFF -DFFTW_INCLUDE_DIR=$FFTW_HOME/include -DFFTW_LIBRARY=$FFTW_HOME/lib/ ../gromacs

and it gives:
-- The C compiler identification is Intel 13.0.0.20120731
-- Check for working C compiler: /apps/intel/ict/mpi/3.1.038/bin/mpiicc
-- Check for working C compiler: /apps/intel/ict/mpi/3.1.038/bin/mpiicc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done

...

-- Looking for emmintrin.h
-- Looking for emmintrin.h - found
-- Enabling SSE2 Gromacs acceleration, and it will help compiler optimization.
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.23")
-- checking for module 'fftw3'
-- package 'fftw3' not found
-- Looking for fftw_plan_r2r_1d in /apps/fftw/3.3.2-double/lib
WARNING: Target "cmTryCompileExec2316155789" requests linking to directory "/apps/fftw/3.3.2-double/lib". Targets may link only to libraries. CMake is dropping the item.
-- Looking for fftw_plan_r2r_1d in /apps/fftw/3.3.2-double/lib - not found
CMake Error at cmake/FindFFTW.cmake:67 (message):
Could not find fftw_plan_r2r_1d in /apps/fftw/3.3.2-double/lib, take a look
at the error message in /home/rb1109/build3/CMakeFiles/CMakeError.log to
find out what went wrong. If you are using a static lib (.a) make sure you
have specified all dependencies of fftw3 in FFTW_LIBRARY by hand (e.g.
-DFFTW_LIBRARY='/path/to/libfftw3.so;/path/to/libm.so') !
Call Stack (most recent call first):
CMakeLists.txt:935 (find_package)

-- Configuring incomplete, errors occurred!

I also tried:

$ CC=mpiicc CXX=mpiicpc cmake -DGMX_MPI=ON -DGMX_DOUBLE=ON -DGMX_GPU=OFF -DGMX_FFT_LIBRARY=fftw3 -DGMX_OPENMP=OFF ../gromacs
with the result:

-- The C compiler identification is Intel 13.0.0.20120731
-- Check for working C compiler: /apps/intel/ict/mpi/3.1.038/bin/mpiicc
-- Check for working C compiler: /apps/intel/ict/mpi/3.1.038/bin/mpiicc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
...
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.23")
-- checking for module 'fftw3'
-- package 'fftw3' not found
Could not find fftw3 library named libfftw3, please specify its location in FFTW_LIBRARY by hand (e.g. -DFFTW_LIBRARY='/path/to/libfftw3.so')
CMake Error at CMakeLists.txt:941 (MESSAGE):
Cannot find FFTW3 (with correct precision - libfftw3f for single precision
GROMACS or libfftw3 for double precision GROMACS). Fix it, choose another
FFT library, or use the Gromacs built-in fftpack (slower)!

-- Configuring incomplete, errors occurred!

However:

$ CC=mpiicc CXX=mpiicpc cmake -DGMX_MPI=ON -DGMX_DOUBLE=ON -DGMX_GPU=OFF -DGMX_FFT_LIBRARY=fftw3 -DGMX_OPENMP=OFF -DFFTW_INCLUDE_DIR=$FFTW_HOME/include -DFFTW_LIBRARY=$FFTW_HOME/lib/libfftw3.so ../gromacs

worked. Normally I just add lfftw3 to the link line as our modules systems adds the libraries to the include and library paths:
$ module show fftw/3.3.2-double
------------------------------------------------------------------

/apps/modules/modulefiles/fftw/3.3.2-double:

module-whatis FFTW 3.3.2 (Double)
append-path LD_LIBRARY_PATH /apps/fftw/3.3.2-double/lib
append-path PATH /apps/fftw/3.3.2-double/bin
append-path MANPATH /apps/fftw/3.3.2-double/man
append-path INCLUDE /apps/fftw/3.3.2-double/include
setenv FFTW_HOME /apps/fftw/3.3.2-double
setenv FFTW_VERSION 3.3.2-double
-------------------------------------------------------------------

Is there an easy way to make cmake recognise this?

I've queued the requested simulations and they should run later today.

Thanks,

Richard

#11 Updated by Richard Broadbent almost 5 years ago

The v-rescale thermostat in the git repository version: f5c1d04b273e4c53e9351f8efc89d160ec37f4d0

converges to and maintains temperature (dropping it from 500K to 300K then holding) for my system on 12, 24, and 36 cores. The resulting internal energies are similar in shape on all core counts; however, there is an offset between them (see attached). The pressure is a little surprising however, as it varies as shown below and in the attached graph. Whilst I would not expect the pressure to be conserved as this is an NVT simulation I would expect that the pressure be similar between the runs.

Core_count Pressure in bar RMSD Drift
12 -2766.31 636.53 84.4708
24 13272.9 1497.98 -1262.1
36 21858.7 1652.29 749.764

Since these pressures are large and do not agree even on their sign I am a little concerned about it.

Richard

#12 Updated by Roland Schulz almost 5 years ago

Our FFTW detection can't use those module settings because none of those environment variables are standardized (and thus other fftw modules on different clusters don't use the same names). You could tell your admin that the fftw module should either set PKG_CONFIG_PATH to /apps/fftw/3.3.2-double/lib/pkgconfig or CMAKE_PREFIX_PATH to /apps/fftw/3.3.2-double. That way the fftw would be picked up automatically by Gromacs when you load the module. Of course you can set either env variable also yourself but it wouldn't use the module system.

#13 Updated by Roland Schulz almost 5 years ago

  • Status changed from Feedback wanted to In Progress

#14 Updated by Mark Abraham over 4 years ago

GROMACS 4.6 is due for imminent release - please let us know if you continue to observe problems there!

#15 Updated by Mark Abraham over 4 years ago

  • Target version set to 4.5.7

Draft https://gerrit.gromacs.org/#/c/2101/ should solve this

#16 Updated by Mark Abraham over 4 years ago

  • Status changed from In Progress to Resolved
  • Affected version set to 4.5.5

#17 Updated by Rossen Apostolov over 3 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF