Project

General

Profile

Bug #585

results of double precision GBSA em drastically change with number of threads

Added by Ehud Schreiber about 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Created an attachment (id=545)
peptide in pdb format

I have encountered some perplexing behavior of double-precision GBSA.
It may be similar to the previously reported bug #548, patched by Per Larsson, but it involves simulations in analyses other than nm.
As a workaround to the previous bug, Per suggested disabling multithreading via the -nt flag.
Now I have seen that the behavior also of em drastically changes when adding the option -nt 1.
Am I correct to assume that the single thread results are the reliable ones?
Is this problem already corrected in Per's patches?

In order to reproduce this behavior, I include a file of a one-chain peptide in pdb format, C34_v2.pdb , as well as an mdp file for steepest descent energy minimization, em1sa.mdp .

1) Choosing terminus types: 3 (None):
pdb2gmx_d -ter -ff oplsaa -water none -f C34_v2.pdb

2) Choosing Group: 0 (System) or 1 (Protein):
editconf_d -f conf.gro -princ -bt triclinic -d 0.81 -o box.gro

3) This gives Potential Energy = -8.86594605837006e+03 (for some run with 4 threads)
grompp_d -f em1sa.mdp -p topol.top -c box.gro -o em1sa.tpr
mdrun_d -v -nice 0 -deffnm em1sa

4) While this gives Potential Energy = -8.21563789577230e+03
cp em1sa.tpr em1nt1sa.tpr
mdrun_d -v -nt 1 -nice 0 -deffnm em1nt1sa

Thanks,
Ehud Schreiber.

C34_v2.pdb (29.2 KB) C34_v2.pdb peptide in pdb format Ehud Schreiber, 10/04/2010 03:34 PM
em1sa.mdp (395 Bytes) em1sa.mdp mdp file for energy minimization Ehud Schreiber, 10/04/2010 03:36 PM
em1sa_noFortran.tpr (243 KB) em1sa_noFortran.tpr tpr file for version compiled without fortran Ehud Schreiber, 10/07/2010 06:49 PM
em1nt2sa_valgrind.stderr (24.9 KB) em1nt2sa_valgrind.stderr stdout+stderr of mdprun_d with valgrind Ehud Schreiber, 10/10/2010 02:06 PM

Associated revisions

Revision 9ccc786e (diff)
Added by Per Larsson about 9 years ago

Double precision GB energies are now summed corrently (bug #585)

The issue was that __m128d _mm_add_sd(a,b) should give back r0 = a0 + b0
and r1 = a0, but it gives back r1 = 0. But since this is not true for
all architectures, it is also important to carefully zeroing out of some
variables in the code (explained in the kernel4XX-comments).

History

#1 Updated by Ehud Schreiber about 9 years ago

Created an attachment (id=546)
mdp file for energy minimization

#2 Updated by Per Larsson about 9 years ago

Hi!

On what machine is this, and also what compiler are you using?

/Per

(In reply to comment #1)

Created an attachment (id=546) [details]
mdp file for energy minimization

#3 Updated by Ehud Schreiber about 9 years ago

The machine is intel x86_64.
The compilers are from redhat linux 5 - i.e. gcc 4.1 / gfortran.

#4 Updated by Berk Hess about 9 years ago

I can't reproduce this, I get differences of the order of 0.01 kJ/mol.
But energy minimization can always get stuck in some local minimum
and the results can depend on minor differences in any quantity,
such as the number of threads used.

BTW you mdp file gives an error, because you use Ace-approximation
instead of Ace-approx.
I will change this for the 4.5.2 release, since we don't like
unnecessary abbreviations.

Berk

#5 Updated by Berk Hess about 9 years ago

I forgot to say:
Please compare the energy of the first step in the log file.
In my case those were completely identical.
If they are also identical in your case,
we have to assume one of your two minimizations got stuck
in a higher local minimum.

Berk

#6 Updated by Ehud Schreiber about 9 years ago

(In reply to comment #4)

I can't reproduce this, I get differences of the order of 0.01 kJ/mol.
But energy minimization can always get stuck in some local minimum
and the results can depend on minor differences in any quantity,
such as the number of threads used.

BTW you mdp file gives an error, because you use Ace-approximation
instead of Ace-approx.
I will change this for the 4.5.2 release, since we don't like
unnecessary abbreviations.

Berk

Hi,

I'm confused.

I tried changing the parameter to
sa_algorithm = Ace-approx
as you suggested. However, then a fatal error occurs - the program explicitly asks for Ace-approximation:

grompp_d -f $dir/mdp/em1sa_approx.mdp -p topol.top -c box.gro -o em1sa_approx.tpr

:-)  G  R  O  M  A  C  S  (-:
Grunge ROck MAChoS
:-)  VERSION 4.5.1  (-:
.
.
ERROR 1 [file /dir/leads_test/schreib/StructuralBiology/Fuzeon/1aik/ImplicitSolvent/mdp/em1sa_approx.mdp, line 14]:
Invalid enum 'Ace-approx' for variable sa_algorithm, using 'No'
Next time use one of: 'No' 'Ace-approximation' 'Still'
.
.
Program grompp_d, VERSION 4.5.1
Source code file: readinp.c, line: 244

Fatal error:
There was 1 error in input file(s)

On the other hand, when I use
sa_algorithm = Ace-approximation
the grompp_d command succeeds, and mdout.mdp shows this line, but running
mdrun_d -v -nt 1 -nice 0 -deffnm em1sa
gives, in the em1sa.log file, the line
sa_algorithm = No

So, what is the correct enum parameter for sa_algorithm?
Are the "Ace-approximation" and "no" options equivalent?
If not, this might well be the source of my problems.

As for your comment #5, when I compute energies for a run with -nt 1 , I get:

  1. This file was created Tue Oct 5 15:51:05 2010
  2. by the following command:
  3. g_energy_d -s em1sa.tpr -f em1sa.edr -o em1sa.potential_energy.xvg #
  4. g_energy_d is part of G R O M A C S: #
  5. GROningen MAchine for Chemical Simulation #
    title "Gromacs Energies"
    xaxis label "Time (ps)"
    yaxis label "(kJ/mol), (bar nm)"
    @TYPE xy
    view 0.15, 0.15, 0.75, 0.85
    legend on
    legend box on
    legend loctype view
    legend 0.78, 0.8
    legend length 2
    s0 legend "Bond"
    s1 legend "Angle"
    s2 legend "Proper Dih."
    s3 legend "Ryckaert-Bell."
    s4 legend "GB Polarization"
    s5 legend "Nonpolar sol."
    s6 legend "LJ-14"
    s7 legend "Coulomb-14"
    s8 legend "LJ (SR)"
    s9 legend "Coulomb (SR)"
    s10 legend "Potential"
    @ s11 legend "#Surf*SurfTen"
    0.000000 2673.594266 565.495718 23.237730 543.550788 -3676.757587 119.003468 1217.407573 3634.403116 -515.525066 -9561.614965 -4977.204958 0.000000
    .
    .

The corresponding command for a run without the -nt flag gives, for the initial time:

0.000000  2673.594266  565.495718   23.237730  543.550788  178.975614  119.003468  1217.407573  3634.403116  -520.683563  -13566.313202  -5131.328493    0.000000

so as you can see, the "GB Polarization", "LJ (SR)" and "Coulomb (SR)" fields are different, and therefore also the "Potential" which is the sum of all preceding fields.

Please do not hesitate to ask me anything needed to sort out this perplexing situation.

#7 Updated by Berk Hess about 9 years ago

Have you just done a git pull?
Otherwise I can't understand why it asks for Ace-approximation.
I only committed this change a few hours ago.

Could you mail me your tpr file, so I can check with that directly?

Berk

#8 Updated by Berk Hess about 9 years ago

Did you compile with fortran?
In that case, could you try compiling and running without fortran?

Berk

#9 Updated by Ehud Schreiber about 9 years ago

(In reply to comment #7)

Have you just done a git pull?
Otherwise I can't understand why it asks for Ace-approximation.
I only committed this change a few hours ago.

Could you mail me your tpr file, so I can check with that directly?

Berk

Hi,

I never deal with any gits - asking our system administrator to compile the official releases is more than enough work for him.
Still, I'll ask him to try and compile without fortran (currently it's compiled with it).
In fact, currently I work with a kind of hybrid system - single precision is still 4.0.7 while double precision is 4.5.1. I hope this causes no trouble as I used only the _d version commands in the example above.
I'll also email you the tpr file.

Thanks,
Ehud.

#10 Updated by Berk Hess about 9 years ago

NEVER use fortran on x86, it doesn't make things faster (only slower).

Also double precision is almost never required, it makes things 50% slower.
But you can mix input files and binaries of single and double precision.

Berk

#11 Updated by Ehud Schreiber about 9 years ago

(In reply to comment #10)

NEVER use fortran on x86, it doesn't make things faster (only slower).

Also double precision is almost never required, it makes things 50% slower.
But you can mix input files and binaries of single and double precision.

Berk

Thanks,
we'll take the fortran advice into account.

I use double precision because I may later need to use also nm analysis, and because this allowed me not to replace the trusted 4.0.7 version.

Ehud.

#12 Updated by Berk Hess about 9 years ago

Have you tried without fortran?

Berk

#13 Updated by Ehud Schreiber about 9 years ago

(In reply to comment #12)

Have you tried without fortran?

Berk

Hi,

I just tried that, and I seem to get different numbers, but the same problem.

without -nt (running on 6 threads), I get for the initial configuration
0.000000 2673.594266 565.495718 23.237730 543.550788 178.975614 119.003468 1217.407573 3634.403116 -520.683563 -13566.313202 -5131.328493 0.000000
while with -nt 1 I get
0.000000 2673.594266 565.495718 23.237730 543.550788 -3676.757587 119.003468 1217.407573 3634.403116 -515.525066 -9561.614965 -4977.204958 0.000000
Again the GB Polarization contribution to the potential, as well as the two Short-Range ones, differ.
Note that the #Surf*SurfTen contribution is always zero. I don't really see what it means and what is its relation to the Nonpolar sol. one, but either it's irrelevant (and shouldn't be printed at all) or it might be wrong, which can cause trouble.
At least the sum of contributions equals the potential.

Also, I again use sa_algorithm=Ace-approximation in the mdp file, and see sa_algorithm=No in the mdout.mdp file. Using sa_algorithm=Ace-approx gives a fatal error in grompp_d ; my guess is that this may be the source of the trouble.

I'll also attach the new tpr file, produced by using pdb2gmx_d , grompp_d and editconf_d compiled without fortran.

Thanks Again,
Ehud.

#14 Updated by Ehud Schreiber about 9 years ago

Created an attachment (id=549)
tpr file for version compiled without fortran

#15 Updated by Ehud Schreiber about 9 years ago

(In continuation to comment #13)

I forgot to mention that the results with -nt 1 are identical for the two compilations (with and without fortran); this suggests, encouragingly, that they are correct.

#16 Updated by Berk Hess about 9 years ago

I you have valgrind installed it would help if you could run with that:
valgrind mdrun -nt 2 ...
and report the output.

I would assume that also with -nt 2 the results are different.

The Ace-approximation is not the issue. Or you get an error from grompp
or it works (and even if the setting would be wrong, you should get
the same answer independent of the number of nodes.

The surften term is the surface tension of the system (nothing to do with GB),
which apparently is zero.

Berk

#17 Updated by Per Larsson about 9 years ago

Also, could you please try without the sse-loops, to see if that helps.
Set GMX_NOOPTIMIZEDKERNELS=1, and report back.

/Per

#18 Updated by Ehud Schreiber about 9 years ago

(In reply to comment #16)

The -nt 2 results are indeed different from the -nt 1 ones. I don't know if it's interesting, but they seem very close to some previous run with 4 threads.

I ran with valgrind:

valgrind -v --leak-check=full mdrun_d -nice 0 -nt 2 -v -deffnm em1nt2sa_valgrind |& tee em1nt2sa_valgrind.stderr

and I attach the resulting em1nt2sa_valgrind.stderr file.
It seems to claim no memory errors were found, only leaks.
I hope it helps to clarify things.

Thanks,
Ehud.

#19 Updated by Ehud Schreiber about 9 years ago

Created an attachment (id=551)
stdout+stderr of mdprun_d with valgrind

#20 Updated by Ehud Schreiber about 9 years ago

(In reply to comment #17)

Also, could you please try without the sse-loops, to see if that helps.
Set GMX_NOOPTIMIZEDKERNELS=1, and report back.

/Per

Hi,
I wasn't sure how this should be done, so I tried both the shell command

set GMX_NOOPTIMIZEDKERNELS=1

before running mdrun_d, and also adding the following to the mdp file:

define = -DGMX_NOOPTIMIZEDKERNELS=1

in both cases, though, as far as I could see, the change didn't influence the output of the -nt 2 run.

#21 Updated by Per Larsson about 9 years ago

Energies were summed incorrectly in double precision SSE-loops.
The reason was that _mm128d _mm_add_sd(a,b) which should give back r0=a0+b0, r1=a1 instead gave back r1=0 when compiled with gcc (tested with 4.1.3) and -O3.

This can be circumvented by zeroing the appropriate variables before adding the potentials, and then use _mm_add_pd instead.

Also available in: Atom PDF