Project

General

Profile

Bug #1001

REMD statistics not collected properly any more

Added by Mark Abraham over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Michael's large squashed commit c7a82654 "Merging in free energy, exp. ensemble, & andersen t-control to 4.6" makes some changes to the organization of REMD which I didn't anticipate from the reference in the commit message to "expanded ensemble". Had it been mentioned, I'd have known I wanted to review them. The corresponding updates to the manual look good, though :-) In future, I think we need squashed multi-feature commit messages to be more fully descriptive of what changes have occurred, please. Are there any more changes introduced in this commit that we need to be aware of?

One side effect of this commit is that some of the book-keeping about the REMD exchange rate is no longer correct, since many exchanges did occur in a test REMD simulation I ran with release-4-6:

Replica exchange statistics
Repl 599 attempts, 300 odd, 299 even
Repl 0 1 2 3
Repl .16 .14 .18
Repl number of exchanges:
Repl 0 1 2 3
Repl 0 0 0
Repl average number of exchanges:
Repl 0 1 2 3
Repl .00 .00 .00

Empirical Transition Matrix
1 2 3 4
0.9265 0.0735 0.0000 0.0000 0
0.0735 0.8531 0.0735 0.0000 1
0.0000 0.0735 0.8297 0.0968 2
0.0000 0.0000 0.0968 0.9032 3

This should be easy for someone to fix, but I'm worried about what might be lots of other hidden effects, which can't be tested for if we don't really know what's in the feature.

bug_1001.tgz (248 KB) bug_1001.tgz Mark Abraham, 09/10/2012 03:49 PM
bug_1001_input.tgz (286 KB) bug_1001_input.tgz Mark Abraham, 09/11/2012 05:56 AM

Associated revisions

Revision 6a9499a9 (diff)
Added by Michael Shirts over 6 years ago

Fixing issues with replica exchange

Avoids unnecessary communication, clarifies variable names, restores
exchange statistics removed in c7a82654, fixes memory leaks, and some
other minor house-keeping noticed by Mark Abraham.

Fixes #1001

Change-Id: I8a9032e5946117d87f672e16cba525c5ed9720f9

History

#1 Updated by Michael Shirts over 6 years ago

OK, let me take a look at it. Can you append files so I'm running the same thing you are?

#2 Updated by Mark Abraham over 6 years ago

Further, in replica_exchange()

get_replica_exchange(fplog,ms,re,enerd,det(state->box),step,time,exchanges,KE_scale_factors);
bExchanged = (exchanges[re->repl] != re->nrepl); /* only mark as exchanged if it has a shuffled index
*/

is wrong for any kind of REMD, including the "expanded ensemble" feature Michael was introducing, because get_replica_exchange() sets exchanges[i] to re->ind[i] which is evidently != re->nrepl so every replica thinks it exchanges every step.

It looks to me like Michael hasn't included the full history of this development in the squashed commit. Michael?

#3 Updated by Mark Abraham over 6 years ago

.tpr files, FWIW

#4 Updated by Michael Shirts over 6 years ago

Agreed -- history was lost in the squash. Hopefully now that things are being integrated better, and I know the versioning system better, this will not happen again.

For the statistics -- clearly, something is missing. I'll look into that.

For the error Mark caught -- it actually is a bug, but doesn't do what Mark thinks it does here -- it marks for a possible exchange, but the actual exchange is done later. So replica exchange is actually done correctly -- but more communication is done that doesn't need to be, so this should be fixed.

I've got the systems up and working, so should be able to post a fix in the next day or two.

Note that this code does not do expanded ensemble -- it does Temperature + Hamiltonian exchange, as well as allowing for multiple swaps each time that the system pauses to swap. Expanded ensemble is done elsewhere.

#5 Updated by Mark Abraham over 6 years ago

  • Assignee changed from Berk Hess to Michael Shirts

Michael Shirts wrote:

Agreed -- history was lost in the squash. Hopefully now that things are being integrated better, and I know the versioning system better, this will not happen again.

Great :-)

For the statistics -- clearly, something is missing. I'll look into that.

For the error Mark caught -- it actually is a bug, but doesn't do what Mark thinks it does here -- it marks for a possible exchange, but the actual exchange is done later. So replica exchange is actually done correctly -- but more communication is done that doesn't need to be, so this should be fixed.

Agreed, there is further logic later in replica_exchange that does excess communication before finally choosing to do an exchange. I assumed bExchanged still meant what its name suggests, and I suppose that it will when the code gets fixed.

I've got the systems up and working, so should be able to post a fix in the next day or two.

Note that this code does not do expanded ensemble -- it does Temperature + Hamiltonian exchange, as well as allowing for multiple swaps each time that the system pauses to swap. Expanded ensemble is done elsewhere.

OK. Life is hard when one doesn't know the algorithm and the commit message is terse :-)

#6 Updated by Michael Shirts over 6 years ago

OK, I've got a fix. I can't patch right now, because for some reason 'git pull --rebase origin' keeps pausing at random places before completing. I'll try again in the morning.

#7 Updated by Michael Shirts over 6 years ago

Also, can you post grompp input files as well? Easier than regenerating them from the gmxdump so I can make some modifications.

#8 Updated by Mark Abraham over 6 years ago

They're 4.6-generation .cpt files, so should be OK for you to use.

#9 Updated by Erik Lindahl over 6 years ago

  • Status changed from New to Closed

Also available in: Atom PDF