Better parallel random number generator
The random number generators currently used for have correlation between parallel processes (as seed the input seed plus the rank is used), and are not reproducible (which makes testing and debugging unnecessary hard). The code doesn't say which algorithm is used, so I'm not sure whether the algorithm itself doesn't have any correlations. http://www.deshawresearch.com/resources_random123.html is small API licensed under BSD with a couple of PRNG which are fast, don't fail any statics tests, and are reproducible. We should consider replacing the current PRNG with one of those. The paper http://dl.acm.org/citation.cfm?doid=2063405 has the details including performance numbers, claiming it is the fastest PRNG which doesn't fail any statistics tests of TestU01 (as far as I know the gold standard for RNG) and additional tests for parallel RNG.
Replace all mdrun rngs with cycle based rng
The stateful random rumber generator (rng) used previously doesn't
produce reproducible results in parallel for sd/bd and doesn't
produce reproducible results for continutation for replica exchange.
The rng state has been removed from the checkpoint file.
#2 Updated by Roland Schulz over 6 years ago
The question is in which version should it go? Is it a bugfix or feature? The reason we might want to categorize it as a bugfix is that the current approach to seed with seed+mpi_rank doesn't give statically independent random numbers. Whether that is a problem for the algorithms we use, I don't know.
#3 Updated by Alexey Shvetsov over 6 years ago
Another idea may be use following approach
1. use fist rng to generate seeds for mpi/openmp processes
2. init mpi/openmp process with that seed
i use that for mc in nsfactor.c http://redmine.gromacs.org/projects/gromacs/repository/revisions/master/entry/src/tools/nsfactor.c lines 200-235
#7 Updated by Roland Schulz about 5 years ago
- ARS-7: 2.2x faster. Requires hardware support (should be available on all modern CPU) and no support for GPU (not sure this is important). If we choose this we need a fall-back and the regression-tests wouldn't match. Not sure how to deal with that. Otherwise I would recommend ARS-7.
- Threefry-2×64-13: 1.24x faster. Smaller number of rounds. Still crush resistant so it should be fine but the paper recommends to have a few extra rounds of safety margin. And given that the performance difference is small that's probably not a bad idea.
- Threefry-4×64-20: 1.25x higher throughput. Returns 4 8byte random numbers. Given that we usually don't have use for 4 the effective speed is lower.
Further, I suggest to use as the 2 counter inputs, the atomic number and the step number, and as the 2 keys, the seed from the mdp file and a hardcoded one in the code. Any better suggestion?