Task #2304
Document and propose tracking mdrun heuristics
Description
mdrun employs a number of heuristics for picking defaults at runtime in order to optimize out-of-the-box performance behavior or to advise users about their choice of options/parameters. These heuristics need regular revision and maintenance so we should facilitate finding/tracking these in order to plan for revising and updating them to the changing code and external conditions.
The aim is two-fold:- gather the heuristics (will keep adding them below with pointers to the source roughly around v2018-b1)
- come up with a way to track them (e.g. mark them with GMX_HEURISTIC)
- OpenMP recommended #threads/rank
source:src/gromacs/taskassignment/resourcedivision.cpp#L115
- Minimum number of atoms per thread-MPI rank w/wo GPU
source:src/gromacs/taskassignment/resourcedivision.cpp#L89
- Using OpenMP-only vs MPI+OpenMP parallelizaiton with GPUs
source:src/gromacs/taskassignment/resourcedivision.cpp#L125 and nthreads_omp_faster()
- Automation of tMPI rank count and rank to OpenMP thread division
get_tmpi_omp_thread_d ivision() in source:src/gromacs/taskassignment/resourcedivision.cpp#L201 and
get_nthreads_mpi() in source:src/gromacs/taskassignment/resourcedivision.cpp#L335
and efficency check in check_resource_division_efficiency()
- #atoms per code for SMT on/of threshold
source:src/gromacs/taskassignment/resourcedivision.cpp#L868
- thread pinning stride (with/without topolgy info
NOTE: the comment on the no topology available case is outdated (now there are 2 existing arch where it's not true)
source:src/gromacs/mdrunutility/threadaffinity.cpp#L221
- switch to MPMD / separate PME rank mode
- Target NB list size per GPU multiprocessor;
gpu_min_ci_balanced_factor
in
source:src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda_data_mgmt.cu#L78
and
source:src/gromacs/mdlib/nbnxn_ocl//nbnxn_ocl_data_mgmt.cpp#L85
- GPU API wait overhead margin (
gpuWaitApiOverheadMargin
)
source:src/gromacs/mdlib/sim_util.cpp#L1436
- Tabulated VS analytical nonbonded Ewald kernels per arch (both CPU and GPU)
- add SIMD heuristics (kernel flavor choice / CPU arch)?
- DD heuristics?
History
#1 Updated by Erik Lindahl over 3 years ago
- Tracker changed from Bug to Task
- Affected version deleted (
git master)
#2 Updated by Szilárd Páll about 3 years ago
- Description updated (diff)
#3 Updated by Szilárd Páll about 3 years ago
It seems like custom commands may be useful an example (though one that targets a different use-case) is shown here.
#4 Updated by Szilárd Páll about 3 years ago
+ GMX_BONDED_NTHREAD_UNIFORM / init_bonded_threading()
#5 Updated by Szilárd Páll almost 3 years ago
+ c_pullMaxNumLocalAtomsSingleThreaded in pull_internal.h (currently tuned with/for Haswell / gcc 5).
#6 Updated by Mark Abraham over 2 years ago
- Target version changed from 2019 to future