Bug #2980
taskassignment fails with unit tests when GPUs and custom number of ranks is used
Status:
New
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
-
Affected version - extra info:
master
Affected version:
Difficulty:
uncategorized
Description
$ bin/mdrun-test -ntmpi 3 [...] Opened /home/pszilard/projects/gromacs/gromacs-master/build_gcc73_cuda92/src/programs/mdrun/tests/Testing/Temporary/PmeTest_ReproducesEnergies_spc-and-methanol_PmeOnCpuTune.edr as single precision energy file Last energy frame read 20 time 0.020 Reading file /home/pszilard/projects/gromacs/gromacs-master/build_gcc73_cuda92/src/programs/mdrun/tests/Testing/Temporary/PmeTest_ReproducesEnergies.tpr, VERSION 2020-dev-20190606-ec71536 (single precision) Can not increase nstlist because an NVE ensemble is used Using 3 MPI threads Using 1 OpenMP thread per tMPI thread ------------------------------------------------------- Program: mdrun-test, version 2020-dev-20190606-ec71536 Source file: src/gromacs/taskassignment/taskassignment.cpp (line 256) Function: std::vector<std::vector<gmx::GpuTaskMapping> >::value_type gmx::runTaskAssignment(const std::vector<int>&, const std::vector<int>&, const gmx_hw_info_t&, const gmx::MDLogger&, const t_commrec*, const gmx_multisim_t*, const gmx::PhysicalNodeCommunicator&, const std::vector<gmx::GpuTask>&, bool, PmeRunMode) MPI rank: 0 (out of 3) Inconsistency in user input: There were 3 GPU tasks found on node racoon, but 2 GPUs were available. If the GPUs are equivalent, then it is usually best to have a number of tasks that is a multiple of the number of GPUs. You should reconsider your GPU task assignment, number of ranks, or your use of the -nb, -pme, and -npme options, perhaps after measuring the performance you can get. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors
The above can't run without passing -gputasks
; luckily the GMX_GPUTASKS
environment variable can be used with unit tests too (which don't have the command line option), but this would require passing -nb/-pme
which can't be done:
$ GMX_GPUTASKS="001" bin/mdrun-test -ntmpi 3 [...] This run will generate roughly 0 Mb of data There were 3 notes Reading file /home/pszilard/projects/gromacs/gromacs-master/build_gcc73_cuda92/src/programs/mdrun/tests/Testing/Temporary/PmeTest_ReproducesEnergies.tpr, VERSION 2020-dev-20190606-ec71536 (single precision) ------------------------------------------------------- Program: mdrun-test, version 2020-dev-20190606-ec71536 Source file: src/gromacs/taskassignment/decidegpuusage.cpp (line 132) Function: bool gmx::decideWhetherToUseGpusForNonbondedWithThreadMpi(gmx::TaskTarget, const std::vector<int>&, const std::vector<int>&, gmx::EmulateGpuNonbonded, bool, bool, bool, int) Inconsistency in user input: When you use mdrun -gputasks, -nb and -ntmpi must be set to non-default values, so that the device IDs can be interpreted correctly. If you simply want to restrict which GPUs are used, then it is better to use mdrun -gpu_id. Otherwise, setting the CUDA_VISIBLE_DEVICES environment variable in your bash profile or job script may be more convenient. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors -------------------------------------------------------