Project

General

Profile

Bug #2504

Updated by Mark Abraham over 1 year ago

I have tried to run a standard MD simulation on a KNL cluster but I get the following error:

<pre>
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 25280 RUNNING AT r065c04s03
= EXIT CODE: 132
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 25280 RUNNING AT r065c04s03
= EXIT CODE: 4
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764
===================================================================================

</pre>
The code is run as:

mpiexec -np 32 mdrun_knl -s topol0 -nb cpu -v -maxh 23.9 -nsteps -1 >& log

and is compiled with Intel 2017, the same happens with the intel 2018 and using the fftw instead of the mkl.

This is the log

md.2018.1.log:

<pre>
ROMACS version: 2018.1
Precision: single
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: disabled
SIMD instructions: AVX_512_KNL
FFT library: Intel MKL
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: hwloc-1.11.0
Tracing support: disabled
Built on: 2018-05-17 11:33:17
Built by: ccamillo@r000u06l01 [CMAKE]
Build OS/arch: Linux 3.10.0-327.36.3.el7.x86_64 x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Build CPU family: 6 Model: 79 Stepping: 1
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /cineca/prod/opt/compilers/intel/pe-xe-2017/binary/bin/icc Intel 17.0.4.20170411
C compiler flags: -xMIC-AVX512 -mkl=sequential -std=gnu99 -O3 -DNDEBUG -ip -funroll-all-loops -alias-const -ansi-alias -no-prec-div -fimf-domain-exclusion=14 -qoverride-limits
C++ compiler: /cineca/prod/opt/compilers/intel/pe-xe-2017/binary/bin/icpc Intel 17.0.4.20170411
C++ compiler flags: -xMIC-AVX512 -mkl=sequential -std=c++11 -O3 -DNDEBUG -ip -funroll-all-loops -alias-const -ansi-alias -no-prec-div -fimf-domain-exclusion=14 -qoverride-limits

</pre>
(it stops here)

the same TPR with the same setup on the same cluster with gromacs 2016.5 compiled in the same way works well

md.2016.5.log:
<pre>
GROMACS version: 2016.5
Precision: single
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support: disabled
SIMD instructions: AVX_512_KNL
FFT library: Intel MKL
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: hwloc-1.11.0
Tracing support: disabled
Built on: Thu May 17 12:17:27 CEST 2018
Built by: ccamillo@r000u06l01 [CMAKE]
Build OS/arch: Linux 3.10.0-327.36.3.el7.x86_64 x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Build CPU family: 6 Model: 79 Stepping: 1
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /cineca/prod/opt/compilers/intel/pe-xe-2017/binary/bin/icc Intel 17.0.4.20170411
C compiler flags: -xMIC-AVX512 -mkl=sequential -std=gnu99 -O3 -DNDEBUG -ip -funroll-all-loops -alias-const -ansi-alias
C++ compiler: /cineca/prod/opt/compilers/intel/pe-xe-2017/binary/bin/icpc Intel 17.0.4.20170411
C++ compiler flags: -xMIC-AVX512 -mkl=sequential -std=c++0x -O3 -DNDEBUG -ip -funroll-all-loops -alias-const -ansi-alias

Running on 1 node with total 68 cores, 272 logical cores
Hardware detected on host r065c06s01 (the node of MPI rank 0):
CPU info:
Vendor: Intel
Brand: Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz
Family: 6 Model: 87 Stepping: 1
Features: aes apic avx avx2 avx512f avx512pf avx512er avx512cd clfsh cmov cx8 cx16 f16c fma htt lahf mmx msr nonstop_tsc pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
SIMD instructions most likely to fit this hardware: AVX_512_KNL
SIMD instructions selected at GROMACS compile time: AVX_512_KNL

Hardware topology: Basic
Sockets, cores, and logical processors:
Socket 0: [ 0 68 136 204] [ 1 69 137 205] [ 2 70 138 206] [ 3 71 139 207] [ 4 72 140 208] [ 5 73 141 209] [ 6 74 142 210] [ 7 75 143 211] [ 8 76 144 212] [ 9 77 145 213] [ 10 78 146 214] [ 11 79 147 215] [ 12 80 148 216] [ 13 81 149 217] [ 14 82 150 218] [ 15 83 151 219] [ 16 84 152 220] [ 17 85 153 221] [ 18 86 154 222] [ 19 87 155 223] [ 20 88 156 224] [ 21 89 157 225] [ 22 90 158 226] [ 23 91 159 227] [ 24 92 160 228] [ 25 93 161 229] [ 26 94 162 230] [ 27 95 163 231] [ 28 96 164 232] [ 29 97 165 233] [ 30 98 166 234] [ 31 99 167 235] [ 32 100 168 236] [ 33 101 169 237] [ 34 102 170 238] [ 35 103 171 239] [ 36 104 172 240] [ 37 105 173 241] [ 38 106 174 242] [ 39 107 175 243] [ 40 108 176 244] [ 41 109 177 245] [ 42 110 178 246] [ 43 111 179 247] [ 44 112 180 248] [ 45 113 181 249] [ 46 114 182 250] [ 47 115 183 251] [ 48 116 184 252] [ 49 117 185 253] [ 50 118 186 254] [ 51 119 187 255] [ 52 120 188 256] [ 53 121 189 257] [ 54 122 190 258] [ 55 123 191 259] [ 56 124 192 260] [ 57 125 193 261] [ 58 126 194 262] [ 59 127 195 263] [ 60 128 196 264] [ 61 129 197 265] [ 62 130 198 266] [ 63 131 199 267] [ 64 132 200 268] [ 65 133 201 269] [ 66 134 202 270] [ 67 135 203 271]
</pre>

Back