Reconsider default 1 OpenMP thread per thread-MPI rank
We always have 1 OpenMP thread per thread-MPI rank by default. This is best at small/medium rank and simple. But with 128 core nodes, this might no longer be the best. It might even limit the system size when we will soon communicate the whole tpr as one chunk of memory.