PP-PME load balancing discards tested fastest setting
As a result of DD and PP-PME load balancing influencing each-other, mdrun can end up discarding a setting faster than any others tried as a result of the DD load balancing shrinking domains enough that the fast cut-off setting ends up being too long for the cell size.
This happens already at moderate level of parallelization, e.g the attached log files were obtained 1-2 nodes of a Cray XK7 (16C Bulldozer + K20X) running a 134k atoms system.
While a proper solution to this issue would be addressing the DD load imbalance itself, this requires substantial effort and therefore will happen post-4.6. However, the eager nature of the DD load balancing means that it can not only prevent an already discovered and known to be faster setting to be used - which a user could notice (although that's quite unlikely) -, but such a cut-off might not even get tested by the PP-PME load-balancing if the domains shrink fast enough.