identify and note about/tune task offload in GPU-bound runs
There is a range of hardware/input combinations where the default eager offload of both NB and PME tasks is not beneficial. The limited scope testing that was also generally biased by our mostly high-end hardware did not bring out enough use cases to warrant better identifying severely GPU-limited cases and at least noting this or possibly auto-tuning it.
I suggest that we:
- create some heuristics and at least note in the log that the user should consider mixed / CPU PME if the run is very GPU bound
- look into auto-tuning PME task placement