Project

General

Profile

Task #3445

Task #2792: Improvement of PME gather and spread CUDA kernels

create heuristic for c_skipNeutralAtoms

Added by Szilárd Páll 2 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Category:
-
Target version:
-
Difficulty:
uncategorized
Close

Description

Skipping zero-charge atoms can have significant performance benefit (measured ~10-15% spread+gather time a while ago before the new spline recompute). The initial implementation hardcoded the c_skipNeutralAtoms=false.

History

#1 Updated by Szilárd Páll 2 months ago

  • Description updated (diff)

#2 Updated by Jonathan Vincent 2 months ago

  • Assignee set to Jonathan Vincent

So what is a good set of input to test this on?

We have been using water boxes previously which will not work as there are no zero charge atoms.

Would ADH and similar work?

What we have right now are

  • villin -> 5,032 atoms
  • rnase dodec -> 16,816 atoms
  • JAC (DHFR) -> 23,558 atoms
  • ADH dodec -> 95,561 atoms
  • Prace A -> 141,677 atoms
  • Cellulose -> 408,609 atoms
  • STMV –> 1,066,628 atoms

Do you have any feeling on what will affect the performance between skipping the neutral atoms or not in terms of input?

#3 Updated by Szilárd Páll 2 months ago

I don't think we need to test all of those inputs. We should think of what are the conditions that have an impact on performance and pick accordingly.

For this parameter I think the aspects that matters are:
- input size / size relative to hardware "size" and uarch
- fraction of zero charge atoms
- atom order (DD sort?)

Have I missed something?

Also available in: Atom PDF