Feature #1688

g_cluster "middle" is not exactly the same as the definition in the cited paper

Added by Chris Neale almost 6 years ago. Updated almost 6 years ago.

Target version:


Daura et al. (Agewandte Chemie, 1999, 38:236-240) say in their methods section: "The structure with the highest number of neighbors was taken as the center of a cluster"

However, gmx_cluster.c on lines 1110 - 1137 (version 4.6.7) re-defines the cluster "middle" as the conformation with the lowest average RMSD to all other members of the cluster. In this way one ends up with a cluster that has a "middle" whose RMSD to some members of the cluster can be greater than the clustering cutoff (confirmed with g_rms)

When I go back and add some extra output in gmx_cluster.c around line 659 (version 4.6.7), printing the value of the variable k for each conformation, I can identify the conformation that makes the most contacts with RMSD less than the cutoff to other members of the total input. If I do this on a single cluster identified by an initial run through g_cluster, then I can identify what Daura et al. seem to have used as the cluster center (and I confirmed it using g_rms).

In light of the above, I suggest adding this identifier to the output for each cluster in the .log file so that one can use the current "middle" or Daura's definition of the "center". An alternative would be to make it clear that the definition of the cluster "middle" in g_cluster is not the same as the definition of the cluster "center" in Daura et al.

Please note that I can see that there are cases in which the current output is superior to the Daura definition. Nevertheless, the program is not currently as advertised and this is what I think could use some improvement.

Thank you,


#1 Updated by Chris Neale almost 6 years ago

I should clarify that this relates to the "gromos" method in g_cluster and I didn't read up on the published definitions of the other clustering methods.

#2 Updated by João M. Damas almost 6 years ago

I agree with Chris. My suggestion would be to output the Daura's "center of cluster" to stderr as a warning for when using the gromos method. This solution does not require much rewriting of the code.

Also available in: Atom PDF