Additional utilities

This module defines miscellaneous utility functions that is public to users.

calcTree(names, distance_matrix, method='upgma', linkage=False)[source]

Given a distance matrix, it creates an returns a tree structure.

Parameters:
  • names (list, ndarray) – a list of names
  • distance_matrix (ndarray) – a square matrix with length of ensemble. If numbers does not match names it will raise an error
  • method (str) – method used for constructing the tree. Acceptable options are "upgma", "nj", or methods supported by linkage() such as "single", "average", "ward", etc. Default is "upgma"
  • linkage (bool) – whether the linkage matrix is returned. Note that NJ trees do not support linkage
clusterMatrix(distance_matrix=None, similarity_matrix=None, labels=None, return_linkage=None, **kwargs)[source]

Cluster a distance matrix using scipy.cluster.hierarchy and return the sorted matrix, indices used for sorting, sorted labels (if labels are passed), and linkage matrix (if return_linkage is True).

Parameters:
  • distance_matrix (ndarray) – an N-by-N matrix containing some measure of distance such as 1. - seqid_matrix (Hamming distance), rmsds, or distances in PCA space
  • similarity_matrix (ndarray) – an N-by-N matrix containing some measure of similarity such as sequence identity, mode-mode overlap, or spectral overlap. Each element will be subtracted from 1. to get distance, so make sure this is reasonable.
  • labels (list) – labels for each matrix row that can be returned sorted
  • no_plot (bool) – if True, don’t plot the dendrogram. default is True
  • reversed (bool) – if set to True, then the sorting indices will be reversed.

Other arguments for linkage() and dendrogram() can also be provided and will be taken as kwargs.

showLines(*args, **kwargs)[source]

Show 1-D data using plot().

Parameters:
  • x (ndarray) – (optional) x coordinates. x can be an 1-D array or a 2-D matrix of column vectors.
  • y (ndarray) – data array. y can be an 1-D array or a 2-D matrix of column vectors.
  • dy (ndarray) – an array of variances of y which will be plotted as a band along y. It should have the same shape with y.
  • lower (ndarray) – an array of lower bounds which will be plotted as a band along y. It should have the same shape with y and should be paired with upper.
  • upper (ndarray) – an array of upper bounds which will be plotted as a band along y. It should have the same shape with y and should be paired with lower.
  • alpha (float) – the transparency of the band(s) for plotting dy.
  • beta (float) – the transparency of the band(s) for plotting miny and maxy.
  • ticklabels (list) – user-defined tick labels for x-axis.
showMatrix(matrix, x_array=None, y_array=None, **kwargs)[source]

Show a matrix using imshow() or scatter() if markersize is provided.

Curves on x- and y-axis can be added.

Parameters:
  • matrix (ndarray) – matrix to be displayed
  • x_array (ndarray) – data to be plotted above the matrix
  • y_array (ndarray) – data to be plotted on the left side of the matrix
  • percentile (float) – a percentile threshold to remove outliers, i.e. only showing data within p-th to 100-p-th percentile
  • vmin (float) – a minimum value threshold to remove outliers, i.e. only showing data greater than vmin This overrides percentile.
  • vmax (float) – a maximum value threshold to remove outliers, i.e. only showing data less than vmax This overrides percentile.
  • interactive (bool) – turn on or off the interactive options
  • xtickrotation (float) – how much to rotate the xticklabels in degrees default is 0
  • markersize (float) – size of square markers for using scatter() to help show matrices with small data regions compared to zeros. Note only non-zeros are plotted so the colorbar range may change if not using norm Default is None, which results in using imshow()
reorderMatrix(names, matrix, tree, axis=None)[source]

Reorder a matrix based on a tree and return the reordered matrix and indices for reordering other things.

Parameters:
  • names (list) – a list of names associated with the rows of the matrix These names must match the ones used to generate the tree
  • matrix (ndarray) – any square matrix
  • tree (Tree) – any tree from calcTree()
  • axis (int) – along which axis the matrix should be reordered. Default is None which reorder along all the axes
findSubgroups(tree, c, method='naive', **kwargs)[source]

Divide tree into subgroups using a criterion method and a cutoff c. Returns a list of lists with labels divided into subgroups.

getCoords(data)[source]

Get coordinates from data if possible and handle errors well.

Parameters:data (numpy.ndarray, Atomic, Ensemble, Trajectory) – a coordinate set or an object with getCoords method
getLinkage(names, tree)[source]

Obtain the linkage() matrix encoding tree.

Parameters:
  • names (list, ndarray) – a list of names, the order determines the values in the linkage matrix
  • tree (Tree) – tree to be converted
getTreeFromLinkage(names, linkage)[source]

Obtain the tree encoded by linkage.

Parameters:
  • names (list, ndarray) – a list of names, the order should correspond to the values in linkage
  • linkage (ndarray) – linkage matrix
clusterSubfamilies(similarities, n_clusters=0, linkage='all', method='tsne', cutoff=0.0, **kwargs)[source]

Perform clustering based on members of the ensemble projected into lower a reduced dimension.

Parameters:
  • similarities (ndarray) – a matrix of similarities for each structure in the ensemble, such as RMSD-matrix, dynamics-based spectral overlap, sequence similarity
  • n_clusters (int) – the number of clusters to generate. If 0, will scan a range of number of clusters and return the best one based on highest silhouette score. Default is 0.
  • linkage (str, list, tuple, ndarray) – if all, will test all linkage types (ward, average, complete, single). Otherwise will use only the one(s) given as input. Default is all.
  • method (str) – if set to spectral, will generate a Kirchoff matrix based on the cutoff value given and use that as input as clustering instead of the values themselves. Default is tsne.
  • cutoff (float) – only used if method is set to spectral. This value is used for generating the Kirchoff matrix to use for generating clusters when doing spectral clustering. Default is 0.0.
calcRMSDclusters(rmsd_matrix, c, labels=None)[source]

Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)

Returns a list of lists with labels divided into clusters.

calcGromosClusters(rmsd_matrix, c, labels=None)

Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)

Returns a list of lists with labels divided into clusters.

calcGromacsClusters(rmsd_matrix, c, labels=None)

Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)

Returns a list of lists with labels divided into clusters.