Additional utilities¶
This module defines miscellaneous utility functions that is public to users.
-
calcTree(names, distance_matrix, method='upgma', linkage=False)[source]¶ Given a distance matrix, it creates an returns a tree structure.
Parameters: - names (list,
ndarray) – a list of names - distance_matrix (
ndarray) – a square matrix with length of ensemble. If numbers does not match names it will raise an error - method (str) – method used for constructing the tree. Acceptable options are
"upgma","nj", or methods supported bylinkage()such as"single","average","ward", etc. Default is"upgma" - linkage (bool) – whether the linkage matrix is returned. Note that NJ trees do not support linkage
- names (list,
-
clusterMatrix(distance_matrix=None, similarity_matrix=None, labels=None, return_linkage=None, **kwargs)[source]¶ Cluster a distance matrix using scipy.cluster.hierarchy and return the sorted matrix, indices used for sorting, sorted labels (if labels are passed), and linkage matrix (if return_linkage is True).
Parameters: - distance_matrix (
ndarray) – an N-by-N matrix containing some measure of distance such as 1. - seqid_matrix (Hamming distance), rmsds, or distances in PCA space - similarity_matrix (
ndarray) – an N-by-N matrix containing some measure of similarity such as sequence identity, mode-mode overlap, or spectral overlap. Each element will be subtracted from 1. to get distance, so make sure this is reasonable. - labels (list) – labels for each matrix row that can be returned sorted
- no_plot (bool) – if True, don’t plot the dendrogram. default is True
- reversed (bool) – if set to True, then the sorting indices will be reversed.
Other arguments for
linkage()anddendrogram()can also be provided and will be taken as kwargs.- distance_matrix (
-
showLines(*args, **kwargs)[source]¶ Show 1-D data using
plot().Parameters: - x (
ndarray) – (optional) x coordinates. x can be an 1-D array or a 2-D matrix of column vectors. - y (
ndarray) – data array. y can be an 1-D array or a 2-D matrix of column vectors. - dy (
ndarray) – an array of variances of y which will be plotted as a band along y. It should have the same shape with y. - lower (
ndarray) – an array of lower bounds which will be plotted as a band along y. It should have the same shape with y and should be paired with upper. - upper (
ndarray) – an array of upper bounds which will be plotted as a band along y. It should have the same shape with y and should be paired with lower. - alpha (float) – the transparency of the band(s) for plotting dy.
- beta (float) – the transparency of the band(s) for plotting miny and maxy.
- ticklabels (list) – user-defined tick labels for x-axis.
- x (
-
showMatrix(matrix, x_array=None, y_array=None, **kwargs)[source]¶ Show a matrix using
imshow()orscatter()if markersize is provided.Curves on x- and y-axis can be added.
Parameters: - matrix (
ndarray) – matrix to be displayed - x_array (
ndarray) – data to be plotted above the matrix - y_array (
ndarray) – data to be plotted on the left side of the matrix - percentile (float) – a percentile threshold to remove outliers, i.e. only showing data within p-th to 100-p-th percentile
- vmin (float) – a minimum value threshold to remove outliers, i.e. only showing data greater than vmin This overrides percentile.
- vmax (float) – a maximum value threshold to remove outliers, i.e. only showing data less than vmax This overrides percentile.
- interactive (bool) – turn on or off the interactive options
- xtickrotation (float) – how much to rotate the xticklabels in degrees default is 0
- markersize (float) – size of square markers for using
scatter()to help show matrices with small data regions compared to zeros. Note only non-zeros are plotted so the colorbar range may change if not using norm Default is None, which results in usingimshow()
- matrix (
-
reorderMatrix(names, matrix, tree, axis=None)[source]¶ Reorder a matrix based on a tree and return the reordered matrix and indices for reordering other things.
Parameters: - names (list) – a list of names associated with the rows of the matrix These names must match the ones used to generate the tree
- matrix (
ndarray) – any square matrix - tree (
Tree) – any tree fromcalcTree() - axis (int) – along which axis the matrix should be reordered. Default is None which reorder along all the axes
-
findSubgroups(tree, c, method='naive', **kwargs)[source]¶ Divide tree into subgroups using a criterion method and a cutoff c. Returns a list of lists with labels divided into subgroups.
-
getCoords(data)[source]¶ Get coordinates from data if possible and handle errors well.
Parameters: data ( numpy.ndarray,Atomic,Ensemble,Trajectory) – a coordinate set or an object withgetCoordsmethod
-
getLinkage(names, tree)[source]¶ Obtain the
linkage()matrix encodingtree.Parameters: - names (list,
ndarray) – a list of names, the order determines the values in the linkage matrix - tree (
Tree) – tree to be converted
- names (list,
-
getTreeFromLinkage(names, linkage)[source]¶ Obtain the tree encoded by
linkage.Parameters: - names (list,
ndarray) – a list of names, the order should correspond to the values in linkage - linkage (
ndarray) – linkage matrix
- names (list,
-
clusterSubfamilies(similarities, n_clusters=0, linkage='all', method='tsne', cutoff=0.0, **kwargs)[source]¶ Perform clustering based on members of the ensemble projected into lower a reduced dimension.
Parameters: - similarities (
ndarray) – a matrix of similarities for each structure in the ensemble, such as RMSD-matrix, dynamics-based spectral overlap, sequence similarity - n_clusters (int) – the number of clusters to generate. If 0, will scan a range of number of clusters and return the best one based on highest silhouette score. Default is 0.
- linkage (str, list, tuple,
ndarray) – if all, will test all linkage types (ward, average, complete, single). Otherwise will use only the one(s) given as input. Default is all. - method (str) – if set to spectral, will generate a Kirchoff matrix based on the cutoff value given and use that as input as clustering instead of the values themselves. Default is tsne.
- cutoff (float) – only used if method is set to spectral. This value is used for generating the Kirchoff matrix to use for generating clusters when doing spectral clustering. Default is 0.0.
- similarities (
-
calcRMSDclusters(rmsd_matrix, c, labels=None)[source]¶ Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)
Returns a list of lists with labels divided into clusters.
-
calcGromosClusters(rmsd_matrix, c, labels=None)¶ Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)
Returns a list of lists with labels divided into clusters.
-
calcGromacsClusters(rmsd_matrix, c, labels=None)¶ Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)
Returns a list of lists with labels divided into clusters.