Additional utilities¶
This module defines miscellaneous utility functions that is public to users.
-
calcTree
(names, distance_matrix, method='upgma', linkage=False)[source]¶ Given a distance matrix, it creates an returns a tree structure.
Parameters: - names (list,
ndarray
) – a list of names - distance_matrix (
ndarray
) – a square matrix with length of ensemble. If numbers does not match names it will raise an error - method (str) – method used for constructing the tree. Acceptable options are
"upgma"
,"nj"
, or methods supported bylinkage()
such as"single"
,"average"
,"ward"
, etc. Default is"upgma"
- linkage (bool) – whether the linkage matrix is returned. Note that NJ trees do not support linkage
- names (list,
-
clusterMatrix
(distance_matrix=None, similarity_matrix=None, labels=None, return_linkage=None, **kwargs)[source]¶ Cluster a distance matrix using scipy.cluster.hierarchy and return the sorted matrix, indices used for sorting, sorted labels (if labels are passed), and linkage matrix (if return_linkage is True).
Parameters: - distance_matrix (
ndarray
) – an N-by-N matrix containing some measure of distance such as 1. - seqid_matrix (Hamming distance), rmsds, or distances in PCA space - similarity_matrix (
ndarray
) – an N-by-N matrix containing some measure of similarity such as sequence identity, mode-mode overlap, or spectral overlap. Each element will be subtracted from 1. to get distance, so make sure this is reasonable. - labels (list) – labels for each matrix row that can be returned sorted
- no_plot (bool) – if True, don’t plot the dendrogram. default is True
- reversed (bool) – if set to True, then the sorting indices will be reversed.
Other arguments for
linkage()
anddendrogram()
can also be provided and will be taken as kwargs.- distance_matrix (
-
showLines
(*args, **kwargs)[source]¶ Show 1-D data using
plot()
.Parameters: - x (
ndarray
) – (optional) x coordinates. x can be an 1-D array or a 2-D matrix of column vectors. - y (
ndarray
) – data array. y can be an 1-D array or a 2-D matrix of column vectors. - dy (
ndarray
) – an array of variances of y which will be plotted as a band along y. It should have the same shape with y. - lower (
ndarray
) – an array of lower bounds which will be plotted as a band along y. It should have the same shape with y and should be paired with upper. - upper (
ndarray
) – an array of upper bounds which will be plotted as a band along y. It should have the same shape with y and should be paired with lower. - alpha (float) – the transparency of the band(s) for plotting dy.
- beta (float) – the transparency of the band(s) for plotting miny and maxy.
- ticklabels (list) – user-defined tick labels for x-axis.
- x (
-
showMatrix
(matrix, x_array=None, y_array=None, **kwargs)[source]¶ Show a matrix using
imshow()
orscatter()
if markersize is provided.Curves on x- and y-axis can be added.
Parameters: - matrix (
ndarray
) – matrix to be displayed - x_array (
ndarray
) – data to be plotted above the matrix - y_array (
ndarray
) – data to be plotted on the left side of the matrix - percentile (float) – a percentile threshold to remove outliers, i.e. only showing data within p-th to 100-p-th percentile
- vmin (float) – a minimum value threshold to remove outliers, i.e. only showing data greater than vmin This overrides percentile.
- vmax (float) – a maximum value threshold to remove outliers, i.e. only showing data less than vmax This overrides percentile.
- interactive (bool) – turn on or off the interactive options
- xtickrotation (float) – how much to rotate the xticklabels in degrees default is 0
- markersize (float) – size of square markers for using
scatter()
to help show matrices with small data regions compared to zeros. Note only non-zeros are plotted so the colorbar range may change if not using norm Default is None, which results in usingimshow()
- matrix (
-
reorderMatrix
(names, matrix, tree, axis=None)[source]¶ Reorder a matrix based on a tree and return the reordered matrix and indices for reordering other things.
Parameters: - names (list) – a list of names associated with the rows of the matrix These names must match the ones used to generate the tree
- matrix (
ndarray
) – any square matrix - tree (
Tree
) – any tree fromcalcTree()
- axis (int) – along which axis the matrix should be reordered. Default is None which reorder along all the axes
-
findSubgroups
(tree, c, method='naive', **kwargs)[source]¶ Divide tree into subgroups using a criterion method and a cutoff c. Returns a list of lists with labels divided into subgroups.
-
getCoords
(data)[source]¶ Get coordinates from data if possible and handle errors well.
Parameters: data ( numpy.ndarray
,Atomic
,Ensemble
,Trajectory
) – a coordinate set or an object withgetCoords
method
-
getLinkage
(names, tree)[source]¶ Obtain the
linkage()
matrix encodingtree
.Parameters: - names (list,
ndarray
) – a list of names, the order determines the values in the linkage matrix - tree (
Tree
) – tree to be converted
- names (list,
-
getTreeFromLinkage
(names, linkage)[source]¶ Obtain the tree encoded by
linkage
.Parameters: - names (list,
ndarray
) – a list of names, the order should correspond to the values in linkage - linkage (
ndarray
) – linkage matrix
- names (list,
-
clusterSubfamilies
(similarities, n_clusters=0, linkage='all', method='tsne', cutoff=0.0, **kwargs)[source]¶ Perform clustering based on members of the ensemble projected into lower a reduced dimension.
Parameters: - similarities (
ndarray
) – a matrix of similarities for each structure in the ensemble, such as RMSD-matrix, dynamics-based spectral overlap, sequence similarity - n_clusters (int) – the number of clusters to generate. If 0, will scan a range of number of clusters and return the best one based on highest silhouette score. Default is 0.
- linkage (str, list, tuple,
ndarray
) – if all, will test all linkage types (ward, average, complete, single). Otherwise will use only the one(s) given as input. Default is all. - method (str) – if set to spectral, will generate a Kirchoff matrix based on the cutoff value given and use that as input as clustering instead of the values themselves. Default is tsne.
- cutoff (float) – only used if method is set to spectral. This value is used for generating the Kirchoff matrix to use for generating clusters when doing spectral clustering. Default is 0.0.
- similarities (
-
calcRMSDclusters
(rmsd_matrix, c, labels=None)[source]¶ Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)
Returns a list of lists with labels divided into clusters.
-
calcGromosClusters
(rmsd_matrix, c, labels=None)¶ Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)
Returns a list of lists with labels divided into clusters.
-
calcGromacsClusters
(rmsd_matrix, c, labels=None)¶ Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)
Returns a list of lists with labels divided into clusters.