Release Notes

v2.5.0 series come with new and improved sequence, structure, and dynamics analysis features. See release notes for details.

How to Cite

Bakan A, Meireles LM, Bahar I ProDy: Protein Dynamics Inferred from Theory and Experiments
Bioinformatics 2011 27(11):1575-1577.

Bakan A, Dutta A, Mao W, Liu Y, Chennubhotla C, Lezon TR, Bahar I Evol and ProDy for Bridging Protein Sequence Evolution and Structural Dynamics
Bioinformatics 2014 30(18):2681-2683.

Zhang S, Krieger JM, Zhang Y, Kaya C, Kaynak B, Mikulska-Ruminska K, Doruker P, Li H, Bahar I ProDy 2.0: Increased scale and scope after 10 years of protein dynamics modelling with Python
Bioinformatics 2021 37(20):3657-3659.

Structure Comparison¶

This module defines functions for comparing and mapping polypeptide chains.

matchChains(atoms1, atoms2, **kwargs)[source]¶

Returns pairs of chains matched based on sequence similarity. Makes an all-to-all comparison of chains in atoms1 and atoms2. Chains are obtained from hierarchical views (HierView) of atom groups. This function returns a list of matching chains in a tuple that contain 4 items:

matching chain from atoms1 as a AtomMap instance,

matching chain from atoms2 as a AtomMap instance,

percent sequence identity of the match,

percent sequence overlap of the match.

List of matches are sorted in decreasing percent sequence identity order. AtomMap instances can be used to calculate RMSD values and superpose atom groups.

Parameters:

Parameters:	atoms1 (`Chain`, `AtomGroup`, `Selection`) – atoms that contain a chain atoms2 (`Chain`, `AtomGroup`, `Selection`) – atoms that contain a chain subset (str) – one of the following well-defined subsets of atoms: `"calpha"` (or `"ca"`), `"backbone"` (or `"bb"`), `"heavy"` (or `"noh"`), or `"all"`, default is `"calpha"` seqid (float) – percent sequence identity, default is 90 overlap (float) – percent overlap, default is 90 pwalign (bool) – perform pairwise sequence alignment

atoms1 (Chain, AtomGroup, Selection) – atoms that contain a chain
atoms2 (Chain, AtomGroup, Selection) – atoms that contain a chain
subset (str) – one of the following well-defined subsets of atoms: "calpha" (or "ca"), "backbone" (or "bb"), "heavy" (or "noh"), or "all", default is "calpha"
seqid (float) – percent sequence identity, default is 90
overlap (float) – percent overlap, default is 90
pwalign (bool) – perform pairwise sequence alignment

If subset is set to calpha or backbone, only alpha carbon atoms or backbone atoms will be paired. If set to all, all atoms common to matched residues will be returned.

This function tries to match chains based on residue numbers and names. All chains in atoms1 is compared to all chains in atoms2. This works well for different structures of the same protein. When it fails, Biopython is used for pairwise sequence alignment, and matching is performed based on the sequence alignment. User can control, whether sequence alignment is performed or not with pwalign keyword. If pwalign=True is passed, pairwise alignment is enforced.

matchAlign(mobile, target, **kwargs)[source]¶

Superpose mobile onto target based on best matching pair of chains. This function uses matchChains() for matching chains and returns a tuple that contains the following items:

mobile after it is superposed,

matching chain from mobile as a AtomMap instance,

matching chain from target as a AtomMap instance,

percent sequence identity of the match,

percent sequence overlap of the match.

Parameters:

Parameters:	mobile (`Chain`, `AtomGroup`, `Selection`) – atoms that contain a protein chain target (`Chain`, `AtomGroup`, `Selection`) – atoms that contain a protein chain tarsel (str) – target atoms that will be used for alignment, default is `'calpha'` allcsets (bool) – align all coordinate sets of mobile, default is True seqid (float) – percent sequence identity, default is 90 overlap (float) – percent overlap, default is 90 pwalign (bool) – perform pairwise sequence alignment

mobile (Chain, AtomGroup, Selection) – atoms that contain a protein chain
target (Chain, AtomGroup, Selection) – atoms that contain a protein chain
tarsel (str) – target atoms that will be used for alignment, default is 'calpha'
allcsets (bool) – align all coordinate sets of mobile, default is True
seqid (float) – percent sequence identity, default is 90
overlap (float) – percent overlap, default is 90
pwalign (bool) – perform pairwise sequence alignment

mapChainOntoChain(mobile, target, **kwargs)[source]¶

Map mobile chain onto target chain. This function returns a mapping that contains 4 items:

Mapped chain as an AtomMap instance,

chain as an AtomMap instance,

Percent sequence identitity,

Percent sequence overlap

Mappings are returned in decreasing percent sequence identity order. AtomMap that keeps mapped atom indices contains dummy atoms in place of unmapped atoms.

Parameters:

Parameters:	mobile (`Chain`) – mobile that will be mapped to the target chain target (`Chain`) – chain to which atoms will be mapped seqid (float) – percent sequence identity, default is 90. Note that this parameter is only effective for sequence alignment overlap (float) – percent overlap with target, default is 70 mapping (list, str, bool) – what method will be used if the trivial mapping based on residue numbers fails. If `"ce"` or `"cealign"`, then the CE structural alignment [IS98] will be performed. It can also be a list of prealigned sequences, a `MSA` instance, or a dict of indices such as that derived from a `DaliRecord`. If set to True then the sequence alignment from Biopython will be used. If set to False, only the trivial mapping will be performed. Default is “auto”, which means try sequence alignment then CE. pwalign (bool) – if True, then pairwise sequence alignment will be performed. If False then a simple mapping will be performed based on residue numbers (as well as insertion codes). This will be overridden by the mapping keyword’s value.

mobile (Chain) – mobile that will be mapped to the target chain
target (Chain) – chain to which atoms will be mapped
seqid (float) – percent sequence identity, default is 90. Note that this parameter is only effective for sequence alignment
overlap (float) – percent overlap with target, default is 70
mapping (list, str, bool) – what method will be used if the trivial mapping based on residue numbers fails. If "ce" or "cealign", then the CE structural alignment [IS98] will be performed. It can also be a list of prealigned sequences, a MSA instance, or a dict of indices such as that derived from a DaliRecord. If set to True then the sequence alignment from Biopython will be used. If set to False, only the trivial mapping will be performed. Default is “auto”, which means try sequence alignment then CE.
pwalign (bool) – if True, then pairwise sequence alignment will be performed. If False then a simple mapping will be performed based on residue numbers (as well as insertion codes). This will be overridden by the mapping keyword’s value.

[IS98]

Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein engineering 1998 11(9):739-47.

mapOntoChain(atoms, chain, **kwargs)[source]¶

Map atoms onto chain. This function is a wrapper of mapChainOntoChain() that manages to map chains onto target chain. The function returns a list of mappings. Each mapping is a tuple that contains 4 items:

Mapped chain as an AtomMap instance,

chain as an AtomMap instance,

Percent sequence identitity,

Percent sequence overlap

Mappings are returned in decreasing percent sequence identity order. AtomMap that keeps mapped atom indices contains dummy atoms in place of unmapped atoms.

Parameters:	atoms (`Chain`, `AtomGroup`, `Selection`) – atoms that will be mapped to the target chain chain (`Chain`) – chain to which atoms will be mapped subset (str) – one of the following well-defined subsets of atoms: `"calpha"` (or `"ca"`), `"backbone"` (or `"bb"`), `"heavy"` (or `"noh"`), or `"all"`, default is `"calpha"`

See mapChainOntoChain() for other keyword arguments. This function tries to map atoms to chain based on residue numbers and types. Each individual chain in atoms is compared to target chain.

[IS98]

Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein engineering 1998 11(9):739-47.

alignChains(atoms, target, match_func=<function bestMatch>, **kwargs)[source]¶: Aligns chains of atoms to those of target using mapOntoChains() and combineAtomMaps(). Please check out those two functions for details about the parameters.

mapOntoChains(atoms, ref, match_func=<function bestMatch>, **kwargs)[source]¶

This function is a generalization and wrapper of mapOntoChain() that manages to map chains onto chains (instead of a single chain).

Parameters:	atoms (`Atomic`) – atoms to map onto the reference ref (`Atomic`) – reference structure for mapping match_func (func) – function determines which chains from `ref` and `atoms` are matched. Default is to use the best match.

bestMatch(chain1, chain2)[source]¶

sameChid(chain1, chain2)[source]¶

userDefined(chain1, chain2, correspondence)[source]¶

sameChainPos(chain1, chain2)[source]¶

mapOntoChainByAlignment(atoms, chain, **kwargs)[source]¶

This function is similar to mapOntoChain() but correspondence of chains is found by alignment provided.

Parameters:	alignments (list, dict, `MSA`) – A list of predefined alignments. It can be also a dictionary or `MSA` instance where the keys or labels are the title of atoms or chains.

getMatchScore()[source]¶: Returns match score used to align sequences.

setMatchScore(match_score)[source]¶: Set match score used to align sequences.

getMismatchScore()[source]¶: Returns mismatch score used to align sequences.

setMismatchScore(mismatch_score)[source]¶: Set mismatch score used to align sequences.

getGapPenalty()[source]¶: Returns gap opening penalty used for pairwise alignment.

setGapPenalty(gap_penalty)[source]¶: Set gap opening penalty used for pairwise alignment.

getGapExtPenalty()[source]¶: Returns gap extension penalty used for pairwise alignment.

setGapExtPenalty(gap_ext_penalty)[source]¶: Set gap extension penalty used for pairwise alignment.

getGoodSeqId()[source]¶: Returns good sequence identity.

setGoodSeqId(seqid)[source]¶: Set good sequence identity.

getGoodCoverage()[source]¶: Returns good sequence coverage.

combineAtomMaps(mappings, target=None, **kwargs)[source]¶

Builds a grand AtomMap instance based on mappings obtained from mapOntoChains(). The function also accepts the output mapOntoChain() but will trivially return all the AtomMap in mappings. mappings should be a list or an array of matching chains in a tuple that contain 4 items:

matching chain from atoms1 as a AtomMap instance,

matching chain from atoms2 as a AtomMap instance,

percent sequence identity of the match,

percent sequence overlap of the match.

Parameters:

Parameters:	mappings (tuple, list, `ndarray`) – a list or an array of matching chains in a tuple, or just the tuple target (`Atomic`) – reference structure for superposition and checking RMSD drmsd (float) – amount deviation of the RMSD with respect to the top ranking atommap. This is to allow multiple matches when mobile has more chains than target. Default is 3.0 rmsd_reject (float) – upper RMSD cutoff that rejects an atommap. Default is 15.0 least (int) – the least number of atommaps requested. If None, it will be automatically determined by the number of chains present in target and mobile. Default is None debug (dict) – a container (dict) that saves the following information for debugging purposes: * coverage: original coverage matrix, rows and columns correspond to the reference and the mobile, respectively, * solutions: matched index groups that obtained by modeling the coverage matrix as a linear assignment problem, * rmsd: a list of ranked RMSDs of identified atommaps.

mappings (tuple, list, ndarray) – a list or an array of matching chains in a tuple, or just the tuple
target (Atomic) – reference structure for superposition and checking RMSD
drmsd (float) – amount deviation of the RMSD with respect to the top ranking atommap. This is to allow multiple matches when mobile has more chains than target. Default is 3.0
rmsd_reject (float) – upper RMSD cutoff that rejects an atommap. Default is 15.0
least (int) – the least number of atommaps requested. If None, it will be automatically determined by the number of chains present in target and mobile. Default is None
debug (dict) – a container (dict) that saves the following information for debugging purposes: * coverage: original coverage matrix, rows and columns correspond to the reference and the mobile, respectively, * solutions: matched index groups that obtained by modeling the coverage matrix as a linear assignment problem, * rmsd: a list of ranked RMSDs of identified atommaps.

setGoodCoverage(coverage)[source]¶: Set good sequence coverage.

getAlignmentMethod()[source]¶: Returns pairwise alignment method.

setAlignmentMethod(method)[source]¶: Set pairwise alignment method (global or local).