Atom Selections¶
This part gives more information on properties of AtomGroup
objects.
We start with making necessary imports. Note that every documentation page
contains them so that the code within the can be executed independently.
You can skip them if you have already done them in a Python session.
In [1]: from prody import *
In [2]: from pylab import *
In [3]: ion()
Atom Selections¶
AtomGroup
instances have a plain view of atoms for efficiency,
but they are coupled with a powerful atom selection engine. You can get well
defined atom subsets by passing simple keywords or make rather sophisticated
selections using composite statements. Selection keywords and grammar are very
much similar to those found in VMD. Some examples are shown here:
Keyword selections¶
Now, we parse a structure. This could be any structure, one that you know well from your research, for example.
In [4]: structure = parsePDB('5uoj')
In [5]: protein = structure.select('protein')
In [6]: protein
Out[6]: <Selection: 'protein' from 5uoj (2763 atoms)>
Using the "protein"
keyword we selected 2833 atoms out of 2962 atoms.
Atomic.select()
method returned a Selection
instance.
Note that all get
and set
methods defined for the AtomGroup
objects are also defined for Selection
objects. For example:
In [7]: protein.getResnames()
Out[7]: array(['ARG', 'ARG', 'ARG', ..., 'LEU', 'LEU', 'LEU'], dtype='|S6')
Select by name/type¶
We can select backbone atoms by passing atom names following "name"
keyword:
In [8]: backbone = structure.select('protein and name N CA C O')
In [9]: backbone
Out[9]: <Selection: 'protein and name N CA C O' from 5uoj (1372 atoms)>
Alternatively, we can use "backbone"
to make the same selection:
In [10]: backbone = structure.select('backbone')
We select acidic and basic residues by using residue names with
"resname"
keyword:
In [11]: charged = structure.select('resname ARG LYS HIS ASP GLU')
In [12]: charged
Out[12]: <Selection: 'resname ARG LYS HIS ASP GLU' from 5uoj (843 atoms)>
Alternatively, we can use predefined keywords “acidic” and “basic”.
In [13]: charged = structure.select('acidic or basic')
In [14]: charged
Out[14]: <Selection: 'acidic or basic' from 5uoj (843 atoms)>
In [15]: set(charged.getResnames())
Out[15]: {'ARG', 'ASP', 'GLU', 'HIS', 'LYS'}
Composite selections¶
Let’s try a more sophisticated selection. We first calculate the geometric
center of the protein atoms using calcCenter()
function. Then, we
select the Cα and Cβ atoms of residues that have at least one atom within
10 A away from the geometric center.
In [16]: center = calcCenter(protein).round(3)
In [17]: center
Out[17]: array([ 0.918, 17.423, 40.248])
In [18]: sel = structure.select('protein and name CA CB and same residue as '
....: '((x-1)**2 + (y-17.5)**2 + (z-40.0)**2)**0.5 < 10')
....:
In [19]: sel
Out[19]: <Selection: 'protein and nam...)**2)**0.5 < 10' from 5uoj (70 atoms)>
Alternatively, this selection could be done as follows:
In [20]: sel = structure.select('protein and name CA CB and same residue as '
....: 'within 10 of center', center=center)
....:
In [21]: sel
Out[21]: <Selection: 'index 567 570 5... 1631 1645 1648' from 5uoj (70 atoms)>
Selections simplified¶
In interactive sessions, an alternative to typing in .select('protein')
or .select('backbone')
is using dot operator:
In [22]: protein = structure.protein
In [23]: protein
Out[23]: <Selection: 'protein' from 5uoj (2763 atoms)>
You can use dot operator multiple times:
In [24]: bb = structure.protein.backbone
In [25]: bb
Out[25]: <Selection: '(backbone) and (protein)' from 5uoj (1372 atoms)>
This may go on and on:
In [26]: ala_ca = structure.protein.backbone.resname_ALA.calpha
In [27]: ala_ca
Out[27]: <Selection: '(calpha) and ((...and (protein)))' from 5uoj (26 atoms)>
More examples¶
There is much more to what you can do with this flexible and fast atom selection engine, without the need for writing nested loops with comparisons or changing the source code. See the following pages:
- Atom Selections for description of all selection keywords
- Intermolecular Contacts for selecting interacting atoms
Operations on Selections¶
Selection
objects can used with bitwise operators:
Union¶
Let’s select β-carbon atoms for non-GLY amino acid residues, and α-carbons for GLYs in two steps:
In [28]: betas = structure.select('name CB and protein')
In [29]: len(betas)
Out[29]: 328
In [30]: gly_alphas = structure.select('name CA and resname GLY')
In [31]: len(gly_alphas)
Out[31]: 15
The above shows that the p38 structure contains 15 GLY residues.
These two selections can be combined as follows:
In [32]: betas_gly_alphas = betas | gly_alphas
In [33]: betas_gly_alphas
Out[33]: <Selection: '(name CB and pr...nd resname GLY)' from 5uoj (343 atoms)>
In [34]: len(betas_gly_alphas)
Out[34]: 343
The selection string for the union of selections becomes:
In [35]: betas_gly_alphas.getSelstr()
Out[35]: '(name CB and protein) or (name CA and resname GLY)'
Note that it is also possible to yield the same selection using selection
string (name CB and protein) or (name CA and resname GLY)
.
Intersection¶
It is as easy to get the intersection of two selections. Let’s find charged and medium size residues in a protein:
In [36]: charged = structure.select('charged')
In [37]: charged
Out[37]: <Selection: 'charged' from 5uoj (843 atoms)>
In [38]: medium = structure.select('medium')
In [39]: medium
Out[39]: <Selection: 'medium' from 5uoj (720 atoms)>
In [40]: medium_charged = medium & charged
In [41]: medium_charged
Out[41]: <Selection: '(medium) and (charged)' from 5uoj (192 atoms)>
In [42]: medium_charged.getSelstr()
Out[42]: '(medium) and (charged)'
Let’s see which amino acids are considered charged and medium:
In [43]: set(medium_charged.getResnames())
Out[43]: {'ASP'}
What about amino acids that are medium or charged:
In [44]: set((medium | charged).getResnames())
Out[44]: {'ARG', 'ASN', 'ASP', 'CYS', 'GLU', 'HIS', 'LYS', 'PRO', 'THR', 'VAL'}
Inversion¶
It is also possible to invert a selection:
In [45]: only_protein = structure.select('protein')
In [46]: only_protein
Out[46]: <Selection: 'protein' from 5uoj (2763 atoms)>
In [47]: only_non_protein = ~only_protein
In [48]: only_non_protein
Out[48]: <Selection: 'not (protein)' from 5uoj (375 atoms)>
In [49]: water = structure.select('water')
In [50]: water
Out[50]: <Selection: 'water' from 5uoj (375 atoms)>
The above shows that 5uoj does not contain any non-water hetero atoms.
Addition¶
Another operation defined on the Select
object is addition
(also on other AtomPointer
derived classes).
This may be useful if you want to yield atoms in an AtomGroup
in a
specific order.
Let’s think of a simple case, where we want to output atoms in 5uoj in a
specific order:
In [51]: protein = structure.select('protein')
In [52]: water = structure.select('water')
In [53]: water_protein = water + protein
In [54]: writePDB('5uoj_water_protein.pdb', water_protein)
Out[54]: '5uoj_water_protein.pdb'
In the resulting file, the water atoms will precedes the protein atoms.
Membership¶
Selections also allows membership test operations:
In [55]: backbone = structure.select('protein')
In [56]: calpha = structure.select('calpha')
Is calpha a subset of backbone?
In [57]: calpha in backbone
Out[57]: True
Or, is water in protein selection?
In [58]: water in protein
Out[58]: False
Other tests include:
In [59]: protein in structure
Out[59]: True
In [60]: backbone in structure
Out[60]: True
In [61]: structure in structure
Out[61]: True
In [62]: calpha in calpha
Out[62]: True
Equality¶
You can also check the equality of selections. Comparison will return
True
if both selections refer to the same atoms.
In [63]: calpha = structure.select('protein and name CA')
In [64]: calpha2 = structure.select('calpha')
In [65]: calpha == calpha2
Out[65]: True