Atom Flags¶
This module defines atom flags that are used in Atom Selections.
You can read this page in interactive sessions using help(flags)
.
Flag labels can be used in atom selections:
In [1]: from prody import *
In [2]: p = parsePDB('1ubi')
In [3]: p.select('protein')
Out[3]: <Selection: 'protein' from 1ubi (602 atoms)>
Flag labels can be combined with dot operator as follows to make selections:
In [4]: p.protein
Out[4]: <Selection: 'protein' from 1ubi (602 atoms)>
In [5]: p.protein.acidic # selects acidic residues
Out[5]: <Selection: '(acidic) and (protein)' from 1ubi (94 atoms)>
Flag labels can be prefixed with 'is'
to check whether all atoms in
an Atomic
instance are flagged the same way:
In [6]: p.protein.ishetero
Out[6]: False
In [7]: p.water.ishetero
Out[7]: True
Flag labels can also be used to make quick atom counts:
In [8]: p.numAtoms()
Out[8]: 683
In [9]: p.numAtoms('protein')
Out[9]: 602
In [10]: p.numAtoms('water')
Out[10]: 81
Protein¶
- protein
aminoacid - indicates the twenty standard amino acids (stdaa) and some
non-standard amino acids (nonstdaa) described below. Residue
must also have an atom named
'CA'
in addition to having a qualifying residue name. - stdaa
- indicates the standard amino acid residues: ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, and VAL
- nonstdaa
indicates one of the following residues:
ASX (B) asparagine or aspartic acid GLX (Z) glutamine or glutamic acid CSO (C) S-hydroxycysteine HIP (H) ND1-phosphohistidine HSD (H) prototropic tautomer of histidine, H on ND1 (CHARMM) HSE (H) prototropic tautomer of histidine, H on NE2 (CHARMM) HSP (H) protonated histidine MSE selenomethionine SEC (U) selenocysteine SEP (S) phosphoserine TPO (T) phosphothreonine PTR (Y) O-phosphotyrosine XLE (J) leucine or isoleucine XAA (X) unspecified or unknown You can modify the list of non-standard amino acids using
addNonstdAminoacid()
,delNonstdAminoacid()
, andlistNonstdAAProps()
.- calpha
ca - Cα atoms of protein residues, same as selection
'name CA and protein'
- backbone
bb - non-hydrogen backbone atoms of protein residues, same as
selection
'name CA C O N and protein'
- backbonefull
bbfull - backbone atoms of protein residues, same as selection
'name CA C O N H H1 H2 H3 OXT and protein'
- sidechain
sc - side-chain atoms of protein residues, same as selection
'protein and not backbonefull'
- acidic
- residues ASP, GLU, HSP, PHD, PTR, SEP, TPO
- acyclic
- residues ALA, ARG, ASN, ASP, ASX, CME, CSO, CYS, CYX, GLN, GLU, GLX, LY, ILE, LEU, LYS, MET, MSE, PHD, SEC, SEP, SER, THR, TPO, VAL, LE
- aliphatic
- residues ALA, GLY, ILE, LEU, PRO, VAL, XLE
- aromatic
- residues HIS, PHE, PTR, TRP, TYR
- basic
- residues ARG, HID, HIE, HIP, HIS, HSD, HSE, LYS
- buried
- residues ALA, CME, CYS, CYX, ILE, LEU, MET, MSE, PHE, SEC, TRP, VAL, LE
- charged
- residues ARG, ASP, GLU, HIS, LYS
- cyclic
- residues HID, HIE, HIP, HIS, HSD, HSE, HSP, PHE, PRO, PTR, TRP, TYR
- hydrophobic
- residues ALA, ILE, LEU, MET, PHE, PRO, TRP, VAL, XLE
- large
- residues ARG, CME, GLN, GLU, GLX, HID, HIE, HIP, HIS, HSD, HSE, HSP, LE, LEU, LYS, MET, MSE, PHD, PHE, PTR, SEP, TPO, TRP, TYR, XLE
- medium
- residues ASN, ASP, ASX, CSO, CYS, CYX, PRO, SEC, THR, VAL
- neutral
- residues ALA, ASN, CME, CSO, CYS, CYX, GLN, GLY, ILE, LEU, MET, MSE, HE, PRO, SEC, SER, THR, TRP, TYR, VAL
- polar
- residues ARG, ASN, ASP, ASX, CSO, CYS, CYX, GLN, GLU, GLX, GLY, HID, IE, HIP, HIS, HSD, HSE, HSP, LYS, PHD, PTR, SEC, SEP, SER, THR, PO, TYR
- small
- residues ALA, GLY, SER
- surface
- residues ARG, ASN, ASP, ASX, CSO, GLN, GLU, GLX, GLY, HID, HIE, HIP, IS, HSD, HSE, HSP, LYS, PHD, PRO, PTR, SEP, SER, THR, TPO, TYR
Nucleic¶
- nucleic
- indicates nucleobase, nucleotide, and some
nucleoside derivatives that are described below, so it is same
as
'nucleobase or nucleotide or nucleoside'
. - nucleobase
- indicates ADE (adenine), GUN (guanine), CYT (cytosine), THY (thymine), and URA (uracil).
- nucleotide
indicates residues with the following names:
DA 2’-deoxyadenosine-5’-monophosphate DC 2’-deoxycytidine-5’-monophosphate DG 2’-deoxyguanosine-5’-monophosphate DT 2’-deoxythymidine-5’-monophosphate DU 2’-deoxyuridine-5’-monophosphate A adenosine-5’-monophosphate C cytidine-5’-monophosphate G guanosine-5’-monophosphate T 2’-deoxythymidine-5’-monophosphate U uridine-5’-monophosphate - nucleoside
indicates following nucleosides and their derivatives that are recognized by PDB:
ADN adenosine AMP adenosine-5’-monophosphate ADP adenosine-5’-diphosphate ATP adenosine-5’-triphosphate AGS adenosine-5’-triphosphate-gamma-S CMP cyclic adenosine-3’,5’-monophosphate A2P adenosine-2’,5’-diphosphate A3P adenosine-3’,5’-diphosphate CTN cytidine C2P cytidine-2’-monophosphate C3P cytidine-3’-monophosphate C5P cytidine-5’-monophosphate CDP cytidine-5’-diphosphate CTP cytidine-5’-triphosphate GMP guanosine 5GP guanosine-5’-monophosphate GDP guanosine-5’-diphosphate GTP guanosine-5’-triphosphate THM thymidine TMP thymidine-5’-monophosphate TPP thymidine-5’-diphosphate TTP thymidine-5’-triphosphate URI uridine (uracil plus ribose) UMP 2’-deoxyuridine 5’-monophosphate UDP uridine 5’-diphosphate UTP uridine 5’-triphosphate - at
- same as selection
'resname ADE A THY T'
- cg
- same as selection
'resname CYT C GUN G'
- purine
- same as selection
'resname ADE A GUN G'
- pyrimidine
- same as selection
'resname CYT C THY T URA U'
Heteros¶
- hetero
- indicates anything other than a protein or a
nucleic residue, i.e.
'not (protein or nucleic)'
. - hetatm
- is available when atomic data is parsed from a PDB or similar
format file and indicates atoms that are marked
'HETATM'
in the file. - water
indices HOH and DOD recognized by PDB and also WAT, TIP3, H2O, OH2, TIP, TIP2, and TIP4 recognized by molecular dynamics (MD) force fields.
Previously used water types HH0, OHH, and SOL conflict with other compounds in the PDB, so are removed from the definition of this flag.
- ion
indicates the following ions most of which are recognized by the PDB and others by MD force fields.
PDB Source Conflict AL aluminum Yes BA barium Yes CA calcium Yes CD cadmium Yes CL chloride Yes CO cobalt (ii) Yes CS cesium Yes CU copper (ii) Yes CU1 copper (i) Yes CUA dinuclear copper Yes HG mercury (ii) Yes IN indium (iii) Yes IOD iodide Yes K potassium Yes MG magnesium Yes MN3 manganese (iii) Yes MN manganese (ii) Yes NA sodium Yes PB lead (ii) Yes PT platinum (ii) Yes RB rubidium Yes TB terbium (iii) Yes TL thallium (i) Yes WO4 thungstate (vi) Yes YB ytterbium (iii) Yes ZN zinc Yes CAL calcium No CHARMM Yes CES cesium No CHARMM Yes CLA chloride No CHARMM Yes POT potassium No CHARMM Yes SOD sodium No CHARMM Yes ZN2 zinc No CHARMM No Ion identifiers that are obsoleted by PDB (MO3, MO4, MO5, MO6, NAW, OC7, and ZN1) are removed from this definition.
- lipid
- indicates GPE, LPP, OLA, SDS, and STE from PDB, and also POPC, LPPC, POPE, DLPE, PCGL, STEA, PALM, OLEO, DMPC from CHARMM force field.
- sugar
- indicates BGC, GLC, and GLO from PDB, and also AGLC from CHARMM.
- heme
- indicates 1FH, 2FH, DDH, DHE, HAS, HDD, HDE, HDM, HEA, HEB, HEC, HEM, HEO, HES, HEV, NTE, SRM, and VER from PDB, and also HEMO and HEMR from CHARMM.
- pdbter
- is available when atomic data is parsed from a PDB format file and
indicates atoms that were followed by
'TER'
record. - selpdbter
- is available when atomic data is parsed from a PDB format file and
then a selection is made and indicates selected atoms that should
be followed by
'TER'
record.
Elements¶
Following elements found in proteins are recognized by applying regular expressions to atom names:
- carbon
- carbon atoms, same as
'name "C.*" and not ion'
- nitrogen
- nitrogen atoms, same as
'name "N.*" and not ion'
- oxygen
- oxygen atoms, same as
'name "O.*" and not ion'
- sulfur
- sulfur atoms, same as
'name "S.*" and not ion'
- hydrogen
- hydrogen atoms, same as
'name "[1-9]?H.*" and not ion'
- noh
heavy - non hydrogen atoms, same as
'not hydrogen
'not ion'
is appended to above definitions to avoid conflicts with
ion atoms.
Structure¶
Following secondary structure flags are defined but before they can be used, secondary structure assignments must be made.
- extended
- extended conformation, same as
'secondary E'
- helix
- α-helix conformation, same as
'secondary H'
- helix310
- 3_10-helix conformation, same as
'secondary G'
- helixpi
- π-helix conformation, same as
'secondary I'
- turn
- hydrogen bonded turn conformation, same as
'secondary T'
- bridge
- isolated beta-bridge conformation, same as
'secondary B'
- bend
- bend conformation, same as
'secondary S'
- coil
- not in one of above conformations, same as
'secondary C'
Others¶
- all
- indicates all atoms, returns a new view of the instance
- none
- indicates no atoms, returns None
- dummy
- indicates dummy atoms in an
AtomMap
- mapped
- indicates mapped atoms in an
AtomMap
Functions¶
The following functions can be used to customize flag definitions:
-
flagDefinition
(*arg, **kwarg)[source]¶ Learn, change, or reset Atom Flags definitions.
Learn a definition
Calling this function with no arguments will return list of flag names whose definitions you can learn:
In [1]: flagDefinition() Out[1]: ['acidic', 'acyclic', 'aliphatic', 'aminoacid', 'aromatic', 'at', 'backbone', 'backbonefull', 'basic', 'bb', 'bbfull', 'buried', 'carbon', 'cg', 'charged', 'cyclic', 'heme', 'hydrogen', 'hydrophobic', 'ion', 'large', 'lipid', 'medium', 'neutral', 'nitrogen', 'nonstdaa', 'nucleic', 'nucleobase', 'nucleoside', 'nucleotide', 'oxygen', 'polar', 'protein', 'purine', 'pyrimidine', 'small', 'stdaa', 'sugar', 'sulfur', 'surface', 'water']
Passing a flag name will return its definition:
In [2]: flagDefinition('backbone') Out[2]: ['C', 'CA', 'N', 'O'] In [3]: flagDefinition('hydrogen') Out[3]: '[0-9]?H.*'
Change a definition
Calling the function with
editable=True
argument will return flag names those definitions that can be edited:In [4]: flagDefinition(editable=True) Out[4]: ['at', 'backbone', 'backbonefull', 'bb', 'bbfull', 'carbon', 'cg', 'heme', 'hydrogen', 'ion', 'lipid', 'nitrogen', 'nucleobase', 'nucleoside', 'nucleotide', 'oxygen', 'purine', 'pyrimidine', 'sugar', 'sulfur', 'water']
Pass an editable flag name with its new definition:
In [5]: flagDefinition(nitrogen='N.*') In [6]: flagDefinition(backbone=['CA', 'C', 'O', 'N']) In [7]: flagDefinition(nucleobase=['ADE', 'CYT', 'GUN', 'THY', 'URA'])
Note that the type of the new definition must be the same as the type of the old definition. Flags with editable definitions are: at, backbone, backbonefull, bb, bbfull, carbon, cg, heme, hydrogen, ion, lipid, nitrogen, nucleobase, nucleoside, nucleotide, oxygen, purine, pyrimidine, sugar, sulfur, and water
Reset definitions
Pass reset keyword as follows to restore all default definitions of editable flags and also non-standard amino acids.
In [8]: flagDefinition(reset='all')
Or, pass a specific editable flag label to restore its definition:
In [9]: flagDefinition(reset='nitrogen')
-
listNonstdAAProps
(resname)[source]¶ Returns properties of non-standard amino acid resname.
In [1]: listNonstdAAProps('PTR') Out[1]: ['acidic', 'aromatic', 'cyclic', 'large', 'polar', 'surface']
-
getNonstdProperties
(resname)[source]¶ Deprecated for removal in v1.4, use
listNonstdAAProps()
instead.
-
addNonstdAminoacid
(resname, *properties)[source]¶ Add non-standard amino acid resname with properties selected from:
In [1]: addNonstdAminoacid('PTR', 'acidic', 'aromatic', 'cyclic', 'large', ...: 'polar', 'surface') ...:
Default set of non-standard amino acids can be restored as follows:
In [2]: flagDefinition(reset='nonstdaa')
-
delNonstdAminoacid
(resname)[source]¶ Delete non-standard amino acid resname.
In [1]: delNonstdAminoacid('PTR') In [2]: flagDefinition('nonstdaa') Out[2]: ['ASX', 'CME', 'CSB', 'CSO', 'CYX', 'GLX', 'HID', 'HIE', 'HIP', 'HSD', 'HSE', 'HSP', 'MEN', 'MSE', 'PHD', 'SEC', 'SEP', 'TPO', 'XAA', 'XLE']
Default set of non-standard amino acids can be restored as follows:
In [3]: flagDefinition(reset='nonstdaa')