Atom Flags

This module defines atom flags that are used in Atom Selections. You can read this page in interactive sessions using help(flags).

Flag labels can be used in atom selections:

In [1]: from prody import *

In [2]: p = parsePDB('1ubi')

In [3]: p.select('protein')
Out[3]: <Selection: 'protein' from 1ubi (602 atoms)>

Flag labels can be combined with dot operator as follows to make selections:

In [4]: p.protein
Out[4]: <Selection: 'protein' from 1ubi (602 atoms)>

In [5]: p.protein.acidic  # selects acidic residues
Out[5]: <Selection: '(acidic) and (protein)' from 1ubi (94 atoms)>

Flag labels can be prefixed with 'is' to check whether all atoms in an Atomic instance are flagged the same way:

In [6]: p.protein.ishetero
Out[6]: False

In [7]: p.water.ishetero
Out[7]: True

Flag labels can also be used to make quick atom counts:

In [8]: p.numAtoms()
Out[8]: 683

In [9]: p.numAtoms('protein')
Out[9]: 602

In [10]: p.numAtoms('water')
Out[10]: 81

Protein

protein
aminoacid
indicates the twenty standard amino acids (stdaa) and some non-standard amino acids (nonstdaa) described below. Residue must also have an atom named 'CA' in addition to having a qualifying residue name.
stdaa
indicates the standard amino acid residues: ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, and VAL
nonstdaa

indicates one of the following residues:

ASX (B) asparagine or aspartic acid
GLX (Z) glutamine or glutamic acid
CSO (C) S-hydroxycysteine
HIP (H) ND1-phosphohistidine
HSD (H) prototropic tautomer of histidine, H on ND1 (CHARMM)
HSE (H) prototropic tautomer of histidine, H on NE2 (CHARMM)
HSP (H) protonated histidine
MSE selenomethionine
SEC (U) selenocysteine
SEP (S) phosphoserine
TPO (T) phosphothreonine
PTR (Y) O-phosphotyrosine
XLE (J) leucine or isoleucine
XAA (X) unspecified or unknown

You can modify the list of non-standard amino acids using addNonstdAminoacid(), delNonstdAminoacid(), and listNonstdAAProps().

calpha
ca
Cα atoms of protein residues, same as selection 'name CA and protein'
backbone
bb
non-hydrogen backbone atoms of protein residues, same as selection 'name CA C O N and protein'
backbonefull
bbfull
backbone atoms of protein residues, same as selection 'name CA C O N H H1 H2 H3 OXT and protein'
sidechain
sc
side-chain atoms of protein residues, same as selection 'protein and not backbonefull'
acidic
residues ASP, GLU, HSP, PHD, PTR, SEP, TPO
acyclic
residues ALA, ARG, ASN, ASP, ASX, CME, CSO, CYS, CYX, GLN, GLU, GLX, LY, ILE, LEU, LYS, MET, MSE, PHD, SEC, SEP, SER, THR, TPO, VAL, LE
aliphatic
residues ALA, GLY, ILE, LEU, PRO, VAL, XLE
aromatic
residues HIS, PHE, PTR, TRP, TYR
basic
residues ARG, HID, HIE, HIP, HIS, HSD, HSE, LYS
buried
residues ALA, CME, CYS, CYX, ILE, LEU, MET, MSE, PHE, SEC, TRP, VAL, LE
charged
residues ARG, ASP, GLU, HIS, LYS
cyclic
residues HID, HIE, HIP, HIS, HSD, HSE, HSP, PHE, PRO, PTR, TRP, TYR
hydrophobic
residues ALA, ILE, LEU, MET, PHE, PRO, TRP, VAL, XLE
large
residues ARG, CME, GLN, GLU, GLX, HID, HIE, HIP, HIS, HSD, HSE, HSP, LE, LEU, LYS, MET, MSE, PHD, PHE, PTR, SEP, TPO, TRP, TYR, XLE
medium
residues ASN, ASP, ASX, CSO, CYS, CYX, PRO, SEC, THR, VAL
neutral
residues ALA, ASN, CME, CSO, CYS, CYX, GLN, GLY, ILE, LEU, MET, MSE, HE, PRO, SEC, SER, THR, TRP, TYR, VAL
polar
residues ARG, ASN, ASP, ASX, CSO, CYS, CYX, GLN, GLU, GLX, GLY, HID, IE, HIP, HIS, HSD, HSE, HSP, LYS, PHD, PTR, SEC, SEP, SER, THR, PO, TYR
small
residues ALA, GLY, SER
surface
residues ARG, ASN, ASP, ASX, CSO, GLN, GLU, GLX, GLY, HID, HIE, HIP, IS, HSD, HSE, HSP, LYS, PHD, PRO, PTR, SEP, SER, THR, TPO, TYR

Nucleic

nucleic
indicates nucleobase, nucleotide, and some nucleoside derivatives that are described below, so it is same as 'nucleobase or nucleotide or nucleoside'.
nucleobase
indicates ADE (adenine), GUN (guanine), CYT (cytosine), THY (thymine), and URA (uracil).
nucleotide

indicates residues with the following names:

DA 2’-deoxyadenosine-5’-monophosphate
DC 2’-deoxycytidine-5’-monophosphate
DG 2’-deoxyguanosine-5’-monophosphate
DT 2’-deoxythymidine-5’-monophosphate
DU 2’-deoxyuridine-5’-monophosphate
A adenosine-5’-monophosphate
C cytidine-5’-monophosphate
G guanosine-5’-monophosphate
T 2’-deoxythymidine-5’-monophosphate
U uridine-5’-monophosphate
nucleoside

indicates following nucleosides and their derivatives that are recognized by PDB:

ADN adenosine
AMP adenosine-5’-monophosphate
ADP adenosine-5’-diphosphate
ATP adenosine-5’-triphosphate
AGS adenosine-5’-triphosphate-gamma-S
CMP cyclic adenosine-3’,5’-monophosphate
A2P adenosine-2’,5’-diphosphate
A3P adenosine-3’,5’-diphosphate
CTN cytidine
C2P cytidine-2’-monophosphate
C3P cytidine-3’-monophosphate
C5P cytidine-5’-monophosphate
CDP cytidine-5’-diphosphate
CTP cytidine-5’-triphosphate
GMP guanosine
5GP guanosine-5’-monophosphate
GDP guanosine-5’-diphosphate
GTP guanosine-5’-triphosphate
THM thymidine
TMP thymidine-5’-monophosphate
TPP thymidine-5’-diphosphate
TTP thymidine-5’-triphosphate
URI uridine (uracil plus ribose)
UMP 2’-deoxyuridine 5’-monophosphate
UDP uridine 5’-diphosphate
UTP uridine 5’-triphosphate
at
same as selection 'resname ADE A THY T'
cg
same as selection 'resname CYT C GUN G'
purine
same as selection 'resname ADE A GUN G'
pyrimidine
same as selection 'resname CYT C THY T URA U'

Heteros

hetero
indicates anything other than a protein or a nucleic residue, i.e. 'not (protein or nucleic)'.
hetatm
is available when atomic data is parsed from a PDB or similar format file and indicates atoms that are marked 'HETATM' in the file.
water

indices HOH and DOD recognized by PDB and also WAT, TIP3, H2O, OH2, TIP, TIP2, and TIP4 recognized by molecular dynamics (MD) force fields.

Previously used water types HH0, OHH, and SOL conflict with other compounds in the PDB, so are removed from the definition of this flag.

ion

indicates the following ions most of which are recognized by the PDB and others by MD force fields.

  PDB Source Conflict
AL aluminum Yes    
BA barium Yes    
CA calcium Yes    
CD cadmium Yes    
CL chloride Yes    
CO cobalt (ii) Yes    
CS cesium Yes    
CU copper (ii) Yes    
CU1 copper (i) Yes    
CUA dinuclear copper Yes    
HG mercury (ii) Yes    
IN indium (iii) Yes    
IOD iodide Yes    
K potassium Yes    
MG magnesium Yes    
MN3 manganese (iii) Yes    
MN manganese (ii) Yes    
NA sodium Yes    
PB lead (ii) Yes    
PT platinum (ii) Yes    
RB rubidium Yes    
TB terbium (iii) Yes    
TL thallium (i) Yes    
WO4 thungstate (vi) Yes    
YB ytterbium (iii) Yes    
ZN zinc Yes    
CAL calcium No CHARMM Yes
CES cesium No CHARMM Yes
CLA chloride No CHARMM Yes
POT potassium No CHARMM Yes
SOD sodium No CHARMM Yes
ZN2 zinc No CHARMM No

Ion identifiers that are obsoleted by PDB (MO3, MO4, MO5, MO6, NAW, OC7, and ZN1) are removed from this definition.

lipid
indicates GPE, LPP, OLA, SDS, and STE from PDB, and also POPC, LPPC, POPE, DLPE, PCGL, STEA, PALM, OLEO, DMPC from CHARMM force field.
sugar
indicates BGC, GLC, and GLO from PDB, and also AGLC from CHARMM.
heme
indicates 1FH, 2FH, DDH, DHE, HAS, HDD, HDE, HDM, HEA, HEB, HEC, HEM, HEO, HES, HEV, NTE, SRM, and VER from PDB, and also HEMO and HEMR from CHARMM.
pdbter
is available when atomic data is parsed from a PDB format file and indicates atoms that were followed by 'TER' record.
selpdbter
is available when atomic data is parsed from a PDB format file and then a selection is made and indicates selected atoms that should be followed by 'TER' record.

Elements

Following elements found in proteins are recognized by applying regular expressions to atom names:

carbon
carbon atoms, same as 'name "C.*" and not ion'
nitrogen
nitrogen atoms, same as 'name "N.*" and not ion'
oxygen
oxygen atoms, same as 'name "O.*" and not ion'
sulfur
sulfur atoms, same as 'name "S.*" and not ion'
hydrogen
hydrogen atoms, same as 'name "[1-9]?H.*" and not ion'
noh
heavy
non hydrogen atoms, same as 'not hydrogen

'not ion' is appended to above definitions to avoid conflicts with ion atoms.

Structure

Following secondary structure flags are defined but before they can be used, secondary structure assignments must be made.

extended
extended conformation, same as 'secondary E'
helix
α-helix conformation, same as 'secondary H'
helix310
3_10-helix conformation, same as 'secondary G'
helixpi
π-helix conformation, same as 'secondary I'
turn
hydrogen bonded turn conformation, same as 'secondary T'
bridge
isolated beta-bridge conformation, same as 'secondary B'
bend
bend conformation, same as 'secondary S'
coil
not in one of above conformations, same as 'secondary C'

Others

all
indicates all atoms, returns a new view of the instance
none
indicates no atoms, returns None
dummy
indicates dummy atoms in an AtomMap
mapped
indicates mapped atoms in an AtomMap

Functions

The following functions can be used to customize flag definitions:

flagDefinition(*arg, **kwarg)[source]

Learn, change, or reset Atom Flags definitions.

Learn a definition

Calling this function with no arguments will return list of flag names whose definitions you can learn:

In [1]: flagDefinition()
Out[1]: 
['acidic',
 'acyclic',
 'aliphatic',
 'aminoacid',
 'aromatic',
 'at',
 'backbone',
 'backbonefull',
 'basic',
 'bb',
 'bbfull',
 'buried',
 'carbon',
 'cg',
 'charged',
 'cyclic',
 'heme',
 'hydrogen',
 'hydrophobic',
 'ion',
 'large',
 'lipid',
 'medium',
 'neutral',
 'nitrogen',
 'nonstdaa',
 'nucleic',
 'nucleobase',
 'nucleoside',
 'nucleotide',
 'oxygen',
 'polar',
 'protein',
 'purine',
 'pyrimidine',
 'small',
 'stdaa',
 'sugar',
 'sulfur',
 'surface',
 'water']

Passing a flag name will return its definition:

In [2]: flagDefinition('backbone')
Out[2]: ['C', 'CA', 'N', 'O']

In [3]: flagDefinition('hydrogen')
Out[3]: '[0-9]?H.*'

Change a definition

Calling the function with editable=True argument will return flag names those definitions that can be edited:

In [4]: flagDefinition(editable=True)
Out[4]: 
['at',
 'backbone',
 'backbonefull',
 'bb',
 'bbfull',
 'carbon',
 'cg',
 'heme',
 'hydrogen',
 'ion',
 'lipid',
 'nitrogen',
 'nucleobase',
 'nucleoside',
 'nucleotide',
 'oxygen',
 'purine',
 'pyrimidine',
 'sugar',
 'sulfur',
 'water']

Pass an editable flag name with its new definition:

In [5]: flagDefinition(nitrogen='N.*')

In [6]: flagDefinition(backbone=['CA', 'C', 'O', 'N'])

In [7]: flagDefinition(nucleobase=['ADE', 'CYT', 'GUN', 'THY', 'URA'])

Note that the type of the new definition must be the same as the type of the old definition. Flags with editable definitions are: at, backbone, backbonefull, bb, bbfull, carbon, cg, heme, hydrogen, ion, lipid, nitrogen, nucleobase, nucleoside, nucleotide, oxygen, purine, pyrimidine, sugar, sulfur, and water

Reset definitions

Pass reset keyword as follows to restore all default definitions of editable flags and also non-standard amino acids.

In [8]: flagDefinition(reset='all')

Or, pass a specific editable flag label to restore its definition:

In [9]: flagDefinition(reset='nitrogen')
listNonstdAAProps(resname)[source]

Returns properties of non-standard amino acid resname.

In [1]: listNonstdAAProps('PTR')
Out[1]: ['acidic', 'aromatic', 'cyclic', 'large', 'polar', 'surface']
getNonstdProperties(resname)[source]

Deprecated for removal in v1.4, use listNonstdAAProps() instead.

addNonstdAminoacid(resname, *properties)[source]

Add non-standard amino acid resname with properties selected from:

In [1]: addNonstdAminoacid('PTR', 'acidic', 'aromatic', 'cyclic', 'large',
   ...: 'polar', 'surface')
   ...: 

Default set of non-standard amino acids can be restored as follows:

In [2]: flagDefinition(reset='nonstdaa')
delNonstdAminoacid(resname)[source]

Delete non-standard amino acid resname.

In [1]: delNonstdAminoacid('PTR')

In [2]: flagDefinition('nonstdaa')
Out[2]: 
['ASX',
 'CME',
 'CSB',
 'CSO',
 'CYX',
 'GLX',
 'HID',
 'HIE',
 'HIP',
 'HSD',
 'HSE',
 'HSP',
 'MEN',
 'MSE',
 'PHD',
 'SEC',
 'SEP',
 'TPO',
 'XAA',
 'XLE']

Default set of non-standard amino acids can be restored as follows:

In [3]: flagDefinition(reset='nonstdaa')