PDB File Header¶
This module defines functions for parsing header data from PDB files.
-
class
Chemical
(resname)[source]¶ A data structure for storing information on chemical components (or heterogens) in PDB structures.
A
Chemical
instance has the following attributes:Attribute Type Description (RECORD TYPE) resname str residue name (or chemical component identifier) (HET) name str chemical name (HETNAM) chain str chain identifier (HET) resnum int residue (or sequence) number (HET) icode str insertion code (HET) natoms int number of atoms present in the structure (HET) description str description of the chemical component (HET) synonyms list synonyms (HETSYN) formula str chemical formula (FORMUL) pdbentry str PDB entry that chemical data is extracted from Chemical class instances can be obtained as follows:
In [1]: from prody import * In [2]: chemical = parsePDBHeader('1zz2', 'chemicals')[0] In [3]: chemical Out[3]: <Chemical: B11 (1ZZ2_A_362)> In [4]: chemical.name Out[4]: 'N-[3-(4-FLUOROPHENOXY)PHENYL]-4-[(2-HYDROXYBENZYL)AMINO]PIPERIDINE-1-SULFONAMIDE' In [5]: chemical.natoms Out[5]: 33 In [6]: len(chemical) Out[6]: 33
-
chain
¶ chain identifier
-
description
¶ description of the chemical component
-
formula
¶ chemical formula
-
icode
¶ insertion code
-
name
¶ chemical name
-
natoms
¶ number of atoms present in the structure
-
pdbentry
¶ PDB entry that chemical data is extracted from
-
resname
¶ residue name (or chemical component identifier)
-
resnum
¶ residue (or sequence) number
-
synonyms
¶ list of synonyms
-
-
class
Polymer
(chid)[source]¶ A data structure for storing information on polymer components (protein or nucleic) of PDB structures.
A
Polymer
instance has the following attributes:Attribute Type Description (RECORD TYPE) chid str chain identifier name str name of the polymer (macromolecule) (COMPND) fragment str specifies a domain or region of the molecule (COMPND) synonyms list synonyms for the polymer (COMPND) ec list associated Enzyme Commission numbers (COMPND) engineered bool indicates that the polymer was produced using recombinant technology or by purely chemical synthesis (COMPND) mutation bool indicates presence of a mutation (COMPND) comments str additional comments sequence str polymer chain sequence (SEQRES) dbrefs list sequence database records (DBREF[1|2] and SEQADV), see DBRef
modified list modified residues (MODRES)when modified residues are present, each will be represented as:(resname, chid, resnum, icode, stdname, comment)
pdbentry str PDB entry that polymer data is extracted from Polymer class instances can be obtained as follows:
In [1]: polymer = parsePDBHeader('2k39', 'polymers')[0] In [2]: polymer Out[2]: <Polymer: UBIQUITIN (2K39_A)> In [3]: polymer.pdbentry Out[3]: '2K39' In [4]: polymer.chid Out[4]: 'A' In [5]: polymer.name Out[5]: 'UBIQUITIN' In [6]: polymer.sequence Out[6]: 'MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG' In [7]: len(polymer.sequence) Out[7]: 76 In [8]: len(polymer) Out[8]: 76 In [9]: dbref = polymer.dbrefs[0] In [10]: dbref.database Out[10]: 'UniProt' In [11]: dbref.accession Out[11]: 'P62972' In [12]: dbref.idcode Out[12]: 'UBIQ_XENLA'
-
chid
¶ chain identifier
-
comments
¶ additional comments
-
dbrefs
¶ sequence database reference records
-
ec
¶ list of associated Enzyme Commission numbers
-
engineered
¶ indicates that the molecule was produced using recombinant technology or by purely chemical synthesis
-
fragment
¶ specifies a domain or region of the molecule
-
modified
¶ modified residues
-
mutation
¶ indicates presence of a mutation
-
name
¶ name of the polymer (macromolecule)
-
pdbentry
¶ PDB entry that polymer data is extracted from
-
sequence
¶ polymer chain sequence
-
synonyms
¶ list of synonyms for the molecule
-
-
class
DBRef
[source]¶ A data structure for storing reference to sequence databases for polymer components in PDB structures. Information if parsed from DBREF[1|2] and SEQADV records in PDB header.
-
accession
¶ database accession code
-
database
¶ sequence database, one of UniProt, GenBank, Norine, UNIMES, or PDB
-
dbabbr
¶ database abbreviation, one of UNP, GB, NORINE, UNIMES, or PDB
-
diff
¶ list of differences between PDB and database sequences,
(resname, resnum, icode, dbResname, dbResnum, comment)
-
first
¶ initial residue numbers,
(resnum, icode, dbnum)
-
idcode
¶ database identification code, i.e. entry name in UniProt
-
last
¶ ending residue numbers,
(resnum, icode, dbnum)
-
-
parsePDBHeader
(pdb, *keys)[source]¶ Returns header data dictionary for pdb. This function is equivalent to
parsePDB(pdb, header=True, model=0, meta=False)
, likewise pdb may be an identifier or a filename.List of header records that are parsed.
Record type Dictionary key(s) Description HEADER classificationdeposition_dateidentifiermolecule classificationdeposition datePDB identifierTITLE title title for the experiment or analysis SPLIT split list of PDB entries that make up the whole structure when combined with this one COMPND polymers see Polymer
EXPDTA experiment information about the experiment NUMMDL n_models number of models MDLTYP model_type additional structural annotation AUTHOR authors list of contributors JRNL reference - reference information dictionary:
- authors: list of authors
- title: title of the article
- editors: list of editors
- issn:
- reference: journal, vol, issue, etc.
- publisher: publisher information
- pmid: pubmed identifier
- doi: digital object identifier
DBREF[1|2] polymers see Polymer
andDBRef
SEQADV polymers see Polymer
SEQRES polymers see Polymer
MODRES polymers see Polymer
HELIX polymers see Polymer
SHEET polymers see Polymer
HET chemicals see Chemical
HETNAM chemicals see Chemical
HETSYN chemicals see Chemical
FORMUL chemicals see Chemical
REMARK 2 resolution resolution of structures, when applicable REMARK 4 version PDB file version REMARK 350 biomoltrans biomolecular transformation lines (unprocessed) REMARK 900 related_entries related entries in the PDB or EMDB Header records that are not parsed are: OBSLTE, CAVEAT, SOURCE, KEYWDS, REVDAT, SPRSDE, SSBOND, LINK, CISPEP, CRYST1, ORIGX1, ORIGX2, ORIGX3, MTRIX1, MTRIX2, MTRIX3, and REMARK X not mentioned above.
-
assignSecstr
(header, atoms, coil=True)[source]¶ Assign secondary structure from header dictionary to atoms. header must be a dictionary parsed using the
parsePDB()
. atoms may be an instance ofAtomGroup
,Selection
,Chain
orResidue
. ProDy can be configured to automatically parse and assign secondary structure information usingconfProDy(auto_secondary=True)
command. See alsoconfProDy()
function.The Dictionary of Protein Secondary Structure, in short DSSP, type single letter code assignments are used:
- G = 3-turn helix (310 helix). Min length 3 residues.
- H = 4-turn helix (alpha helix). Min length 4 residues.
- I = 5-turn helix (pi helix). Min length 5 residues.
- T = hydrogen bonded turn (3, 4 or 5 turn)
- E = extended strand in parallel and/or anti-parallel beta-sheet conformation. Min length 2 residues.
- B = residue in isolated beta-bridge (single pair beta-sheet hydrogen bond formation)
- S = bend (the only non-hydrogen-bond based assignment).
- C = residues not in one of above conformations.
See http://en.wikipedia.org/wiki/Protein_secondary_structure#The_DSSP_code for more details.
Following PDB helix classes are omitted:
- Right-handed omega (2, class number)
- Right-handed gamma (4)
- Left-handed alpha (6)
- Left-handed omega (7)
- Left-handed gamma (8)
- 2 - 7 ribbon/helix (9)
- Polyproline (10)
Secondary structures are assigned to all atoms in a residue. Amino acid residues without any secondary structure assignments in the header section will be assigned coil (C) conformation. This can be prevented by passing
coil=False
argument.
-
buildBiomolecules
(header, atoms, biomol=None)[source]¶ Returns atoms after applying biomolecular transformations from header dictionary. Biomolecular transformations are applied to all coordinate sets in the molecule.
Some PDB files contain transformations for more than 1 biomolecules. A specific set of transformations can be choosen using biomol argument. Transformation sets are identified by numbers, e.g.
"1"
,"2"
, ...If multiple biomolecular transformations are provided in the header dictionary, biomolecules will be returned as
AtomGroup
instances in alist()
.If the resulting biomolecule has more than 26 chains, the molecular assembly will be split into multiple
AtomGroup
instances each containing at most 26 chains. TheseAtomGroup
instances will be returned in a tuple.Note that atoms in biomolecules are ordered according to chain identifiers. When multiple chains in a biomolecule have the same chain identifier, they are given different segment names to distinguish them.