PDB File Header

This module defines functions for parsing header data from PDB files.

class Chemical(resname)[source]

A data structure for storing information on chemical components (or heterogens) in PDB structures.

A Chemical instance has the following attributes:

Attribute Type Description (RECORD TYPE)
resname str residue name (or chemical component identifier) (HET)
name str chemical name (HETNAM)
chain str chain identifier (HET)
resnum int residue (or sequence) number (HET)
icode str insertion code (HET)
natoms int number of atoms present in the structure (HET)
description str description of the chemical component (HET)
synonyms list synonyms (HETSYN)
formula str chemical formula (FORMUL)
pdbentry str PDB entry that chemical data is extracted from

Chemical class instances can be obtained as follows:

In [1]: from prody import *

In [2]: chemical = parsePDBHeader('1zz2', 'chemicals')[0]

In [3]: chemical
Out[3]: <Chemical: B11 (1ZZ2_A_362)>

In [4]: chemical.name
Out[4]: 'N-[3-(4-FLUOROPHENOXY)PHENYL]-4-[(2-HYDROXYBENZYL)AMINO]PIPERIDINE-1-SULFONAMIDE'

In [5]: chemical.natoms
Out[5]: 33

In [6]: len(chemical)
Out[6]: 33
chain

chain identifier

description

description of the chemical component

formula

chemical formula

icode

insertion code

name

chemical name

natoms

number of atoms present in the structure

pdbentry

PDB entry that chemical data is extracted from

resname

residue name (or chemical component identifier)

resnum

residue (or sequence) number

synonyms

list of synonyms

class Polymer(chid)[source]

A data structure for storing information on polymer components (protein or nucleic) of PDB structures.

A Polymer instance has the following attributes:

Attribute Type Description (RECORD TYPE)
chid str chain identifier
name str name of the polymer (macromolecule) (COMPND)
fragment str specifies a domain or region of the molecule (COMPND)
synonyms list synonyms for the polymer (COMPND)
ec list associated Enzyme Commission numbers (COMPND)
engineered bool indicates that the polymer was produced using recombinant technology or by purely chemical synthesis (COMPND)
mutation bool indicates presence of a mutation (COMPND)
comments str additional comments
sequence str polymer chain sequence (SEQRES)
dbrefs list sequence database records (DBREF[1|2] and SEQADV), see DBRef
modified list
modified residues (MODRES)
when modified residues are present, each will be represented as: (resname, chid, resnum, icode, stdname, comment)
pdbentry str PDB entry that polymer data is extracted from

Polymer class instances can be obtained as follows:

In [1]: polymer = parsePDBHeader('2k39', 'polymers')[0]

In [2]: polymer
Out[2]: <Polymer: UBIQUITIN (2K39_A)>

In [3]: polymer.pdbentry
Out[3]: '2K39'

In [4]: polymer.chid
Out[4]: 'A'

In [5]: polymer.name
Out[5]: 'UBIQUITIN'

In [6]: polymer.sequence
Out[6]: 'MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG'

In [7]: len(polymer.sequence)
Out[7]: 76

In [8]: len(polymer)
Out[8]: 76

In [9]: dbref = polymer.dbrefs[0]

In [10]: dbref.database
Out[10]: 'UniProt'

In [11]: dbref.accession
Out[11]: 'P62972'

In [12]: dbref.idcode
Out[12]: 'UBIQ_XENLA'
chid

chain identifier

comments

additional comments

dbrefs

sequence database reference records

ec

list of associated Enzyme Commission numbers

engineered

indicates that the molecule was produced using recombinant technology or by purely chemical synthesis

fragment

specifies a domain or region of the molecule

modified

modified residues

mutation

indicates presence of a mutation

name

name of the polymer (macromolecule)

pdbentry

PDB entry that polymer data is extracted from

sequence

polymer chain sequence

synonyms

list of synonyms for the molecule

class DBRef[source]

A data structure for storing reference to sequence databases for polymer components in PDB structures. Information if parsed from DBREF[1|2] and SEQADV records in PDB header.

accession

database accession code

database

sequence database, one of UniProt, GenBank, Norine, UNIMES, or PDB

dbabbr

database abbreviation, one of UNP, GB, NORINE, UNIMES, or PDB

diff

list of differences between PDB and database sequences, (resname, resnum, icode, dbResname, dbResnum, comment)

first

initial residue numbers, (resnum, icode, dbnum)

idcode

database identification code, i.e. entry name in UniProt

last

ending residue numbers, (resnum, icode, dbnum)

parsePDBHeader(pdb, *keys)[source]

Returns header data dictionary for pdb. This function is equivalent to parsePDB(pdb, header=True, model=0, meta=False), likewise pdb may be an identifier or a filename.

List of header records that are parsed.

Record type Dictionary key(s) Description
HEADER
classification
deposition_date
identifier
molecule classification
deposition date
PDB identifier
TITLE title title for the experiment or analysis
SPLIT split list of PDB entries that make up the whole structure when combined with this one
COMPND polymers see Polymer
EXPDTA experiment information about the experiment
NUMMDL n_models number of models
MDLTYP model_type additional structural annotation
AUTHOR authors list of contributors
JRNL reference
reference information dictionary:
  • authors: list of authors
  • title: title of the article
  • editors: list of editors
  • issn:
  • reference: journal, vol, issue, etc.
  • publisher: publisher information
  • pmid: pubmed identifier
  • doi: digital object identifier
DBREF[1|2] polymers see Polymer and DBRef
SEQADV polymers see Polymer
SEQRES polymers see Polymer
MODRES polymers see Polymer
HELIX polymers see Polymer
SHEET polymers see Polymer
HET chemicals see Chemical
HETNAM chemicals see Chemical
HETSYN chemicals see Chemical
FORMUL chemicals see Chemical
REMARK 2 resolution resolution of structures, when applicable
REMARK 4 version PDB file version
REMARK 350 biomoltrans biomolecular transformation lines (unprocessed)
REMARK 900 related_entries related entries in the PDB or EMDB

Header records that are not parsed are: OBSLTE, CAVEAT, SOURCE, KEYWDS, REVDAT, SPRSDE, SSBOND, LINK, CISPEP, CRYST1, ORIGX1, ORIGX2, ORIGX3, MTRIX1, MTRIX2, MTRIX3, and REMARK X not mentioned above.

assignSecstr(header, atoms, coil=True)[source]

Assign secondary structure from header dictionary to atoms. header must be a dictionary parsed using the parsePDB(). atoms may be an instance of AtomGroup, Selection, Chain or Residue. ProDy can be configured to automatically parse and assign secondary structure information using confProDy(auto_secondary=True) command. See also confProDy() function.

The Dictionary of Protein Secondary Structure, in short DSSP, type single letter code assignments are used:

  • G = 3-turn helix (310 helix). Min length 3 residues.
  • H = 4-turn helix (alpha helix). Min length 4 residues.
  • I = 5-turn helix (pi helix). Min length 5 residues.
  • T = hydrogen bonded turn (3, 4 or 5 turn)
  • E = extended strand in parallel and/or anti-parallel beta-sheet conformation. Min length 2 residues.
  • B = residue in isolated beta-bridge (single pair beta-sheet hydrogen bond formation)
  • S = bend (the only non-hydrogen-bond based assignment).
  • C = residues not in one of above conformations.

See http://en.wikipedia.org/wiki/Protein_secondary_structure#The_DSSP_code for more details.

Following PDB helix classes are omitted:

  • Right-handed omega (2, class number)
  • Right-handed gamma (4)
  • Left-handed alpha (6)
  • Left-handed omega (7)
  • Left-handed gamma (8)
  • 2 - 7 ribbon/helix (9)
  • Polyproline (10)

Secondary structures are assigned to all atoms in a residue. Amino acid residues without any secondary structure assignments in the header section will be assigned coil (C) conformation. This can be prevented by passing coil=False argument.

buildBiomolecules(header, atoms, biomol=None)[source]

Returns atoms after applying biomolecular transformations from header dictionary. Biomolecular transformations are applied to all coordinate sets in the molecule.

Some PDB files contain transformations for more than 1 biomolecules. A specific set of transformations can be choosen using biomol argument. Transformation sets are identified by numbers, e.g. "1", "2", ...

If multiple biomolecular transformations are provided in the header dictionary, biomolecules will be returned as AtomGroup instances in a list().

If the resulting biomolecule has more than 26 chains, the molecular assembly will be split into multiple AtomGroup instances each containing at most 26 chains. These AtomGroup instances will be returned in a tuple.

Note that atoms in biomolecules are ordered according to chain identifiers. When multiple chains in a biomolecule have the same chain identifier, they are given different segment names to distinguish them.