PDB structure without hydrogens¶
Very often, PDB structures downloaded directly from the PDB database will not
have determined hydrogen atoms that are required, for example, for predicting
hydrogen bonds. In such a case, we can use the addHydrogens()
function.
It will allow us to use one of two available methods (openbabel
or pdbfixer
)
to predict the position of hydrogen atoms in protein structure.
To use one of those functions, we need to install additional Python package(s). For Anaconda users, the installation will be the following:
Installation of Openbabel:
conda install -c conda-forge openbabel
Installation of PDBfixer:
conda install -c conda-forge pdbfixer
Add missing hydrogen atoms to the structure¶
We start by fetching the PDB file with 5KQM code (5kqm.pdb
).
Openbabel requires having the PDB file in the same folder. Therefore, it
needs to be downloaded and saved to successfully perform the operation with
adding missing hydrogens. A new file will be saved with the same name with
the additional prefix addH_
.
In [1]: from prody import *
In [2]: from pylab import *
In [3]: import matplotlib
In [4]: ion() # turn interactive mode on
Openbabel or PDBfixer require PDB file saved in the direcory. Therefore first it needs to be downloaded.
In [5]: fetchPDB('5kqm', compressed=False)
@> Connecting wwPDB FTP server RCSB PDB (USA).
@> Downloading PDB files via FTP failed, trying HTTP.
@> 5kqm downloaded (5kqm.pdb)
@> PDB download via HTTP completed (1 downloaded, 0 failed).
When PDB file is already in the local directory, we can choose between Openbabel and PDBfixer to add missing hydrogen bonds to the protein structure:
Openbabel:
In [6]: PDBname = '5kqm.pdb'
In [7]: addMissingAtoms(PDBname, method='openbabel')
@> Hydrogens were added to the structure. Structure addH_5kqm.pdb is saved in the local directry.
PDBfixer:
In [8]: addMissingAtoms(PDBname, method='pdbfixer')
@> Hydrogens were added to the structure. New structure is saved as addH_5kqm.pdb.
Next, we can parse the saved structure with hydrogen atoms to ProDy and analyze it in the same way as in the previous paragraph.
In [9]: atoms = parsePDB('addH_'+str(PDBname)).select('protein')
@> 2800 atoms and 1 coordinate set(s) were parsed in 0.03s.