PDB files¶
This examples demonstrates how to use the flexible PDB fetcher,
fetchPDB()
. Valid inputs are PDB identifier, e.g 2k39, or a list
of PDB identifiers, e.g. ["2k39", "1mkp", "1etc"]
.
Compressed PDB files (pdb.gz
) will be saved to the current working
directory or a target folder.
Fetch PDB files¶
Single file¶
We start by importing everything from the ProDy package:
In [1]: from prody import *
The function will return a filename if the download is successful.
In [2]: filename = fetchPDB('5uoj')
In [3]: filename
Out[3]: '5uoj.pdb.gz'
Multiple files¶
This function also accepts a list of PDB identifiers:
In [4]: filenames = fetchPDB(['5uoj', '1r39', '@!~#'])
In [5]: filenames
Out[5]: ['5uoj.pdb.gz', '1r39.pdb.gz', None]
For failed downloads, None
will be returned (or the list will contain
None
item).
Also note that in this case we passed a folder name. Files are saved in this folder, after it is created if it did not exist.
ProDy will give you a report of download results and return a list of filenames. The report will be printed on the screen, which in this case would be:
@> 5uoj (./5uoj.pdb.gz) is found in the target directory.
@> @!~# is not a valid identifier.
@> 1r39 downloaded (./1r39.pdb.gz)
@> PDB download completed (1 found, 1 downloaded, 1 failed).
Parse PDB files¶
ProDy offers a fast and flexible PDB parser, parsePDB()
.
Parser can be used to read well defined subsets of atoms, specific chains or
models (in NMR structures) to boost the performance. This example shows how to
use the flexible parsing options.
Three types of input are accepted from user:
- PDB file path, e.g.
"../1MKP.pdb"
- compressed (gzipped) PDB file path, e.g.
"5uoj.pdb.gz"
- PDB identifier, e.g. 2k39
Output is an AtomGroup
instance that stores atomic data
and can be used as input to functions and classes for dynamics analysis.
Parse a file¶
You can parse PDB files by passing a filename (gzipped files are handled). We do so after downloading a PDB file (see Fetch PDB files for more information):
In [6]: fetchPDB('5uoj')
Out[6]: '5uoj.pdb.gz'
In [7]: atoms = parsePDB('5uoj')
In [8]: atoms
Out[8]: <AtomGroup: 5uoj (3138 atoms)>
Parser returns an AtomGroup
instance.
Also note that the time it took to parse the file is printed on
the screen. This includes the time that it takes to evaluate
coordinate lines and build an AtomGroup
instance and
excludes the time spent on reading the file from disk.
Use an identifier¶
PDB files can be parsed by passing simply an identifier. Parser will look for a PDB file that matches the given identifier in the current working directory. If a matching file is not found, ProDy will downloaded it from PDB FTP server automatically and saved it in the current working directory.
In [9]: atoms = parsePDB('1mkp')
In [10]: atoms
Out[10]: <AtomGroup: 1mkp (1183 atoms)>
Subsets of atoms¶
Parser can be used to parse backbone or Cα atoms:
In [11]: backbone = parsePDB('1mkp', subset='bb')
In [12]: backbone
Out[12]: <AtomGroup: 1mkp_bb (576 atoms)>
In [13]: calpha = parsePDB('1mkp', subset='ca')
In [14]: calpha
Out[14]: <AtomGroup: 1mkp_ca (144 atoms)>
Specific chains¶
Parser can be used to parse a specific chain from a PDB file:
In [15]: chA = parsePDB('3mkb', chain='A')
In [16]: chA
Out[16]: <AtomGroup: 3mkbA (1198 atoms)>
In [17]: chC = parsePDB('3mkb', chain='C')
In [18]: chC
Out[18]: <AtomGroup: 3mkbC (1189 atoms)>
Multiple chains can also be parsed in the same way:
In [19]: chAC = parsePDB('3mkb', chain='AC')
In [20]: chAC
Out[20]: <AtomGroup: 3mkbAC (2387 atoms)>
Specific models¶
Parser can be used to parse a specific model from a file:
In [21]: model1 = parsePDB('2k39', model=10)
In [22]: model1
Out[22]: <AtomGroup: 2k39 (1231 atoms)>
Alternate locations¶
When a PDB file contains alternate locations for some of the atoms, by default
alternate locations with indicator A
are parsed.
In [23]: altlocA = parsePDB('1ejg')
In [24]: altlocA
Out[24]: <AtomGroup: 1ejg (637 atoms)>
Specific alternate locations can be parsed as follows:
In [25]: altlocB = parsePDB('1ejg', altloc='B')
In [26]: altlocB
Out[26]: <AtomGroup: 1ejg (634 atoms)>
Note that in this case number of atoms are different between the two atom groups. This is because the residue types of atoms with alternate locations are different.
Also, all alternate locations can be parsed as follows:
In [27]: all_altlocs = parsePDB('1ejg', altloc=True)
In [28]: all_altlocs
Out[28]: <AtomGroup: 1ejg (637 atoms; active #0 of 3 coordsets)>
Note that this time parser returned three coordinate sets. One for each alternate location indicator found in this file (A, B, C). When parsing multiple alternate locations, parser will expect for the same residue type for each atom with an alternate location. If residue names differ, a warning message will be printed.
Composite arguments¶
Parser can be used to parse coordinates from a specific model for a subset of atoms of a specific chain:
In [29]: composite = parsePDB('2k39', model=10, chain='A', subset='ca')
In [30]: composite
Out[30]: <AtomGroup: 2k39A_ca (76 atoms)>
Header data¶
PDB parser can be used to extract header data in a dict
from PDB
files as follows:
In [31]: atoms, header = parsePDB('1ubi', header=True)
In [32]: list(header)
Out[32]:
['A',
'related_entries',
'sheet',
'classification',
'reference',
'title',
'sheet_range',
'polymers',
'resolution',
'space_group',
'helix_range',
'chemicals',
'experiment',
'helix',
'version',
'authors',
'identifier',
'deposition_date',
'biomoltrans']
In [33]: header['experiment']
Out[33]: 'X-RAY DIFFRACTION'
In [34]: header['resolution']
Out[34]: 1.8
It is also possible to parse only header data by passing model=0 as an argument:
In [35]: header = parsePDB('1ubi', header=True, model=0)
or using parsePDBHeader()
function:
In [36]: header = parsePDBHeader('1ubi')
Write PDB file¶
PDB files can be written using writePDB()
function. This
example shows how to write PDB files for AtomGroup
instances and subsets of atoms.
Write all atoms¶
All atoms in an AtomGroup
can be written in PDB format
as follows:
In [37]: writePDB('MKP3.pdb', atoms)
Out[37]: 'MKP3.pdb'
Upon successful writing of PDB file, filename is returned.
Write a subset¶
It is also possible to write subsets of atoms in PDB format:
In [38]: alpha_carbons = atoms.select('calpha')
In [39]: writePDB('1mkp_ca.pdb', alpha_carbons)
Out[39]: '1mkp_ca.pdb'
In [40]: backbone = atoms.select('backbone')
In [41]: writePDB('1mkp_bb.pdb', backbone)
Out[41]: '1mkp_bb.pdb'