Release Notes

v2.5.0 series come with new and improved sequence, structure, and dynamics analysis features. See release notes for details.

How to Cite

Bakan A, Meireles LM, Bahar I ProDy: Protein Dynamics Inferred from Theory and Experiments
Bioinformatics 2011 27(11):1575-1577.

Bakan A, Dutta A, Mao W, Liu Y, Chennubhotla C, Lezon TR, Bahar I Evol and ProDy for Bridging Protein Sequence Evolution and Structural Dynamics
Bioinformatics 2014 30(18):2681-2683.

Zhang S, Krieger JM, Zhang Y, Kaya C, Kaynak B, Mikulska-Ruminska K, Doruker P, Li H, Bahar I ProDy 2.0: Increased scale and scope after 10 years of protein dynamics modelling with Python
Bioinformatics 2021 37(20):3657-3659.

PDB File¶

This module defines functions for parsing and writing PDB files.

parsePDBStream(stream, **kwargs)[source]¶

Returns an AtomGroup and/or dictionary containing header data parsed from a stream of PDB lines.

Parameters:

Parameters:	stream – Anything that implements the method `readlines` (e.g. `file`, buffer, stdin) title (str) – title of the `AtomGroup` instance, default is the PDB filename or PDB identifier ag (`AtomGroup`) – `AtomGroup` instance for storing data parsed from PDB file, number of atoms in ag and number of atoms parsed from the PDB file must be the same and atoms in ag and those in PDB file must be in the same order. Non-coordinate data stored in ag will be overwritten with those parsed from the file. chain (str) – chain identifiers for parsing specific chains, e.g. `chain='A'`, `chain='B'`, `chain='DE'`, by default all chains are parsed subset (str) – a predefined keyword to parse subset of atoms, valid keywords are `'calpha'` (`'ca'`), `'backbone'` (`'bb'`), or None (read all atoms), e.g. `subset='bb'` model (int, list) – model index or None (read all models), e.g. `model=10` header (bool) – if True PDB header content will be parsed and returned altloc (str) – if a location indicator is passed, such as `'A'` or `'B'`, only indicated alternate locations will be parsed as the single coordinate set of the AtomGroup, if altloc is set `'all'` then all alternate locations will be parsed and each will be appended as a distinct coordinate set, default is `"A"` biomol (bool) – if True, biomolecules are obtained by transforming the coordinates using information from header section will be returned. This option uses `buildBiomolecules()` and as noted there, atoms in biomolecules are ordered according to the original chain IDs. Chains may have the same chain ID, in which case they are given different segment names. Default is False secondary (bool) – if True, secondary structure information from header section will be assigned to atoms. Default is False

stream – Anything that implements the method readlines (e.g. file, buffer, stdin)
title (str) – title of the AtomGroup instance, default is the PDB filename or PDB identifier
ag (AtomGroup) – AtomGroup instance for storing data parsed from PDB file, number of atoms in ag and number of atoms parsed from the PDB file must be the same and atoms in ag and those in PDB file must be in the same order. Non-coordinate data stored in ag will be overwritten with those parsed from the file.
chain (str) – chain identifiers for parsing specific chains, e.g. chain='A', chain='B', chain='DE', by default all chains are parsed
subset (str) – a predefined keyword to parse subset of atoms, valid keywords are 'calpha' ('ca'), 'backbone' ('bb'), or None (read all atoms), e.g. subset='bb'
model (int, list) – model index or None (read all models), e.g. model=10
header (bool) – if True PDB header content will be parsed and returned
altloc (str) – if a location indicator is passed, such as 'A' or 'B', only indicated alternate locations will be parsed as the single coordinate set of the AtomGroup, if altloc is set 'all' then all alternate locations will be parsed and each will be appended as a distinct coordinate set, default is "A"
biomol (bool) – if True, biomolecules are obtained by transforming the coordinates using information from header section will be returned. This option uses buildBiomolecules() and as noted there, atoms in biomolecules are ordered according to the original chain IDs. Chains may have the same chain ID, in which case they are given different segment names. Default is False
secondary (bool) – if True, secondary structure information from header section will be assigned to atoms. Default is False

If model=0 and header=True, return header dictionary only.

parsePDB(*pdb, **kwargs)[source]¶

Returns an AtomGroup and/or dictionary containing header data parsed from a PDB file.

This function extends parsePDBStream().

See Parse PDB files for a detailed usage example.

Parameters:	pdb – one PDB identifier or filename, or a list of them. If needed, PDB files are downloaded using `fetchPDB()` function.

You can also provide arguments that you would like passed on to fetchPDB().

Parameters:	extend_biomol (bool) – whether to extend the list of results with a list rather than appending, which can create a mixed list, especially when biomol=True. Default value is False to reproduce previous behaviour. This value is ignored when result is not a list (header=True or model=0).

Please note that resnames are only taken as 3 characters and chids can be 2. Hence, TIP3S is split into resname TIP and chid 3S.

Parameters:

Parameters:	title (str) – title of the `AtomGroup` instance, default is the PDB filename or PDB identifier ag (`AtomGroup`) – `AtomGroup` instance for storing data parsed from PDB file, number of atoms in ag and number of atoms parsed from the PDB file must be the same and atoms in ag and those in PDB file must be in the same order. Non-coordinate data stored in ag will be overwritten with those parsed from the file. chain (str) – chain identifiers for parsing specific chains, e.g. `chain='A'`, `chain='B'`, `chain='DE'`, by default all chains are parsed subset (str) – a predefined keyword to parse subset of atoms, valid keywords are `'calpha'` (`'ca'`), `'backbone'` (`'bb'`), or None (read all atoms), e.g. `subset='bb'` model (int, list) – model index or None (read all models), e.g. `model=10` header (bool) – if True PDB header content will be parsed and returned altloc (str) – if a location indicator is passed, such as `'A'` or `'B'`, only indicated alternate locations will be parsed as the single coordinate set of the AtomGroup, if altloc is set `'all'` then all alternate locations will be parsed and each will be appended as a distinct coordinate set, default is `"A"` biomol (bool) – if True, biomolecules are obtained by transforming the coordinates using information from header section will be returned. This option uses `buildBiomolecules()` and as noted there, atoms in biomolecules are ordered according to the original chain IDs. Chains may have the same chain ID, in which case they are given different segment names. Default is False secondary (bool) – if True, secondary structure information from header section will be assigned to atoms. Default is False

title (str) – title of the AtomGroup instance, default is the PDB filename or PDB identifier
ag (AtomGroup) – AtomGroup instance for storing data parsed from PDB file, number of atoms in ag and number of atoms parsed from the PDB file must be the same and atoms in ag and those in PDB file must be in the same order. Non-coordinate data stored in ag will be overwritten with those parsed from the file.
chain (str) – chain identifiers for parsing specific chains, e.g. chain='A', chain='B', chain='DE', by default all chains are parsed
subset (str) – a predefined keyword to parse subset of atoms, valid keywords are 'calpha' ('ca'), 'backbone' ('bb'), or None (read all atoms), e.g. subset='bb'
model (int, list) – model index or None (read all models), e.g. model=10
header (bool) – if True PDB header content will be parsed and returned
altloc (str) – if a location indicator is passed, such as 'A' or 'B', only indicated alternate locations will be parsed as the single coordinate set of the AtomGroup, if altloc is set 'all' then all alternate locations will be parsed and each will be appended as a distinct coordinate set, default is "A"
biomol (bool) – if True, biomolecules are obtained by transforming the coordinates using information from header section will be returned. This option uses buildBiomolecules() and as noted there, atoms in biomolecules are ordered according to the original chain IDs. Chains may have the same chain ID, in which case they are given different segment names. Default is False
secondary (bool) – if True, secondary structure information from header section will be assigned to atoms. Default is False

If model=0 and header=True, return header dictionary only.

parseChainsList(filename)[source]¶

Parse a set of PDBs and extract chains based on a list in a text file.

Parameters:	filename (str) – the name of the file to be read

Returns: lists containing an :class:’.AtomGroup’ for each PDB, the headers for those PDBs, and the requested Chain objects

parsePQR(filename, **kwargs)[source]¶

Returns an AtomGroup containing data parsed from PDB lines.

Parameters:

Parameters:	filename (str) – a PQR filename title (str) – title of the `AtomGroup` instance, default is the PDB filename or PDB identifier ag (`AtomGroup`) – `AtomGroup` instance for storing data parsed from PDB file, number of atoms in ag and number of atoms parsed from the PDB file must be the same and atoms in ag and those in PDB file must be in the same order. Non-coordinate data stored in ag will be overwritten with those parsed from the file. chain (str) – chain identifiers for parsing specific chains, e.g. `chain='A'`, `chain='B'`, `chain='DE'`, by default all chains are parsed subset (str) – a predefined keyword to parse subset of atoms, valid keywords are `'calpha'` (`'ca'`), `'backbone'` (`'bb'`), or None (read all atoms), e.g. `subset='bb'`

filename (str) – a PQR filename
title (str) – title of the AtomGroup instance, default is the PDB filename or PDB identifier
ag (AtomGroup) – AtomGroup instance for storing data parsed from PDB file, number of atoms in ag and number of atoms parsed from the PDB file must be the same and atoms in ag and those in PDB file must be in the same order. Non-coordinate data stored in ag will be overwritten with those parsed from the file.
chain (str) – chain identifiers for parsing specific chains, e.g. chain='A', chain='B', chain='DE', by default all chains are parsed
subset (str) – a predefined keyword to parse subset of atoms, valid keywords are 'calpha' ('ca'), 'backbone' ('bb'), or None (read all atoms), e.g. subset='bb'

writePDBStream(stream, atoms, csets=None, **kwargs)[source]¶

Write atoms in PDB format to a stream.

Parameters:

Parameters:	stream – anything that implements a `write()` method (e.g. file, buffer, stdout) renumber (bool) – whether to renumber atoms with serial indices Default is True hybrid36 (bool) – whether to use hybrid36 format for atom residue numbers Default is False, which means using hexadecimal instead. NB: ChimeraX seems to prefer hybrid36 and may have problems with hexadecimal. full_ter (bool) – whether to write full TER lines with atoms info Default is True write_remarks (bool) – whether to write REMARK lines Default is True atoms – an object with atom and coordinate data csets – coordinate set indices, default is all coordinate sets beta – a list or array of number to be outputted in beta column occupancy – a list or array of number to be outputted in occupancy column hybrid36 – whether to use hybrid36 format for atoms with serial greater than 99999. Hexadecimal is used otherwise. Default is False

stream – anything that implements a write() method (e.g. file, buffer, stdout)
renumber (bool) – whether to renumber atoms with serial indices Default is True
hybrid36 (bool) – whether to use hybrid36 format for atom residue numbers Default is False, which means using hexadecimal instead. NB: ChimeraX seems to prefer hybrid36 and may have problems with hexadecimal.
full_ter (bool) – whether to write full TER lines with atoms info Default is True
write_remarks (bool) – whether to write REMARK lines Default is True
atoms – an object with atom and coordinate data
csets – coordinate set indices, default is all coordinate sets
beta – a list or array of number to be outputted in beta column
occupancy – a list or array of number to be outputted in occupancy column
hybrid36 – whether to use hybrid36 format for atoms with serial greater than 99999. Hexadecimal is used otherwise. Default is False

writePDB(filename, atoms, csets=None, autoext=True, **kwargs)[source]¶

Write atoms in PDB format to a file with name filename and return filename. If filename ends with .gz, a compressed file will be written.

Parameters:	renumber (bool) – whether to renumber atoms with serial indices Default is True hybrid36 (bool) – whether to use hybrid36 format for atom residue numbers Default is False, which means using hexadecimal instead. NB: ChimeraX seems to prefer hybrid36 and may have problems with hexadecimal.

Please note that resnames longer than 3 characters will be trimmed.

Parameters:

Parameters:	atoms – an object with atom and coordinate data csets – coordinate set indices, default is all coordinate sets beta – a list or array of number to be outputted in beta column occupancy – a list or array of number to be outputted in occupancy column hybrid36 (bool) – whether to use hybrid36 format for atoms with serial greater than 99999. Hexadecimal is used otherwise. Default is False autoext – when not present, append extension `.pdb` to filename

atoms – an object with atom and coordinate data
csets – coordinate set indices, default is all coordinate sets
beta – a list or array of number to be outputted in beta column
occupancy – a list or array of number to be outputted in occupancy column
hybrid36 (bool) – whether to use hybrid36 format for atoms with serial greater than 99999. Hexadecimal is used otherwise. Default is False
autoext – when not present, append extension .pdb to filename

writeChainsList(chains, filename)[source]¶

Write a text file containing a list of chains that can be parsed.

Parameters:	chains (list) – a list of `Chain` objects filename (str) – the name of the file to be written

writePQR(filename, atoms, **kwargs)[source]¶: Write atoms in PQR format to a file with name filename. Only current coordinate set is written. Returns filename upon success. If filename ends with .gz, a compressed file will be written.

writePQRStream(stream, atoms, **kwargs)[source]¶