Water bridges detection in an Ensemble PDB
- Initial analysis
- Detecting water centers

Release Notes

v2.5.0 series come with new and improved sequence, structure, and dynamics analysis features. See release notes for details.

How to Cite

Bakan A, Meireles LM, Bahar I ProDy: Protein Dynamics Inferred from Theory and Experiments
Bioinformatics 2011 27(11):1575-1577.

Bakan A, Dutta A, Mao W, Liu Y, Chennubhotla C, Lezon TR, Bahar I Evol and ProDy for Bridging Protein Sequence Evolution and Structural Dynamics
Bioinformatics 2014 30(18):2681-2683.

Zhang S, Krieger JM, Zhang Y, Kaya C, Kaynak B, Mikulska-Ruminska K, Doruker P, Li H, Bahar I ProDy 2.0: Increased scale and scope after 10 years of protein dynamics modelling with Python
Bioinformatics 2021 37(20):3657-3659.

Water bridges detection in an Ensemble PDB¶

This time we will use an ensemble stored in a multi-model PDB, which contains 50 frames from MD simulations from PE-binding protein 1 (PDB: 1BEH). Simulations were performed using NAMD_ and saved as a multi-model PDB using VMD.

We need to remember to align the protein structure before performing the analysis. Otherwise, when all structures are uploaded to the visualization program, they will be spread out in space. We could do this inside ProDy by converting the Atomic object to an Ensemble object and using its Ensemble.iterpose() method to do this, but here we demonstrate the process for parsing a multi-model PDB file directly.

Initial analysis¶

In [1]: ens = 'pebp1_50frames.pdb'

In [2]: coords_ens = parsePDB(ens)

In [3]: bridgeFrames_ens = calcWaterBridgesTrajectory(coords_ens, coords_ens)

@> 20195 atoms and 51 coordinate set(s) were parsed in 1.88s.
@> Frame: 0
@> 161 water bridges detected.
@> Frame: 1
@> 127 water bridges detected.
@> Frame: 2
@> 168 water bridges detected.
@> Frame: 3
@> 132 water bridges detected.
@> Frame: 4
@> 142 water bridges detected.
@> Frame: 5
@> 166 water bridges detected.
@> Frame: 6
@> 150 water bridges detected.
@> Frame: 7
@> 159 water bridges detected.
@> Frame: 8
@> 147 water bridges detected.
@> Frame: 9
@> 136 water bridges detected.
@> Frame: 10
@> 127 water bridges detected.
@> Frame: 11
@> 135 water bridges detected.
...
...

Analysis of the results is similar to that presented in trajectory analysis. Below are examples showing which residues are most frequently involved in water bridge formation (calcBridgingResiduesHistogram()), details of those interactions (calcWaterBridgesStatistics()), and results saved as a PDB structure for further visualization (savePDBWaterBridgesTrajectory()). Other functions can be explored in the trajectory analysis.

In [4]: calcBridgingResiduesHistogram(bridgeFrames_ens)

[('VAL34P', 1),
 ('VAL177P', 1),
 ('PRO43P', 1),
 ('LEU41P', 2),
 ('MET92P', 2),
 ('VAL164P', 3),
 ('LEU14P', 3),
 ('TYR169P', 3),
 ('PHE154P', 4),
 .
 .
 ('ARG49P', 50),
 ('ASN48P', 50),
 ('ARG141P', 50),
 ('LYS150P', 50),
 ('ARG119P', 51),
 ('LYS80P', 51),
 ('ARG76P', 51),
 ('ARG161P', 51),
 ('ARG129P', 51),
 ('ARG82P', 51),
 ('ARG146P', 51)]

In [5]: analysisAtomic_ens = calcWaterBridgesStatistics(bridgeFrames_ens,
   ...:                                                      coords_ens)
   ...: 

In [6]: for item in analysisAtomic_ens.items():
   ...:    print(item)
   ...: 

@> RES1           RES2           PERC      DIST_AVG  DIST_STD
@> VAL3P          HSE26P         19.608    5.581     0.696
@> ASP4P          SER6P          13.725    3.817     0.560
@> SER6P          LYS7P          43.137    4.394     1.114
@> LYS7P          GLU36P         1.961     6.088     0.000
@> LYS7P          LEU37P         7.843     6.353     0.433
@> GLY10P         SER13P         43.137    4.759     0.612
@> GLY10P         ARG76P         11.765    5.309     0.586
@> LEU12P         SER13P         45.098    2.767     0.080
@> SER13P         GLU16P         25.490    4.449     1.133
@> GLN15P         ASP18P         7.843     3.732     0.174
@> GLU16P         ARG82P         45.098    4.550     1.086
@> GLU16P         VAL17P         17.647    3.438     0.952
@> GLU16P         LYS150P        21.569    5.056     0.929
@> GLU16P         GLU83P         9.804     5.476     1.138
@> GLU16P         ALA152P        7.843     7.307     0.450
@> VAL17P         GLU83P         1.961     7.262     0.000
@> VAL17P         LYS150P        13.725    6.303     0.572
@> GLN22P         GLU126P        33.333    6.458     1.216
@> GLN22P         HSE23P         37.255    4.738     0.669
@> HSE23P         GLU126P        7.843     7.911     0.239
@> PRO24P         ASP56P         17.647    5.592     0.910
@> THR28P         SER52P         43.137    3.970     0.677
@> THR28P         ILE53P         5.882     5.849     0.027
@> TYR29P         THR51P         7.843     3.583     0.286
@> ALA30P         ARG49P         17.647    5.206     0.304
...
...

In [7]: savePDBWaterBridgesTrajectory(bridgeFrames_ens, coords_ens, ens[:-4]+'_ens.pdb')

@> All 51 coordinate sets are copied to pebp1_50frames Selection 'protein' + pebp1_50frames Selection 'same residue as...6074 4190 14360'.
@> All 51 coordinate sets are copied to pebp1_50frames Selection 'protein' + pebp1_50frames Selection 'same residue as...9718 17936 7184'.
@> All 51 coordinate sets are copied to pebp1_50frames Selection 'protein' + pebp1_50frames Selection 'same residue as...947 10043 11756'.
..
..

Detecting water centers¶

The previous function generated multiple PDB files in which we can find protein and water molecules for each frame that form water bridges with the protein structure. Now we can use another function findClusterCenters() which will extract water centers (they refer to the oxygens from water molecules that are forming clusters). We need to provide a file pattern as show below. Now all the PDB files with prefix 'pebp1_50frames_ens_' will be analyzed.

In [8]: findClusterCenters('pebp1_50frames_ens_*.pdb')

@> 3269 atoms and 1 coordinate set(s) were parsed in 0.11s.
@> 3161 atoms and 1 coordinate set(s) were parsed in 0.05s.
@> 3173 atoms and 1 coordinate set(s) were parsed in 0.04s.
@> 3173 atoms and 1 coordinate set(s) were parsed in 0.04s.
@> 3218 atoms and 1 coordinate set(s) were parsed in 0.04s.
@> 3251 atoms and 1 coordinate set(s) were parsed in 0.04s.
@> 3215 atoms and 1 coordinate set(s) were parsed in 0.04s.
@> 3230 atoms and 1 coordinate set(s) were parsed in 0.03s.
@> 3230 atoms and 1 coordinate set(s) were parsed in 0.04s.
@> 3224 atoms and 1 coordinate set(s) were parsed in 0.03s.
@> 3158 atoms and 1 coordinate set(s) were parsed in 0.03s.
..
..
@> Results are saved in clusters_pebp1_50frames_ens_.pdb.

This function generated one PDB file with water centers. We used default values, such as distC (distance to other molecule) and numC (min number of molecules in a cluster), but those values could be changed if the molecules are more widely distributed or we would like to have more numerous clusters. Moreover, this function can be applied to different types of molecules by using the selection parameter. We can provide the whole molecule, and by default, the center of mass will be used as a reference.

Saved PDB files using savePDBWaterBridgesTrajectory() in the previous step can be upload to VMD or other program for visualization:

After uploading a new PDB file with water centers we can see the results as follows: