Water bridges detection in a trajectory

Release Notes

v2.5.0 series come with new and improved sequence, structure, and dynamics analysis features. See release notes for details.

How to Cite

Bakan A, Meireles LM, Bahar I ProDy: Protein Dynamics Inferred from Theory and Experiments
Bioinformatics 2011 27(11):1575-1577.

Bakan A, Dutta A, Mao W, Liu Y, Chennubhotla C, Lezon TR, Bahar I Evol and ProDy for Bridging Protein Sequence Evolution and Structural Dynamics
Bioinformatics 2014 30(18):2681-2683.

Zhang S, Krieger JM, Zhang Y, Kaya C, Kaynak B, Mikulska-Ruminska K, Doruker P, Li H, Bahar I ProDy 2.0: Increased scale and scope after 10 years of protein dynamics modelling with Python
Bioinformatics 2021 37(20):3657-3659.

Water bridges detection in a trajectory¶

Now, we will perform calculations for a trajectory file that was obtained using the NAMD_ package. We will use calcWaterBridgesTrajectory(), for which we need to provide PDB and DCD files.

The system (protein in a water box) can be found in 5kqm_all_sci.pdb. The trajectory, NAMD_D2_sample.dcd, has dcd format. If we want to analyze trajectories with different formats, we need to convert them to dcd file format or save the trajectory as a multi-model PDB (using VMD or another tool).

Parse structure with trajectory¶

In [1]: PDBtraj_file = "5kqm_all_sci.pdb"

In [2]: coords_traj = parsePDB(PDBtraj_file)

In [3]: trajectory = parseDCD("NAMD_D2_sample.dcd")

@> 19321 atoms and 1 coordinate set(s) were parsed in 0.18s.
@> DCD file contains 17 coordinate sets for 19321 atoms.
@> DCD file was parsed in 0.01 seconds.
@> 3.76 MB parsed at input rate 748.06 MB/s.
@> 17 coordinate sets parsed at input rate 3382 frame/s.

The analysis od water bridges can be performed on selected frames by using start_frame or stop_frame.

In [4]: wb_traj = calcWaterBridgesTrajectory(coords_traj, trajectory, start_frame=5,
   ...:                                      stop_frame=15, output='info')
   ...: 

@> Frame: 5
@> 101 water bridges detected.
@> Frame: 6
@> 107 water bridges detected.
@> Frame: 7
@> 90 water bridges detected.
@> Frame: 8
@> 97 water bridges detected.
@> Frame: 9
@> 122 water bridges detected.
@> Frame: 10
@> 101 water bridges detected.
@> Frame: 11
@> 130 water bridges detected.
@> Frame: 12
@> 132 water bridges detected.
@> Frame: 13
@> 126 water bridges detected.
@> Frame: 14
@> 88 water bridges detected.
@> Frame: 15
@> 105 water bridges detected.

Because of the amount of data, detailed results will not be displayed. We instead access the raw data by using output='info'.

In [5]: wb_traj

[[['THR5',
   'OG1_8',
   'P',
   'TRP39',
   'NE1_547',
   'P',
   3.261452627054999,
   1,
   ['3W_13313']],
  ['THR5',
   'O_15',
   'P',
   'ASP86',
   'OD1_1269',
   'P',
   5.986350034086454,
   2,
   ['3W_12974', '3W_18431']],
  ['THR5',
   'O_15',
   'P',
   'LYS110',
   'NZ_1667',
   'P',
   7.375256709599827,
   2,
   ['3W_12974', '3W_18431']],
  ['THR5',
   'O_15',
   'P',
   'LYS6',
   'NZ_32',
   'P',
   6.414308925017051,
   2,
   ['3W_12974', '3W_12152']],
  ['LYS6',
   'NZ_32',
   'P',
   'TYR87',
   'OH_1286',
   'P',
   4.891713264838611,
   1,
   ['3W_9209']]
   ...
   ...
   ]]

Save the results¶

The results can be saved using saveWaterBridges() in two formats. The txt file will contain all the results for analysis and can be visualized in a text editor, and the wb file will restore data for further analysis. It can be loaded using parseWaterBridges() as shown below.

First, we have to return the calculation without output='info'.

We can suppress the logged output using confProDy() to set the verbosity to 'none'.

In [6]: confProDy(verbosity='none')

In [7]: wb_traj = calcWaterBridgesTrajectory(coords_traj, trajectory,
   ...:                                      stop_frame=15)
   ...: 

In [8]: saveWaterBridges(wb_traj,'wb_saved.txt')

In [9]: saveWaterBridges(wb_traj,'wb_saved.wb')

To load the wb file, use parseWaterBridges() and protein coordinates as follows:

In [10]: waterBridges = parseWaterBridges('wb_saved.wb', coords_traj)

Loaded results from a .wb file are Atomic type and therefore can be used for analysis later.

Analysis of the results:¶

Information about residues contributing to water bridges¶

The data can be analyzed using calcWaterBridgesStatistics(). The following analysis provides details about the pairs of residues engaged in water bridges, their frequency of occurrence, and the average distance between them. The standard deviation offers insights into the variation in distance throughout the simulation. Moreover, the analysis can be saved using the filename option.

We can recover logged output using confProDy() again with a different verbosity.

In [11]: confProDy(verbosity='debug')

In [12]: analysisAtomic = calcWaterBridgesStatistics(waterBridges, trajectory,
   ....:                                             filename='data.txt')
   ....: 

@> RES1           RES2           PERC      DIST_AVG  DIST_STD
@> ARG40P         SER7P          12.500    4.901     0.000
@> ASP92P         ARG18P         68.750    4.285     1.159
@> ASN95P         ARG18P         68.750    5.099     1.192
@> GLU23P         PRO20P         12.500    4.571     0.000
@> HSE72P         GLU23P         12.500    3.669     0.458
@> VAL41P         ARG27P         56.250    5.565     0.781
@> SER71P         ARG27P         75.000    6.116     0.445
@> ASN34P         ASP32P         25.000    4.218     0.652
@> GLU37P         SER36P         75.000    3.700     1.154
@> THR84P         ARG40P         50.000    4.235     0.671
@> ARG75P         ASP42P         68.750    3.159     0.652
@> ASN95P         THR46P         62.500    4.067     0.842
@> TYR49P         SER47P         50.000    4.320     0.757
..
..

The output is a dictionary, so we can use dict.items() to inspect it.

In [13]: for item in list(analysisAtomic.items())[:5]:
   ....:    print(item)
   ....: 

((40, 7), {'percentage': 12.5, 'distAvg': 4.9006157, 'distStd': 0.0})
((7, 40), {'percentage': 12.5, 'distAvg': 4.9006157, 'distStd': 0.0})
((92, 18), {'percentage': 68.75, 'distAvg': 4.2853837, 'distStd': 1.159262})
((18, 92), {'percentage': 68.75, 'distAvg': 4.2853837, 'distStd': 1.159262})
((95, 18), {'percentage': 68.75, 'distAvg': 5.0986476, 'distStd': 1.1916962})

To have easier access to the data, we can use getWaterBridgeStatInfo().

In [14]: wb_stat_info = getWaterBridgeStatInfo(analysisAtomic, coords_traj)

In [15]: wb_stat_info

{('SER7P', 'ARG40P'): {'percentage': 12.5,
  'distAvg': 4.9006157,
  'distStd': 0.0},
 ('ARG18P', 'ASP92P'): {'percentage': 68.75,
  'distAvg': 4.2853837,
  'distStd': 1.159262},
 ('ARG18P', 'ASN95P'): {'percentage': 68.75,
  'distAvg': 5.0986476,
  'distStd': 1.1916962},
 ('PRO20P', 'GLU23P'): {'percentage': 12.5,
  'distAvg': 4.571081,
  'distStd': 0.0},
   ...
   ...

To obtain maps of interactions for the protein structure, we can use showWaterBridgeMatrix(), which is equipped with three paramaters: 'percentage' (how often two residues were forming water bridges), 'distAvg' (how close there were on average), and 'distStd' (how stable that water bridge was).

In [16]: showWaterBridgeMatrix(analysisAtomic, 'percentage')

In [17]: showWaterBridgeMatrix(analysisAtomic, 'distAvg')

In [18]: showWaterBridgeMatrix(analysisAtomic, 'distStd')

Raw data of the matrices can be obtained with calcWaterBridgeMatrix(). The type of the data in the matrix can be selected using the following strings for the second argument: 'percentage', 'distAvg', 'distStd'.

In [19]: M1 = calcWaterBridgeMatrix(analysisAtomic, 'percentage')

In [20]: M2 = calcWaterBridgeMatrix(analysisAtomic, 'distAvg')

In [21]: M3 = calcWaterBridgeMatrix(analysisAtomic, 'distStd')

In [22]: M1

array([[ 0.  ,  0.  ,  0.  , ...,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  , ...,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  , ...,  0.  ,  0.  ,  0.  ],
       ...,
       [ 0.  ,  0.  ,  0.  , ...,  0.  , 12.5 , 31.25],
       [ 0.  ,  0.  ,  0.  , ..., 12.5 ,  0.  , 12.5 ],
       [ 0.  ,  0.  ,  0.  , ..., 31.25, 12.5 ,  0.  ]])

In [23]: M2

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 4.58851337,
        5.82083416],
       [0.        , 0.        , 0.        , ..., 4.58851337, 0.        ,
        3.52366138],
       [0.        , 0.        , 0.        , ..., 5.82083416, 3.52366138,
        0.        ]])

In [24]: M3

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 1.71697354,
        1.38650537],
       [0.        , 0.        , 0.        , ..., 1.71697354, 0.        ,
        1.27207112],
       [0.        , 0.        , 0.        , ..., 1.38650537, 1.27207112,
        0.        ]])

Statistical analysis for water bridges¶

To visualize the results in a more accessible way, we can use the calcBridgingResiduesHistogram() function, which will show how often each residue was contributing to the water bridges in the trajectory.

In [25]: wb_res_hist = calcBridgingResiduesHistogram(waterBridges)

In [26]: wb_res_hist

[('LEU96P', 1),
 ('MET63P', 1),
 ('PHE152P', 1),
 ('LEU29P', 1),
 ('PRO130P', 1),
 ('PHE85P', 1),
 ('PRO54P', 1),
 ('ILE16P', 1),
 ('CYS148P', 1),
 ('VAL25P', 1),
 ('ILE77P', 1),
 .
 .
 ('ARG75P', 15),
 ('ARG18P', 15),
 ('ARG65P', 15),
 ('ARG40P', 16),
 ('ARG147P', 16),
 ('ARG58P', 16),
 ('ARG27P', 16),
 ('ASP92P', 16),
 ('TYR49P', 16),
 ('LYS102P', 16),
 ('ARG150P', 16),
 ('SER36P', 16)]

The clip option can be used to include different number of results on the histogram.

In [27]: calcBridgingResiduesHistogram(waterBridges, clip=25)

If we are interested in one particular residue, we can also use calcWaterBridgesDistribution() to find their partners in water bridges. Below we can see results for arginine 147 or aspartic acid 92 from chain P using the nomenclature for them corresponding to the keys of the dictionary.

In [28]: calcWaterBridgesDistribution(waterBridges, 'ARG147P')

[('GLN122P', 8),
 ('ARG150P', 7),
 ('GLN143P', 6),
 ('LYS123P', 6),
 ('GLN124P', 5),
 ('ASP120P', 5),
 ('GLN144P', 3),
 ('THR140P', 2)]

In [29]: calcWaterBridgesDistribution(waterBridges, 'ASP92P')

[('ARG18P', 11),
 ('ASN95P', 10),
 ('SER94P', 5),
 ('MET91P', 5),
 ('ASP129P', 4),
 ('LEU13P', 3),
 ('CYS90P', 1)]

Once we select a pair of residues which are supported by interactions with water molecules, we can use calcWaterBridgesDistribution() to obtain histograms with results such as distances between them (metric='distance'), the number of water molecules which were involved (metric='waters'), and information about residue part which was involved in water bridges, i.e. backbone or side chain (metric='location').

In [30]: calcWaterBridgesDistribution(waterBridges,  'ASP92P', 'ARG18P',
   ....:                              trajectory=trajectory, metric='distance')
   ....: 

[5.3736005,
3736005,
167575,
681302,
371548,
6318514,
0394073,
0884595,
4406505,
4112484,
805657,
4176636,
5104342,
991175,
470093,
4345005,
6427624]

In [31]: calcWaterBridgesDistribution(waterBridges, 'ARG147P', 'GLN122P',
   ....:                                                      metric='waters')
   ....: 

[2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2]

In [32]: calcWaterBridgesDistribution(waterBridges, 'ARG147P', 'GLN122P',
   ....:                              trajectory=trajectory, metric='location')
   ....: 

{'ARG147P': {'backbone': 7, 'side': 86},
'GLN122P': {'backbone': 21, 'side': 25}}

Save results as PDB file¶

The results can be stored as a PDB file using savePDBWaterBridges() (single PDB file, single frame) or using savePDBWaterBridgesTrajectory() to save all the results (large number of frames saved each independently).

5kqm_all_sci_multi_0.pdb 5kqm_all_sci_multi_1.pdb .. 5kqm_all_sci_multi_15.pdb

Those results can be displayed in any program for visualization. The results for the protein structure will be storage in the B-factor (*beta*) column (average values of contributions of each residue in water bridging) and Occupancy column (results for particular frame). Water molecules will be included in each frame.

In [33]: savePDBWaterBridges(waterBridges[0], coords_traj, PDBtraj_file[:-4]+'_frame0.pdb')

In [34]: savePDBWaterBridgesTrajectory(waterBridges, coords_traj,
   ....:                               filename=PDBtraj_file[:-4]+'_multi.pdb',
   ....:                               trajectory=trajectory)
   ....: 

Results saved in PDB file can be displayed as follows: