Sequence Conservation Score Calculation and Mapping
The TRAPP webserver offers the possibility to analyze the sequence conservation per residue together with the TRAPP pocket results. The user can choose
between an average conservation score or the difference in the conservation score with and without an off-target sequence. The user has to upload
a multiple sequence alignment (MSA) in FASTA format.
For calculating the conservation score for the MSA, the 'Protein Residue Conservation Prediction' tool by Capra et al. (Link to tool, Capra JA and Singh M. Predicting functionally important residues from sequence conservation. Bioinformatics. 23(15):1875-82, 2007) using the Jensen-Shannon divergence is applied with default parameters except the following ones:
- -g 0.99 (gap cutoff. Do not score columns that contain more than gap cutoff fraction gaps)
- -m "/matrix/blosum62.bla" (similarity matrix file)
- -s js_divergence (conservation estimation method)
- -w 0 window size. Number of residues on either side included in the window. Default=3
f(x) = (((70-30)(x - min))/(max - min)) + 30
To visualize the conservation scores, the scores are placed at the B-factor position in the reference structure, which allows an easy way to show the level of conservation by color scheme. The applied color scheme can be seen here.
Average ConservationFor running the average conservation analysis in TRAPP, the uploaded MSA in FASTA format is used to run the above mentioned conservation calculation tool that results in a conservation score (between 0-1) for each aligned position in the MSA.
To map the conservation score on the 3D reference structure, the sequence from the reference PDB file is extracted and aligned to the MSA with the MUSCLE tool. This new MSA is used to map the conservation scores to the correct residue in the reference structure.
On/Off TargetThis option allows the comparison of one protein sequence class with another single sequence regarding the conservation score. This is often useful when comparing On and Off target proteins.
In this case the user can upload a MSA in FASTA format including several On target sequences and on Off target sequence. The user needs to define the Off target sequence by entering the FASTA Header name. The conservation tool is applied two times: once with all sequences (On and Off target sequences) and once without the Off target sequence. Afterwards the absolute difference of the conservations core per position is calculated and re-scaled again between 30-70 as explained before.
VisualizationThe results of the conservation calculations are displayed on the reference structure together with the TRAPP pocket results in JSMol. For this purpose, the scores are added to the PDB file at the B-factor position and visualized via the temperature colouring option in the JSMol application. For the average conservation, the colour gradient starts at blue (low conservation) and goes via white to red (high conservation). For the On/Off target option blue means no difference between the conservation scores with/without the Off target sequence. Red highlights residues with a high difference, meaning these residues are not conserved between the On and Off target sequences.
As the scores are re-scaled, the gradient starts with ice-blue and only goes up to pink. This is necessary for a better contrast between the grid visualization of opening and closing pockets and the conservation colouring. The conservation scores (or the difference scores) can be switched on by clicking the checkbox in the Transient Pocket Analysis result page in the 'Change JSmol View' tab.
Additionally, the MSA is visualized in the 'Show MSA' tab on the result page. For this purpose, the MSAViewer is used. Two additional lines are added to the MSA: the corresponding amino acid sequence in the uploaded reference PDB file and the residue numbers extracted from the reference PDB File. The user can double-click on the residue numbers to highlight them in the JSMol viewer (yellow halos).