ELMThe Eukaryote Linear Motif resource for Functional Sites in  Proteins
|
|
|
|
|
|
o More indepth details on the Structural Filter (BETA version)
 

o Accessibility and secondary structure assignment

The solvent accessibility and secondary structure values are collected from DSSP files. For the solvent exposure of a residue, a relative (normalized) value is calculated as the ratio of the residue's accessibility DSSP value to the residue accessible surface area value as defined by Miller and co-workers (Miller et al., 1987). The latter is calculated for the residue in a Gly-Xaa-Gly tripeptide in extended conformation. The relative accessibility varies between 0 and 1. The secondary structure assignments are: H = alpha helix, B = residue in isolated beta-bridge, E = extended strand (participates in beta ladder), G = 3-helix (3/10 helix), I = 5 helix (pi helix), T = hydrogen bonded turn and S = bend.

 

o Score of an individual position

Based on the structural study of true motifs, accessibility (SA (p)) and secondary structure (SSSE (p)) score of a position p are assigned as follows:

Accessibility:

             relative accessibility >= 0.7: SA (p) = 1
             relative accessibility < 0.7: SA (p) = relative accessibility

Secondary structure:

             helix: SSSEM(p) = 0.3
             strand: SSSE (p) = 0.5
             G-helix: SSSE (p) = 0.7
             loop: SSSE (p) = 1.0

 

o Score of a match

Given a motif match that can be modeled onto a structure domain and such that len(match) = N, its global SA and SSSE scores are evaluated as:

             SA (match) = ∑p (SA (p))/N
             SSSE (match) = ∑p (SSSE (p))/N

 

o Benchmark

The benchmark consists of the whole set of ELM's true motifs that can be mapped onto a domain structure (at ≥ 70% sequence similarity). A set of reliable false positives (FPs) is determined as well. Our benchmark is composed of 218 TPs and 28790 FPs.

o Score calibration

The percentage of TPs and FPs has been calculated for 0.1 SA and SSSE score bins and plotted (figure 1 and 2).

fig1
Figure 1 - TPs and FPs frequencies versus SA score bins


fig2
Figure 2 - TPs and FPs frequencies versus SSSE score bins


Based on these findings, we assigned the score range into three categories:

range colour accessibility and 2D structure conditions
 poor context   grey   (0.0 ≤ SA< 0.3) or (0.3 ≤ SA ≤ 0.4 and 0.0 ≤ SSSE < 0.5) 
 quite good context   half blue, half grey   (0.3 ≤ SA ≤ 0.4 and 0.5 ≤ SSSE ≤ 1.0) or (0.4 < SA ≤ 0.7 and 0.0 ≤ SSSE ≤ 0.5) 
 best context   blue   (0.4 < SA ≤ 0.7 and 0.5 < SSSE ≤ 1.0) or (SA ≥ 0.7) 


By evaluating the percentage of TPs and FPs into these categories, we obtained the results in figure 3.

fig3
Figure 3 - TPs and FPs frequencies versus range categories.



Last modified 12-FEB-2008 - webmaster