Critical Points and Local Features

Generating the Critical Points

The critical points are generated by finding the minima, maxima, and saddle points of the distance function to backbone alpha carbon atoms. This is effectively same as finding the Delaunay Tesselation (The code for finding Delaunay Tesselation of a given set of 3D coordinates is given in software package directory LFMPro/CriticalPoints/Delaunay/delaunay.cpp).

The critical points are represented as the center of the tetrahedra in the Voronoi diagram and the neighborhood of the critical point is defined as a sphere that passes through the corners of the tetrahedra. A filtering step removes the critical points with small persistence values or small neighborhoods (The code for calculating persistence values is given in software directory LFMPro/CriticalPoints/ The critical points that are formed by the artifacts in the pdb structure (e.g., breaks/gaps in the structure) are also removed.

The number of critical points generated from a protein is roughly of 10 times as the number of amino-acids. The actual number depends on the conformation of the amino acids in 3D space. The filtering step we applied reduce this number to be in the same range as the number of amino acids.

Generating Local Features

For each of the critical points, we endow geometrical and chemical features. The following table summarizes the local features we have used to summarize a given neighborhood:

index feature
1 radius of the neigborhood
2 type of critical point (min,max, or saddle)
3 number of backbone pieces that contribute to the neighborhood
4 persistence of the critical point
5 distance between the pair of the critical points used in the calculation of persistence
6-7 writhing values (exact and volume-based)
8-11 side-chain atom frequencies within the neighborhood (Sulphur, Nitrogen, Oxygen, and Carbon)
12-15 center of mass for each atom type

Extending LFMPro

If you wish to incorporate new features into LFMPro, please edit ptn_get_features function. You can append your features as new columns in the feature vector. You may also wish to optimize weights used in the Euclidean distance using the function weights_optimize which tries to adjust the weights while maximizing the family discrimination score. You may then eliminate features (remove them from the vector) that have a very low weight, to increase the performance.

If you want to redefine the way local centers and their neighborhoods are defined, ptn_get_criticalPoints is a good starting point for your modifications.

app/lfmpro/localfeatures.txt · Last modified: 2006/11/15 06:05 by Ahmet Sacan