Following this link may get you blocked.

    Warning in :urlencode()
    urlencode() expects parameter 1 to be string, array given

  • Warning in :urlencode()
    urlencode() expects parameter 1 to be string, array given

  • Warning in :urlencode()
    urlencode() expects parameter 1 to be string, array given

    Warning in :urlencode()
    urlencode() expects parameter 1 to be string, array given

Warning in :urlencode()
urlencode() expects parameter 1 to be string, array given

Differences

This shows you the differences between two versions of the page.

app:lfmpro:index [2010/04/02 14:28]
Ahmet Sacan
app:lfmpro:index [2010/04/02 14:28] (current)
Ahmet Sacan
Line 1: Line 1:
 +====== LFM-Pro: A Tool for Detecting Significant Local Structural Sites in Proteins ======
  
 +
 +<style float-left>
 +{{1irda.globins.1sites.jpg?160|globins:1a6m - top scoring site }}
 +</style>
 +
 +**Motivation**
 +----
 +The rapidly growing protein structure repositories have opened up new opportunities for discovery and analysis of functional and evolutionary relationships among proteins. Detecting conserved structural sites that are unique to a protein family is of great value in identification of functionally important atoms and residues. Currently available methods are computationally expensive and fail to detect biologically significant local
 +features.
 +
 +===== Results =====
 +
 +We propose LFM-Pro (Local Feature Mining in Proteins) as a framework for automatically discovering family specific local sites and the features associated with these sites. Our method uses the distance field to backbone atoms to detect geometrically significant structural centers of the protein. A feature vector is generated from the geometrical and bio-chemical environment around these centers. These features are then scored using a statistical measure, for their ability to distinguish a family of proteins from a background set of unrelated proteins, and successful features are combined into a representative set for the protein family. The utility and success of LFM-Pro are demonstrated on Trypsin-like Serine Proteases family of proteins. The results verify that our method is successful both in identifying the distinctive sites of a given family of proteins, and in classifying proteins using the extracted features.
 +
 +==== Availability ====
 +The software is freely available for academic research use.
 +
 +==== Contact ====
 +ahmet[at]ceng.metu.edu.tr , {ozturk,hakan,yusu}@cse.ohio-state.edu
 +
 +
 +
 +==== Publication ====
 +Ahmet Sacan, Ozgur Ozturk, Hakan Ferhatosmanoglu, and Yusu Wang. {{http://bioinformatics.oxfordjournals.org/cgi/reprint/23/6/709?hits=10&FIRSTINDEX=0&FULLTEXT=%28ahmet+AND+sacan+AND+lfmpro%29&SEARCHID=1&gca=bioinfo%3B23%2F6%2F709&|LFM-Pro: A Tool for Detecting Significant Local Structural Sites in Proteins}}. Bioinformatics, 23(6):709-716, 2007
 +
 +===== Datasets =====
 +  * Multi-class classification {{multiclass.trainset.txt|training set}} is generated from 40% ASTRAL set of SCOP 1.67, having at least 5 member proteins. The {{multiclass.testset.txt|test set}} is the newly added proteins in 40% ASTRAL set of SCOP 1.69 for the corresponding families. <!-- You can also obtain the {{dalialignments.rar|Dali alignment data files}} for the multi-class classification experiment. -->
 +
 +  * For feature mining and binary-classification tasks, we utilized the {{families.txt|Trypsin-like Serine Proteases}} and Nuclear Receptor Ligand Binding Domain family.
 +
 +
 +
 +
 +===== Installation =====
 +
 +{{download>app:lfmpro:lfmpro.rar|download}} and unzip the LFMPro software package into folder (The root folder will be denoted as "LFMPro" from here on). When unzipped, you should have the following folders:
 +  * **LFMPro/src/**: contains the source Matlab scripts. You can edit globals_default.m file to change the default paths for data folders and external programs.
 +  * **LFMPro/myLibrary/**: has the general utility functions used in the Matlab scripts. Add this folder to your Matlab path.
 +  * **LFMPro/CriticalPoints/** has the python scripts that generate critical points for a given pdb structure. A Win32 binary delaunay.exe is provided in this folder. For other platforms, you'll have to compile the Delaunay program (Makefile is provided).
 +  * **LFMPro/Data/**: contains the pdb files and program cache
 +  * **LFMPro/Parameters/**: contains parameters that guide the site mining process. ASTRAL, SCOP, and CullPDB lists are also kept in this folder. For some of the time-consuming batch tasks, myPause.txt (should have a 0 or 1) is checked for pausing the execution.
 +
 +
 +=== Prerequisites ===
 +  * You need [[http://www.mathworks.com|Matlab]] to run LFMPro. Add the LFMPro/myLibrary/ folder to your Matlab path.
 +  * You need to install [[http://www.cgal.org|CGAL]] computational geometry library and  [[http://www.python.org|python]] in order to execute the code for generation of critical points.
 +  * Download the [[ftp://ftp.rcsb.org/pub/pdb/data/structures/divided/pdb|PDB repository]] into LFMPro/Data/PdbZips_divided
 +
 +
 +===== Quick-Run Guide =====
 +
 +in MATLAB:
 +  * chdir to LFMPro/src directory
 +  * setup the environment variables. The globals_set function will check to make sure you have the proper source and data directories.
 +<code>
 +>> globals_set;
 +</code>
 +
 +  * use mine_sites to extract features and corresponding scores:
 +<code>
 +>> rep = mine_family_represent(familyName, 'ptns',ptnNames, 'rand',backgroundPtns);
 +</code>
 +''familyName'' is a unique identifier that you can use to refer to the family representation with. ptnNames and backgroundPtns are cell arrays of pdb names for the family members and for the outgroup proteins.
 +
 +  * you can view the mapping of the mined features onto a protein using:
 +<code>
 +>> signatures_show(pdb, 'familyName',familyName)
 +</code>
 +
 +  * for the functions implemented in LFMPro, you can get help about the function arguments  by passing 'help' as the varargin. 
 +For example:
 +<code>
 +>> data_prepare_batch('help');
 +</code>
 +
 +
 +===== Sample Tasks =====
 +
 +Here is a list of sample runs demonstrating some of the available functionality:
 +  * [[sample-Globins|Mining significant sites in Globins]]. Given a set of Globins members, extract sites significant for the Globins family.
 +  * [[sample-SerThrKinases|Mining significant sites in Ser/Thr Kinases]]. Given a set of Ser/Thr Kinases members, extract sites significant for the family.
 +  * [[sample-classification|Classification Example]]. Here, we classify proteins based on representations generated from Globins and Ser-Thr Kinases.
 +  * [[sample-ptn_get|Accessing and displaying protein information]]. Some of the functions useful for accessing a protein's information, and its features are described.
 +
 +===== Function Reference =====
 +
 +[[function-reference|See the function reference page]] for the list of available functions and their usage.
 +
 +===== Extending LFMPro =====
 +
 +You can view the [[localfeatures| local features]] page if you wish to extend LFMPro to incorporate new local features.