====== Function Reference ====== ===== How to get help about function options ===== Functions implemented in LFMPro are designed to accept their required and optional arguments in the following style: functionname('requiredvalue1', 'requiredvalue2',..., 'option1','value1', 'option2','value2',...); If you pass 'help' as one of the options, you will be given the list of available options for that function, and also a synopsis, if one is available. For example, here are the available options and their default values for draw_backbone function: >> draw_backbone('1irda','help') default options are: simple: 1 LineWidth: 1 useSphere: 0 sphereRadius: 0.4000 sphereDetails: 6 cylinderDetail: 8 color: 0.2000 0.2000 0.2000 withLabels: 0 labelShift: 0.3000 R: T: LFMPro uses a caching mechanism to avoid recalculation of some of the steps. Among the common options to functions is ''recache'' parameter which when set to 1, triggers recalculation of the function value. ===== src/* Matlab scripts ===== ==== Protein Classification ==== [ cs,feated ] = classify_proteins ( ptns, feats,scores,borderDs, feats_,scores_,borderDs_, prior,varargin ) binary-classify proteins using representative feature sets of families ==== Data handling functions ==== [ pdbs,families,cacheOpts ] = data_astral_family ( varargin ) get the list of pdbs in a given SCOP family, using a representative ASTRAL file. [ pdbs,families,cacheOpts ] = data_culled_family ( varargin ) get the list of pdbs in a given SCOP family, selected from similarity- reduced culled-pdb set. [ pdbs ] = data_culled_parse ( varargin ) reads in list of pdb/chain identifiers from a culledPDB list. [ pdbs ] = data_pdbs_cleanup ( varargin ) filters pdbs using desired criteria. filtering based on length of chain, or whether broken-backbones exist are possible. data_prepare_batch ( varargin ) automates preprocessing of a list of pdbs. Resume/Continue using an external file is possible. [ ptns,families ] = data_prepare_experiment ( varargin ) prepare an experiment dataset. reads in an external file of family pdbs and loads families and proteins. [ s ] = data_rand_family ( varargin ) prints a random permutation of family members, which can be pasted to families.txt file for use in experiments. [ pdbs,families,cacheOpts ] = data_scop_family ( varargin ) get the list of pdbs in a given SCOP family [ families ] = data_scop_findFamily ( pdb ) finds the scop family of a given pdb. ==== 3D display functions ==== signatures_show ( p,feats,scores,varargin ) show the mapping of features onto a protein draw_backbone ( coords, varargin ) draws the protein backbone in 3D coords can be pdb file name, a protein struct, or a list of coords. [ handles ] = draw_centers ( centers,varargin ) draws critical points. use this together with draw_backbone. [ X, Y, Z ] = draw_cylinder ( R, N,r1,r2 ) draw a N-sided cylinder based on the generator curve in the vector R. ==== Experimentation batch scripts ==== [ stats ] = experiment_G ( ptns, family_inds, varargin ) run experiments for varying family/background size. [ thestats ] = experiment_Gs ( ptns, families, fams, nonfams, varargin ) run experiments for varying family/background size [ feateds ] = experiment_classification ( ptns, family_inds, varargin ) run experiments for classification, and collect results. ==== Family-specific functions ==== [ triads ] = family_triad_find ( p,varargin ) find the location of catalytic triad (approximate) [ minmax ] = features_getMinMax ( ptns, varargin ) finds min,max of features, which can later be used for normalization. ==== Environment setup ==== globals_aksu ( ) setup the global paths and variables. globals_garip ( ) setup the global paths and variables. globals_set ( ) setup the global paths and variables. determines which machine we are on, and calls the appropriate globals_* configuration file. ==== Geometric utility functions ==== [ bins,binSize ] = lib_3D_hashing ( coords, varargin ) uniform hashing of the coordinates. which can be used to speed-up the nearest neighbor or range calculations. [ ret ] = lib_points_within_spheres ( centers, atomCoords, varargin ) finds points within the centers. [ center, r ] = lib_tetra_center ( tetra ) finds the center of the sphere that passes through a tetrahedra [ r,rind,nextind,firstDist ] = matches_rank_triad ( feats,p,varargin ) find the rank of the triad among the features. [ sum_writhe ] = writhe_edges ( edgeList, coor, exact ) calculate writhe of the given list of edges. synopsis: moleculeCoor=load('../data/1aus.mol') writhe=sum_writhe_volume([1:438;2:439]',moleculeCoor(:,2:4)) exact=[1|0] [exact|volume] ==== Mining for significant sites ==== [ feats,scores,border_ds,varargout ] = mine_Sites ( ptns, family_inds, varargin ) mines representative feature set, ordered by discriminative scores. ==== pdb reading and features processing ==== [ PDB_struct ] = my_pdbread ( pdbfile ) read in a pdb file. [ obje ] = pdbname_parse ( pdb ) parse a pdb name into pdb,chain,range parts. [ pdb ] = pdbname_unparse ( obje ) combine pdb,chain,range parts into a pdb name. [ ptn,gotwhat ] = ptn_get ( ptn,varargin ) an interface to read/extract/calculate various properties (e.g., atoms, CA-atoms) of a protein. keeps track of caching, so properties are not calculated each time. [ centerOfMass ] = ptn_get_atomCenterOfMass ( atoms,atomsWithinSpheres,atomCount,centers,varargin ) subfunction of ptn_get. not intended for direct call. calculates individual centerofmasses within the spheres, for each atom-type. note on code-synchronization: this function is similar to ptn_get_atomCount [ atomCounts ] = ptn_get_atomCount ( atomsWithinSpheres,atoms,varargin ) subfunction of ptn_get. not intended for direct call. calculates individual atom-atomCounts within the spheres, for each atom-type. note on code-synchronization: this function is similar to ptn_get_atom_centerOfMass [ cps ] = ptn_get_criticalPoints ( ptn,varargin ) subfunction of ptn_get. not intended for direct call. wrapper for criticalPoint executable finds the critical points in pairs. [ f ] = ptn_get_features ( p,type ) subfunction of ptn_get. not intended for direct call. combines various features into a vector.. [ features ] = ptn_get_features_normalized ( features,which ) subfunction of ptn_get. not intended for direct call. normalizes features between [0-1] [ numPieces ] = ptn_get_numPieces ( cas ) subfunction of ptn_get. not intended for direct call. calculates the number of backbone pieces that make up the center.. [ ptn ] = ptn_get_protein ( ptn,varargin ) subfunction of ptn_get. not intended for direct call. reads a protein from pdb, takes care of chain-selection and examines broken-chains (missing residues) [ ptn ] = ptn_get_tetra ( ptn,varargin ) subfunction of ptn_get. not intended for direct call. wrapper for AD-tetrahedra executable. finds the AD-tetrahedra for a given set of points and thresholds. [ writhe ] = ptn_get_writhe ( cas, caCoords ) subfunction of ptn_get. not intended for direct call. calculates writhing number for centers.. [ ret ] = ptn_get_x ( ptn,which ) helper function to unify accessing a protein's property. ==== Weight setup and optimization ==== [ bestWeights,bestWeightsHistory ] = weights_optimize ( ptns,families,family_inds,varargin ) optimize the weights of the Euclidean using a simulated annealing approach [ weights,varargout ] = weights_get ( varargin ) get the weights to be used in Euclidean distance [ W ] = weights_set ( o_weights,FEAT_SIZE ) set the weights used in Euclidean [ feats ] = weights_use ( W, feats ) multiply feats with weights to use in distance function ==== Utility functions ==== [ varargout ] = myFuncs ( func, varargin ) provides a place to store various temporary functions. these are small scripts that don't quite make it into their own function file. ===== myLibrary/* Matlab scripts ===== ==== Caching mechanism ==== [ opts ] = myCache_getOptions ( varargin ) not intended for direct use. iscalled from myCache_load/save [ recache, cachevar, varargout ] = myCache_load ( varargin ) synopsis: myCache_load('identifier', struct('extension','branch')) myCache_save ( variable, varargin ) [ ret ] = myCache_test ( varargin ) ==== Optimized geometric operations ==== [ ret ] = myEuclidean ( X, Y ) [ mins, inds ] = myEuclidean_closest ( X, Y ) [ hist ] = myHistogram ( V,varargin ) ==== Setting up Function arguments and options ==== [ varargin ] = myFix_varargin ( varargin ) varargin passed down to function calls gets encapsulated with another layer of {} cell'ation with each call. this function fixes the problem by extracting the innermost cell content. [ defaults ] = myOptions_set ( defaults, varargin ) synopsis: opts = myOptions_set(struct('name','ahmet', 'surname','sacan'),varargin); vargin=overrides myOptions_test ( varargin ) ==== Other utility functions ==== [ fid ] = myFile_open ( filename,mode ) [ s ] = myImplode ( d, a ) [ str ] = myTime_pretty ( secs )