| || || Proteins -- Structure|
| || || Exploring techniques for optimal feature and classifier selection for protein modeling, function, and fold recognition|
Author: Saini, Harsh
Institution: University of the South Pacific.
Subject: Proteins -- Mathematical models, Proteins -- Structure
Call No.: Pac QP 551 .S25 2015
Copyright:Over 80% of this thesis may be copied without the authors written permission
Abstract: Identification of the tertiary structure (three dimensional structure) of a protein is a fundamental problem in biology which helps in identifying its functions. Predicting a protein’s structural class and its fold type is considered to be an intermediate step for identifying the tertiary structure of a protein. Computational methods have been applied for this application by assembling information from its structural, physicochemical and/or evolutionary properties. In this study, various schemes are discussed for improving protein structural class and fold recognition. A feature extraction technique is explored that extracts probabilistic expressions of amino acid dimers, which have varying degree of spatial separation in the primary sequences of proteins, from the Position Specific Scoring Matrix. The explored techniques have been evaluated using benchmarked datasets. In addition to identifying the tertiary structure for proteins, protein subcellular localization is an important topic in proteomics since it is related to a proteins overall function, help in the understanding of metabolic pathways, and in drug design and discovery. This study also explores the applicability of a basic approximation technique called the linear interpolation smoothing for predicting protein subcellular localizations. The proposed approach extracts features from syntactical information in protein sequences to build probabilistic profiles using dependency models, which are used in linear interpolation to determine the likelihood of a sequence to belong to a particular subcellular location.