Log in

Relevant bibliographies by topics / Subcellular localization prediction / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Subcellular localization prediction.

Dissertations / Theses on the topic 'Subcellular localization prediction'

Author: Grafiati

Published: 4 June 2021

Last updated: 15 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 20 dissertations / theses for your research on the topic 'Subcellular localization prediction.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Ozarar, Mert. "Prediction Of Protein Subcellular Localization Based On Primary Sequence Data." Master's thesis, METU, 2003. http://etd.lib.metu.edu.tr/upload/1082320/index.pdf.

Full text

Abstract:

Subcellular localization is crucial for determining the functions of proteins. A system called prediction of protein subcellular localization (P2SL) that predicts the subcellular localization of proteins in eukaryotic organisms based on the amino acid content of primary sequences using amino acid order is designed. The approach for prediction is to nd the most frequent motifs for each protein in a given class based on clustering via self organizing maps and then to use these most frequent motifs as features for classication by the help of multi layer perceptrons. This approach allows a classication independent of the length of the sequence. In addition to these, the use of a new encoding scheme is described for the amino acids that conserves biological function based on point of accepted mutations (PAM) substitution matrix. The statistical test results of the system is presented on a four class problem. P2SL achieves slightly higher prediction accuracy than the similar studies.

APA, Harvard, Vancouver, ISO, and other styles

2

Bozkurt, Burcin. "Prediction Of Protein Subcellular Localization Using Global Protein Sequence Feature." Master's thesis, METU, 2003. http://etd.lib.metu.edu.tr/upload/3/1135292/index.pdf.

Full text

Abstract:

The problem of identifying genes in eukaryotic genomic sequences by computational methods has attracted considerable research attention in recent years. Many early approaches to the problem focused on prediction of individual functional elements and compositional properties of coding and non coding deoxyribonucleic acid (DNA) in entire eukaryotic gene structures. More recently, a number of approaches has been developed which integrate multiple types of information including structure, function and genetic properties of proteins. Knowledge of the structure of a protein is essential for describing and understanding its function. In addition, subcellular localization of a protein can be used to provide some amount of characterization of a protein. In this study, a method for the prediction of protein subcellular localization based on primary sequence data is described. Primary sequence data for a protein is based on amino acid sequence. The frequency value for each amino acid is computed in one given position. Assigned frequencies are used in a new encoding scheme that conserves biological information based on point accepted mutations (PAM) substitution matrix. This method can be used to predict the nuclear, the cytosolic sequences, the mitochondrial targeting peptides (mTP) and the signal peptides (SP). For clustering purposes, other than well known traditional techniques, principle component analysis (PCA)"
and self-organizing maps (SOM)"
are used. For classication purposes, support vector machines (SVM)"
, a method of statistical learning theory recently introduced to bioinformatics is used. The aim of the combination of feature extraction, clustering and classification methods is to design an acccurate system that predicts the subcellular localization of proteins presented into the system. Our scheme for combining several methods is cascading or serial combination according to its architecture. In the cascading architecture, the output of a method serves as the input of the other model used.

APA, Harvard, Vancouver, ISO, and other styles

3

Scott, Michelle. "Protein subcellular localization : analysis and prediction using the endoplasmic reticulum as a model organelle." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=102170.

Full text

Abstract:

Eukaryotic cells are divided into subcellular organelles that generate appropriate molecular environments for the functions they harbour. As such, subcellular localization is a key characteristic that provides valuable clues regarding protein function and, when studied globally, a better understanding of cellular processes. The organelles of the secretory pathway are responsible for the processing of all proteins destined for secretion, the plasma membrane as well as their own resident proteins. This group of organelles is difficult to study experimentally because they are difficult to purify to homogeneity.
To facilitate the investigation of the endoplasmic reticulum (ER) and more generally, the secretory pathway, we have created Hera, a publicly accessible protein localization database. Originally designed to house characteristics of ER proteins, it currently contains tens of thousands of proteins from different organisms and subcellular compartments. Hera was originally used to investigate features of ER proteins, providing insight into the extent of usage of various localization mechanisms, including both well-studied but also non-classical and novel mechanisms.
Hera was subsequently used to create Bayesian network type localization predictors. By considering the combinatorial presence of motifs, domains, targeting signals and using in some cases, protein interaction information, our predictors achieve high accuracy and coverage. When our predictions are compared with localization annotations from high-throughput studies in both human and yeast, we find that disagreements mainly involve proteins in the secretory pathway. Our predictors can be used to independently validate these large-scale studies. We further refined the localization prediction of the whole yeast proteome by distinguishing proteins localized to the lumen or membrane of various organelles from cytosolic proteins peripherally associated with these organelles.
Hera was also used to investigate efficient and informative approaches to interrogate interaction networks in order to gain insight into the relationship between proteins/genes of interest. By combining interaction and refined localization information, we constructed localizome-interactome networks of whole organelles. Such models provide insight into global organellar characteristics and inter-organellar mechanisms of communication.
The research presented in this thesis demonstrates that the integration, in an appropriate framework such as Bayesian networks, of widely available information such as localization and interaction data allows to gain deep insights into cellular processes.

APA, Harvard, Vancouver, ISO, and other styles

4

Zhu, Lu [Verfasser]. "Context-specific subcellular localization prediction: Leveraging protein interaction networks and scientific texts / Lu Zhu." Bielefeld : Universitätsbibliothek Bielefeld, 2018. http://d-nb.info/1169314589/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Fagerberg, Linn. "Mapping the human proteome using bioinformatic methods." Doctoral thesis, KTH, Proteomik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-31477.

Full text

Abstract:

The fundamental goal of proteomics is to gain an understanding of the expression and function of the proteome on the level of individual proteins, on the level of defined cell types and on the level of the entire organism. In this thesis, the human proteome is explored using membrane protein topology prediction methods to define the human membrane proteome and by global protein expression profiling, which relies on a complex study of the location and expression levels of proteins in tissues and cells. A whole-proteome analysis was performed based on the predicted protein-coding genes of humans using a selection of membrane protein topology prediction methods. The study used a majority decision-based method, which estimated that approximately 26% of the human genes encode for a membrane protein. The prediction results are displayed in a visualization tool to facilitate the selection of antigens to be used for antibody generation. Global protein expression profiles in a large number of cells and tissues in the human body were analyzed for more than 4000 protein targets, based on data from the antibody-based immunohistochemistry and immunofluorescence methods within the framework of the Human Protein Atlas project. The results revealed few cell-type specific proteins and a high fraction of human proteins expressed in most cells, suggesting that cell and tissue specificity is attained by a fine-tuned regulation of protein levels. The expression profiles were also used to analyze the relationship between 45 cell lines by hierarchical clustering and principal component analysis. The global protein expression patterns overall reflected the tumor origin of the cells, and also allowed for identification of proteins of importance for distinguishing different categories of cell lines, as defined by phenotype of progenitor cell. In addition, the protein distribution in 16 subcellular compartments in three of the human cell lines was mapped. A large fraction of proteins were localized in two or more compartments and, in line with previous results, a majority of proteins were detected in all three cell lines. Finally, mass spectrometry-based protein expression levels were compared to RNA-seq-based transcript expression levels in three cell lines. Highly ubiquitous mRNA expression was found and the changes of expression levels between the cell lines showed high correlations between proteins and transcripts. Large general differences in abundance of proteins from various functional classes were observed. A comparison between categories based on expression levels revealed that, in general, genes with varying expression levels between the cell lines or only expressed in one cell line were highly enriched for cell-surface proteins. These studies show a path for a systematic analysis to characterize the proteome in human cells, tissues and organs.
QC 20110317
The Human Protein Atlas project

APA, Harvard, Vancouver, ISO, and other styles

6

Yu, Chin-Sheng, and 游景盛. "Prediction of Protein Subcellular Localization." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/28216444510128135886.

Full text

Abstract:

博士
國立交通大學
生物科技系所
95
Since the protein's function is usually related to its subcellular localization, the ability to predict subcellular localization directly from protein sequences will be useful to biologists to infer protein function. Recent years we have seen a surging interest in the development of novel computational tools to predict subcellular localization. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. At present, these approaches, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. In this thesis, I used support vector machine (SVM) method based on n–peptide composition in predicting the subcellular locations of proteins. For an unbiased assessment of the results, we apply our approach to several independent data sets in the beginning. In those data sets, our approach gives superior performance compared with other approaches. A number of authors have noticed that sequence similarity is useful in predicting subcellular localization. For example, Rost and Nair (Protein Sci, 11:2836-47 (2002)) have carried out extensive analysis of the relation between sequence similarity and identity in subcellular localization and found a close relationship between them above a certain similarity threshold. However, many existing benchmark data sets used for the prediction accuracy assessment contain highly homologous sequences – some data sets comprising sequences up to 80-90% sequence identity. Using these benchmark test data will surely lead to overestimation of the performance of the methods considered. Here, we developed an approach based on a two-level SVM system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We compare our approach with a global sequence alignment approach and other existing approaches for two iii often-used benchmark data sets – one comprising prokaryotic sequences and the other eukaryotic sequences. Furthermore, we carried out all-against-all sequence alignment for several data sets to check the relationship between sequence homology and localization. Our results, which are consistent with previous studies, indicate that the homology search approach performs surprisingly well for sequences sharing homology as low as 30%, but its performance deteriorates considerably for sequences sharing lower sequence identity. A data set of high homology levels will obviously lead to biased assessment of the performances of the predictive approaches - especially those relying on homology search or sequence annotations. Since our two-level classification system based on SVM does not rely on homology search, its performance remains relatively unaffected by sequence homology. When compared with other approaches, our approach outperformed other existing approaches, even though some of which use homology search as part of their algorithms. Furthermore, for the practical purpose, we also develop a practical hybrid method that pipelines the two-level SVM classifier and the homology search method in sequential order as a general tool for the sequence annotation of subcellular localization. Our approaches should be valuable in the high throughput analysis of genomics and proteomics.

APA, Harvard, Vancouver, ISO, and other styles

7

Syu, Shiao-shan, and 徐筱姍. "Human Protein Subcellular Localization Prediction." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/96176482574886082780.

Full text

Abstract:

碩士
逢甲大學
生醫資訊暨生醫工程碩士學位學程
99
The biological function of a protein in a cell is often closely correlated with its subcellular localization. Hence, the information about where a protein localized often offers important clues toward knowing the function of an uncharacterized sequence. The protein subcellular localization can be used as an important feature to screen for drug candidates, vaccine design, and gene products annotation. Here, We applied the support vector machine algorithm to a benchmark dataset of human protein sequence based on n-peptide composition. The first step of this method is that we classify the protein sequence by different feature then use SVM to predict subcellular localization. The second step, we use the result of the first step to predict again by the support vector machine classifier.We use PSLT training Hera data set, this data set is include 2233 human protein sequence and 9 subcellular localizations inside of cell. Our method achieves an overall classification accuracy of 80% as estimated by using a 10-fold cross-validation test with coverage of 74%. For the rest 26%, our method achieves an overall classification the accuracy of 45%. This research should provide an important tool in human genomics and proteomics studies.

APA, Harvard, Vancouver, ISO, and other styles

8

Chen, Shu-Pin, and 陳書品. "Prediction of eukaryotic protein subcellular localization." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/10932435428409959975.

Full text

Abstract:

碩士
國立中央大學
資訊工程研究所
96
Prediction of subcellular localization of various proteins is an important and well-studied problem. Each compartment in cell has specific tasks, and proteins in each compartment are synthesized to fulfill these tasks. Proteins localized in the same compartment are thought to have the same or similar function. Knowledge of the subcellular localization of a protein can significantly improve target identification during the drug discovery process. Current available methods extract information from amino acid sequence or signal peptide and lack more biological features like post-translational modification. We develop an integrated system for biologists to know which localization the proteins from eukaryote is located to. The system is based on protein sequence composition, signal peptide, protein domains from Pfam and homologs search.

APA, Harvard, Vancouver, ISO, and other styles

9

Chen, Shih-Hao, and 陳世豪. "Subcellular Localization Prediction of Eukaryotic Protein." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/80756466635576715069.

Full text

Abstract:

碩士
臺中健康暨管理學院
生物資訊研究所
92
Biologically, the function of a protein is highly related to its subcellular localization. Accordingly, it is necessary to develop an automatic yet reliable method for protein subcellular localization prediction, especially when large-scale genome sequences are to be analyzed. Various methods have been proposed to perform the task. The results, however, are not satisfactory in terms of effectiveness and efficiency. In this paper, the proposed Bayesian inference method and The Information Gain used to observed important information, Moreover, the Nearest Neighbor Classification is considerably effective for subcellular localization prediction in a supervised fashion.

APA, Harvard, Vancouver, ISO, and other styles

10

Chen, Yu-Tzu, and 陳佑慈. "Protein-protein interaction prediction enhancement using subcellular localization." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/81806002826018277394.

Full text

Abstract:

碩士
國立中央大學
資訊工程研究所
98
Protein–protein interactions are importance for almost every process in living cell. Abnormal interactions may have implications in a number of neurological syndromes. Therefore, it is crucial to recognize the association and dissociation of protein molecules. Current available computational methods of prediction of protein–protein interaction extract information from amino acid sequence or signal peptide. There are few method consider subcellular localization information. The method presented in this paper is based on the assumption that two proteins should appear on same subcellular localization to perform interaction. We develop an integrated system which based on a learning algorithm-support vector machine to predict protein–protein interactions. We construct training models for different subcellular localization. Each test protein pair request one training model to predict according to its localization. This method is take protein sequence composition, protein domains and subcellular localization information as features. The prediction ability of our method is better than other sequence-based protein–protein interaction prediction methods. In addition, a more complete data of protein-protein interactions and subcellular localizations can enhance the prediction ability of the method.

APA, Harvard, Vancouver, ISO, and other styles

11

Su, Chia-Yu, and 蘇家玉. "Prediction of Subcellular Localization and RNA-binding Sites in Proteins." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/43820761805319359116.

Full text

Abstract:

博士
國立交通大學
生物資訊研究所
97
Automated function annotation is a major goal of post-genomic era with tremendous amount of protein sequences in the databases. Prediction of subcellular localization or binding sites in proteins is crucial for function analysis, genome annotation, and drug discovery. Determination of localization or structure using experimental approaches is time-consuming; thus, computational approaches become highly desirable. We proposed two protein subcellular localization prediction methods, PSL101 and PSLDoc. PSL101 combines a structural homology approach and a support vector machine model, in which compartment-specific biological features derived from bacterial translocation pathways are incorporated. PSLDoc uses a probabilistic latent semantic analysis on gapped-dipeptides of various distances, where evolutionary information from position specific scoring matrix (PSSM) is utilized. Our methods achieve 93% in overall accuracy for Gram-negative bacteria, and compared favorably to the state-of-the-art results by 7.4% on a benchmark dataset having low homology to the training set. Experiment results demonstrate that both biological features derived from translocation pathways and feature reduction by document classification techniques can lead to a significant improvement in the prediction performance. Moreover, the proposed biological features and gapped-dipeptide signatures are interpretable and can be applied in advanced studies and experiment designs. For RNA-binding site prediction, we propose another method, RNAProB, which incorporates a new smoothed PSSM encoding scheme in a support vector machine model. The proposed smoothed PSSM encoding considers correlation and dependency from neighboring residues for each amino acid in a protein sequence. Experiment results show that smoothed PSSM encoding significantly enhances the prediction performance, especially for sensitivity. Our method performs better than the state-of-the-art systems by 4.90%~6.83%, 7.05%~26.90%, 0.88%~5.33%, and 0.10~0.23 in terms of overall accuracy, sensitivity, specificity, and Matthew’s correlation coefficient, respectively. This also supports our assumption that smoothed PSSM encoding can better resolve the ambiguity in discriminating between interacting and non-interacting residues by modeling the dependency from surrounding residues. Because of the generality of the proposed methods, they can be extended to other research topics in the future. Moreover, the information from predicted localization and structure of proteins can be used collectively to assist biologists in both inferring protein function and finding suitable drug targets. Therefore, we believe that our work can contribute to scientific discoveries on a high-throughput basis.

APA, Harvard, Vancouver, ISO, and other styles

12

Gaston, Daniel. "PHYLOGENOMIC APPROACHES TO THE ANALYSIS OF FUNCTIONAL DIVERGENCE AND SUBCELLULAR LOCALIZATION." 2012. http://hdl.handle.net/10222/14439.

Full text

Abstract:

With rapid advances in sequencing technologies and precipitous decreases in cost, public sequence databases have increased in size apace. However, experimental characterization of novel genes and their products remains prohibitively expensive and time consuming. For these reasons, bioinformatics approaches have become increasingly necessary to generate hypotheses of biological function. Phylogenomic approaches use phylogenetic methods to place genes, chromosomes, or whole genomes within the context of their evolutionary history and can be used to predict the function of encoded proteins. In this thesis, two new phylogenomic methods and software implementations are presented that address the problems of subcellular localization prediction and functional divergence prediction within protein families respectively. Most of the widely used programs for subcellular localization prediction have been trained on model organisms and ignore phylogenetic information. As a result, their predictions are not always reliable when applied to phylogenetically divergent eukaryotes, such as unicellular protists. To address this problem, PhyloPred-HMM, a novel phylogenomic method was developed to predict sequences that are targeted to mitochondria or mitochondrion-related organelles (hydrogenosomes and mitosomes). This method was compared to existing prediction methods using an existing test dataset of mitochondrion-targeted sequences from well-studied groups, sequences from a variety of protists, and the whole proteomes of two protists: Tetrahymena thermophila and Trichomonas vaginalis. PhyloPred-HMM performed comparably to existing classifiers on mitochondrial sequences from well-studied groups such as animals, plants, and Fungi and better than existing classifiers on diverse protistan lineages. FunDi, a novel approach to the prediction of functional divergence was developed and tested on 11 biological datasets and two large simulated datasets. On the 11 biological datasets, FunDi appeared to perform comparably to existing programs, although performance measures were compromised by a lack of experimental information. On the simulated datasets, FunDi was clearly superior to existing methods. FunDi, and two other prediction programs, was then used to characterize the functional divergence in two groups of plastid-targeted glyceraldehyde-3-phosphate dehydrogenases (GAPDH) adapted to roles in the Calvin cycle. FunDi successfully identified functionally divergent residues supported by experimental data, and identified cases of potential convergent evolution between the two groups of GAPDH sequences.

APA, Harvard, Vancouver, ISO, and other styles

13

Yen, Shou-Cheng, and 嚴守正. "Towards Improving Accuracy of Subcellular Localization Prediction for Lysosomal, Peroxisomal and ER Proteins." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/22770069206338505923.

Full text

Abstract:

碩士
國立陽明大學
生物醫學資訊研究所
99
Within a cell biological functions are often localized in specific subcellular compartments. Hence the ability to predicting the subcellular localizations for uncharacterized proteins is critical for protein functional annotation. This study describes a novel method for identifying sequence motifs to predict protein subcellular localizations. Existing methods mostly rely on prior knowledge about protein targeting signals and sophisticated residue compositions that provide obscure insights about cellular functions. Here we proposed a systematic approach to identify signature motifs without using prior knowledge. The attention was placed on the localizations that are traditionally more difficult to predict, i.e., the lysosomal, peroxisomal and ER proteins. For proteins within those localizations, we investigated all sequence motifs (length < 8) represented by a reduced amino acid alphabet set. Each motif was then subject to a statistical test to determine if it has a distinct occurrence frequency for proteins in the specific localization. The identified sequence motifs were further extended on both ends to increase their length. Three of the motifs have never been reported in the field of localization prediction, they are: (1) the [WFY][AVLI][AVLI]KNS[WFY] motif, a lysosomal specific motif found on cathepsin protease active site; (2) the RERIPERVVHA motif exclusive for peroxisomal proteins; and (3) the enriched CGHC motif present exclusively in ER proteins; The results facilitate our in-house implementation of a more accurate prediction tool for lysosomal, peroxisomal and ER proteins, the three most challenging localizations. We propose a prediction system using existing approaches and correct the mis-identified using our novel motifs and existing motifs. The result shows that our prediction performs more accurate performance for predict ER, lysosomal and peroxisome proteins with the MCC 0.61, 0.63, and 0.53, respectively. With extension of proteins located in other subcellular compartments using a wider range of physicochemical properties, our discovery-oriented approach fulfills the gaps left by the current studies in this field.

APA, Harvard, Vancouver, ISO, and other styles

14

Huang, Wen-Lin, and 黃文玲. "Using Gene Ontology Annotation and Physicochemical Properties for Prediction of Protein Subcellular and Subnuclear Localization." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/09303799954362845008.

Full text

Abstract:

博士
逢甲大學
資訊工程所
96
Eukaryotic cells consist of some major parts, the nucleus, cytoplasm, Mitochondrion, Extracellular, and Chloroplast. One of the fundamental goals in molecular cell biology and proteomics is to identify their subcellular locations or environments because the function of a protein and its role in a cell are closely correlated with which compartment or organelle it resides in. The knowledge thus obtained can help us timely utilize these newly found protein sequences for both basic research and drug discovery. Among the subcellualr compartments, the nucleus is a highly complex organelle that forms a package for cells and their corresponding regulatory factors. Therefore, preicition of subcellualr and subnuclear localization are critical problems in biological field. Computational prediction methods from primary protein sequences are fairly economic in terms of identifying many proteins with unknown functions. Accurate prediction methods not only rely on informative features and classifier design but also emphasize in feature section. This dissertation proposes two novel genetic algorithm based algorithms, GOmining and ESVM, for subcellualr and subnucelar localization prediction. The two algorithms combined with support vector machine (SVM) can determine the best number m of n features and identify a small number m out of the n features and determine simultaneously. This dissertation using the GOmining and ESVM proposes two prediciotn systems, ProLoc-GO and ProLoc, by mining informative Gene Ontology (GO) terms and physicochemical composition (PCC) for protein subcellular and subnuclear localization, respectively. To evaluate ProLoc, this study uses two datasets SNL6 and SNL9, which have 504 proteins localized in six subnuclear compartments and 367 proteins localized in nine subnuclear compartments. The ProLoc utilizing the selected mPCC=33 and 28 PCC features has accuracies of 56.37% for SNL6 and 72.82% for SNL9, respectively. As for the ProLoc-GO system, it utilizes GOmining to identify a small number m out of the n GO terms as input features to SVM, where m << n. The m informative GO terms contain the essential GO terms annotating subcellular compartments such as GO:0005634 (Nucleus), GO:0005737 (Cytoplasm) and GO:0005856 (Cytoskeleton). Two existing data sets SCL12 (human protein with 12 locations) and SCL16 (Eukaryotic proteins with 16 locations) with <25% sequence identity are used to evaluate ProLoc-GO which has been implemented by using a single SVM classifier with the m=44 and m=60 informative GO terms, respectively. ProLoc-GO using input sequences yields test accuracies of 88.1% and 83.3% for SCL12 and SCL16, respectively. Since GOmining incoperated with GO is effieient, an improved prediction system NuProLoc by using GOmining is proposed for subnucelar localization prediction. The NuProLoc yields accuracies 75.6% and 82.4% for SNL6 and SNL9, respectively, which significiently better than 56.37% and 72.82% for ProLoc. The growth of Gene Ontology and physicochemical properties in size and popularity has increased the effectiveness of GO-based and PCC-based features. GOmining and ESVM can serve as tools for selecting informative GO terms and PCC features in solving sequence-based prediction problems.

APA, Harvard, Vancouver, ISO, and other styles

15

Liao, Jun-Qin, and 廖俊欽. "Protein Subcellular Localization Prediction by Support Vector Machine and Genetic Algorithm based on n-Peptide Compositions." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/4u3ruf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Li, Wei-Jyun, and 李瑋峻. "Predicting Protein Subcellular Localization Using Integrative System." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/46151991856883970059.

Full text

Abstract:

碩士
國立臺灣海洋大學
資訊工程學系
97
The prediction of protein subcellular localization (PSL) has become a popular field in recent years because it can help protein function prediction and genome annotation, and thus aid the drug design. However, the experimental methods for analyzing PSL are often expensive and time-consuming tasks. Therefore, the computational prediction of PSL, with the use of information in databases, has become a vibrant field of study. Nevertheless, it is still a tough task to extract suitable features from proteins for accurate prediction of PSL due to the complex structures of proteins. Consequently, for improving prediction performance on PSL problem, several modern PSL prediction systems apply multi-feature based protein descriptors and adopt hybrid complex prediction systems to classify and predict PSL. Even though, these systems possess outstanding prediction performance, few of them provide protein characteristics and bases of classification for further analysis. Therefore, in this thesis, a PSL prediction system, PSL-PR-CPR (Protein Subcellular Localization PredictoR and Characteristic ProvideR), which aims to provide more protein characteristics for analysis, is proposed. In PSL-PR-CPR system, proteins are encoded into feature vectors by using a protein descriptor, AAwindow, which uses Amino Acid Index (AAI) to describe proteins in a simple and easy-understood way. In order to derive a prediction model which has a high prediction performance, PSL-PR-CPR employs MG-PSO-DS, an evolutionary computation algorithm, for doing feature selection to select appropriate feature sets that are suitable for C4.5 classifier to classify and predict PSL. MG-PSO-DS is also applied to optimize C4.5 prediction performance by tuning C4.5 parameters. The PSL-PR-CPR displays C4.5 decision rules and provides protein features that assist protein analysis after constructing the prediction model. In addition, PSL-PR-CPR shows the characteristics of important features within amino acid sequence according to the easy-understood property of AAwindow for the purpose of providing more information for analysis reference. For prediction performance validation, two datasets were applied to compare the prediction performance of PSL-PR-CPR, Mycobacterial PSL predictor, Gpos-PLoc, CELLO and LocateP at the end of this thesis. The two datasets are 852 mycobacterial proteins from the study of Mycobacterial PSL predictor and 452 Gram-positive bacterial proteins from the study of Gpos-PLoc. The 5 fold cross validation and the 10 fold cross validation are used to validate PSL-PR-CPR performance on 852 mycobacterial proteins and 452 Gram-positive bacterial proteins, respectively. PSL-PR-CPR also provides samples of C4.5 decision rules, important features and characteristics within amino acid sequence.

APA, Harvard, Vancouver, ISO, and other styles

17

Nathan, Michel. "A multiple site predictor for subcellular localization of fungal proteins." Thesis, 2006. http://spectrum.library.concordia.ca/9050/1/MR20780.pdf.

Full text

Abstract:

In this work, we build a system that uses a decision tree to predict fungal protein localization based on physiochemical properties of proteins calculable from their primary sequences. The training examples that serve as basis for learning are obtained from experimentally validated localizations. Although there is clear evidence of presence of the same protein in more than one sub-cellular compartment, almost all existing automated systems restrict their predictions to single-site localization. Here, we attempt to address this issue and for proteins that are reported to target more than one sub-cellular location, our system predicts as many localization sites as possible. When localizing among 17 sub-cellular compartments, in 64% of the cases our system successfully predicts at least one of the experimentally reported localizations. In addition, our results indicate that all the reported localizations are correctly predicted in 49% of the cases. We also report 76 fungal protein features implicated in localization and indicate those with the highest relative discriminatory power. Finally, we report on necessary conditions for localization to specific sub-cellular sites

APA, Harvard, Vancouver, ISO, and other styles

18

LIN, TSAI-YU, and 林采妤. "Improvement of Predicting Human Protein Subcellular Localization Through Integrated Machine Learning Methods." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/5949gw.

Full text

Abstract:

碩士
逢甲大學
資訊工程學系
106
The prediction of protein subcellular locations is an important topic in computational biology research over the past decade. Knowing protein subcellular localization can understand protein function as well as protein-protein interactions. However, relying on experimental methods to identify subcellular locations of proteins is often laborious and expensive, so when using large-scale protein datasets with unknown locations, it is highly desirable to use more efficient computer prediction tools. So far, many methods have been proposed to predict the location of large-scale protein datasets, and statistical machine learning methods have been widely used in model construction. The key step in these predictions is to encode the amino acid sequence as a feature vector. In this paper, we use protein sequences to calculate various n-peptide amino acid composition, and then characterize different n-peptide amino acid composition characteristics using a machine learning approach-Support Vector Machine(SVM) [1] combined genetic algorithm(GA). Then the genetic algorithm is used to select the features. Finally, the prediction results are evaluated by recall, precision and F1 and compared with the past methods. The results show that our method can achieve 64% of the overall F1 value. We use a simpler method to make predictions, we can get results that are about or better than other more complex methods.

APA, Harvard, Vancouver, ISO, and other styles

19

Sun, Han-Hao, and 孫翰豪. "REALoc: Reliable and effective methods to assist predicting human protein subcellular localization." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/76013331557236563304.

Full text

Abstract:

碩士
國立中興大學
基因體暨生物資訊學研究所
101
Protein subcellular localization is an important part of biological research; which could support drug development and explore the function of proteins. Many subcellular localization prediction tools has developed, most of them used the data of eukaryotes or prokaryotes for model training, however, the related predictors for human proteins are rare. We established a system to predict subcellular localization of human proteins with Singleplex and Multiplex, called REALoc. It based on two layers architecture integrated with two different machine learning methods, one-to-one and many-to may. Besides, system included many sequence based features and function based features, such as amino acid composition, surface accessibility. In addition, we developed a series of computing features like weighted sign AAindex, sequence similarity profile and regular-mRMR feature selection for Gene Ontology. 5 folds Cross-validation was performed with iLoc-Hum on training dataset covers 6 location sites (Cell membrane, Cytoplasm, Endoplasmic reticulum/Golgi apparatus, Mitochondrion, Nucleus, secreted), overall absolute true success rate of REALoc is 75.34%, and on testing dataset is 57.14% which performances are about 10% higher than other four prediction systems. Finally, this study discussed the performance of the two decision mechanism of vote and GANN for predicting single location and multiple locations. Furthermore, the relationship between the protein-protein interaction and subcellular localization by using motifs was investigated.

APA, Harvard, Vancouver, ISO, and other styles

20

Shen, Yaoqing. "In silico analysis of mitochondrial proteins." Thèse, 2009. http://hdl.handle.net/1866/3766.

Full text

Abstract:

Le rôle important joué par la mitochondrie dans la cellule eucaryote est admis depuis longtemps. Cependant, la composition exacte des mitochondries, ainsi que les processus biologiques qui sy déroulent restent encore largement inconnus. Deux facteurs principaux permettent dexpliquer pourquoi létude des mitochondries progresse si lentement : le manque defficacité des méthodes didentification des protéines mitochondriales et le manque de précision dans lannotation de ces protéines. En conséquence, nous avons développé un nouvel outil informatique, YimLoc, qui permet de prédire avec succès les protéines mitochondriales à partir des séquences génomiques. Cet outil intègre plusieurs indicateurs existants, et sa performance est supérieure à celle des indicateurs considérés individuellement. Nous avons analysé environ 60 génomes fongiques avec YimLoc afin de lever la controverse concernant la localisation de la bêta-oxydation dans ces organismes. Contrairement à ce qui était généralement admis, nos résultats montrent que la plupart des groupes de Fungi possèdent une bêta-oxydation mitochondriale. Ce travail met également en évidence la diversité des processus de bêta-oxydation chez les champignons, en corrélation avec leur utilisation des acides gras comme source dénergie et de carbone. De plus, nous avons étudié le composant clef de la voie de bêta-oxydation mitochondriale, lacyl-CoA déshydrogénase (ACAD), dans 250 espèces, couvrant les 3 domaines de la vie, en combinant la prédiction de la localisation subcellulaire avec la classification en sous-familles et linférence phylogénétique. Notre étude suggère que les gènes ACAD font partie dune ancienne famille qui a adopté des stratégies évolutionnaires innovatrices afin de générer un large ensemble denzymes susceptibles dutiliser la plupart des acides gras et des acides aminés. Finalement, afin de permettre la prédiction de protéines mitochondriales à partir de données autres que les séquences génomiques, nous avons développé le logiciel TESTLoc qui utilise comme données des Expressed Sequence Tags (ESTs). La performance de TESTLoc est significativement supérieure à celle de tout autre outil de prédiction connu. En plus de fournir deux nouveaux outils de prédiction de la localisation subcellulaire utilisant différents types de données, nos travaux démontrent comment lassociation de la prédiction de la localisation subcellulaire à dautres méthodes danalyse in silico permet daméliorer la connaissance des protéines mitochondriales. De plus, ces travaux proposent des hypothèses claires et faciles à vérifier par des expériences, ce qui présente un grand potentiel pour faire progresser nos connaissances des métabolismes mitochondriaux.
The important role of mitochondria in the eukaryotic cell has long been appreciated, but their exact composition and the biological processes taking place in mitochondria are not yet fully understood. The two main factors that slow down the progress in this field are inefficient recognition and imprecise annotation of mitochondrial proteins. Therefore, we developed a new computational tool, YimLoc, which effectively predicts mitochondrial proteins from genomic sequences. This tool integrates the strengths of existing predictors and yields higher performance than any individual predictor. We applied YimLoc to ~60 fungal genomes in order to address the controversy about the localization of beta oxidation in these organisms. Our results show that in contrast to previous studies, most fungal groups do possess mitochondrial beta oxidation. This work also revealed the diversity of beta oxidation in fungi, which correlates with their utilization of fatty acids as energy and carbon sources. Further, we conducted an investigation of the key component of the mitochondrial beta oxidation pathway, the acyl-CoA dehydrogenase (ACAD). We combined subcellular localization prediction with subfamily classification and phylogenetic inference of ACAD enzymes from 250 species covering all three domains of life. Our study suggests that ACAD genes are an ancient family with innovative evolutionary strategies to generate a large enzyme toolset for utilizing most diverse fatty acids and amino acids. Finally, to enable the prediction of mitochondrial proteins from data beyond genome sequences, we designed the tool TESTLoc that uses expressed sequence tags (ESTs) as input. TESTLoc performs significantly better than known tools. In addition to providing two new tools for subcellular localization designed for different data, our studies demonstrate the power of combining subcellular localization prediction with other in silico analyses to gain insights into the function of mitochondrial proteins. Most importantly, this work proposes clear hypotheses that are easily testable, with great potential for advancing our knowledge of mitochondrial metabolism.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!