To see the other types of publications on this topic, follow the link: Bioinformatics application.

Dissertations / Theses on the topic 'Bioinformatics application'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Bioinformatics application.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Malm, Patrik. "Development of a hierarchical k-selecting clustering algorithm – application to allergy." Thesis, Linköping University, The Department of Physics, Chemistry and Biology, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10273.

Full text
Abstract:
<p>The objective with this Master’s thesis was to develop, implement and evaluate an iterative procedure for hierarchical clustering with good overall performance which also merges features of certain already described algorithms into a single integrated package. An accordingly built tool was then applied to an allergen IgE-reactivity data set. The finally implemented algorithm uses a hierarchical approach which illustrates the emergence of patterns in the data. At each level of the hierarchical tree a partitional clustering method is used to divide data into k groups, where the number k is decided through application of cluster validation techniques. The cross-reactivity analysis, by means of the new algorithm, largely arrives at anticipated cluster formations in the allergen data, which strengthen results obtained through previous studies on the subject. Notably, though, certain unexpected findings presented in the former analysis where aggregated differently, and more in line with phylogenetic and protein family relationships, by the novel clustering package.</p>
APA, Harvard, Vancouver, ISO, and other styles
2

Li, Yong-Jun. "The application of statistical physics in bioinformatics /." View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?PHYS%202003%20LI.

Full text
Abstract:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003.<br>Includes bibliographical references (leaves 55-58). Also available in electronic version. Access restricted to campus users.
APA, Harvard, Vancouver, ISO, and other styles
3

Momin, Amin Altaf. "Application of bioinformatics in studies of sphingolipid biosynthesis." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34842.

Full text
Abstract:
The studies in this dissertation demonstrate that the gene expression pathway maps are useful tools to notice alteration in different branches of sphingolipid biosynthesis pathway based on microarray and other transcriptomic analysis. To facilitate the integrative analysis of gene expression and sphingolipid amounts, updated pathway maps were prepared using an open access visualization tool, Pathvisio v1.1. The datasets were formatted using Perl scripts and visualized with the aid of color coded pathway diagrams. Comparative analysis of transcriptomics and sphingolipid alterations from experimental studies and published literature revealed 72.8 % correlation between mRNA and sphingolipid differences (p-value < 0.0001 by the Fisher's exact test).The high correlation between gene expression differences and sphingolipid alterations highlights the application of this tool to evaluate molecular changes associate with sphingolipid alterations as well as predict differences in specific metabolites that can be experimentally verified using sensitive approaches such as mass spectrometry. In addition, bioinformatics sequence analysis was used to identify transcripts for sphingolipid biosynthesis enzyme 3-ketosphinganine reductase, and homology modeling studies helped in the evaluation of a cell line defective in sphingolipid metabolism due to mutation in the enzyme serine palmitoyltransferase, the first enzyme of de novo biosynthesis pathway. Hence, the combination of different bioinformatics approaches, including protein and DNA sequence analysis, structure modeling and pathway diagrams can provide valuable inputs for biochemical and molecular studies of sphingolipid metabolism.
APA, Harvard, Vancouver, ISO, and other styles
4

Tarcha, Eric J. "Application of Immunoproteomics and Bioinformatics to coccidioidomycosis Vaccinology." University of Toledo Health Science Campus / OhioLINK, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=mco1154441973.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Kiritchenko, Svetlana. "Hierarchical text categorization and its application to bioinformatics." Thesis, University of Ottawa (Canada), 2006. http://hdl.handle.net/10393/29298.

Full text
Abstract:
In a hierarchical categorization problem, categories are partially ordered to form a hierarchy. In this dissertation, we explore two main aspects of hierarchical categorization: learning algorithms and performance evaluation. We introduce the notion of consistent hierarchical classification that makes classification results more comprehensible and easily interpretable for end-users. Among the previously introduced hierarchical learning algorithms, only a local top-down approach produces consistent classification. The present work extends this algorithm to the general case of DAG class hierarchies and possible internal class assignments. In addition, a new global hierarchical approach aimed at performing consistent classification is proposed. This is a general framework of converting a conventional "flat" learning algorithm into a hierarchical one. An extensive set of experiments on real and synthetic data indicate that the proposed approach significantly outperforms the corresponding "flat" as well as the local top-down method. For evaluation purposes, we use a novel hierarchical evaluation measure that is superior to the existing hierarchical and non-hierarchical evaluation techniques according to a number of formal criteria. Also, this dissertation presents the first endeavor of applying the hierarchical text categorization techniques to the tasks of bioinformatics. Three bioinformatics problems are addressed. The objective of the first task, indexing biomedical articles with Medical Subject Headings (MeSH), is to associate documents with biomedical concepts from the specialized vocabulary of MeSH. In the second application, we tackle a challenging problem of gene functional annotation from biomedical literature. Our experiments demonstrate a considerable advantage of hierarchical text categorization techniques over the "flat" method on these two tasks. In the third application, our goal is to enrich the analysis of plain experimental data with biological knowledge. In particular, we incorporate the functional information on genes directly into the clustering process of microarray data with the outcome of an improved biological relevance and value of clustering results.
APA, Harvard, Vancouver, ISO, and other styles
6

Abbas, Naeem. "Acceleration of a bioinformatics application using high-level synthesis." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2012. http://tel.archives-ouvertes.fr/tel-00847076.

Full text
Abstract:
The revolutionary advancements in the field of bioinformatics have opened new horizons in biological and pharmaceutical research. However, the existing bioinformatics tools are unable to meet the computational demands, due to the recent exponential growth in biological data. So there is a dire need to build future bioinformatics platforms incorporating modern parallel computation techniques. In this work, we investigate FPGA based acceleration of these applications, using High-Level Synthesis. High-Level Synthesis tools enable automatic translation of abstract specifications to the hardware design, considerably reducing the design efforts. However, the generation of an efficient hardware using these tools is often a challenge for the designers. Our research effort encompasses an exploration of the techniques and practices, that can lead to the generation of an efficient design from these high-level synthesis tools. We illustrate our methodology by accelerating a widely used application -- HMMER -- in bioinformatics community. HMMER is well-known for its compute-intensive kernels and data dependencies that lead to a sequential execution. We propose an original parallelization scheme based on rewriting of its mathematical formulation, followed by an in-depth exploration of hardware mapping techniques of these kernels, and finally show on-board acceleration results. Our research work demonstrates designing flexible hardware accelerators for bioinformatics applications, using design methodologies which are more efficient than the traditional ones, and where resulting designs are scalable enough to meet the future requirements.
APA, Harvard, Vancouver, ISO, and other styles
7

Lou, Xuemei. "Methods Evaluation and Application in Complex Human Genetic Disease." NCSU, 2008. http://www.lib.ncsu.edu/theses/available/etd-06122008-114358/.

Full text
Abstract:
One of the most important tasks in human genetics is to search for disease susceptibility genes. Linkage and association analyses are two major approaches for disease-gene mapping. Chapter 1 reviewed the development of disease-gene mapping methods in the past decades. Gene mapping of complex human diseases often results in the identification of multiple potential risk variants within a gene and/or in the identification of multiple genes within a linkage peak. Thus a question of interest is to test whether the linkage result can be explained in part or in full by the candidate SNP if it shows evidence of association, and then provide some guidance for the next time-consuming step of positional cloning of susceptibility genes. Two methods, GIST and LAMP, which access whether the SNP can partially or fully account for the linkage signal in the region identified by a linkage scan, are evaluated on Genetic Analysis Workshop 15 (GAW15) simulated rheumatoid arthritis (RA) data and discussed in Chapter 2. The simulation results showed that GIST is simple and works slightly better than LAMP-LE test when there is little linkage evidence, LAMP linkage test has limited power when there is not much linkage evidence, and LAMP association test is the best not only when the linkage evidence is extremely high, but also when there is some LD between the candidate SNP and the trait locus. The fact that complex traits are often determined by multiple genetic and environmental factors with small-to-moderate effects makes it important to investigate the behavior of current association methods under multiple risk variants model. In Chapter 3, we compared APL, FBAT, LAMP, APL-Haplotype, FBAT-LC and APL-OSA conditional test in five multiple risk variants models. The simulation results showed that the power of single marker association tests is closely correlated with the amount of LD between marker and disease loci, and these tests maintain good power to detect multiple risk variants in a small region with moderate degree of LD for fully genotyped families. Global tests, such as FBAT-LC are sensitive to the presence of at least one susceptibility variant, but are not helpful for selecting the most promising SNPs for further study. We reported that if multiple haplotypes are associated with different disease loci, the haplotype tests results can be misleading while APL-OSA conditional test has the greatest power to properly dissect the clustered associated markers for all models with an acceptable type I error rate ranging from 0.033 to 0.056. We applied APL-OSA conditional test on GENECARD samples, and got reasonable results. One linkage region of particular interest on chromosome 3 was identified by two independent genome linkage scan with Coronary Artery Disease (CAD). Multiple disease susceptibility genes have been reported from this region, and there are also linkage evidence that this region may harbors a gene or genes determining HDL-C levels. Within this region, a search for HDL-C QTL and analyses of the relationship between genetic variants, HDL-C level to CAD risk are discussed in Chapter 4. We performed CAD association and HDL-C QTL analysis on two independent datasets. We identified SNP rs2979307 in the OSBPL11 gene which survives a Bonferroni correction. We observed different HDL-C trends with HDL-C associated SNPs. Even with the evident heterogeneity presented in our CAD population, we detected several association signals with SNPs in KALRN, MYLK, CDGAP and PAK2 genes in both CAD datasets for HDL-C, where all these genes belong to a Rho pathway.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhou, Yu. "Application of RNA Bioinformatics in decoding RNA structure and regulation." Paris 11, 2008. http://www.theses.fr/2008PA112234.

Full text
Abstract:
Ma thèse porte sur le développement de méthodologies informatiques et bioinformatiques pour résoudre des problèmes provenant de questions biologiques liées à l’ARN, telles que la prédiction de structures, l’identification de structures communes, la découverte de cibles des micro-ARN, la prédiction de la régulation de l’épissage, et le design (ou repliement inverse) d'ARN. Le premier chapitre concerne la mise en place d’une méthode itérative pour la prédiction des structures secondaires des introns de groupes 1, incluant les pseudo-nœuds, et la développement d’une base de données complète sur les introns de groupe 1. Dans le deuxième chapitre, je décris mon travail sur l’analyse bioinformatique de la structure des sites d’incorporation de la Pyrrolysine, le 22ème acide aminé, dans des gènes d’archae. Les troisième et quatrième chapitres sont consacrés au développement et à la mise en œuvre de deux méthodes d’analyse de données expérimentales pour la recherche, dans les séquences d’ARN, de cibles de micro-ARN, et de sites de fixation de protéines impliquées dans le processus d’épissage des introns. Enfin, le cinquième chapitre présente un algorithme de design de structures d’ARN avec des contraintes de motifs, faisant appel à des manipulations d’automates et de grammaires non contextuelles<br>My thesis focuses on the application of RNA bioinformatics analysis to solve the problems originated from biological requirements, ranging from structure prediction, common structure identification, microRNA target discovery, splicing regulation prediction, and RNA design (inverse folding). The first chapter concerns the establishment of an iterative method for the secondary structure prediction of group I introns including pseudo-knots, and the development of a comprehensive group I intron sequence and structure database. In the second chapter, I describe my work on bioinformatics analysis of the Pyrrolysine (Pyl, 22nd amino acid) insertion structure in Pyl-associated genes in archaea. The third and fourth chapters are devoted to develop two methods of experimental data analysis for identification of micro-RNA target sites, and for determination of binding sites of a RNA binding protein implicated in pre-mRNA splicing, independently. Finally, the fifth chapter presents an algorithm for RNA design under motif constraints, involving manipulation of automata and context-free grammars
APA, Harvard, Vancouver, ISO, and other styles
9

Zaslavskiy, Mikhail. "Graph matching and its application in computer vision and bioinformatics." Paris, ENMP, 2010. http://www.theses.fr/2010ENMP1659.

Full text
Abstract:
Le problème d'alignement de graphes, qui joue un rôle central dans différents domaines de la reconnaissance de formes, est l'un des plus grands défis dans le traitement de graphes. Nous proposons une méthode approximative pour l'alignement de graphes étiquetés et pondérés, basée sur la programmation convexe concave. Une application importante du problème d'alignement de graphes est l'alignement de réseaux d'interactions de protéines, qui joue un rôle central pour la recherche de voies de signalisation conservées dans l'évolution, de complexes protéiques conservés entre les espèces, et pour l'identification d'orthologues fonctionnels. Nous reformulons le problème d'alignement de réseaux d'interactions comme un problème d'alignement de graphes, et étudions comment les algorithmes existants d'alignement de graphes peuvent être utilisés pour le résoudre. Dans la formulation classique de problème d'alignement de graphes, seules les correspondances bijectives entre les noeuds de deux graphes sont considérées. Dans beaucoup d'applications, cependant, il est plus intéressant de considérer les correspondances entre des ensembles de noeuds. Nous proposons une nouvelle formulation de ce problème comme un problème d'optimisation discret, ainsi qu'un algorithme approximatif basé sur une relaxation continue. Nous présentons également deux résultats indépendents dans les domaines de la traduction automatique statistique et de la bio-informatique. Nous montrons d'une part comment le problème de la traduction statistique basé sur les phrases peut être reformulé comme un problème du voyageur de commerce. Nous proposons d'autre part une nouvelle mesure de similarité entre les sites de fixation de protéines, basée sur la comparaison 3D de nuages atomiques<br>The graph matching problem is among the most important challenges of graph processing, and plays a central role in various fields of pattern recognition. We propose an approximate method for labeled weighted graph matching, based on a convex-concave programming approach which can be applied to the matching of large sized graphs. This method allows to easily integrate information on graph label similarities into the optimization problem, and therefore to perform labeled weighted graph matching. One of the interesting applications of the graph matching problem is the alignment of protein-protein interaction networks. This problem is important when investigating evolutionary conserved pathways or protein complexes across species, and to help in the identification of functional orthologs through the detection of conserved interactions. We reformulate PPI alignment as a graph matching problem, and study how state-of-the-art graph matching algorithms can be used for this purpose. In the classical formulation of graph matching, only one-to-one correspondences are considered, which is not always appropriate. In many applications, it is more interesting to consider many-to-many correspondences between graph vertices. We propose a reformulation of the many-to-many graph matching problem as a discrete optimization problem and we propose an approximate algorithm based on a continuous relaxation. In this thesis, we also present two interesting results in statistical machine translation and bioinformatics. We show how the phrase-based statistical machine translation decoding problem can be reformulated as a Traveling Salesman Problem. We also propose a new protein binding pocket similarity measure based on a comparison of 3D atom clouds
APA, Harvard, Vancouver, ISO, and other styles
10

Rajabi, Zeyad. "BIAS : bioinformatics integrated application software and discovering relationships between transcription factors." Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=81427.

Full text
Abstract:
In the first part of this thesis, we present a new development platform especially tailored to Bioinformatics research and software development called Bias (Bioinformatics Integrated Application Software) designed to provide the tools necessary for carrying out integrative Bioinformatics research. Bias follows an object-relational strategy for providing persistent objects, allows third-party tools to be easily incorporated within the system, and it supports standards and data-exchange protocols common to Bioinformatics. The second part of this thesis is on the design and implementation of modules and libraries within Bias related to transcription factors. We present a module in Bias that focuses on discovering competitive relationships between mouse and yeast transcription factors. By competitive relationships we mean the competitive binding of two transcription factors for a given binding site. We also present a method that divides a transcription factor's set of binding sites into two or more different sets when constructing PSSMs.
APA, Harvard, Vancouver, ISO, and other styles
11

Lutimba, Stuart. "Determination of specificity and affinity of the Lactose permease (LacY) protein of Escherichia coli through application of molecular dynamics simulation." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15933.

Full text
Abstract:
Proteins are essential in all living organisms. They are involved in various critical activities and are also structural components of cells and tissues. Lactose permease a membrane protein has become a prototype for the major facilitator super family and utilises an existing electrochemical proton gradient to shuttle galactoside sugars to the cell. Therefore it exists in two principle states exposing the internal binding site to either side of the membrane. From previous studies it has been suggested that protonation precedes substrate binding but it is still unclear why this has to occur in the event of substrate binding. Therefore this study aimed to bridge this gap and to determine the chemical characteristics of the transport pathway. Molecular dynamics simulation methods and specialised simulation hardware were employed to elucidate the dependency of substrate binding on the protonation nature of Lactose permease. Protein models that differed in their conformation as well as their protonation states were defined from their respective X-ray structures. Targeted molecular dynamics was implemented to drive the substrate to the binding site and umbrella sampling was used to define the free energy of the transport pathway. It was therefore suggested that protonation for sugar binding is due to the switch-like mechanism of Glu325 in the residue-residue interaction (His322 and Glu269) that leads to sugar binding only in the protonated state of LacY. Furthermore, the free energy profile of sugar transport path way was lower only in the protonated state which indicates stability of sugar binding in the protonated state.
APA, Harvard, Vancouver, ISO, and other styles
12

Marko, Adam Christian. "Structure prediction and virtual screening: Application to G protein-coupled receptors." Diss., Search in ProQuest Dissertations & Theses. UC Only, 2009. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:1469757.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Kandasamy, Meenakshi. "Approaches to Creating Fuzzy Concept Lattices and an Application to Bioinformatics Annotations." Miami University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=miami1293821656.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Bush, Stephen J. Baker Erich J. "Automated sequence homology using empirical correlations to create graph-based networks for the elucidation of protein relationships /." Waco, Tex. : Baylor University, 2008. http://hdl.handle.net/2104/5221.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

DING, ZEJIN. "Diversified Ensemble Classifiers for Highly Imbalanced Data Learning and their Application in Bioinformatics." Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/cs_diss/60.

Full text
Abstract:
In this dissertation, the problem of learning from highly imbalanced data is studied. Imbalance data learning is of great importance and challenge in many real applications. Dealing with a minority class normally needs new concepts, observations and solutions in order to fully understand the underlying complicated models. We try to systematically review and solve this special learning task in this dissertation.We propose a new ensemble learning framework—Diversified Ensemble Classifiers for Imbal-anced Data Learning (DECIDL), based on the advantages of existing ensemble imbalanced learning strategies. Our framework combines three learning techniques: a) ensemble learning, b) artificial example generation, and c) diversity construction by reversely data re-labeling. As a meta-learner, DECIDL utilizes general supervised learning algorithms as base learners to build an ensemble committee. We create a standard benchmark data pool, which contains 30 highly skewed sets with diverse characteristics from different domains, in order to facilitate future research on imbalance data learning. We use this benchmark pool to evaluate and compare our DECIDL framework with several ensemble learning methods, namely under-bagging, over-bagging, SMOTE-bagging, and AdaBoost. Extensive experiments suggest that our DECIDL framework is comparable with other methods. The data sets, experiments and results provide a valuable knowledge base for future research on imbalance learning. We develop a simple but effective artificial example generation method for data balancing. Two new methods DBEG-ensemble and DECIDL-DBEG are then designed to improve the power of imbalance learning. Experiments show that these two methods are comparable to the state-of-the-art methods, e.g., GSVM-RU and SMOTE-bagging. Furthermore, we investigate learning on imbalanced data from a new angle—active learning. By combining active learning with the DECIDL framework, we show that the newly designed Active-DECIDL method is very effective for imbalance learning, suggesting the DECIDL framework is very robust and flexible.Lastly, we apply the proposed learning methods to a real-world bioinformatics problem—protein methylation prediction. Extensive computational results show that the DECIDL method does perform very well for the imbalanced data mining task. Importantly, the experimental results have confirmed our new contributions on this particular data learning problem.
APA, Harvard, Vancouver, ISO, and other styles
16

Ahlert, Darla. "Application of Graph Theoretic Clustering on Some Biomedical Data Sets." Thesis, Southern Illinois University at Edwardsville, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=1588658.

Full text
Abstract:
<p> Clustering algorithms have become a popular way to analyze biomedical data sets and in particular, gene expression data. Since these data sets are often large, it is difficult to gather useful information from them as a whole. Clustering is a proven method to extract knowledge about the data that can eventually lead to many discoveries in the biological world. Hierarchical clustering is used frequently to interpret gene expression data, but recently, graph-theoretic clustering algorithms have started to gain some attraction for analysis of this type of data. We consider five graph-theoretic clustering algorithms run over a post-mortem gene expression dataset, as well as a few different biomedical data sets, in which the ground truth, or class label, is known for each data point. We then externally evaluate the algorithms based on the accuracy of the resulting clusters against the ground truth clusters. Comparing the results of each of the algorithms run over all of the datasets, we found that our algorithms are efficient on the real biomedical datasets but find gene expression data especially difficult to handle.</p>
APA, Harvard, Vancouver, ISO, and other styles
17

Siek, Katie A. "The design and evaluation of an assistive application for dialysis patients." [Bloomington, Ind.] : Indiana University, 2006. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3223070.

Full text
Abstract:
Thesis (Ph.D.)--Indiana University, Dept. of Computer Science, 2006.<br>"Title from dissertation home page (viewed June 28, 2007)." Source: Dissertation Abstracts International, Volume: 67-06, Section: B, page: 3242. Adviser: Kay H. Connelly.
APA, Harvard, Vancouver, ISO, and other styles
18

Crabtree, Nathaniel Mark. "Multi-Class Computational Evolution| Development, Benchmark Comparison, and Application to RNA-Seq Biomarker Discovery." Thesis, University of Arkansas at Little Rock, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10620232.

Full text
Abstract:
<p> A computational evolution system (CES) is a knowledge-discovery engine that constructs and evolves classifiers with a small number of features to identify subtle, synergistic relationships among features and to discriminate groups in high-dimensional data analysis. CESs have previously been designed to only analyze binary datasets. In this work, the CES method has been expanded to accommodate multi-class data.</p><p> The multi-class CES was compared to three common classification and feature selection methods: random forest, random k-nearest neighbor, and support vector machines. The four classifiers were evaluated on three real RNA sequencing datasets. Performance was evaluated via cross validation to assess classification accuracy, number of features selected, stability of the selected feature sets, and run-time.</p><p> The three common classification and feature selection methods were originally designed for microarray data, which is fundamentally different from RNA-Seq data. In order to preprocess RNA-Seq count data for classification, the data was normalized and transformed via a variance stabilizing transformation to remove the variance-mean relationship that is commonly observed in RNA-Seq count data.</p><p> Compared to the three competing methods, the multi-class CES selected far fewer features. The identified features are potential biomarkers that may be more relevant than the longer lists of features identified by the competing methods. The CES performed best on the dataset with the smallest sample size, indicating that it has a unique advantage in these situations since most classification algorithms suffer in terms of accuracy when the sample size is small.</p><p> The CES identified numerous potentially-important biomarkers in each of the three real datasets that are validated by previous research and worthy of additional investigation. CES was especially helpful at identifying important features in the rat blood RNA-Seq data set. Subsequent ontological analysis of these selected features revealed protein folding as an important process in that dataset. The other contribution of this research to science was to extend the applicability of CES to biomarker discovery in multi-class settings. New software algorithms based on CES have already been developed, and the multi-class modifications presented here are directly applicable and would also benefit the newer software.</p><p>
APA, Harvard, Vancouver, ISO, and other styles
19

Qin, Jing, and 覃静. "Application of bioinformatics on gene regulation studies and regulatory network construction with omics data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/205684.

Full text
Abstract:
Gene expression is a multi-step process that involves various regulators. From whole genome sequences to the complex gene regulatory system, high-throughput technologies have generated a large amount of omics data, but information in such a large scale is hard to interpret manually. Bioinformatics can help to process this huge biological information and infer biological insights using the merits of mathematics, statistics and computational techniques. In this study, we applied various bioinformatic techniques on gene regulation in several aspects. Multiple primary transcripts of a gene can be initiated at different promoters, termed alternative promoters (APs). Most human genes have multiple APs. However, whether the usage of APs is independent or not is still controversial. In this study, we analyze the roles of APs in gene regulations using various bioinformatics approaches. Chromosomal interactions between APs are found to be more frequent than interactions between different genes. By comparing the APs at two ends of the genes, we find that they are significant different in terms of sequence content, conservation and motif frequency. The position and distance of two APs are important for their combined effects, which prove their regulations are not independent and one AP could affect the transcription of the other. With the aim to understand the multi-level gene regulatory system in various biological processes, a mass of high-throughput omics data have been generated. However, each omics technology measuring the molecular abundance or behavior at a single level has a limited ability to depict the multi-level system. Integrating omics data can effectively comprehend the multi-level gene regulatory system and reduce the false positives. In this study, two web servers, ChIP-Array and ProteoMirExpress, have been built to construct transcriptional and post-transcriptional regulatory networks by integrating omics data. ChIP-Array is a web server for biologists to construct a TF-centered network for their own data. Network library is further constructed by ChIP-Array from publicly available data. Given a series mRNA expression profiles in a biological process, master regulators can be identified by matching the profiles with the networks in the library. To explore gene regulatory network controlled by multiple TFs, least absolute shrinkage and selection operator (LASSO)-type regularization models are applied on multiple integrative data. Golden standard based evaluations demonstrate that the L0 and L1/2 regularization models are efficient and applicable to gene regulatory network inference in large genome with a small number of samples. ProteoMirExpress integrates transcriptomic and proteomic data to infer miRNA-centered networks. It successfully infers the perturbed miRNA and those that co-express with it. The resulting network reports miRNA targets with uncorrelated mRNA and protein levels, which are usually ignored by tools considering only the mRNA abundance, even though some of them may be important downstream regulators. In summary, in this study we analyze gene regulation at multiple levels and develop several tools for gene network construction and regulator analysis with multiple omics data. It benefits researchers to efficiently process high-throughput raw data and to draw biological hypotheses and interpretation.<br>published_or_final_version<br>Biochemistry<br>Doctoral<br>Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
20

Yu, Liyang. "An Indexation and Discovery Architecture for Semantic Web Services and its Application in Bioinformatics." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_theses/20.

Full text
Abstract:
Recently much research effort has been devoted to the discovery of relevant Web services. It is widely recognized that adding semantics to service description is the solution to this challenge. Web services with explicit semantic annotation are called Semantic Web Services (SWS). This research proposes an indexation and discovery architecture for SWS, together with a prototype application in the area of bioinformatics. In this approach, a SWS repository is created and maintained by crawling both ontology-oriented UDDI registries and Web sites that hosting SWS. For a given service request, the proposed system invokes the matching algorithm and a candidate set is returned with different degree of matching considered. This approach can add more flexibility to the current industry standards by offering more choices to both the service requesters and publishers. Also, the prototype developed in this research shows the value can be added by using SWS in application areas such as bioinformatics.
APA, Harvard, Vancouver, ISO, and other styles
21

Vajdi, Hoojghan Amir. "Application of Graphical Models in Protein-Protein Interactions and Dynamics." Thesis, University of Massachusetts Boston, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10982841.

Full text
Abstract:
<p> Every organism contains a few hundred to thousands of proteins. A protein is made of a sequence of molecular building blocks named amino acids. Amino acids will be referred to as residues. Every protein performs one or more functions in the cell. In order for a protein to do its job, it requires to bind properly to other partner proteins. Many genetic diseases such as cancer are caused by mutations (changes) of specific residues which cause disturbances in the functions of those proteins. The problem of prediction of protein binding site is a crucial topic in computational biology. A protein is usually made up of 50 to a few thousand residues. A contact site can occur within a protein or with other proteins. By having a robust and accurate model for identifying residues that are involved in the binding site, scientists can investigate the impact of critical mutations and residues that can cause genetic diseases. </p><p> The main focus of this thesis is to propose a machine learning model for predicting the binding site between two proteins. By extracting structural information from a protein, we can have additional knowledge of binding sites. This structural information can be converted into a penalty matrix for a graphical model to be learned from the protein sequence. The second part of this thesis is mostly focused on motion planning algorithms for proteins and simulation of the protein pathway changes using a Monte Carlo based method. Later, by applying a novel geometry based scoring function, we cluster the intermediate conformations into corresponding subsets that may indicate interesting intermediate states.</p><p>
APA, Harvard, Vancouver, ISO, and other styles
22

Gonzaludo, Nina. "Exploring the Use of Electronic Health Record-Linked Biorepositories for Pharmacogenomic Application and Discovery." Thesis, University of California, San Francisco, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3725479.

Full text
Abstract:
<p> Drug response is well documented to vary considerably among patient groups and populations, as well as within individual patients. Since drug prescribing is often based on population averages of drug response, many patients will not respond, and up to one-third may experience harmful toxicity. Genetics plays a large role in explaining the variability observed in response to different drugs and is an important factor driving precision medicine initiatives. Pharmacogenetic information can be useful in optimizing patient therapy, potentially reducing the cost of hospitalizations and treatment of adverse drug events. </p><p> As part of the Kaiser Permanente Research Program on Genes, Environment, and Health (RPGEH), we analyzed 102,979 members of the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort with genetic information available, along with almost two decades of electronic health record (EHR) data, prescription records, and lifestyle survey results. In one of the largest, most ethnically diverse pharmacogene characterization studies to date, we assessed cohort metabolizer status phenotypes for 7 drug-gene interactions (DGIs) for which there is moderate to strong evidence suggesting the use of pharmacogenetic information to guide therapy. 89% of the cohort had at least one actionable allele for the 7 DGIs in this study, and we observed large variations among ethnicities. Additionally, 17,747 individuals had been prescribed a drug for which they had an actionable or high-risk metabolizer status phenotype. For these individuals, the availability of pharmacogenetic information at point-of-care may have potentially led to a more personalized drug or dosing regimen. </p><p> Following this study, we assessed the utility of this resource for deriving two drug response phenotypes: weight gain induced by atypical antipsychotic use and major adverse cardiovascular events in clopiodgrel non-responders. Despite challenges in deriving phenotypes from the EHR, we were able to extract phenotypes that reflected observed estimates from previously published studies. Using these phenotypes, we performed candidate gene and genome-wide association studies to identify genetic variants associated with response. Altogether, this dissertation demonstrates the potential utility and clinical impact of integrating genetic data with EHRs for pharmacogenetic application and discovery, and provides the foundation for future studies in precision medicine.</p>
APA, Harvard, Vancouver, ISO, and other styles
23

Samocha, Kaitlin E. "Modeling Rare Protein-Coding Variation to Identify Mutation-Intolerant Genes With Application to Disease." Thesis, Harvard University, 2016. http://nrs.harvard.edu/urn-3:HUL.InstRepos:33493508.

Full text
Abstract:
Sequencing exomes—the 1% of the genome that codes for proteins—has increased the rate at which the genetic basis of a patient’s disease is determined. Unfortunately, when a patient does not carry a well-established pathogenic variant, it is extremely challenging to establish which of the tens of thousands of variants identified in that individual is contributing to their disease. In these situations, variants must be prioritized to make further investigation more manageable. In this thesis, we have focused on creating statistical frameworks and models to aid in the interpretation of rare variants and towards establishing gene-level metrics for variant prioritization. We developed a sensitive and specific workflow to detect newly arising (de novo) variants from exome sequencing data of parent-child trios, and created a sequence-context based mutational. This mutational model was the basis of a rigorous statistical framework to evaluate the significance of de novo variant burden not only globally, but also per gene. When we applied this framework to de novo variants identified in patients with an autism spectrum disorder, we found a global excess of de novo loss-of-function variants as well as two genes that harbored significantly more de novo loss-of-function variants than expected. We also used the mutational model to predict the expected number of rare (minor allele frequency < 0.1%) variants in exome sequencing datasets of reference individuals. We found a significant depletion of missense and loss-of-function variants in a subset of genes, indicating that these genes are under strong evolutionary constraint. Specifically, we identified 3,230 genes that are intolerant of loss-of-function variation and that set of genes is enriched for established dominant and haploinsufficient disease genes. Similarly, we searched for regions within genes that were intolerant of missense variation. The most missense depleted 15% of the exome contains 83% of reported pathogenic variants found in haploinsufficient disease genes that cause severe disease. Additionally, both gene-level and region-level constraint metrics highlight a set of de novo variants from patients with a neurodevelopmental disorder that are more likely to be pathogenic, supporting the utility of these metrics when interpreting rare variants within the context of disease.<br>Medical Sciences
APA, Harvard, Vancouver, ISO, and other styles
24

Faust, Karoline. "Development, assessment and application of bioinformatics tools for the extraction of pathways from metabolic networks." Doctoral thesis, Universite Libre de Bruxelles, 2010. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/210054.

Full text
Abstract:
Genes can be associated in numerous ways, e.g. by co-expression in micro-arrays, co-regulation in operons and regulons or co-localization on the genome. Association of genes often indicates that they contribute to a common biological function, such as a pathway. The aim of this thesis is to predict metabolic pathways from associated enzyme-coding genes. The prediction approach developed in this work consists of two steps: First, the reactions are obtained that are carried out by the enzymes coded by the genes. Second, the gaps between these seed reactions are filled with intermediate compounds and reactions. In order to select these intermediates, metabolic data is needed. This work made use of metabolic data collected from the two major metabolic databases, KEGG and MetaCyc. The metabolic data is represented as a network (or graph) consisting of reaction nodes and compound nodes. Interme- diate compounds and reactions are then predicted by connecting the seed reactions obtained from the query genes in this metabolic network using a graph algorithm.<p>In large metabolic networks, there are numerous ways to connect the seed reactions. The main problem of the graph-based prediction approach is to differentiate biochemically valid connections from others. Metabolic networks contain hub compounds, which are involved in a large number of reactions, such as ATP, NADPH, H2O or CO2. When a graph algorithm traverses the metabolic network via these hub compounds, the resulting metabolic pathway is often biochemically invalid.<p>In the first step of the thesis, an already existing approach to predict pathways from two seeds was improved. In the previous approach, the metabolic network was weighted to penalize hub compounds and an extensive evaluation was performed, which showed that the weighted network yielded higher prediction accuracies than either a raw or filtered network (where hub compounds are removed). In the improved approach, hub compounds are avoided using reaction-specific side/main compound an- notations from KEGG RPAIR. As an evaluation showed, this approach in combination with weights increases prediction accuracy with respect to the weighted, filtered and raw network.<p>In the second step of the thesis, path finding between two seeds was extended to pathway prediction given multiple seeds. Several multiple-seed pathay prediction approaches were evaluated, namely three Steiner tree solving heuristics and a random-walk based algorithm called kWalks. The evaluation showed that a combination of kWalks with a Steiner tree heuristic applied to a weighted graph yielded the highest prediction accuracy.<p>Finally, the best perfoming algorithm was applied to a microarray data set, which measured gene expression in S. cerevisiae cells growing on 21 different compounds as sole nitrogen source. For 20 nitrogen sources, gene groups were obtained that were significantly over-expressed or suppressed with respect to urea as reference nitrogen source. For each of these 40 gene groups, a metabolic pathway was predicted that represents the part of metabolism up- or down-regulated in the presence of the investigated nitrogen source.<p>The graph-based prediction of pathways is not restricted to metabolic networks. It may be applied to any biological network and to any data set yielding groups of associated genes, enzymes or compounds. Thus, multiple-end pathway prediction can serve to interpret various high-throughput data sets.<br>Doctorat en Sciences<br>info:eu-repo/semantics/nonPublished
APA, Harvard, Vancouver, ISO, and other styles
25

Cox, David Alan. "Application and integration of bioinformatic strategies towards central and peripheral proteomic profiling for diagnosis and drug discovery in schizophrenia." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/278018.

Full text
Abstract:
Proteomic profiling studies of schizophrenia have the potential to shed further light on this debilitating and poorly understood condition which affects up to 1% of the world’s population. However, recent studies suggest that the field of proteomics in general has been hindered by poor application of bioinformatic strategies, contributing to the failure of many findings to validate. In the context of schizophrenia research, there is therefore a need for a more robust application and integration of existing statistical approaches to proteomic datasets, as well as the development of new methodologies to offer solutions to current challenges. The aims of this thesis were multi-fold. Many studies have stipulated the need for new diagnostic and prognostic strategies to aid psychiatrists, particularly in predicting disease conversion from the prodromal phase. Proteomic data from serum samples was used to investigate the potential for statistical models based on biomarker panels to offer a new and clinically relevant approach. Models were trained based on either differential protein (chapter 3) or peptide (chapter 4) expression levels between schizophrenia patients and controls, as measured through multiplex immunoassay or targeted mass spectrometry technologies. In chapter 4, an SVM model based on 21 peptides was identified that is both highly sensitive and specific as a diagnostic and prognostic tool in symptomatic individuals. Furthermore, in recent years, few preclinical innovations have been made in schizophrenia research in either in vitro or in vivo studies, resulting in a standstill in the development of treatments. In chapters 5 and 6 of this thesis, proteomic information from a novel cellular model of schizophrenia was analyzed. In chapter 5, cell signalling alterations in vitro were identified which may underpin dysfunctional microglial activation in at least a subgroup of patients, thus representing new drug targets in the CNS. Subsequent analysis identified compounds which have the potential to ameliorate the observed changes. Lastly, in chapters 7 and 8, a novel systems biology methodology was developed for the functional comparison of proteomic changes in brain tissue from existing preclinical rodent models of psychiatric disorders to those in human post-mortem samples, providing a new means of overcoming some of the translational hurdles of preclinical psychiatric research. The application of different bioinformatic strategies to a range of proteomic datasets in this thesis has yielded a number of findings which have enhanced the understanding of schizophrenia pathophysiology and provide a platform for future studies towards the goal of improving outcomes for patients affected by this disorder.
APA, Harvard, Vancouver, ISO, and other styles
26

Johnson, Sean Robert. "Development and application of spectral databases and mathematical models in the study of plant natural products biosynthesis." Thesis, Washington State University, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10164019.

Full text
Abstract:
<p> Plant natural products are useful for many different applications, including medicines, flavors and fragrances, and industrial uses. Two important aspects of plant natural products research are the identification of compounds in their source plants, and the characterization of the processes involved in their biosynthesis. To aid in the identification of plant natural products, we developed the Spektraris family of databases. These databases include highperformance liquid chromatography mass spectrometry data, and <sup>13</sup>C and <sup> 1</sup>H nuclear magnetic resonance data, which are searchable through an online interface. The utility of Spektraris was validated by using it to identify compounds in plant extracts and as part of a workflow to elucidate the structure of a previously undescribed compound. </p><p> Mints have a long history of use as model systems for studying the processes of terpene natural products biosynthesis in specialized plant tissues. The mint family (Lamiaceae), synthesizes and stores volatile terpenes in glandular trichomes. Using a comparative transcriptomic approach, we identified differences in gene expression of monoterpene biosynthetic genes among mint species with different oil profiles. We also assembled the genome of a mint species, <i> Mentha longifolia</i>. The genome assembly will be valuable for future mint research. </p><p> To further investigate biosynthetic processes in mint, I developed a detailed mathematical model of the metabolism of peppermint glandular trichomes. The model incorporates multiple sources of data, including transcriptome data, metabolite data, enzymatic data from the peppermint literature, and previously developed models of plant metabolism. The creation of a new metabolic modeling software package, called YASMEnv, facilitated construction of the model. Model-based simulated reaction knockouts using flux balance analysis revealed that fermentation may be important for ATP regeneration in secretory phase glandular trichomes. Follow up experiments confirmed high levels of alcohol dehydrogenase activity in secretory phase isolated trichomes. Simulations also supported an essential role for ferredoxin and ferredoxin-NADP reductase. Transcriptome analysis revealed the presence of an isoform of ferredoxin in trichomes distinct from the one expressed in root. The presence of a distinct ferredoxin isoform in trichomes supports the hypothesis that selection pressure for efficient natural products biosynthesis may also act on the enzymes of primary metabolism.</p>
APA, Harvard, Vancouver, ISO, and other styles
27

Luecken, Malte. "Application of multi-resolution partitioning of interaction networks to the study of complex disease." Thesis, University of Oxford, 2016. https://ora.ox.ac.uk/objects/uuid:b49187be-8203-4aa0-abbd-bff1a507ff6f.

Full text
Abstract:
Large-scale gene expression studies are widely used to identify genes that are differentially expressed between phenotypes relevant to disease. Often thousands of differentially expressed genes (DEGs) are found using this type of analysis, which complicates the interpretation of the data. In this project we treat DEGs as windows into the biological processes that underlie disease. In order to find these processes, we put DEGs into the context in which they perform their functions - through the interactions of their protein products. Protein-protein interactions can provide biological context to DEGs in the form of functional modules. These modules are groups of proteins that together perform cellular functions. In this thesis we have refined a functional module detection process that consists of two steps. Firstly, community detection methods are applied to protein interaction networks (PINs) to detect groups of interacting proteins, and secondly, the biological coherence of the proteins grouped together is evaluated to select communities that represent potential functional modules. Two features that are central to this work are the detection of modules at different scales of network organization, and CommWalker, a module evaluation method that we developed which is able to detect signals of poorly-studied functions. By integrating these methods into our functional module detection process, we were able to obtain a good coverage of potential functional modules. Testing for enrichment of DEGs on these functional modules can uncover biological processes that are involved in the contrasted phenotypes and merit further investigation. We have applied our pipeline to find differentially regulated functions between hypoxic and normoxic breast cancer cell lines, and between M1 and M2 macrophages. Our results generate biological hypotheses of cellular functions that are differentially regulated in the investigated phenotypes, and proteins that are involved in these functions. We were able to validate several proteins in enriched modules which did not correspond to DEGs that were input into the pipeline, which suggests our methodology can reveal new biological insight.
APA, Harvard, Vancouver, ISO, and other styles
28

Jungjit, Suwimol. "New multi-label correlation-based feature selection methods for multi-label classification and application in bioinformatics." Thesis, University of Kent, 2016. https://kar.kent.ac.uk/58873/.

Full text
Abstract:
The very large dimensionality of real world datasets is a challenging problem for classification algorithms, since often many features are redundant or irrelevant for classification. In addition, a very large number of features leads to a high computational time for classification algorithms. Feature selection methods are used to deal with the large dimensionality of data by selecting a relevant feature subset according to an evaluation criterion. The vast majority of research on feature selection involves conventional single-label classification problems, where each instance is assigned a single class label; but there has been growing research on more complex multi-label classification problems, where each instance can be assigned multiple class labels. This thesis proposes three types of new Multi-Label Correlation-based Feature Selection (ML-CFS) methods, namely: (a) methods based on hill-climbing search, (b) methods that exploit biological knowledge (still using hill-climbing search), and (c) methods based on genetic algorithms as the search method. Firstly, we proposed three versions of ML-CFS methods based on hill climbing search. In essence, these ML-CFS versions extend the original CFS method by extending the merit function (which evaluates candidate feature subsets) to the multi-label classification scenario, as well as modifying the merit function in other ways. A conventional search strategy, hill-climbing, was used to explore the space of candidate solutions (candidate feature subsets) for those three versions of ML-CFS. These ML-CFS versions are described in detail in Chapter 4. Secondly, in order to try to improve the performance of ML-CFS in cancer-related microarray gene expression datasets, we proposed three versions of the ML-CFS method that exploit biological knowledge. These ML-CFS versions are also based on hill-climbing search, but the merit function was modified in a way that favours the selection of genes (features) involved in pre-defined cancer-related pathways, as discussed in detail in Chapter 5. Lastly, we proposed two more sophisticated versions of ML-CFS based on Genetic Algorithms (rather than hill-climbing) as the search method. The first version of GA-based ML-CFS is based on a conventional single-objective GA, where there is only one objective to be optimized; while the second version of GA-based ML-CFS performs lexicographic multi-objective optimization, where there are two objectives to be optimized, as discussed in detail in Chapter 6. In this thesis, all proposed ML-CFS methods for multi-label classification problems were evaluated by measuring the predictive accuracies obtained by two well-known multi-label classification algorithms when using the selected featuresม namely: the Multi-Label K-Nearest neighbours (ML-kNN) algorithm and the Multi-Label Back Propagation Multi-Label Learning Neural Network (BPMLL) algorithm. In general, the results obtained by the best version of the proposed ML-CFS methods, namely a GA-based ML-CFS method, were competitive with the results of other multi-label feature selection methods and baseline approaches. More precisely, one of our GA-based methods achieved the second best predictive accuracy out of all methods being compared (both with ML-kNN and BPMLL used as classifiers), but there was no statistically significant difference between that GA-based ML-CFS and the best method in terms of predictive accuracy. In addition, in the experiment with ML-kNN (the most accurate) method selects about twice as many features as our GA-based ML-CFS; whilst in the experiments with BPMLL the most accurate method was a baseline method that does not perform any feature selection, and runs the classifier once (with all original features) for each of the many class labels, which is a very computationally expensive baseline approach. In summary, one of the proposed GA-based ML-CFS methods managed to achieve substantial data reduction, (selecting a smaller subset of relevant features) without a significant decrease in predictive accuracy with respect to the most accurate method.
APA, Harvard, Vancouver, ISO, and other styles
29

Lunev, Alexey. "Evaluation and visualization of complexity in parameter setting in automotive industry." Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-362711.

Full text
Abstract:
Parameter setting is a process primary used to specify in what kind of vehicle an electronic control unit of each type is used. This thesis is targeted to investigate whether the current strategy to measure complexity gives user satisfactory results. The strategy consists of structure-based algorithms that are an essential part of the Complexity Analyzer - a prototype application used to evaluate the complexity.     The results described in this work suggest that the currently implemented algorithms have to be properly defined and adapted to be used in terms of parameter setting. Moreover, the measurements that the algorithms output has been analyzed in more detail making the results easier to interpret.     It has been shown that a typical parameter setting file can be regarded as a tree structure. To measure variation in this structure a new concept, called Path entropy has been formulated, tested and implemented.     The main disadvantage of the original version of the Complexity Analyzer application is its lack of user-friendliness. Therefore, a web version of the application based on Model-View-Controller technique has been developed. Different to the original version it has user interface included and it takes just a couple of seconds to see the visualization of data, compared to the original version where it took several minutes to run the application.
APA, Harvard, Vancouver, ISO, and other styles
30

Chen, Peikai, and 陈培凯. "Identification of cancer subtypes and subtypes-specific drivers using high-throughput data wih application to medulloblastoma." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2012. http://hub.hku.hk/bib/B49617746.

Full text
Abstract:
Cancer is a fearful, deadly disease. Currently there is almost no cure. The reason is that the disease mechanisms are hardly understood to humans. This in turn is because of the complex molecular activities that underlie cancer processes. Some variables of these processes, such as gene expressions, copy number profiles and point mutations, recently became measurable in high-throughput. However, these data are massive and non-readable even to experts. A lot of efforts are being made to develop engineering tools for the analysis and interpretation of these data, for various purposes. In this thesis, we focus on addressing the problem of individuality in cancer. More specifically, we are interested in knowing the subgroups of processes in a cancer, called subtypes. This problem has both theoretical and practical implications. Theoretically, classification of cancer patients represents an understanding of the disease, and may help speed up drug development. Practically, subgroups of patients can be treated with different protocols for optimal outcomes. Towards this end, we propose an approach with two specific aims: performing subtypes for a given set of high-throughput data, and identifying candidate genes (called drivers) that drive the subtype-specific processes. First, we assume that a subtype has a distinctive process, compared not just with normal controls, but also with other cases of the same cancer. The process is characterized with a set of differentially expressed genes uniquely found in the corresponding subtype. Based on this assumption, we develop a signature based subtyping algorithm, which on the one hand divides a set of cases into as many subtypes as possible, while on the other hand merges subtypes that have too small a signature set. We applied this algorithm to datasets of the pediatric brain tumor of medulloblastoma, and found no more than three subtypes can meet the above criteria. Second, we explore subtype patterns of the copy number profiles. By regarding all events on a chromosome arm as a single event, we quantize the copy number profiles into event profiles. An unsupervised decision tree training algorithm is specifically designed for detecting subtypes on these profiles. The trained decision tree is intuitive, predictive, easy to implement and deterministic. Its application to datasets of medulloblastoma reveals interesting subtype patterns characterized with co-occurrence of CNA events.<br>published_or_final_version<br>Electrical and Electronic Engineering<br>Doctoral<br>Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
31

McSweeny, Andrew. "Genome Evolution Model (GEM): Design and Application." University of Toledo Health Science Campus / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=mco1290550446.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Janowski, Sebastian Jan [Verfasser]. "VANESA - A bioinformatics software application for the modeling, visualization, analysis, and simulation of biological networks in systems biology applications / Sebastian Jan Janowski." Bielefeld : Universitaetsbibliothek Bielefeld, 2013. http://d-nb.info/1036112020/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Meader, Stephen. "Application of the Neutral Indel Model to genome sequences for diverse metazoans." Thesis, University of Oxford, 2010. http://ora.ox.ac.uk/objects/uuid:18f8c5fc-28f2-4d5e-aa87-c1086582213c.

Full text
Abstract:
The Neutral Indel Model is able to predict accurately the distribution of indel events in alignments of neutrally evolving genomic sequence. Here, I apply this model to a diverse range of metazoan species pairs, to a number of ends. First, I apply the Neutral Indel Model to alignments of genome sequences for species within the mammalian clade in order to estimate the quantities of functional DNA shared between species pairs. I demonstrate that as the evolutionary divergence between species pairs increases, estimates of functional sequence drop off dramatically. This pattern is not replicated in extensive simulations of genome sequence alignments, suggesting that functional (and mostly non-coding) sequence is turning over at a rapid rate. I also estimate that between 200 and 300 Mb (6.5-10%) of the human genome is under evolutionary constraint, a considerably higher quantity of sequence than has been estimated by previous whole genome analyses. Second, extending my analyses to consider more diverse metazoan species, I provide estimates for functional bases within organisms’ genomes that appear to mirror our conceptions of organismal complexity. Thirdly, I develop the Neutral Indel Model as a method for assessing genome sequence quality, by quantifying indel errors within alignments of closely related (ds < 0.1) species pairs. Applying this method to six primate genome sequence assemblies, I demonstrate that the frequency of indel error events per base varies up to six-fold. Further to this, I show that second generation sequencing technologies can be used to create high quality genome sequence assemblies and to ameliorate errors in pre-existing assemblies. Finally, I analyse patterns of indel mutations in primate transposable elements and show that indels are not randomly distributed within these sequences due to regularly spaced homo-nucleotide motifs.
APA, Harvard, Vancouver, ISO, and other styles
34

Marshall, Byron Bennett. "Concept Matching in Informal Node-Link Knowledge Representations." Diss., Tucson, Arizona : University of Arizona, 2005. http://etd.library.arizona.edu/etd/GetFileServlet?file=file:///data1/pdf/etd/azu%5Fetd%5F1145%5F1%5Fm.pdf&type=application/pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Srivastava, Vikesh. "Bias : bioinformatics integrated application software, design and implementation which was written as part of my masters degree requirements." Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=81442.

Full text
Abstract:
This thesis introduces a development platform especially tailored to Bioinformatics research and software development. Bias (Bioinformatics Integrated Application Software) provides the tools necessary for carrying out integrative Bioinformatics research. It follows an object-relational strategy for providing persistent objects, allows third-party tools to be easily incorporated within the system, and it supports standards and data-exchange protocols common to Bioinformatics. It is not enough to present the architecture of Bias without showing a working example for which exploits all composites of Bias, thus demonstrating the utility of Bias. We present the architecture of Bias and provide a full example based on the work of Segal et al. for finding transcriptional regulatory relationships. Bias is an OpenSource project and is freely available to all interested users.
APA, Harvard, Vancouver, ISO, and other styles
36

Nannapaneni, Kishore. "Design of a bioinformatics system for insertional mutagenesis analysis and its application to the Sleeping Beauty transposon system." Diss., University of Iowa, 2011. https://ir.uiowa.edu/etd/1039.

Full text
Abstract:
Cancer is one of the leading causes of death in the world. Approximately one fifth of deaths in the western industrial nations are caused by cancer. Every year several hundreds of thousands of new patients are diagnosed with cancer and several thousands die of cancer. Scientists have been conducting research from different angles for effective prevention, diagnosis and cure of Cancer. Ever since the genetic basis of cancer has been demonstrated, a race has been ignited globally in the scientific community to identify potential oncogenes and tumor suppressor genes. The genetics of the tumors are complex in nature where combinations of loss of function mutations in tumor suppressor genes and gain of function mutations in oncogenes cause cancers. The identification of these genes is extremely important to devise effective therapies to treat cancer. Insertional mutagenesis systems such as sleeping beauty provide an elegant way to identify genes involved in cancers. More and more researchers are adopting the Sleeping Beauty system for their insertional mutagenesis experiments to identify potential cancer causing genes. Given next generation sequence technologies and the vast amount of data they generate requires novel bioinformatics techniques to process, analyze and meaningfully interpret the data. The goal of this project is to develop a publicly available system for researchers worldwide to analyze the sequence data resulting from insertional mutagenesis experiments. This system will identify and annotate all the insertion sites resulting from the sequencing of the experiment. It will also identify the Common Insertion sites (CIS) and genes with Common Insertion Sites (gCIS). The Common Insertion Sites being the regions in the genome that are targeted more often than by chance. The whole system is accessible as a web application for use by researchers worldwide performing insertional mutagenesis experiments.
APA, Harvard, Vancouver, ISO, and other styles
37

Gupta, Kapil. "Combinatorial optimization and application to DNA sequence analysis." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26676.

Full text
Abstract:
Thesis (Ph.D)--Industrial and Systems Engineering, Georgia Institute of Technology, 2009.<br>Committee Chair: Lee, Eva K.; Committee Member: Barnes, Earl; Committee Member: Fan, Yuhong; Committee Member: Johnson, Ellis; Committee Member: Yuan, Ming. Part of the SMARTech Electronic Thesis and Dissertation Collection.
APA, Harvard, Vancouver, ISO, and other styles
38

Anlind, Alice. "Improvments and evaluation of data processing in LC-MS metabolomics : for application in in vitro systems pharmacology." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-329971.

Full text
Abstract:
The resistance of established medicines is rapidly increasing while the rate of discovery of new drugs and treatments have not increases during the last decades (Spiro et al. 2008). Systems pharmacology can be used to find new combinations or concentrations of established drugs to find new treatments faster (Borisy et al. 2003). A recent study aimed to use high resolution Liquid chromatography–mass spectrometry (LC-MS) for in vitro systems pharmacology, but encountered problems with unwanted variability and batch effects(Herman et al. 2017). This thesis builds on this work by improving the pipeline and comparing alternative methods and evaluating used methods. The evaluation of methods indicated that the data quality was often not improved substantially by complex methods and pipelines. Instead simpler methods such as binning for feature extraction performed best. In-fact many of the preprocessing method commonly used proved to have negative or neglect-able effects on resulting data quality. Finally the recently introduced Optimal Orthonormal System for Discriminant Analysis (OOS-DA) for batch removal was found to be a good alternative to the more complex Combat method.
APA, Harvard, Vancouver, ISO, and other styles
39

Harley, Linda Rosemary. "The application of a knowledge based system to micro-electrode guided neurosurgery." Thesis, Available online, Georgia Institute of Technology, 2004, 2004. http://etd.gatech.edu/theses/available/etd-02042004-131540/unrestricted/harley%5Flinda%5Fr%5F200405%5Fms.pdf.

Full text
Abstract:
Thesis (M. S.)--Civil and Environmental Engineering, Georgia Institute of Technology, 2004.<br>Dr. Michael Hunter, Committee Member ; Dr. Alexander M. Puzrin, Committee Member ; Dr. Nelson Baker, Committee Chair. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
40

Westrin, Karl Johan. "Methods for transcriptome reconstruction, with an application in Picea abies (L.) H. Karst." Licentiate thesis, KTH, Genteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290032.

Full text
Abstract:
Transcriptome reconstruction is an important component in the bioinformatical part of transcriptome studies. It is particulary interesting when a reference genome is missing, highly fragmented or incomplete, since in such situations, a simple alignment (or mapping) would not necessarily tell the full story. One species with such a highly fragmented reference genome is the Norway spruce (Picea abies (L.) H. Karst.) -- a conifer, which is very important for Swedish economy. Given its long juvenile phase and irregular cone setting, the demand of cultivated seeds are larger than the supply. This yields a desire to understand the transcriptomal biology behind the cone setting in P. abies. This thesis presents an introduction to this situation, and the biological and bioinformatical background in general, followed by two papers in which this is applied: Paper I introduces a novel de novo transcriptome assembler, with a focus on recovering isoforms, and paper II makes use of this assembler to be able to detect connections between scaffolds in the P. abies genome. Paper I also studies P. abies var acrocona, a mutant with shorter juvenile phase than the wild type, in order to detect how cone setting is initiated.  From differential expression studies of both mRNA and miRNA, a number of genes potentially involved in cone-setting in P. abies were found, and also a set of miRNAs that could be involved in their regulation.<br>Transkriptomrekonstruktion är en viktig komponent i den bioinformatiska delen av transkriptomstudier. Särskilt intressant är detta när ett referensgenom saknas, är kraftigt fragmenterat eller ofullständigt, ty i dessa situationer skulle inte en vanlig inpassning (eller mappning) kunna berätta allt. En art med ett kraftigt fragmenterat referensgenom är gran (Picea abies (L.) H. Karst.) -- ett barrträd, som är mycket viktigt för svensk ekonomi. På grund av dess långa uppväxtsfas och oregelbundna kottsättning, så är efterfrågan av förädlade fröer större än utbudet. Detta lämnar en önskan att förstå den transkriptomala biologin bakom granens kottsättning. Denna avhandling presenterar en introduktion till denna situation, den generella biologiska och bioinformatiska bakgrunden, följd av två artiklar i vilket detta är tillämpat: Artikel I introducerar en ny de novo transkriptomassembler med fokus på att återskapa isoformer, och artikel II tillämpar denna assembler för att kunna hitta länkar mellan scaffolder (genom-delar som hittills inte kunnat länkas med varandra) i grangenomet. Artikel II studerar även granmutanten acrocona (kottegran), vilken har kortare uppväxtsfas än vildtypen, för att kunna se vad som initierar kottsättning.  Från differentiella expressionsstudier av såväl mRNA som miRNA, hittades ett antal gener potentiellt involverade i granens kottsättning, samt några miRNA som kan vara involverade i dess reglering.<br><p>QC 2021-02-12</p>
APA, Harvard, Vancouver, ISO, and other styles
41

De, Beer Tjaart Andries Petrus. "Development of a generic, structural bioinformatics information management system and its application to variation in foot-and-mouth disease virus proteins." Pretoria : [s.n.], 2009. http://upetd.up.ac.za/thesis/available/etd-05302009-130137.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Servant, Nicolas. "Analysis of chromosome conformation data and application to cancer." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066535/document.

Full text
Abstract:
L’organisation nucléaire de la chromatine n’est pas aléatoire. Sa structure est parfaitement contrôlée, suivant un modèle hiérarchique avec différents niveaux d’organisation et de compaction. A large échelle, chaque chromosome occupe son propre espace au sein du noyau. A plus fine résolution, un chromosome est subdivisé en compartiments actifs ou répressifs, caractérisés par un état de la chromatine plus ou moins compact. A l’échelle du méga-base, cette organisation hiérarchique peut encore être divisée en domaines topologiques (ou TADs), jusqu’à la caractérisation de boucle d’ADN facilitant les interactions entre promoteurs et régions régulatrices. Très brièvement, et bien que les méchanismes exactes restent à déterminer, il a récemment été démontré que l’organisation spatiale de la chromatine dans une cellule normale joue un rôle primordial dans la régulation et l’expression des gènes. L’organisation en domaines topologiques implique la présence de complexes protéiques insulateurs tel que CTCF/cohésine. Ces facteurs jouent un rôle de barrière en restreignant et favorisant les interactions entre éléments régulateurs et gènes à l’intérieur d’un domaine, tout en limitant les interactions entre domaines. De cette façon, deux régions appartenant au même domaine topologique pourront fréquemment interagir, alors que deux régions appartenant à des domaines distincts auront une très faible probabilité d’interaction. Dans la cellule cancéreuse, l’implication de l’épigénome et de l’organisation spatiale de la chromatine dans la progression tumorale reste à ce jour largement inexplorée. Certaines études récentes ont toutefois démontré qu’une altération de la conformation de l’ADN pouvait être associée à l’activation de certains oncogènes. Même si les mécanismes exacts ne sont pas encore connus, cela démontre que l’organisation de la chromatine est un facteur important de la tumorigenèse, permettant, dans certains cas, d’expliquer les méchanismes moléculaires à l’origine de la dérégulation de certains gènes. Parmi les cas rapportés, une alération des régions insulatrices (ou frontières) entre domaines topologiques permettrait à des régions normalement éloignées spatialement de se retrouver en contact, favorisant ainsi l’activation de certains gènes. Une caractérisation systématique de la conformation spatiale des génomes cancéreux pourrait donc permettre d’améliorer nos connaissances de la biologie des cancers. Les techniques haut-débit d’analyse de la conformation de la chromatine sont actuellement largement utilisées pour caractériser les interactions physiques entre régions du génome. Brièvement, ces techniques consistent à fixer, digérer, puis liguer ensemble deux régions du génome spatialement proches. Les fragments d’ADN chimériques ainsi générés peuvent alors être séquencés par leurs extrémités, afin de quantifier le nombre de fois où ces régions ont été trouvées en contact. Parmi les différentes variantes de ces techniques, le Hi-C associé à un séquençage profond permet l’exploration systématique de ces interactions à l’échelle du génome, offrant ainsi une vue détaillée de l’organisation tri-dimensionnelle de la chromatine d’une population cellulaire<br>The chromatin is not randomly arranged into the nucleus. Instead, the nuclear organization is tightly controlled following different organization levels. Recent studies have explored how the genome is organized to ensure proper gene regulation within a constrained nuclear space. However, the impact of the epigenome, and in particular the three-dimensional topology of chromatin and its implication in cancer progression remain largely unexplored. As an example, recent studies have started to demonstrate that defects in the folding of the genome can be associated with oncogenes activation. Although the exact mechanisms are not yet fully understood, it demonstrates that the chromatin organization is an important factor of tumorigenesis, and that a systematic exploration of the three-dimensional cancer genomes could improve our knowledge of cancer biology in a near future. High-throughput chromosome conformation capture methods are now widely used to map chromatin interaction within regions of interest or across the genome. The Hi-C technique empowered by next generation sequencing was designed to explore intra and inter-chromosomal contacts at the whole genome scale and therefore offers detailed insights into the spatial arrangement of complete genomes. The aim of this project was to develop computational methods and tools, that can extract relevant information from Hi-C data, and in particular, in a cancer specific context. The presented work is divided in three parts. First, as many sequencing applications, the Hi-C technique generates a huge amount of data. Managing these data requires optimized bioinformatics workflows able to process them in reasonable time and space. To answer this need, we developped HiC-Pro, an optimized and flexible pipeline to process Hi-C data from raw sequencing reads to normalized contact maps. HiC-Pro maps reads, detects valid ligation products, generates and normalizes intra- and inter-chromosomal contact maps. In addition, HiC-Pro is compatible with all current Hi-C-based protocols
APA, Harvard, Vancouver, ISO, and other styles
43

Raj, Kumar Praveen Kumar. "APPLICATION OF TRANSCRIPTOMICS TO ADDRESS QUESTIONS IN MOLECULAR BIOLOGY AND EVOLUTION." Miami University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=miami1410427259.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Veldsman, Werner Pieter. "SNP based literature and data retrieval." Thesis, University of the Western Cape, 2016. http://hdl.handle.net/11394/5345.

Full text
Abstract:
>Magister Scientiae - MSc<br>Reference single nucleotide polymorphism (refSNP) identifiers are used to earmark SNPs in the human genome. These identifiers are often found in variant call format (VCF) files. RefSNPs can be useful to include as terms submitted to search engines when sourcing biomedical literature. In this thesis, the development of a bioinformatics software package is motivated, planned and implemented as a web application (http://sniphunter.sanbi.ac.za) with an application programming interface (API). The purpose is to allow scientists searching for relevant literature to query a database using refSNP identifiers and potential keywords assigned to scientific literature by the authors. Multiple queries can be simultaneously launched using either the web interface or the API. In addition, a VCF file parser was developed and packaged with the application to allow users to upload, extract and write information from VCF files to a file format that can be interpreted by the novel search engine created during this project. The parsing feature is seamlessly integrated with the web application's user interface, meaning there is no expectation on the user to learn a scripting language. This multi-faceted software system, called SNiPhunter, envisions saving researchers time during life sciences literature procurement, by suggesting articles based on the amount of times a reference SNP identifier has been mentioned in an article. This will allow the user to make a quantitative estimate as to the relevance of an article. A second novel feature is the inclusion of the email address of a correspondence author in the results returned to the user, which promotes communication between scientists. Moreover, links to external functional information are provided to allow researchers to examine annotations associated with their reference SNP identifier of interest. Standard information such as digital object identifiers and publishing dates, that are typically provided by other search engines, are also included in the results returned to the user.<br>National Research Foundation (NRF) /The South African Research Chairs Initiative (SARChI)
APA, Harvard, Vancouver, ISO, and other styles
45

Amela, Abellan Isaac. "Bioinformatics Approaches to Protein Interaction and Complexes: Application to Pathogen-Host Epitope Mimicry and to Fe-S Cluster Biogenesis Model." Doctoral thesis, Universitat Autònoma de Barcelona, 2013. http://hdl.handle.net/10803/125908.

Full text
Abstract:
Les interaccions antigen/anticòs són un dels tipus més interessants d’interaccions proteiques. La millor manera de prevenir les malalties causades per patògens és mitjançant l’ús de vacunes. L’aparició de la genòmica permet fer cerques a tot el genoma de nous candidats vacunals, tècnica anomenada vaccinologia inversa. L’estratègia més comuna on s’aplica la vaccinologia inversa és al disseny de vacunes de subunitats recombinants, que en general generen resposta immune humoral a causa de la presència d’epítops B en les proteïnes del patogen. Un problema important d’aquesta estratègia és la identificació de les proteïnes immunogèniques protectives del surfoma del patogen. El mimetisme epitòpic pot donar lloc a fenòmens autoimmunes relacionats amb diverses malalties humanes. El Capítol I d’aquesta tesi descriu una anàlisi computacional basat en la seqüència on, mitjançant l’aplicació de l’algorisme BLASTP, es van comparar bases de dades d’epítops B lineals coneguts i també de seqüències de proteïnes de superfície dels principals patògens bacterians respiratoris humans amb el proteoma humà. Es va trobar que cap dels 7353 epítops B lineals analitzats tenien regions d’identitat de seqüència amb proteïnes humanes capaces de generar anticossos i alhora que només l'1% de les 2175 proteïnes analitzades contenien alguna zona de seqüència compartida amb el proteoma humà. Aquestes troballes suggereixen l’existència d’un mecanisme per evitar l’autoimmunitat. També proposem una estratègia per corroborar o advertint sobre la viabilitat d’una proteïna que contingui un cert epítop B lineal de ser un bon candidat vacunal mitjançant estudis de vaccinologia inversa. En resum, els epítops sense cap tipus d’identitat de seqüència amb proteïnes humanes han de ser bons candidats vacunals, i al l’inrevés. El docking proteic és un mètode computacional per predir la millor manera en què interactuen les proteïnes, però, és possible identificar quina és la millor solució d’un programa de docking? La resposta habitual a aquesta pregunta és la solució que tingui més alta puntuació als outputs dels programes de docking, però les interaccions entre proteïnes són processos dinàmics, i moltes vegades la regió d’interacció és prou àmplia com per permetre diferents orientacions i/o energies d'interacció entre elles. En alguns casos, com en un multímer, es poden donar diverses regions d’interacció entre els monòmers. Aquests processos dinàmics impliquen interaccions, amb desplaçaments de superfície entre proteïnes, que porten a assolir la configuració funcional del complex proteic. Així doncs, en molts casos no hi ha una solució estàtica i única per a la interacció entre proteïnes, sinó que es donen diverses configuracions que també haurien de ser analitzades perquè podrien ser importants. Per extreure el conjunt de solucions més representatives dels outputs dels programes de docking, al Capítol II d’aquesta tesi es detalla el desenvolupament d’una aplicació de clústering no supervisada i automàtica, anomenada DockAnalyse. Aquesta aplicació es basa en el mètode ja existent de clústering DBscan, mitjançant el qual es busquen continuïtats entre els clústers generats per la representació de les dades dels outputs de docking. El mètode de clústering DBSCAN és molt robust i resol alguns dels problemes d’inconsistència dels mètodes clàssics de clústering com el tractament dels valors atípics i la dependència alhora de definir prèviament el nombre de clústers. Mitjançant representacions gràfiques i molt visuals, DockAnalyse fa que la interpretació de les solucions de docking sigui més fàcil permetent-nos trobar les més representatives. S’ha utilitzat aquesta nova aplicació per analitzar diverses interaccions proteiques i així poder modelar el comportament dinàmic de la interacció entre les proteïnes d’un complex. DockAnalyse també pot fer-se servir per a descriure regions d’interacció entre proteïnes i, per tant, orientar en futurs assajos de docking flexibles. L’aplicació (feta amb el paquet R) és oberta i accessible. La construcció dels Clústers Ferro-Sofre (ISC) en eucariotes implica interaccions entre diferents proteïnes, entre els quals es troba la proteïna Frataxina. Dèficits d'aquesta proteïna s'han associat amb excés de ferro dins del mitocondri i alteracions en la biogènesi dels ISC ja que es proposa que Frataxina actua com a donadora de ferro per a la construcció d'aquests ISC en aquest orgànul. Una reducció dràstica de Frataxina causa l'Atàxia de Friedreich, una malaltia neurodegenerativa hereditària humana que afecta principalment l'equilibri, la coordinació, els músculs i el cor. Aquest síndrome és l'atàxia autosòmica recessiva més comuna. Entre els mecanismes moleculars d' humans i de llevat que involucren Frataxina s'han trobat moltes similituds així que els llevats representen un bon model per a estudiar aquest procés. En llevat, el complex proteic que forma la plataforma central de muntatge dels passos inicials de la biogènesi dels ISC està composta per la Frataxina homòloga de llevat, el dímer Nfs1-Isd11 i la proteïna Isu. En general, està acceptat que la funció de les proteïnes implica interaccions amb altres proteïnes associades, però en aquest cas no se sap prou sobre l'estructura del complex de proteïnes i, per tant, com funciona exactament. En el Capítol III d'aquesta tesi es proposa un model del complex proteic necessari per a la biogènesi dels ISC amb el que es pretén aprofundir en detalls estructurals que expliquin la funció biològica. Per aconseguir aquest objectiu s'han utilitzat diverses eines de la bioinformàtiques, així com tècniques de modelització i programes de docking de proteïnes. Com a resultat, s'ha modelat l'estructura d'aquest complex proteic i també s'ha suggerit el comportament dinàmic dels seus components, juntament amb la dels àtoms de ferro i sofre necessaris per a la formació dels ISC. Aquestes hipòtesis podrien ajudar a comprendre millor la funció i les propietats moleculars de la proteïna Frataxina, així com els de les seves companyes presents al complex proteic.<br>Antigen/antibody interactions are one of the most interesting kinds of protein interactions. The best way to prevent diseases caused by pathogens is by the use of vaccines. The advent of genomics enables genome-wide searches of new vaccine candidates, called reverse vaccinology. The most common strategy to apply reverse vaccinology is by designing subunit recombinant vaccines, which usually generate humoral immune response due to B-cell epitopes in proteins. A major problem for this strategy is the identification of protective immunogenic proteins from the surfome of the pathogen. Epitope mimicry may lead to auto-immune condition related to several human diseases. Chapter I of this thesis describes a sequence-based computational analysis that was carried out applying the BLASTP algorithm where databases containing the known linear B-cell epitopes and the surface-protein sequences of the main human respiratory bacterial pathogens were compared to the human proteome. We found that none of the 7353 linear B-cell epitopes analyzed share any sequence identity region with human proteins capable of generating antibodies, and that only 1% of the 2175 exposed proteins analyzed contain a stretch of shared sequence with the human proteome. These findings suggest the existence of a mechanism to avoid autoimmunity. We also propose a strategy for corroborating or warning about the viability of a protein linear B-cell epitope to be a putative vaccine candidate in reverse vaccinology studies. Therefore, epitopes without any sequence identity with human proteins should be good vaccine candidates, and the other way around. Protein docking is a computational method to predict the best way by which proteins interact, but, is it possible to identify what the best solution of a docking program is? The usual answer to this question is the highest score solution, but interactions between proteins are dynamic processes, and many times the interaction regions are wide enough to permit protein-protein interactions with different orientations and/or interaction energies. In some cases, as in a multimeric protein complex, several interaction regions are possible among the monomers. These dynamic processes involve interactions with surface displacements between the proteins to finally achieve the functional configuration of the protein complex. Consequently, there is not a static and single solution for the interaction between proteins, but there are several important configurations that also have to be analyzed. To extract those representative solutions from the docking output datafile, Chapter II of this thesis details the development of an unsupervised and automatic clustering application, named DockAnalyse. This application is based on the already existing DBscan clustering method, which searches for continuities among the clusters generated by the docking output data representation. The DBscan clustering method is very robust and, moreover, solves some of the inconsistency problems of the classical clustering methods like, for example, the treatment of outliers and the dependence of the previously defined number of clusters. DockAnalyse makes the interpretation of the docking solutions through graphical and visual representations easier by guiding the user to find the representative solutions. We have applied our new approach to analyze several protein interactions and model the dynamic protein interaction behavior of a protein complex. DockAnalyse might also be used to describe interaction regions between proteins and, therefore, guide future flexible dockings. The application (implemented in the R package) is accessible. The assembly of Iron-Sulfur Clusters (ISCs) in eukaryotes involves interactions between different proteins, among which is important the protein Frataxin. Deficits in this protein have been associated with iron inside the mitochondria and impaired ISC biogenesis as it is postulated to act as the iron donor for ISCs assembly in this organelle. A pronounced lack of Frataxin causes Friedreich's Ataxia, which is a human neurodegenerative and hereditary disease mainly affecting the equilibrium, coordination, muscles and heart. Moreover, it is the most common autosomal recessive ataxia. High similarities between the human and yeast molecular mechanisms that involve Frataxin have been suggested making yeast a good model to study that process. In yeast, the protein complex that forms the central assembly platform for the initial step of ISC biogenesis is composed by yeast Frataxin homolog, Nfs1-Isd11 and Isu. In general, it is commonly accepted that protein function involves interaction with other protein partners, but in this case not enough is known about the structure of the protein complex and, therefore, how it exactly functions. In Chapter III of this thesis a model of the ISC biogenesis protein complex was proposed in order to gain insight into structural details that could end up with its biological function. To achieve this goal several bioinformatics tools, modeling techniques and protein docking programs were used. As a result, the structure of the protein complex and the dynamic behavior of its components, along with that of the iron and sulfur atoms required for the ISC assembly, were modeled. This hypothesis might help to better understand the function and molecular properties of Frataxin as well as those of its ISC assembly protein partners.
APA, Harvard, Vancouver, ISO, and other styles
46

Gómez, Álvarez Josep. "Dolphin and whale: development, evaluation and application of novel bioinformatics tools for metabolite profiling in high throughput 1H-NMR analysis." Doctoral thesis, Universitat Rovira i Virgili, 2016. http://hdl.handle.net/10803/399578.

Full text
Abstract:
El perfilat de metabòlits es la tasca més difícil dins l'anàlisi espectral de RMN. El seu objectiu es comprendre els processos biològics que tenen lloc en un moment concret mitjançant la identificació i quantificació dels metabòlits presents en mescles d' RMN complexes. Un espectre de RMN està compost per ressonàncies d'un gran nombre de metabòlits, i aquestes se solen solapar entre elles, canviar de posició depenent del pH de la mostra i poden quedar emmascarades per senyals de macromolècules. Tots aquests problemes compliquen la identificació i quantificació de metabòlits, pel que obtenir un perfil de metabòlits curat en una mostra pot ser un gran repte inclús per usuaris experts. En aquest context, la motivació d'aquesta tesi va néixer amb l'objectiu de donar automatismes i funcions fàcils de fer servir per al perfilat de metabòlits en RMN, millorant la qualitat dels resultats i reduint el temps d'anàlisi. Per fer-ho, es van implementar un conjunt d'algoritmes que van acabar empaquetats en dos programes, Dolphin i Whale.<br>El perfilado de metabolitos es la tarea más difícil dentro del análisis espectral de RMN. Su objetivo es comprender los procesos biológicos que tienen lugar en un momento concreto a través de la identificación y cuantificación de los metabolitos presentes en mezclas de RMN complejas. Un espectro de RMN está compuesto por resonancias de un gran numero de metabolitos, y éstas a menudo se solapan entre ellas, cambian de posición dependiendo del pH de la muestra y pueden quedar enmascaradas por señales de macromoléculas. Todos estos problemas complican la identificación y cuantificación de metabolitos, por lo que obtener un perfilado de metabolitos curado en una muestra puede ser un gran reto incluso para usuarios expertos. En este contexto, la motivación de esta tesis nació con el objetivo de dar automatismos y funciones fáciles de usar para el perfilado de metabolitos en RMN, mejorando la calidad de los resultados y reduciendo el tiempo de análisis. Para hacerlo, se implementaron un conjunto de algoritmos que acabaron empaquetados en dos programas, Dolphin y Whale.<br>Metabolite profiling is the most challenging approach in NMR spectral analysis. It aims to comprehend biological processes occurring in a certain moment through identifying and quantifying metabolites present in complex NMR mixtures. An NMR spectrum is composed by resonances of a huge number of metabolites, and these resonances often overlap between them, shift position depending on the sample pH and can be masked by macromolecules signals. All these drawbacks hinder metabolite identification and quantification, so obtaining a cured metabolite profile of a sample can be a very big issue even for expert users. In this context, the motivation of this thesis was born with the aim to provide automatisms and user-friendly interactive functions for NMR metabolite profiling, improving the quality of the results and reducing the time span of the analysis. To do so, several algorisms were implemented and embedded into two software packages, Dolphin and Whale.
APA, Harvard, Vancouver, ISO, and other styles
47

Ajawatanawong, Pravech. "Mine the Gaps : Evolution of Eukaryotic Protein Indels and their Application for Testing Deep Phylogeny." Doctoral thesis, Uppsala universitet, Systematisk biologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-220727.

Full text
Abstract:
Insertions/deletions (indels) are potentially powerful evolutionary markers, but little is known about their evolution and few tools exist to effectively study them. To address this, I developed SeqFIRE, a tool for automated identification and extraction of indels from protein multiple sequence alignments. The program also extracts conserved alignment blocks, thus covering all major steps in preparing multiple sequence alignments for phylogenetic analysis. I then used SeqFIRE to build an indel database, using 299 single copy proteins from a broad taxonomic sampling of mainly multicellular eukaryotes. A total of 4,707 indels were extracted, of which 901 are simple (one genetic event) and 3,806 are complex (multiple events). The most abundant indels are single amino acid simple indels. Indel frequency decreases exponentially with length and shows a linear relationship with host protein size. Singleton indels reveal a strong bias towards insertions (2.31 x deletions on average). These analyses also identify 43 indels marking major clades in Plantae and Fungi (clade defining indels or CDIs), but none for Metazoa. In order to study the 3806 complex indels they were first classified by number of states. Analysis of the 2-state complex and simple indels combined (“bi-state indels”) confirms that insertions are over 2.5 times as frequent as deletions. Three-quarters of the complex indels had three-nine states (“slightly complex indels”). A tree-assisted search method was developed allowing me to identify 1,010 potential CDIs supporting all examined major branches of Plantae and Fungi. Forty-two proteins were also found to host complex indel CDIs for the deepest branches of Metazoa. After expanding the taxon set for these proteins, I identified a total of 49 non-bilaterian specific CDIs. Parsimony analysis of these indels places Ctenophora as sister taxon to all other Metazoa including Porifera. Six CDIs were also found placing Placozoa as sister to Bilateria. I conclude that slightly complex indels are a rich source of CDIs, and my tree-assisted search strategy could be automated and implemented in the program SeqFIRE to facilitate their discovery. This will have important implications for mining the phylogenomic content of the vast resource of protist genome data soon to become available.
APA, Harvard, Vancouver, ISO, and other styles
48

Venkatesan, Aravind. "Application of Semantic Web Technology to Establish Knowledge Management and Discovery in the Life Sciences." Doctoral thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for biologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-24074.

Full text
Abstract:
The last three decades has seen the successful development of many high-throughput technologies that have revolutionised and transformed biological research. The application of these technologies has generated large quantities of data allowing new approaches to analyze and integrate these data, which now constitute the field of Systems Biology. Systems Biology aims to enable a holistic understanding of a biological system by mapping interactions between all the biochemical components within the system. This requires integration of interdisciplinary data and knowledge to comprehensively explore the various biological processes of a system. Ontologies in biology (bio-ontologies) and the Semantic Web are playing an increasingly important role in the integration of data and knowledge by offering an explicit, unambiguous and rich representation mechanism. This increased influence led to the proposal of the Semantic Systems Biology paradigm to complement the techniques currently used in Systems Biology. Semantic Systems Biology provides a semantic description of the knowledge about the biological systems on the whole facilitating data integration, knowledge management, reasoning and querying. However, this approach is still a typical product of technology push, offering potential users access to the new technology. This doctoral thesis presents the work performed to bring Semantic Systems Biology closer to biological domain experts. The work covers a variety of aspects of Semantic Systems Biology: The Gene eXpression Knowledge Base is a resource that captures knowledge on gene expression. The knowledge base exploits the power of seamless data integration offered by the semantic web technologies to build large networks of varied datasets, capable of answering complex biological questions. The knowledge base is the result of the active collaboration with the Gastrin Systems Biology group here at the Norwegian University of Science and Technology. This resource was customised by the integration of additional data sets on users’ request. Additionally, the utility of the knowledge base is demonstrated by the conversion of biological questions into computable queries. The joint analysis of the query results has helped in filling knowledge gaps in the biological system of study. Biologists often use different bioinformatics tools to conduct complex biological analysis. However, using these tools frequently poses a steep learning curve for the life science researchers. Therefore, the thesis describes ONTO-ToolKit, a plug-in that allows biologists to exploit bio-ontology based analysis as part of biological workflows in Galaxy. ONTO-ToolKit allows users to perform ontology-based analysis to improve the depth of their overall analysis Visualisation plays a key role in aiding users understand and grasp the knowledge represented in bio-ontologies. To this end, OLSVis, a web application was developed to make ontology browsing intuitive and flexible. Finally, the steps needed to further advance the Semantic Systems Biology approach has been discussed.<br>Semantic Systems Biology
APA, Harvard, Vancouver, ISO, and other styles
49

Verma, Rajni [Verfasser]. "Development and Application of Novel Bioinformatics and Computational Modeling Tools for Protein Engineering Advanced Computational Tools for Protein Engineering / Rajni Verma." Bremen : IRC-Library, Information Resource Center der Jacobs University Bremen, 2013. http://d-nb.info/103526966X/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Dhawan, Manik. "Application of Committee k-NN Classifiers for Gene Expression Profile Classification." University of Akron / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=akron1227547457.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography