Dissertations / Theses: 'Gene ontology'

1

Speer, Nora. "Funktionelles Clustering von Genen mit der Gene Ontology /." Berlin : Logos-Verl, 2006. http://deposit.d-nb.de/cgi-bin/dokserv?id=2875270&prov=M&dok_var=1&dok_ext=htm.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Aleksakhin, Vladyslav. "Visualization of gene ontology and cluster analysis results." Thesis, Linnéuniversitetet, Institutionen för datavetenskap, fysik och matematik, DFM, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-21248.

Full text

Abstract:

The purpose of the thesis is to develop a new visualization method for Gene Ontologiesand hierarchical clustering. These are both important tools in biology andmedicine to study high-throughput data such as transcriptomics and metabolomicsdata. Enrichment of ontology terms in the data is used to identify statistically overrepresentedontology terms, that give insight into relevant biological processes orfunctional modules. Hierarchical clustering is a standard method to analyze andvisualize data to nd relatively homogeneous clusters of experimental data points.Both methods support the analysis of the same data set, but are usually consideredindependently. However, often a combined view such as: visualizing a large data setin the context of an ontology under consideration of a clustering of the data.The result of the current work is a user-friendly program that combines twodi erent views for analysing Gene Ontology and Cluster simultaneously. To makeexplorations of such a big data possible we developed new visualization approach.

APA, Harvard, Vancouver, ISO, and other styles

3

Macholan, Robert Daniel. "Analysis of Gene Expression Data for Gene Ontology Based Protein Function Prediction." University of Akron / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=akron1301529255.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Kim, Jong Woo. "A Novel Approach to Ontology Management." Digital Archive @ GSU, 2010. http://digitalarchive.gsu.edu/cis_diss/39.

Full text

Abstract:

The term ontology is defined as the explicit specification of a conceptualization. While much of the prior research has focused on technical aspects of ontology management, little attention has been paid to the investigation of issues that limit the widespread use of ontologies and the evaluation of the effectiveness of ontologies in improving task performance. This dissertation addresses this void through the development of approaches to ontology creation, refinement, and evaluation. This study follows a multi-paper model focusing on ontology creation, refinement, and its evaluation. The first study develops and evaluates a method for ontology creation using knowledge available on the Web. The second study develops a methodology for ontology refinement through pruning and empirically evaluates the effectiveness of this method. The third study investigates the impact of an ontology in use case modeling, which is a complex, knowledge intensive organizational task in the context of IS development. The three studies follow the design science research approach, and each builds and evaluates IT artifacts. These studies contribute to knowledge by developing solutions to three important issues in the effective development and use of ontologies.

APA, Harvard, Vancouver, ISO, and other styles

5

King, James Lowell. "Gene Ontology-Guided Force-Directed Visualization of Protein Interaction Networks." Diss., NSUWorks, 2019. https://nsuworks.nova.edu/gscis_etd/1066.

Full text

Abstract:

Protein interaction data is being generated at unprecedented rates thanks to advancements made in high throughput techniques such as mass spectrometry and DNA microarrays. Biomedical researchers, operating under budgetary constraints, have found it difficult to scale their efforts to keep up with the ever-increasing amount of available data. They often lack the resources and manpower required to analyze the data using existing methodologies. These research deficiencies impede our ability to understand diseases, delay the advancement of clinical therapeutics, and ultimately costs lives. One of the most commonly used techniques to analyze protein interaction data is the construction and visualization of protein interaction networks. This research investigated the effectiveness and efficiency of novel domain-specific algorithms for visualizing protein interaction networks. The existing domain-agnostic algorithms were compared to the novel algorithms using several performance, aesthetic, and biological relevance metrics. The graph drawing algorithms proposed here introduced novel domain-specific forces to the existing force-directed graph drawing algorithms. The innovations include an attractive force and graph coarsening policy based on semantic similarity, and a novel graph refinement algorithm. These experiments have demonstrated that the novel graph drawing algorithms consistently produce more biologically meaningful layouts than the existing methods. Aggregated over the 480 tests performed, and quantified using the Biological Evaluation Percentage metric defined in the Methodology chapter, the novel graph drawing algorithms created layouts that are 237 percent more biologically meaningful than the next best algorithm. This improvement came at the cost of additional edge crossings and smaller minimum angles between adjacent edges, both of which are undesirable aesthetics. The aesthetic and performance tradeoffs are experimentally quantified in this study, and dozens of algorithmically generated graph drawings are presented to visually illustrate the benefits of the novel algorithms. The graph drawing algorithms proposed in this study will help biomedical researchers to more efficiently produce high quality interactive protein interaction network drawings for improved discovery and communication.

APA, Harvard, Vancouver, ISO, and other styles

6

Rego, Fernanda Orpinelli Ramos do. "Modelagem computacional de famílias de proteínas microbianas relevantes para produção de bioenergia." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/95/95131/tde-28082015-222248/.

Full text

Abstract:

Modelos ocultos de Markov (HMMs - hidden Markov models) são ferramentas essenciais para anotação automática de proteínas. Por muitos anos, bancos de dados de famílias de proteínas baseados em HMMs têm sido disponibilizados para a comunidade científica (e.g. TIGRfams). Muitos esforços também têm sido dedicados à geração automática de HMMs de famílias de proteínas (e.g. PANTHER). No entanto, HMMs manualmente curados de famílias de proteínas permanecem como o padrão-ouro para anotação de genomas. Neste contexto, este trabalho teve como principal objetivo a geração de cerca de 80 famílias de proteínas microbianas relevantes para produção de bioenergia, baseadas em HMMs. Para gerar os HMMs, seguimos um protocolo de curadoria manual, gerado neste trabalho. Partimos de uma proteína que tenha função experimentalmente comprovada, esteja associada a uma publicação e tenha sido manualmente anotada com termos da Gene Ontology, criados pelo projeto MENGO¹ (Microbial ENergy Gene Ontology). Os próximos passos consistiram na (1) definição de um critério de seleção para inclusão de membros à família; (2) busca por membros via BLAST; (3) geração do alinhamento múltiplo (MUSCLE 3.7) e do HMM (HMMER 3.0); (4) análise dos resultados e iteração do processo, com o HMM preliminar usado nas buscas adicionais; (5) definição de uma nota de corte (cutoff) para o HMM final; (6) validação individual dos modelos. As principais contribuições deste trabalho são 74 HMMs (manualmente curados) disponibilizados via web (http://mengofams.lbi.iq.usp.br/), onde é possível fazer buscas e o download dos modelos, um protocolo detalhado sobre a curadoria manual de HMMs para famílias de proteínas e uma lista com proteínas candidatas a reanotação. Hidden Markov Models (HMMs) are essential tools for automated annotation of protein sequences. For many years now protein family resources based on HMMs have been made available to the scientific community (e.g. TIGRfams). Much effort has also been devoted to the automated generation of protein family HMMs (e.g Panther). However, manually curated protein family HMMs remain the gold standard for use in genome annotation. In this context, this work had as main objectives the generation of appoximately 80 protein families based on HMMs. We follow a standard protocol, that was generated in this work, to create the HMMs. At first, we start from a protein with experimentally proven function, associated to a publication and that was manually annotated with new terms from Gene Ontology provided by MENGO¹ (Microbial ENergy Gene Ontology). The next steps consists of (1) definition of selection criteria to capture members of the family; (2) search for members via BLAST; (3) generation of multiple alignment (MUSCLE 3.7) and the HMM (HMMER 3.0); (4) result analysis and iteration of the process, using the preliminary HMM; (5) cutoff definition to the final HMM; (6) individual validation of the models using tests against NCBIs NR database. The main deliverables of this work are 74 HMMs manually curated available in the site project (mengofams.lbi.iq.usp.br) that allows browsing and download of all HMMs curated so far, a standard protocol manual curation of protein families, a list with proteins that need to be reviewed.

APA, Harvard, Vancouver, ISO, and other styles

7

Helgadóttir, Hanna Sigrún. "Using semantic similarity measures across Gene Ontology to predict protein-protein interactions." Thesis, University of Skövde, School of Humanities and Informatics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-971.

Full text

Abstract:

Living cells are controlled by proteins and genes that interact through complex molecular pathways to achieve a specific function. Therefore, determination of protein-protein interaction is fundamental for the understanding of the cell’s lifecycle and functions. The function of a protein is also largely determined by its interactions with other proteins. The amount of protein-protein interaction data available has multiplied by the emergence of large-scale technologies for detecting them, but the drawback of such measures is the relatively high amount of noise present in the data. It is time consuming to experimentally determine protein-protein interactions and therefore the aim of this project is to create a computational method that predicts interactions with high sensitivity and specificity. Semantic similarity measures were applied across the Gene Ontology terms assigned to proteins in S. cerevisiae to predict protein-protein interactions. Three semantic similarity measures were tested to see which one performs best in predicting such interactions. Based on the results, a method that predicts function of proteins in connection with connectivity was devised. The results show that semantic similarity is a useful measure for predicting protein-protein interactions.

APA, Harvard, Vancouver, ISO, and other styles

8

Macmullen, W. John Marchionini Gary. "Contextual analysis of variation and quality in human-curated gene ontology annotations." Chapel Hill, N.C. : University of North Carolina at Chapel Hill, 2007. http://dc.lib.unc.edu/u?/etd,774.

Full text

Abstract:

Thesis (Ph. D.)--University of North Carolina at Chapel Hill, 2007. Title from electronic title page (viewed Dec. 18, 2007). "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the School of Information and Library Science." Discipline: Information and Library Science; Department/School: Information and Library Science, School of.

APA, Harvard, Vancouver, ISO, and other styles

9

Welter, Danielle. "Investigating “Gene Ontology”- based semantic similarity in the context of functional genomics." Thesis, Cardiff University, 2011. http://orca.cf.ac.uk/14292/.

Full text

Abstract:

Gene functional annotations are an essential part of knowledge discovery in the analysis of large datasets, with the Gene Ontology [Ashburner et al., 2000] as the de facto standard for such annotations. A considerable number of approaches for quantifying functional similarity between gene products based on the semantic similarity between their annotations have been developed, but little guidance exists as to which of these measures are the most appropriate for different purposes. This was addressed here by comparing the performances of a number of similarity measures and associated parameters. This comparison provided some interesting new insights as well as confirming emerging trends from the literature. There is also a pressing need for novel ways of applying these measures to facilitate the functional analysis of lists of gene products. We developed a novel algorithm, FuSiGroups, to group GO terms based on their semantic similarity and genes based on their functional similarity. This two-fold grouping results in groups of not only functionally similar genes but also an associated set of related GO terms that characterise a single functional aspect relating the genes in the group, which facilitates analysis by creating more coherent groups. Each gene can belong to multiple groups, so the groups more accurately reflect the complexity of biological reality than clusters generated using traditional approaches. FuSiGroups was tested on a number of scenarios and in each case, successfully generated biologically relevant groups, identifying the key functional aspects of the dataset. The algorithm also managed to eliminate genes that were functionally unrelated to the bulk of the dataset and distinguish between different biological pathways. Although dataset size is currently a limiting factor, with smaller datasets performing the best, FuSiGroups has been demonstrated as a promising approach for the functional analysis of gene products.

APA, Harvard, Vancouver, ISO, and other styles

10

Kharsikar, Saket. "A GENE ONTOLOGY BASED COMPUTATIONAL APPROACH FOR THE PREDICTION OF PROTEIN FUNCTIONS." University of Akron / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=akron1187026388.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Sasazaki, Mariana Yuri. "Infraestrutura computacional para avaliação da similaridade funcional composta entre microRNAs baseada em ontologias." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/95/95131/tde-02112014-133658/.

Full text

Abstract:

MicroRNAs (miRNAs) são pequenos RNAs não codificadores de proteínas que atuam principalmente como silenciadores pós-transcricionais, inibindo a tradução de RNAs mensageiros. Evidências crescentes revelam que tais moléculas desempenham papéis críticos em muitos processos biológicos importantes. Uma vez que não existem anotações de termos de miRNAs na Gene Ontology (GO), tampouco um banco de dados de referência com anotações funcionais dos mesmos, o cálculo da medida de similaridade entre miRNAs de forma direta não possui um padrão estabelecido. Por outro lado, a existência de bancos de dados de genes-alvo de miRNAs, como o TarBase, e bases de dados contendo informações sobre associações de miRNAs e doenças humanas, como o HMDD, nos permite inferir a similaridade funcional dos miRNAs indiretamente, por meio da análise de seus genes-alvo na GO ou entre suas doenças relacionadas na ontologia MeSH. Além disso, de acordo com a estrutura da ontologia de miRNAs OMIT, um miRNA também pode ser anotado com outras informações, tais como a sua natureza de atuação como oncogênico ou supressor de tumor, o organismo em que se encontra, o tipo de experimento em que foi encontrado, suas associações com doenças, genes-alvo, proteínas e eventos patológicos. Dessa forma, a similaridade entre miRNAs pode ser inferida com base na combinação de um conjunto de informações contidas nas respectivas anotações, de forma que possamos obter um aproveitamento de várias informações existentes, definindo assim um cálculo de similaridade funcional composta. Assim, neste trabalho, propomos a criação e aplicação de um método chamado CFSim, aplicado sobre a OMIT e que utiliza a ontologia de doenças, MeSH, e a ontologia de genes, GO, para calcular a similaridade entre dois miRNAs, juntamente com informações contidas em suas anotações. A validação de nosso método foi realizada por meio da comparação com a similaridade funcional inferida considerando diferentes famílias de miRNAs e os resultados obtidos mostraram que nosso método é eficiente, no sentido de que a similaridade entre miRNAs pertencentes à mesma família é maior que a similaridade entre miRNAs de famílias distintas. Ainda, em comparação com os métodos de similaridade funcional já existentes na literatura, o CFSim obteve melhores resultados. Adicionalmente, para tornarmos viável a utilização do método proposto, foi projetado e implementado um ambiente contendo a infraestrutura necessária para que pesquisadores possam incluir dados obtidos de novas descobertas e consultar as informações sobre um determinado miRNA, assim como calcular a similaridade entre dois miRNAs, baseada no método proposto. MicroRNAs (miRNAs) are small non-coding RNA that mainly negatively regulate gene expression by inhibiting translation of target RNAs. Increasing evidences show that such molecules play critical roles in many important biological processes. Since there are no terms of miRNAs annotations in Gene Ontology (GO), nor a database with microRNAs functional annotations, directly calculating the functional similarity between miRNAs does not have an estabilished pattern aproach. However, the existence of miRNAs target genes database, such as TarBase, and a miRNAs-disease associations database, such as HMDD, allow us to indirectly infer functional similarity of miRNAs through the analysis of their target genes in GO or between their related diseases in MeSH. Moreover, according to the structure of the ontology of miRNAs OMIT, a miRNA can also be annotated with other information, such as if it acts as an oncogene or a tumor suppressor, the organism that it belongs, the experiment in which it was found, its associations with diseases, target genes, proteins and pathological events. Thus, miRNAs similarity can be inferred based on the combination of a broad set of information contained in their annotations, indeed, we can use all available information defining the calculation of a composed functional similarity. In this study, we propose the creation and application of CFSim method applied to the OMIT using the diseases ontology, MeSH, and gene ontology, GO, to compute miRNAs similarity based on different information in their annotations. We validated our method by comparing with functional similarity inferred by miRNA families and the results showed that our method is efficient in sense that the functional similarity between miRNAs in the same family was greater compared to other miRNAs from distinct families. Furthermore, in comparison with existing methods of functional similarity in the literature until the present day, the CFSim showed better results. Finally, to make feasible the use of the proposed method, an environment was designed and implemented, containing the necessary infrastructure so that researchers can include data from new discoveries and see information about a particular miRNA, as well as calculate the similarity between two miRNAs, based in the proposed method.

APA, Harvard, Vancouver, ISO, and other styles

12

Xue, Lin. "A VISUALIZATION TOOL FOR CROSS-EXPERIMENT GENE EXPRESSION ANALYSIS OF C. ELEGANS." UKnowledge, 2007. http://uknowledge.uky.edu/gradschool_theses/472.

Full text

Abstract:

Forty-six genomic gene expression studies of free living soil nematode C. eleganshave been published. To facilitate exploratory analysis of those studies, we constructed adatabase containing all the published C. elegans expression datasets. A Perl CGIprogram, called Microarray Analysis Display (MAdisplay), allows gene expressionclustergrams of any combination of entered genes and datasets to be viewed(http://elegans.uky.edu/gl/madisplay). Perl programs were used to preprocess the rawdata from different sources into a common format and to transform the data to displaythe expression changes relative to each experiment's controls. Three hundred lists ofgenes from figures and tables were extracted from the publications and made available inthe GeneLists database, which also contains Gene Ontology and KEGG gene lists. Weused these tools to examine in a systematic fashion the mean expression of gene lists inthe set of microarray and SAGE experiments. Seventy-nine percent of publicationderived gene lists show a strong expression change (p-value andlt;0.001) in more than oneexperiment with the median being fourteen out of the 127 experiments that are derivedfrom the forty-six publications. This indicates that groups of genes identified in onepublication typically show an expression effect in many other experiments.

APA, Harvard, Vancouver, ISO, and other styles

13

NOTARO, MARCO. "HIERARCHICAL ENSEMBLE METHODS FOR ONTOLOGY-BASED PREDICTIONS IN COMPUTATIONAL BIOLOGY." Doctoral thesis, Università degli Studi di Milano, 2019. http://hdl.handle.net/2434/606185.

Full text

Abstract:

L'annotazione standardizzata di entità biologiche, quali geni e proteine, ha fortemente promosso l'organizzazione dei concetti biologici in vocabolari controllati, cioè ontologie che consentono di indicizzare in modo coerente le relazioni tra le diverse classi funzionali organizzate secondo una gerarchia predefinita. Esempi di ontologie biologiche in cui i termini funzionali sono strutturati secondo un grafo diretto aciclico (DAG) sono la Gene Ontology (GO) e la Human Phenotype Ontology (HPO). Tali tassonomie gerarchiche vengono utilizzate dalla comunità scientifica rispettivamente per sistematizzare le funzioni proteiche di tutti gli organismi viventi dagli Archea ai Metazoa e per categorizzare le anomalie fenotipiche associate a malattie umane. Tali bio-ontologie, offrendo uno spazio di classificazione ben definito, hanno favorito lo sviluppo di metodi di apprendimento per la predizione automatizzata della funzione delle proteine e delle associazioni gene-fenotipo patologico nell'uomo. L'obiettivo di tali metodologie consiste nell'“indirizzare” la ricerca “in-vitro” per favorire una riduzione delle spese ed un uso più efficace dei fondi destinati alla ricerca. Dal punto di vista dell'apprendimento automatico il problema della predizione della funzione delle proteine o delle associazioni gene-fenotipo patologico nell'uomo può essere modellato come un problema di classificazione multi-etichetta strutturato, in cui le predizioni associate ad ogni esempio (i.e., gene o proteina) sono sotto-grafi organizzati secondo una determinata struttura (albero o DAG). A causa della complessità del problema di classificazione, ad oggi l'approccio di predizione più comunemente utilizzato è quello “flat”, che consiste nell'addestrare un classificatore separatamente per ogni termine dell'ontologia senza considerare le relazioni gerarchiche esistenti tra le classi funzionali. L'utilizzo di questo approccio è giustificato non soltanto dal fatto di ridurre la complessità computazionale del problema di apprendimento, ma anche dalla natura “instabile” dei termini che compongono l'ontologia stessa. Infatti tali termini vengono aggiornati mensilmente mediante un processo curato da esperti che si basa sia sulla letteratura scientifica biomedica che su dati sperimentali ottenuti da esperimenti eseguiti “in-vitro” o “in-silico”. In questo contesto, in letteratura sono stati proposti due classi generali di classificatori. Da una parte, si collocano i metodi di apprendimento automatico che predicono le classi funzionali in modo “flat”, ossia senza esplorare la struttura intrinseca dello spazio delle annotazioni. Dall'altra parte, gli approcci gerarchici che, considerando esplicitamente le relazioni gerarchiche fra i termini funzionali dell'ontologia, garantiscono che le annotazioni predette rispettino la “true-path-rule”, la regola biologica che governa le ontologie. Nell'ambito dei metodi gerarchici, in letteratura sono stati proposti due diverse categorie di approcci. La prima si basa su metodi kernelizzati per predizioni con output strutturato, mentre la seconda su metodi di ensemble gerarchici. Entrambi questi metodi presentano alcuni svantaggi. I primi sono computazionalmente pesanti e non scalano bene se applicati ad ontologie biologiche. I secondi sono stati per la maggior parte concepiti per tassonomie strutturate ad albero, e quei pochi approcci specificatamente progettati per ontologie strutturate secondo un DAG, sono nella maggioranza dei casi incapaci di migliorare le performance di predizione dei metodi “flat”. Per superare queste limitazioni, nel presente lavoro di tesi si sono proposti dei nuovi metodi di ensemble gerarchici capaci di fornire predizioni consistenti con la struttura gerarchica dell'ontologia. Tali approcci, da un lato estendono precedenti metodi originariamente sviluppati per ontologie strutturate ad albero ad ontologie organizzate secondo un DAG e dall'altro migliorano significativamente le predizioni rispetto all'approccio “flat” indipendentemente dalla scelta del tipo di classificatore utilizzato. Nella loro forma più generale, gli approcci di ensemble gerarchici sono altamente modulari, nel senso che adottano una strategia di apprendimento a due passi. Nel primo passo, le classi funzionali dell'ontologia vengono apprese in modo indipendente l'una dall'altra, mentre nel secondo passo le predizioni “flat” vengono combinate opportunamente tenendo conto delle gerarchia fra le classi ontologiche. I principali contributi introdotti nella presente tesi sono sia metodologici che sperimentali. Da un punto di vista metodologico, sono stati proposti i seguenti nuovi metodi di ensemble gerarchici: a) HTD-DAG (Hierarchical Top-Down per tassonomie DAG strutturate); b) TPR-DAG (True-Path-Rule per DAG) con diverse varianti algoritmiche; c) ISO-TPR (True-Path-Rule con Regressione Isotonica), un nuovo algoritmo gerarchico che combina la True-Path-Rule con metodi di regressione isotonica. Per tutti i metodi di ensemble gerarchici è stato dimostrato in modo formale la coerenza delle predizioni, cioè è stato provato come gli approcci proposti sono in grado di fornire predizioni che rispettano le relazioni gerarchiche fra le classi. Da un punto di vista sperimentale, risultati a livello dell'intero genoma di organismi modello e dell'uomo ed a livello della totalità delle classi incluse nelle ontologie biologiche mostrano che gli approcci metodologici proposti: a) sono competitivi con gli algoritmi di predizione output strutturata allo stato dell'arte; b) sono in grado di migliorare i classificatori “flat”, a patto che le predizioni fornite dal classificatore non siano casuali; c) sono in grado di predire nuove associazioni tra geni umani e fenotipi patologici, un passo cruciale per la scoperta di nuovi geni associati a malattie genetiche umane e al cancro; d) scalano bene su dataset costituiti da decina di migliaia di esempi (i.e., proteine o geni) e su tassonomie costituite da migliaia di classi funzionali. Infine, i metodi proposti in questa tesi sono stati implementati in una libreria software scritta in linguaggio R, HEMDAG (Hierarchical Ensemble Methods per DAG), che è pubblica, liberamente scaricabile e disponibile per i sistemi operativi Linux, Windows e Macintosh. The standardized annotation of biomedical related objects, often organized in dedicated catalogues, strongly promoted the organization of biological concepts into controlled vocabularies, i.e. ontologies by which related terms of the underlying biological domain are structured according to a predefined hierarchy. Indeed large ontologies have been developed by the scientific community to structure and organize the gene and protein taxonomy of all the living organisms from Archea to Metazoa, i.e. the Gene Ontology, or human specific ontologies, such as the Human Phenotype Ontology, that provides a structured taxonomy of the abnormal human phenotypes associated with diseases. These ontologies, offering a coded and well-defined classification space for biological entities such as genes and proteins, favor the development of machine learning methods able to predict features of biological objects like the association between a human gene and a disease, with the aim to drive wet lab research allowing a reduction of the costs and a more effective usage of the available research funds. Despite the soundness of the aforementioned objectives, the resulting multi-label classification problems raise so complex machine learning issues that until recently the far common approach was the “flat” prediction, i.e. simply training a classifier for each term in the controlled vocabulary and ignoring the relationships between terms. This approach was not only justified by the need to reduce the computational complexity of the learning task, but also by the somewhat “unstable” nature of the terms composing the controlled vocabularies, because they were (and are) updated on a monthly basis in a process performed by expert curators and based on biomedical literature, and wet and in-silico experiments. In this context, two main general classes of classifiers have been proposed in literature. On the one hand, “hierarchy-unaware” learning methods predict labels in a “flat” way without exploiting the inherent structure of the annotation space. On the other hand, “hierarchy-aware” learning methods can improve the accuracy and the precision of the predictions by considering the hierarchical relationships between ontology terms. Moreover these methods can guarantee the consistency of the predicted labels according to the “true path rule”, that is the biological and logical rule that governs the internal coherence of biological ontologies. To properly handle the hierarchical relationships linking the ontology terms, two main classes of structured output methods have been proposed in literature: the first one is based on kernelized methods for structured output spaces, the second on hierarchical ensemble methods for ontology-based predictions. However both these approaches suffer of significant drawbacks. The kernel-based methods for structured output space are computationally intensive and do not scale well when applied to complex multi-label bio-ontologies. Most hierarchical ensemble methods have been conceived for tree-structured taxonomies and the few ones specifically developed for the prediction in DAG-structured output spaces are, in most cases, unable to improve prediction performances over flat methods. To overcome these limitations, in this thesis novel “ontology-aware” ensemble methods have been developed, able to handle DAG-structured ontologies, leveraging previous results obtained with “true-path-rule”-based hierarchical learning algorithms. These methods are highly modular in the sense that they adopt a “two-step” learning strategy: in the first step they learn separately each term of the ontology using flat methods, and in the second they properly combine the flat predictions according to the hierarchy of the classes. The main contributions of this thesis are both methodological and experimental. From a methodological standpoint, novel hierarchical ensemble methods are proposed, including: a) HTD (Hierarchical Top-Down algorithm for DAG structured ontologies); b) TPR-DAG (True Path Rule ensemble for DAG) with several variants; c) ISO-TPR, a novel ensemble method that combines the True Path Rule approach with Isotonic Regression. For all these methods a formal proof of their consistency, i.e. the guarantee of providing predictions that “respect” the hierarchical relationships between classes, is provided. From an experimental standpoint, extensive genome and ontology-wide results show that the proposed methods: a) are competitive with state-of-the-art prediction algorithms; b) are able to improve flat machine learning classifiers, if the base learners can provide non random predictions; c) are able to predict new associations between genes and human abnormal phenotypes, a crucial step to discover novel genes associated with human diseases ranging from genetic disorders to cancer; d) scale nicely with large datasets and bio-ontologies. Finally HEMDAG, a novel R library implementing the proposed hierarchical ensemble methods has been developed and publicly delivered.

APA, Harvard, Vancouver, ISO, and other styles

14

Yasar, Sevgi. "Multi-resolution Visualization Of Large Scale Protein Networks Enriched With Gene Ontology Annotations." Master's thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/12611132/index.pdf.

Full text

Abstract:

Genome scale protein-protein interactions (PPIs) are interpreted as networks or graphs with thousands of nodes from the perspective of computer science. PPI networks represent various types of possible interactions among proteins or genes of a genome. PPI data is vital in protein function prediction since functions of the cells are performed by groups of proteins interacting with each other and main complexes of the cell are made of proteins interacting with each other. Recent increase in protein interaction prediction techniques have made great amount of protein-protein interaction data available for genomes. As a consequence, a systematic visualization and analysis technique has become crucial. To the best of our knowledge, no PPI visualization tool consider multi-resolution viewing of PPI network. In this thesis, we implemented a new approach for PPI network visualization which supports multi-resolution viewing of compound graphs. We construct compound nodes and label them by using gene set enrichment methods based on Gene Ontology annotations. This thesis further suggests new methods for PPI network visualization.

APA, Harvard, Vancouver, ISO, and other styles

15

Hinderer, Eugene Waverly III. "COMPUTATIONAL TOOLS FOR THE DYNAMIC CATEGORIZATION AND AUGMENTED UTILIZATION OF THE GENE ONTOLOGY." UKnowledge, 2019. https://uknowledge.uky.edu/biochem_etds/43.

Full text

Abstract:

Ontologies provide an organization of language, in the form of a network or graph, which is amenable to computational analysis while remaining human-readable. Although they are used in a variety of disciplines, ontologies in the biomedical field, such as Gene Ontology, are of interest for their role in organizing terminology used to describe—among other concepts—the functions, locations, and processes of genes and gene-products. Due to the consistency and level of automation that ontologies provide for such annotations, methods for finding enriched biological terminology from a set of differentially identified genes in a tissue or cell sample have been developed to aid in the elucidation of disease pathology and unknown biochemical pathways. However, despite their immense utility, biomedical ontologies have significant limitations and caveats. One major issue is that gene annotation enrichment analyses often result in many redundant, individually enriched ontological terms that are highly specific and weakly justified by statistical significance. These large sets of weakly enriched terms are difficult to interpret without manually sorting into appropriate functional or descriptive categories. Also, relationships that organize the terminology within these ontologies do not contain descriptions of semantic scoping or scaling among terms. Therefore, there exists some ambiguity, which complicates the automation of categorizing terms to improve interpretability. We emphasize that existing methods enable the danger of producing incorrect mappings to categories as a result of these ambiguities, unless simplified and incomplete versions of these ontologies are used which omit problematic relations. Such ambiguities could have a significant impact on term categorization, as we have calculated upper boundary estimates of potential false categorizations as high as 121,579 for the misinterpretation of a single scoping relation, has_part, which accounts for approximately 18% of the total possible mappings between terms in the Gene Ontology. However, the omission of problematic relationships results in a significant loss of retrievable information. In the Gene Ontology, this accounts for a 6% reduction for the omission of a single relation. However, this percentage should increase drastically when considering all relations in an ontology. To address these issues, we have developed methods which categorize individual ontology terms into broad, biologically-related concepts to improve the interpretability and statistical significance of gene-annotation enrichment studies, meanwhile addressing the lack of semantic scoping and scaling descriptions among ontological relationships so that annotation enrichment analyses can be performed across a more complete representation of the ontological graph. We show that, when compared to similar term categorization methods, our method produces categorizations that match hand-curated ones with similar or better accuracy, while not requiring the user to compile lists of individual ontology term IDs. Furthermore, our handling of problematic relations produces a more complete representation of ontological information from a scoping perspective, and we demonstrate instances where medically-relevant terms--and by extension putative gene targets--are identified in our annotation enrichment results that would be otherwise missed when using traditional methods. Additionally, we observed a marginal, yet consistent improvement of statistical power in enrichment results when our methods were used, compared to traditional enrichment analyses that utilize ontological ancestors. Finally, using scalable and reproducible data workflow pipelines, we have applied our methods to several genomic, transcriptomic, and proteomic collaborative projects.

APA, Harvard, Vancouver, ISO, and other styles

16

Yu, Xinran. "Mathematical and Experimental Investigation of Ontological Similarity Measures and Their Use in Biomedical Domains." Miami University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=miami1282098178.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Mayor, Charlie. "The classification of gene products in the molecular biology domain : realism, objectivity, and the limitations of the Gene Ontology." Thesis, City University London, 2012. http://openaccess.city.ac.uk/3006/.

Full text

Abstract:

Background: Controlled vocabularies in the molecular biology domain exist to facilitate data integration across database resources. One such tool is the Gene Ontology (GO), a classification designed to act as a universal index for gene products from any species. The Gene Ontology is used extensively in annotating gene products and analysing gene expression data, yet very little research exists from a library and information science perspective exploring the design principles, philosophy and social role of ontologies in biology. Aim: To explore how molecular biologists, in creating the Gene Ontology, devised guidelines and rules for determining which scientific concepts are included in the ontology, and the criteria for how these concepts are represented. Methods: A domain analysis approach was used to devise a mixed methodology to study the design of the Gene Ontology. Concept analysis of a GO term and a critical discourse analysis of GO developer mailing list texts were used to test whether ontological realism is a tenable basis for constructing objective ontologies. A comparison of the current GO vocabulary construction guidelines and a study of the reasons why GO terms are removed from the ontology further explored the justifications for the design of the Gene Ontology. Finally, a content analysis of published GO papers examined how authors use and cite GO data and terminology. Results: Gene Ontology terms can be presented according to different epistemologies for concepts, indicating that ontological realism is not the only way objective ontologies can be designed. Social roles and the exercise of power were found to play an important role in determining ontology content, and poor synonym control, a lack of clear warrant for deciding terminology and arbitrary decisions to delete and invent new terms undermine the objectivity and universal applicability of the Gene Ontology. Authors exhibited poor compliance with GO data citation policies, and in re-wording and misquoting GO terminology, risk exacerbating the semantic problems this controlled vocabulary was designed to solve. Conclusions: The failure of the Gene Ontology to define what is meant by a molecular function, the exercise of power by GO developers in clearing contentious concepts from the ontology, and the strict adherence to ontological realism, which marginalises social and subjective ways of classifying scientific concepts, limits the utility of the ontology as a tool to unify the molecular biology domain. These limitations to the Gene Ontology design could be overcome with the development of lighter, pluralistic, user-controlled ‘open ontologies’ for gene products that can work alongside more traditional, ‘top-down’ developed vocabularies.

APA, Harvard, Vancouver, ISO, and other styles

18

Yi, Gang Man. "An algorithm for identifying clusters of functionally related genes in genomes." [College Station, Tex. : Texas A&M University, 2006. http://hdl.handle.net/1969.1/ETD-TAMU-1079.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Espinosa, Octavio. "Characterisation of a mouse gene-phenotype network." Thesis, University of Oxford, 2011. http://ora.ox.ac.uk/objects/uuid:6231b62c-3047-46fc-a986-9f0565d4386b.

Full text

Abstract:

Following advancements in the "omics" fields of molecular biology and genetics, much attention has been focused on categorising and annotating the large volume of data that has been produced since the sequencing of human and model genomes. With high-throughput data generated from these "omics" experiments and the increasing deposition of information from genetics experiments in biological databases, our understanding of the mechanisms that bridge the gap from genotype to phenotype can be explored in a holistic context. This is one of the aims of the relatively new field of systems biology, which aims to understand the complexity of biological systems in a holistic manner by studying the system as an ensemble of interacting parts. With increased volume and comprehensiveness of biological data, prediction of gene function and automatic identification of potential models for human diseases have become important aspects of systems-level analysis for wet-lab geneticists and clinicians. Here, I describe an integrated analysis of mouse phenotype data with high-throughput experiments to give genome-wide information about gene relationships and their function in a systems biology context. I show a functional dissection of mouse gene and phenotype networks and investigate the potential that ontology-compliant phenotype annotations can offer for functional classification of genes. The mouse genome and phenome show modularity at higher levels of cellular, physiological and organismal function. Using high-throughput protein-protein interaction data, the mouse proteome was dissected and computationally extracted communities were used to predict phenotypes of mouse gene ablation. Precision and recall curves show comparable performance for higher levels of the MP ontology to those undertaken by comprehensive mouse gene function prediction such as the Mouse Function Project which predicted Gene Ontology terms. I also developed and tested an automatic procedure that relates mouse phenotypes to human diseases and demonstrate its application to the use cases of identifying mouse models given a query consisting of a set of mouse phenotypes and breaking down human diseases into mouse phenotypes. Taken together, my results may be useful as a map for candidate gene discovery, finding how mouse networks relate to human networks and investigating the evolutionary origins of their components at higher levels of gene function.

APA, Harvard, Vancouver, ISO, and other styles

20

Ovezmyradov, Guvanchmyrat [Verfasser], Martin [Akademischer Betreuer] Göpfert, and Burkhard [Akademischer Betreuer] Morgenstern. "Gene Ontology-based framework to annotate genes of hearing / Guvanchmyrat Ovezmyradov. Gutachter: Martin Göpfert ; Burkhard Morgenstern. Betreuer: Martin Göpfert." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2013. http://d-nb.info/1044770546/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Chen, Eric Chun-Hung. "Fractionation Resistance of Duplicate Genes Following Whole Genome Duplication in Plants as a Function of Gene Ontology Category and Expression Level." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/32789.

Full text

Abstract:

With the proliferation of plant genomes being sequenced, assembled, and annotated, duplicate gene loss from whole genome duplication events, also known in plants as frac- tionation, has shown to have a different pattern from the classic gene duplication models described by Ohno in 1970. Models proposed more recently, the Gene Balance and Gene Dosage hypotheses, try to model this pattern. These models, however, disagree with each other on the relative importance of gene function and gene expression. In this thesis we explore the effects of gene function and gene expression on duplicate gene loss and retention. We use gene sequence similarity and gene order conservation to construct our gene fam- ilies. We applied multiple whole genome comparison methods across various plants in rosids, asterids, and Poaceae in looking for a general pattern. We found that there is great consistency across different plant lineages. Genes categorized as metabolic genes with low level of expression have relatively low fractionation resistance, losing duplicate genes readily, while genes categorized as regulation and response genes with high level of expression have relatively high fractionation resistance, retaining more duplicate gene pairs or triples. Though both gene function and gene expression have important effects on retention pattern, we found that gene function has a bigger effect than gene expression. Our results suggest that both the Gene Balance and Gene Dosage models account to some extent for fractionation resistance.

APA, Harvard, Vancouver, ISO, and other styles

22

Fatai, Azeez Ayomide. "Computational analysis of multilevel omics data for the elucidation of molecular mechanisms of cancer." University of the Western Cape, 2015. http://hdl.handle.net/11394/4782.

Full text

Abstract:

Philosophiae Doctor - PhD Cancer is a group of diseases that arises from irreversible genomic and epigenomic alterations that result in unrestrained proliferation of abnormal cells. Detailed understanding of the molecular mechanisms underlying a cancer would aid the identification of most, if not all, genes responsible for its progression and the development of molecularly targeted chemotherapy. The challenge of recurrence after treatment shows that our understanding of cancer mechanisms is still poor. As a contribution to overcoming this challenge, we provide an integrative multi-omic analysis on glioblastoma multiforme (GBM) for which large data sets on di erent classes of genomic and epigenomic alterations have been made available in the Cancer Genome Atlas data portal. The rst part of this study involves protein network analysis for the elucidation of GBM tumourigenic molecular mechanisms, identification of driver genes, prioritization of genes in chromosomal regions with copy number alteration, and co-expression and transcriptional analysis. Functional modules were obtained by edge-betweenness clustering of a protein network constructed from genes with predicted functional impact mutations and differentially expressed genes. Pathway enrichment analysis was performed on each module to identify statistical overrepresentation of signaling pathways. Known and novel candidate cancer driver genes were identi ed in the modules, and functionally relevant genes in chromosomal regions altered by homologous deletion or high-level amplication were prioritized with the protein network. Co-expressed modules enriched in cancer biological processes and transcription factor targets were identified using network genes that demonstrated high expression variance. Our findings show that GBM's molecular mechanisms are much more complex than those reported in previous studies. We next identified differentially expressed miRNAs for which target genes associated with the protein network were also differentially expressed. MiRNAs and target genes were prioritized based on the number of targeted genes and targeting miRNAs, respectively. MiRNAs that correlated with time to progression were selected by an elastic net-penalized Cox regression model for survival analysis. These miRNA were combined into a signature that independently predicted adjuvant therapy-linked progression-free survival in GBM and its subtypes and overall survival in GBM. The results show that miRNAs play significant roles in GBM progression and patients' survival finally, a prognostic mRNA signature that independently predicted progression-free and overall survival was identified. Pathway enrichment analysis was carried on genes with high expression variance across a cohort to identify those in chemoradioresistance associated pathways. A support vector machine-based method was then used to identify a set of genes that discriminated between rapidly- and slowly-progressing GBM patients, with minimal 5 % cross-validation error rate. The prognostic value of the gene set was demonstrated by its ability to predict adjuvant therapy-linked progression-free and overall survival in GBM and its subtypes and was validated in an independent data set. We have identified a set of genes involved in tumourigenic mechanisms that could potentially be exploited as targets in drug development for the treatment of primary and recurrent GBM. Furthermore, given their demonstrated accuracy in this study, the identified miRNA and mRNA signatures have strong potential to be combined and developed into a robust clinical test for predicting prognosis and treatment response.

APA, Harvard, Vancouver, ISO, and other styles

23

Wimberley, James. "De novo Sequencing and Analysis of Salvia hispanica Transcriptome and Identification of Genes Involved in the Biosynthesis of Secondary Metabolites." Chapman University Digital Commons, 2019. https://digitalcommons.chapman.edu/cads_theses/5.

Full text

Abstract:

Salvia hispanica L. (commonly known as chia) is gaining popularity worldwide and specially in US as a healthy oil and food supplement for human and animal consumption due to its favorable oil composition, and high protein, fiber, and antioxidant contents. Despite these benefits and its growing public demand, very limited gene sequence information is currently available in public databases. In this project, we generated 90 million high quality 150 bp paired-end sequences from the chia leaf and root tissues. The sequences were de novo assembled into 103,367 contigs with average length of 1,445 bp. The resulted assembly represented 92.2% transcriptome completeness. Around 69% of the assembled contigs were annotated against the uniprot database and represented a diverse array of functional and biological categories. A total of 14,267 contigs showed significant expression difference between the leaf and root tissues, with 6,151 and 8,116 contigs upregulated in the leaf and root, respectively. The sequence data generated in this project will provide valuable resources for future functional genomic research in chia. With the availability of transcriptome sequences, it would be possible to identify genes involved in the important metabolic pathways that give chia its unique nutritional and medicinal properties. Finally, the generated data will contribute to the genetic improvement efforts of chia to better serve the public demand.

APA, Harvard, Vancouver, ISO, and other styles

24

Feltrin, Erika. "Contribution to OBO ontologies and application of structured vocabularies for data integration and biological reasoning." Doctoral thesis, Università degli studi di Padova, 2008. http://hdl.handle.net/11577/3426373.

Full text

Abstract:

As the amount of accessible biological data is growing exponentially, it is becoming harder and harder to extract the biological knowledge contained in thousands of databases. Biomedical scientists collect facts, often recording them in natural language, and then use their knowledge to make inferences about as yet uncharacterised observations. Therefore, to make the best use of biological databases and the knowledge they contain, different kinds of information from diï¬€erent sources must be integrated in ways that make sense to the scientific community. The Gene Ontology (GO) and other biomedical ontologies (OBO) are fundamental components in data integration and annotation. This PhD project focuses on the improvement of some already existing resources, and the development of new methods that facilitate data integration and extraction, for genes, drugs and diseases, and their inter-relationships. The work consists of contributions to biological ontologies and definitions of cross-links between different semantic fields represented in several distinct databases. Significant changes in GO content and structure have been provided, resulting in the addition of hundreds of terms useful in the representation of muscle and nervous system biology. In addition, a resource has been developed to find preliminary correlations between genes, drugs and diseases. This resource integrates information from several very up-to-date sources, most of which are manually curated; and from a human disease ontology, the "Disease Ontology". The revised ontologies will facilitate the interpretation of high-throughout experiments in the area of muscle biology and neurobiology, and more importantly, in the fields of neuromuscular and nervous system diseases. Furthermore, the developed ontology-based system will provide interoperability support for physicians and medical researchers in the interpretation of data from studies on human diseases.

APA, Harvard, Vancouver, ISO, and other styles

25

Groß, Anika. "Evolution von ontologiebasierten Mappings in den Lebenswissenschaften." Doctoral thesis, Universitätsbibliothek Leipzig, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-136766.

Full text

Abstract:

Im Bereich der Lebenswissenschaften steht eine große und wachsende Menge heterogener Datenquellen zur Verfügung, welche häufig in quellübergreifenden Analysen und Auswertungen miteinander kombiniert werden. Um eine einheitliche und strukturierte Erfassung von Wissen sowie einen formalen Austausch zwischen verschiedenen Applikationen zu erleichtern, kommen Ontologien und andere strukturierte Vokabulare zum Einsatz. Sie finden Anwendung in verschiedenen Domänen wie der Molekularbiologie oder Chemie und dienen zumeist der Annotation realer Objekte wie z.B. Gene oder Literaturquellen. Unterschiedliche Ontologien enthalten jedoch teilweise überlappendes Wissen, so dass die Bestimmung einer Abbildung (Ontologiemapping) zwischen ihnen notwendig ist. Oft ist eine manuelle Mappingerstellung zwischen großen Ontologien kaum möglich, weshalb typischerweise automatische Verfahren zu deren Abgleich (Matching) eingesetzt werden. Aufgrund neuer Forschungserkenntnisse und Nutzeranforderungen verändern sich die Ontologien kontinuierlich weiter. Die Evolution der Ontologien hat wiederum Auswirkungen auf abhängige Daten wie beispielsweise Annotations- und Ontologiemappings, welche entsprechend aktualisiert werden müssen. Im Rahmen dieser Arbeit werden neue Methoden und Algorithmen zum Umgang mit der Evolution ontologie-basierter Mappings entwickelt. Dabei wird die generische Infrastruktur GOMMA zur Verwaltung und Analyse der Evolution von Ontologien und Mappings genutzt und erweitert. Zunächst wurde eine vergleichende Analyse der Evolution von Ontologiemappings für drei Subdomänen der Lebenswissenschaften durchgeführt. Ontologien sowie Mappings unterliegen teilweise starken Änderungen, wobei die Evolutionsintensität von der untersuchten Domäne abhängt. Insgesamt zeigt sich ein deutlicher Einfluss von Ontologieänderungen auf Ontologiemappings. Dementsprechend können bestehende Mappings infolge der Weiterentwicklung von Ontologien ungültig werden, so dass sie auf aktuelle Ontologieversionen migriert werden müssen. Dabei sollte eine aufwendige Neubestimmung der Mappings vermieden werden. In dieser Arbeit werden zwei generische Algorithmen zur (semi-) automatischen Adaptierung von Ontologiemappings eingeführt. Ein Ansatz basiert auf der Komposition von Ontologiemappings, wohingegen der andere Ansatz eine individuelle Behandlung von Ontologieänderungen zur Adaptierung der Mappings erlaubt. Beide Verfahren ermöglichen die Wiederverwendung unbeeinflusster, bereits bestätigter Mappingteile und adaptieren nur die von Änderungen betroffenen Bereiche der Mappings. Eine Evaluierung für sehr große, biomedizinische Ontologien und Mappings zeigt, dass beide Verfahren qualitativ hochwertige Ergebnisse produzieren. Ähnlich zu Ontologiemappings werden auch ontologiebasierte Annotationsmappings durch Ontologieänderungen beeinflusst. Die Arbeit stellt einen generischen Ansatz zur Bewertung der Qualität von Annotationsmappings auf Basis ihrer Evolution vor. Verschiedene Qualitätsmaße erlauben die Identifikation glaubwürdiger Annotationen beispielsweise anhand ihrer Stabilität oder Herkunftsinformationen. Eine umfassende Analyse großer Annotationsdatenquellen zeigt zahlreiche Instabilitäten z.B. aufgrund temporärer Annotationslöschungen. Dementsprechend stellt sich die Frage, inwieweit die Datenevolution zu einer Veränderung von abhängigen Analyseergebnissen führen kann. Dazu werden die Auswirkungen der Ontologie- und Annotationsevolution auf sogenannte funktionale Analysen großer biologischer Datensätze untersucht. Eine Evaluierung anhand verschiedener Stabilitätsmaße erlaubt die Bewertung der Änderungsintensität der Ergebnisse und gibt Aufschluss, inwieweit Nutzer mit einer signifikanten Veränderung ihrer Ergebnisse rechnen müssen. Darüber hinaus wird GOMMA um effiziente Verfahren für das Matching sehr großer Ontologien erweitert. Diese werden u.a. für den Abgleich neuer Konzepte während der Adaptierung von Ontologiemappings benötigt. Viele der existierenden Match-Systeme skalieren nicht für das Matching besonders großer Ontologien wie sie im Bereich der Lebenswissenschaften auftreten. Ein effizienter, kompositionsbasierter Ansatz gleicht Ontologien indirekt ab, indem existierende Mappings zu Mediatorontologien wiederverwendet und miteinander kombiniert werden. Mediatorontologien enthalten wertvolles Hintergrundwissen, so dass sich die Mappingqualität im Vergleich zu einem direkten Matching verbessern kann. Zudem werden generelle Strategien für das parallele Ontologie-Matching unter Verwendung mehrerer Rechenknoten vorgestellt. Eine größenbasierte Partitionierung der Eingabeontologien verspricht eine gute Lastbalancierung und Skalierbarkeit, da kleinere Teilaufgaben des Matchings parallel verarbeitet werden können. Die Evaluierung im Rahmen der Ontology Alignment Evaluation Initiative (OAEI) vergleicht GOMMA und andere Systeme für das Matching von Ontologien in verschiedenen Domänen. GOMMA kann u.a. durch Anwendung des parallelen und kompositionsbasierten Matchings sehr gute Ergebnisse bezüglich der Effektivität und Effizienz des Matchings, insbesondere für Ontologien aus dem Bereich der Lebenswissenschaften, erreichen In the life sciences, there is an increasing number of heterogeneous data sources that need to be integrated and combined in comprehensive analysis tasks. Often ontologies and other structured vocabularies are used to provide a formal representation of knowledge and to facilitate data exchange between different applications. Ontologies are used in different domains like molecular biology or chemistry. One of their most important applications is the annotation of real-world objects like genes or publications. Since different ontologies can contain overlapping knowledge it is necessary to determine mappings between them (ontology mappings). A manual mapping creation can be very time-consuming or even infeasible such that (semi-) automatic ontology matching methods are typically applied. Ontologies are not static but underlie continuous modifications due to new research insights and changing user requirements. The evolution of ontologies can have impact on dependent data like annotation or ontology mappings. This thesis presents novel methods and algorithms to deal with the evolution of ontology-based mappings. Thereby the generic infrastructure GOMMA is used and extended to manage and analyze the evolution of ontologies and mappings. First, a comparative evolution analysis for ontologies and mappings from three life science domains shows heavy changes in ontologies and mappings as well as an impact of ontology changes on the mappings. Hence, existing ontology mappings can become invalid and need to be migrated to current ontology versions. Thereby an expensive redetermination of the mappings should be avoided. This thesis introduces two generic algorithms to (semi-) automatically adapt ontology mappings: (1) a composition-based adaptation relies on the principle of mapping composition, and (2) a diff-based adaptation algorithm allows for individually handling change operations to update mappings. Both approaches reuse unaffected mapping parts, and adapt only affected parts of the mappings. An evaluation for very large biomedical ontologies and mappings shows that both approaches produce ontology mappings of high quality. Similarly, ontology changes may also affect ontology-based annotation mappings. The thesis introduces a generic evaluation approach to assess the quality of annotation mappings based on their evolution. Different quality measures allow for the identification of reliable annotations, e.g., based on their stability or provenance information. A comprehensive analysis of large annotation data sources shows numerous instabilities, e.g., due to the temporary absence of annotations. Such modifications may influence results of dependent applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. The question arises to what degree ontology and annotation changes may affect such analyses. Based on different stability measures the evaluation assesses change intensities of application results and gives insights whether users need to expect significant changes of their analysis results. Moreover, GOMMA is extended by large-scale ontology matching techniques. Such techniques are useful, a.o., to match new concepts during ontology mapping adaptation. Many existing match systems do not scale for aligning very large ontologies, e.g., from the life science domain. One efficient composition-based approach indirectly computes ontology mappings by reusing and combining existing mappings to intermediate ontologies. Intermediate ontologies can contain useful background knowledge such that the mapping quality can be improved compared to a direct match approach. Moreover, the thesis introduces general strategies for matching ontologies in parallel using several computing nodes. A size-based partitioning of the input ontologies enables good load balancing and scalability since smaller match tasks can be processed in parallel. The evaluation of the Ontology Alignment Evaluation Initiative (OAEI) compares GOMMA and other systems in terms of matching ontologies from different domains. Using the parallel and composition-based matching, GOMMA can achieve very good results w.r.t. efficiency and effectiveness, especially for ontologies from the life science domain

APA, Harvard, Vancouver, ISO, and other styles

26

Gabbur, Prasad. "Machine Learning Methods for Microarray Data Analysis." Diss., The University of Arizona, 2010. http://hdl.handle.net/10150/195829.

Full text

Abstract:

Microarrays emerged in the 1990s as a consequence of the efforts to speed up the process of drug discovery. They revolutionized molecular biological research by enabling monitoring of thousands of genes together. Typical microarray experiments measure the expression levels of a large numberof genes on very few tissue samples. The resulting sparsity of data presents major challenges to statistical methods used to perform any kind of analysis on this data. This research posits that phenotypic classification and prediction serve as good objective functions for both optimization and evaluation of microarray data analysis methods. This is because classification measures whatis needed for diagnostics and provides quantitative performance measures such as leave-one-out (LOO) or held-out prediction accuracy and confidence. Under the classification framework, various microarray data normalization procedures are evaluated using a class label hypothesis testing framework and also employing Support Vector Machines (SVM) and linear discriminant based classifiers. A novel normalization technique based on minimizing the squared correlation coefficients between expression levels of gene pairs is proposed and evaluated along with the other methods. Our results suggest that most normalization methods helped classification on the datasets considered except the rank method, most likely due to its quantization effects.Another contribution of this research is in developing machine learning methods for incorporating an independent source of information, in the form of gene annotations, to analyze microarray data. Recently, genes of many organisms have been annotated with terms from a limited vocabulary called Gene Ontologies (GO), describing the genes' roles in various biological processes, molecular functions and their locations within the cell. Novel probabilistic generative models are proposed for clustering genes using both their expression levels and GO tags. These models are similar in essence to the ones used for multimodal data, such as images and words, with learning and inference done in a Bayesian framework. The multimodal generative models are used for phenotypic class prediction. More specifically, the problems of phenotype prediction for static gene expression data and state prediction for time-course data are emphasized. Using GO tags for organisms whose genes have been studied more comprehensively leads to an improvement in prediction. Our methods also have the potential to provide a way to assess the quality of available GO tags for the genes of various model organisms.

APA, Harvard, Vancouver, ISO, and other styles

27

He, Xin. "A semi-automated framework for the analytical use of gene-centric data with biological ontologies." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/25505.

Full text

Abstract:

Motivation Translational bioinformatics(TBI) has been defined as ‘the development and application of informatics methods that connect molecular entities to clinical entities’ [1], which has emerged as a systems theory approach to bridge the huge wealth of biomedical data into clinical actions using a combination of innovations and resources across the entire spectrum of biomedical informatics approaches [2]. The challenge for TBI is the availability of both comprehensive knowledge based on genes and the corresponding tools that allow their analysis and exploitation. Traditionally, biological researchers usually study one or only a few genes at a time, but in recent years high throughput technologies such as gene expression microarrays, protein mass-spectrometry and next-generation DNA and RNA sequencing have emerged that allow the simultaneous measurement of changes on a genome-wide scale. These technologies usually result in large lists of interesting genes, but meaningful biological interpretation remains a major challenge. Over the last decade, enrichment analysis has become standard practice in the analysis of such gene lists, enabling systematic assessment of the likelihood of differential representation of defined groups of genes compared to suitably annotated background knowledge. The success of such analyses are highly dependent on the availability and quality of the gene annotation data. For many years, genes were annotated by different experts using inconsistent, non-standard terminologies. Large amounts of variation and duplication in these unstructured annotation sets, made them unsuitable for principled quantitative analysis. More recently, a lot of effort has been put into the development and use of structured, domain specific vocabularies to annotate genes. The Gene Ontology is one of the most successful examples of this where genes are annotated with terms from three main clades; biological process, molecular function and cellular component. However, there are many other established and emerging ontologies to aid biological data interpretation, but are rarely used. For the same reason, many bioinformatic tools only support analysis analysis using the Gene Ontology. The lack of annotation coverage and the support for them in existing analytical tools to aid biological interpretation of data has become a major limitation to their utility and uptake. Thus, automatic approaches are needed to facilitate the transformation of unstructured data to unlock the potential of all ontologies, with corresponding bioinformatics tools to support their interpretation. Approaches In this thesis, firstly, similar to the approach in [3,4], I propose a series of computational approaches implemented in a new tool OntoSuite-Miner to address the ontology based gene association data integration challenge. This approach uses NLP based text mining methods for ontology based biomedical text mining. What differentiates my approach from other approaches is that I integrate two of the most wildly used NLP modules into the framework, not only increasing the confidence of the text mining results, but also providing an annotation score for each mapping, based on the number of pieces of evidence in the literature and the number of NLP modules that agreed with the mapping. Since heterogeneous data is important in understanding human disease, the approach was designed to be generic, thus the ontology based annotation generation can be applied to different sources and can be repeated with different ontologies. Secondly, in respect of the second challenge proposed by TBI, to increase the statistical power of the annotation enrichment analysis, I propose OntoSuite-Analytics, which integrates a collection of enrichment analysis methods into a unified open-source software package named topOnto, in the statistical programming language R. The package supports enrichment analysis across multiple ontologies with a set of implemented statistical/topological algorithms, allowing the comparison of enrichment results across multiple ontologies and between different algorithms. Results The methodologies described above were implemented and a Human Disease Ontology (HDO) based gene annotation database was generated by mining three publicly available database, OMIM, GeneRIF and Ensembl variation. With the availability of the HDO annotation and the corresponding ontology enrichment analysis tools in topOnto, I profiled 277 gene classes with human diseases and generated ‘disease environments’ for 1310 human diseases. The exploration of the disease profiles and disease environment provides an overview of known disease knowledge and provides new insights into disease mechanisms. The integration of multiple ontologies into a disease context demonstrates how ‘orthogonal’ ontologies can lead to biological insight that would have been missed by more traditional single ontology analysis.

APA, Harvard, Vancouver, ISO, and other styles

28

Korkmaz, Gulberal Kircicegi Yoksul. "Mining Microarray Data For Biologically Important Gene Sets." Phd thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614266/index.pdf.

Full text

Abstract:

Microarray technology enables researchers to measure the expression levels of thousands of genes simultaneously to understand relationships between genes, extract pathways, and in general understand a diverse amount of biological processes such as diseases and cell cycles. While microarrays provide the great opportunity of revealing information about biological processes, it is a challenging task to mine the huge amount of information contained in the microarray datasets. Generally, since an accurate model for the data is missing, first a clustering algorithm is applied and then the resulting clusters are examined manually to find genes that are related with the biological process under inspection. We need automated methods for this analysis which can be used to eliminate unrelated genes from data and mine for biologically important genes. Here, we introduce a general methodology which makes use of traditional clustering algorithms and involves integration of the two main sources of biological information, Gene Ontology and interaction networks, with microarray data for eliminating unrelated information and find a clustering result containing only genes related with a given biological process. We applied our methodology successfully on a number of different cases and on different organisms. We assessed the results with Gene Set Enrichment Analysis method and showed that our final clusters are highly enriched. We also analyzed the results manually and found that most of the genes that are in the final clusters are actually related with the biological process under inspection.

APA, Harvard, Vancouver, ISO, and other styles

29

Chao, Yang, and Peng Zhang. "One General Approach For Analysing Compositional Structure Of Terms In Biomedical Field." Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH. Forskningsmiljö Informationsteknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-20913.

Full text

Abstract:

The root is the primary lexical unit of Ontological terms, which carries the most significant aspects of semantic content and cannot be reduced into small constituents. It is the key of ontological term structure. After the identification of root, we can easily get the meaning of terms. According to the meaning, it’s helpful to identify the other parts of terms, such as the relation, definition and so on. We have generated a general classification model to identify the roots of terms in this master thesis. There are four features defined in our classification model: the Token, the POS, the Length and the Position. Implementation is followed using Java and algorithm is followed using Naïve Bayes. We implemented and evaluated the classification model using Gene Ontology (GO). The evaluation results showed that our framework and model were effective.

APA, Harvard, Vancouver, ISO, and other styles

30

Bedhiafi, Walid. "Sciences de l'information pour l'étude des systèmes biologiques (exemple du vieillissement du système immunitaire)." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066139/document.

Full text

Abstract:

Le laboratoire i3 et le laboratoire LGIPH, utilisent des approches à haut débit pour l’étude du système immunitaire et ces disfonctionnements. Des limites ont été observées quant à l’utilisation des approches classiques pour l’annotation des signatures d’expression des gènes. L’objectif principal a été de développer une approche d’annotation pour répondre à ce besoin. L’approche que nous avons développée est une approche basée sur la contextualisation des gènes et de leurs produits puis sur la modélisation des voies biologiques pour la production de bases de connaissances pour l’étude de l’expression des gènes. Nous définissons ici un contexte d’expression des gènes comme suit : population cellulaire+compartiment anatomique+état pathologique. Pour connaitre ces contextes, nous avons opté pour la fouille de la littérature et nous avons développé un package Python, qui permet d’annoter les textes automatiquement en fonction de trois ontologies choisies en fonction de notre définition du contexte. Nous montrons ici que notre package a des performances meilleures que un outil de référence. Nous avons l’avons utilisé pour le criblage d’un corpus sur le vieillissement du système immunitaire dont on présente ici les résultats. Pour la modélisation des voies biologiques nous avons développé en collaboration avec le LIPAH une méthode de modélisation basée sur un algorithme génétique qui permet de combiner les résultats de mesure de la proximité sémantique sur la base des annotations des gènes et les données d’interactions. Nous avons réussis retrouver des réseaux de références avec un taux d’erreur de 0,47 High-throughput experimental approaches for gene expression study involve several processing steps for the quantification, the annotation and interpretation of the results. The i3 lab and the LGIPH, applies these approaches in various experimental setups. However, limitations have been observed when using conventional approaches for annotating gene expression signatures. The main objective of this thesis was to develop an alternative annotation approach to overcome this problem. The approach we have developed is based on the contextualization of genes and their products, and then biological pathways modeling to produce a knowledge base for the study of gene expression. We define a gene expression context as follows: cell population+ anatomical compartment+ pathological condition. For the production of gene contexts, we have opted for the massive screening of literature. We have developed a Python package, which allows annotating the texts according to three ontologies chosen according to our definition of the context. We show here that it ensures better performance for text annotation the reference tool. We used our package to screen an aging immune system text corpus. The results are presented here. To model the biological pathways we have developed, in collaboration with the LIPAH lab a modeling method based on a genetic algorithm that allows combining the results semantics proximity using the Biological Process ontology and the interactions data from db-string. We were able to find networks with an error rate of 0.47

APA, Harvard, Vancouver, ISO, and other styles

31

Rahm, Jonas. "Biologically plausible visual representation of modular decomposition." Thesis, University of Skövde, School of Humanities and Informatics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-953.

Full text

Abstract:

Modular decompositions of protein interaction networks can be used to identify modules of cooperating proteins. The biological plausibility off these modules might be questioned though. This report describes how a modular decomposition can be completed with semantic information in the visual representation. Possible methods for creating modules of functionally related proteins are also proposed in this work. The results show that such modules, with advantage can be combined with modules from a graph decomposition, to find proteins that are likely to cooperate to perform certain functions in organisms

APA, Harvard, Vancouver, ISO, and other styles

32

Amaral, Laurence Rodrigues do. "Aplicando princípios de aprendizado de máquina na construção de um biocurador automático para o Gene Ontology (GO)." Universidade Federal de São Carlos, 2013. https://repositorio.ufscar.br/handle/ufscar/290.

Full text

Abstract:

Made available in DSpace on 2016-06-02T19:03:58Z (GMT). No. of bitstreams: 1 6030.pdf: 2345815 bytes, checksum: 385c6d8c1bda1d4afe540c01668338fa (MD5) Previous issue date: 2013-10-08 Nowadays, the amount of biological data available by universities, hospitals and research centers has increased exponentially due the use of bioinformatics, with the development of methods and advanced computational tools, and high-throughput techniques. Due to this significant increase in the amount of available data, new strategies for capture, storage and analysis of data are necessary. In this scenario, a new research area is developing, called biocuration. The biocuration is becoming a fundamental part in the biological and biomedical research, and the main function is related with the structuration and organization of the biological information, making it readable and accessible to mens and computers. Seeking to support a fast and reliable understanding of new domains, different initiatives are being proposed, and the Gene Ontology (GO) is one of the main examples. The GO is one the main initiatives in bioinformatics, whose main goal is to standardize the representation of genes and their products, providing interconnections between species and databases. Thus, the main objective of this research is to propose a computational architecture that uses principles of never-ending learning to help biocurators in new GO classifications. Nowadays, this classification task is totally manual. The proposed architecture uses semi-supervised learning combining different classifiers used in the classification of new GO samples. In addition, this research also aims to build high-level knowledge in the form of simple IF-THEN rules and decision trees. The generated knowledge can be used by the GO biocurators in the search for important patterns present in the biological data, revealing concise and relevant information about the application domain. Nos dias atuais, a quantidade de dados biológicos disponibilizados por universidades, hospitais e centros de pesquisa tem aumentado de forma exponencial, devido ao emprego da bio-informática, através do desenvolvimento de métodos e técnicas computacionais avançados, e de técnicas de high-throughput. Devido a esse significativo aumento na quantidade de dados disponibilizados, gerou-se a necessidade da criação de novas estratégias para captura, armazenamento e principalmente analise desses dados. Devido a esse cenário, um novo campo de trabalho e pesquisa vem surgindo, chamado biocuragem. A biocuragem está se tornando parte fundamental na pesquisa biomédica e biológica, e tem por principal função estruturar e organizar a informação biológica, tornando-a legível e acessível a homens e computadores. Buscando prover um rápido e confiável entendimento de novos domínios, diferentes iniciativas estão sendo propostas, tendo no Gene Ontology (GO) um dos seus principais exemplos. O GO se destaca mundialmente sendo uma das principais iniciativas em bioinformática, cuja principal meta e padronizar a representação dos genes e seus produtos, provendo interconexões entre espécies e bancos de dados. Dessa forma, objetiva-se com essa pesquisa propor uma arquitetura computacional que utiliza princípios de aprendizado de maquina sem-fim para auxiliar biocuradores do GO na tarefa de classificação de novos termos, tarefa essa, totalmente manual. A arquitetura proposta utiliza aprendizado semi-supervisionado combinando diferentes classificadores na rotulação de novas instâncias do GO. Além disso, essa pesquisa também tem por objetivo a construção de conhecimento de alto-nível na forma de simples regras SE-ENTÃO e árvores de decisão. Esse conhecimento gerado pode ser utilizado pelos biocuradores do GO na busca por padrões importantes presentes nos dados biológicos, revelando informações concisas e relevantes sobre o domínio da aplicação.

APA, Harvard, Vancouver, ISO, and other styles

33

Taniguti, Lucas Mitsuo. "Propagação semi-automática de termos Gene Ontology a proteínas com potencial biotecnológico para a produção de bioenergia." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/11/11137/tde-05012015-175313/.

Full text

Abstract:

O aumento no volume de dados biológicos, oriundos principalmente do surgimento de sequenciadores de segunda geração, configura um desafio para a manutenção dos bancos de dados, que devem armazenar, disponibilizar e, no caso de bancos secundários, propagar informações biológicas para sequências sem caracterização experimental. Tal propagação é crucial , pois o fluxo com que novas sequências são depositadas é muito superior ao que proteínas são experimentalmente caracterizadas. De forma análoga ao EC number (Enzyme Commission number), a organização de proteínas em famílias visa organizar e facilitar operações automáticas nos bancos de dados. Dentro desse contexto este trabalho teve como objetivos a geração de modelos computacionais para famílias de proteínas envolvidas em processos microbianos biotecnologicamente interessantes para a produção de bioenergia. Para a geração dos modelos estatísticos foram escolhidas proteínas referência analisadas a priori em colaboração com o projeto MENGO1 . A partir da proteína referência foram realizadas buscas no UniProtKB com o objetivo de encontrar proteínas representativas para cada família e descrições de função com base na literatura científica. Com a coleção de sequências primárias das proteínas selecionadas foram realizados alinhamentos múltiplos de sequências com o programa MUSCLE 3.7 e posteriormente com o programa HMMER foram gerados os modelos computacionais (perfis de cadeia oculta de Markov). Os modelos passaram por consecutivas revisões para serem utilizados na propagação dos termos do Gene Ontology com confiança.Um total de 1.233 proteínas puderam receber os termos GO. Dessas proteínas 79% não apresentavam os termos GO disponibilizados no banco de dados UniProtKB. Uma comparação dos perfis-HMM com a utilização de redes de similaridade a um E-value de 10-14 confirmou a utilidade dos modelos na propagação adequada dos termos. Uma segunda validação utilizando um banco de dados construído com sequências aleatórias com base nos modelos e na frequência de codons das proteínas anotadas do SwisProt permitiu verificar a sensibilidade da estratégia quanto a recuperar membros não pertencentes aos modelos gerados. The increase of biological data produced mainly by the second generation technologies stands as a challenge for the biological databases, that needs to adress issues like storage, data availability and, in the case of secondary databases, to propagate biological information to sequences with no experimental characterization. The propagation is important since the flow that new sequences are submited into databases is much higher than proteins having their function described by experiments. Similarly to the EC. number (Enzyme Commission number), an organization of protein families aims to organize and help automatic processes in databases. In this context this work had as goals the generation of computational models for protein families related to microbial processes with biotechnology potential for production of bioenergy. Several proteins annotated by MENGO2, a project in collaboration, were used as seeds to the statistic models. Alignments were made on UniProtKB, querying the seeds proteins, looking for representatives for each family generated and the existence of function descriptions referenced on the cientific literature. Multiple sequence alignment were made on each collection of seeds proteins, representatives of the families, thorough the MUSCLE 3.7 program, and after were generated the computational models (profile Hidden Markov Models) with the HMMER package. The models were consecutively reviewed until the curator consider it reliable for propagation of Gene Ontology terms. A set of 1,233 proteins from UniProtKB were classified in our families, suggesting that they could be annotated by the GO terms using MENGOfams families. From those proteins, 79% were not annotated by the MENGO specific GO terms. To compare the results that would be obtained using only BLAST similarity measures and using pHMMs we generated similarity networks, using an Evaue cutoff of 10-14. The results showed that the classification results of pHMMs are valuable for biological annotation propagation because it identifies precisely members of each family. A second analysis was applied for each family, using the respective pHMMs to query a collection of sequences generated by a null model. For null model were assumed that all sequences were not homologous and could be represented just by the aminoacid frequencies observed in the SwissProt database. No non-homologous proteins were classified as members by the MENGOfams models, suggesting that they were sensitive to identify only true member sequences.

APA, Harvard, Vancouver, ISO, and other styles

34

Jain, Vishal. "Integrative approaches to modelling and knowledge discovery of molecular interactions in bioinformatics." Click here to access this resource online, 2008. http://hdl.handle.net/10292/439.

Full text

Abstract:

The core focus of this research lies in developing and using intelligent methods to solve biological problems and integrating the knowledge for understanding the complex gene regulatory phenomenon. We have developed an integrative framework and used it to: model molecular interactions from separate case studies on time-series gene expression microarray datasets, molecular sequences and structure data including the functional role of microRNAs; to extract knowledge; and to build reusable models for the central dogma theme. Knowledge was integrated with the use of ontology and it can be reused to facilitate new discoveries as demonstrated on one of our systems – the Brain Gene Ontology (BGO). The central dogma theme states that proteins are produced from the DNA (gene) via an intermediate transcript called RNA. Later these proteins play the role of enzymes to perform the checkpoints as a gene expression control. Also, according to the recently emerged paradigm, sometimes genes do not code for proteins but results in small molecules of microRNAs which in turn controls the gene regulation. The idea is that such a very complicated molecular biology process (central dogma) results in production of a wide variety of data that can be used by computer scientists for modelling and to enable discoveries. We have suggested that this range of data should actually be taken into account for analysis to understand the concept of gene regulation instead of just taking one source of data and applying some standard methods to reveal facts in the system biology. The problem is very complex and, currently, computational algorithms have not been really successful because either existing methods have certain problems or the proven results were obtained for only one domain of the central dogma of molecular biology, so there has always been a lack of knowledge integration. Proper maintenance of diverse sources of data, structures and, in particular, their adaptation to new knowledge is one of the most challenging problems and one of the crucial tasks towards the knowledge integration vision is the efficient encoding of human knowledge in ontologies. More specifically this work has contributed towards the development of novel computational and information science methods and we have promoted the vision of knowledge integration by developing brain gene ontology (BGO) system. With the integrative use of several bioinformatics methods, this research has indeed resulted in modelling of such knowledge that has not been revealed in system biology so far. There are many discoveries made during my study and some of the findings are briefly mentioned as follows: (1) in relation to leukaemia disease we have discovered a new gene “TCF-1” that interacts with the “telomerase” gene. (2) With respect to yeast cell cycle analysis, we hypothesize that exoglucanase gene “exg1” is now implicated to be tied with “MCB cluster regulation” and a “mannosidase” with “histone linked mannoses”. A new quantitative prediction is that the time delay of the interaction between two genes seems to be approximately 30 minutes, or 0.17 cell cycles. Next, Cdc22, Suc22 and Mrc1 genes were discovered that interacts with each other as the potential candidates in controlling the Ribonucleotide reductase (RNR) activity. (3) Upon studying the phenomenon of Long Term Potentiation (LTP) it was found that the transcription factors, responsible for regulation of gene expression, begin to be elevated as soon as 30 min after induction of LTP, and remain elevated up to 2 hours. (4) Human microRNA data investigation resulted in the successful identification of two miRNA families i.e. let-7 and mir-30. (5) When we analysed the CNS cancer data, a set of 10 genes (HMG-I(Y), NBL1, UBPY, Dynein, APC, TARBP2, hPGT, LTC4S, NTRK3, and Gps2) was found to give 85% correct prediction on drug response. (6) Upon studying the AMPA, GABRA and NMDA receptors we hypothesize that phenylalanine (F at position 269) and leucine (L at position 353) in these receptors play the role of a binding centre for their interaction with several other genes/proteins such as c-jun, mGluR3, Jerky, BDNF, FGF-2, IGF-1, GALR1, NOS and S100beta. All the developed methods that we have used to discover above mentioned findings are very generic and can be easily applied on any dataset with some constraints. We believe that this research has established the significant fact that integrative use of various computational intelligence methods is critical to reveal new aspects of the problem and finally knowledge integration is also a must. During this coursework, I have significantly published this research in reputed international journals, presented results in several conferences and also produced book chapters.

APA, Harvard, Vancouver, ISO, and other styles

35

Bahurek, Tomáš. "Dotazovací jazyk pro databáze biologických dat." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234998.

Full text

Abstract:

With rising amount of biological data, biological databases are becoming more important each day. Knowledge discovery (identification of connections that were unknown at the time of data entry) is an essential aspect of these databases. To gain knowledge from these databases one has to construct complicated SQL queries, which requires advanced knowledge of SQL language and used database schema. Biologists usually don't have this knowledge, which creates need for tool, that would offer more intuitive interface for querying biological databases. This work proposes ChQL, an intuitive query language for biological database Chado. ChQL allows biologists to assemble query using terms they are familiar without knowledge of SQL language or Chado database schema. This work implements application for querying Chado database using ChQL. Web interface guides user through process of assembling sentence in ChQL. Application translates this sentence to SQL query, sends it to Chado database and displays returned data in table. Results are evaluated by testing queries on real data.

APA, Harvard, Vancouver, ISO, and other styles

36

Doms, Andreas. "GoPubMed: Ontology-based literature search for the life sciences." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2009. http://nbn-resolving.de/urn:nbn:de:bsz:14-ds-1232454035091-47450.

Full text

Abstract:

Background: Most of our biomedical knowledge is only accessible through texts. The biomedical literature grows exponentially and PubMed comprises over 18.000.000 literature abstracts. Recently much effort has been put into the creation of biomedical ontologies which capture biomedical facts. The exploitation of ontologies to explore the scientific literature is a new area of research. Motivation: When people search, they have questions in mind. Answering questions in a domain requires the knowledge of the terminology of that domain. Classical search engines do not provide background knowledge for the presentation of search results. Ontology annotated structured databases allow for data-mining. The hypothesis is that ontology annotated literature databases allow for text-mining. The central problem is to associate scientific publications with ontological concepts. This is a prerequisite for ontology-based literature search. The question then is how to answer biomedical questions using ontologies and a literature corpus. Finally the task is to automate bibliometric analyses on an corpus of scientific publications. Approach: Recent joint efforts on automatically extracting information from free text showed that the applied methods are complementary. The idea is to employ the rich terminological and relational information stored in biomedical ontologies to markup biomedical text documents. Based on established semantic links between documents and ontology concepts the goal is to answer biomedical question on a corpus of documents. The entirely annotated literature corpus allows for the first time to automatically generate bibliometric analyses for ontological concepts, authors and institutions. Results: This work includes a novel annotation framework for free texts with ontological concepts. The framework allows to generate recognition patterns rules from the terminological and relational information in an ontology. Maximum entropy models can be trained to distinguish the meaning of ambiguous concept labels. The framework was used to develop a annotation pipeline for PubMed abstracts with 27,863 Gene Ontology concepts. The evaluation of the recognition performance yielded a precision of 79.9% and a recall of 72.7% improving the previously used algorithm by 25,7% f-measure. The evaluation was done on a manually created (by the original authors) curation corpus of 689 PubMed abstracts with 18,356 curations of concepts. Methods to reason over large amounts of documents with ontologies were developed. The ability to answer questions with the online system was shown on a set of biomedical question of the TREC Genomics Track 2006 benchmark. This work includes the first ontology-based, large scale, online available, up-to-date bibliometric analysis for topics in molecular biology represented by GO concepts. The automatic bibliometric analysis is in line with existing, but often out-dated, manual analyses. Outlook: A number of promising continuations starting from this work have been spun off. A freely available online search engine has a growing user community. A spin-off company was funded by the High-Tech Gründerfonds which commercializes the new ontology-based search paradigm. Several off-springs of GoPubMed including GoWeb (general web search), Go3R (search in replacement, reduction, refinement methods for animal experiments), GoGene (search in gene/protein databases) are developed.

APA, Harvard, Vancouver, ISO, and other styles

37

Doms, Andreas. "GoPubMed: Ontology-based literature search for the life sciences." Doctoral thesis, Technische Universität Dresden, 2008. https://tud.qucosa.de/id/qucosa%3A23835.

Full text

Abstract:

Background: Most of our biomedical knowledge is only accessible through texts. The biomedical literature grows exponentially and PubMed comprises over 18.000.000 literature abstracts. Recently much effort has been put into the creation of biomedical ontologies which capture biomedical facts. The exploitation of ontologies to explore the scientific literature is a new area of research. Motivation: When people search, they have questions in mind. Answering questions in a domain requires the knowledge of the terminology of that domain. Classical search engines do not provide background knowledge for the presentation of search results. Ontology annotated structured databases allow for data-mining. The hypothesis is that ontology annotated literature databases allow for text-mining. The central problem is to associate scientific publications with ontological concepts. This is a prerequisite for ontology-based literature search. The question then is how to answer biomedical questions using ontologies and a literature corpus. Finally the task is to automate bibliometric analyses on an corpus of scientific publications. Approach: Recent joint efforts on automatically extracting information from free text showed that the applied methods are complementary. The idea is to employ the rich terminological and relational information stored in biomedical ontologies to markup biomedical text documents. Based on established semantic links between documents and ontology concepts the goal is to answer biomedical question on a corpus of documents. The entirely annotated literature corpus allows for the first time to automatically generate bibliometric analyses for ontological concepts, authors and institutions. Results: This work includes a novel annotation framework for free texts with ontological concepts. The framework allows to generate recognition patterns rules from the terminological and relational information in an ontology. Maximum entropy models can be trained to distinguish the meaning of ambiguous concept labels. The framework was used to develop a annotation pipeline for PubMed abstracts with 27,863 Gene Ontology concepts. The evaluation of the recognition performance yielded a precision of 79.9% and a recall of 72.7% improving the previously used algorithm by 25,7% f-measure. The evaluation was done on a manually created (by the original authors) curation corpus of 689 PubMed abstracts with 18,356 curations of concepts. Methods to reason over large amounts of documents with ontologies were developed. The ability to answer questions with the online system was shown on a set of biomedical question of the TREC Genomics Track 2006 benchmark. This work includes the first ontology-based, large scale, online available, up-to-date bibliometric analysis for topics in molecular biology represented by GO concepts. The automatic bibliometric analysis is in line with existing, but often out-dated, manual analyses. Outlook: A number of promising continuations starting from this work have been spun off. A freely available online search engine has a growing user community. A spin-off company was funded by the High-Tech Gründerfonds which commercializes the new ontology-based search paradigm. Several off-springs of GoPubMed including GoWeb (general web search), Go3R (search in replacement, reduction, refinement methods for animal experiments), GoGene (search in gene/protein databases) are developed.

APA, Harvard, Vancouver, ISO, and other styles

38

Griffith, Obi Lee. "Identification of gene expression changes in human cancer using bioinformatic approaches." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/689.

Full text

Abstract:

The human genome contains tens of thousands of gene loci which code for an even greater number of protein and RNA products. The highly complex temporal and spatial expression of these genes makes possible all the biological processes of life. Altered gene expression by mutation or deregulation is fundamental for the development of many human diseases. The ultimate aim of this thesis was to identify gene expression changes relevant to cancer. The advent of genome-wide expression profiling techniques, such as microarrays, has provided powerful new tools to identify such changes and researchers are now faced with an explosion of gene expression data. Processing, comparing and integrating these data present major challenges. I approached these challenges by developing and assessing novel methods for cross-platform analysis of expression data, scalable subspace clustering, and curation of experimental gene regulation data from the published literature. I found that combining results from different expression platforms increases reliability of coexpression predictions. However, I also observed that global correlation between platforms was generally low, and few gene pairs reached reasonable thresholds for high-confidence coexpression. Therefore, I developed a novel subspace clustering algorithm, able to identify coexpressed genes in experimental subsets of very large gene expression datasets. Biological assessment against several metrics indicates that this algorithm performs well. I also developed a novel meta-analysis method to identify consistently reported genes from differential expression studies when raw data are unavailable. This method was applied to thyroid cancer, producing a ranked list of significantly over-represented genes. Tissue microarray analysis of some of these candidates and others identified a number of promising biomarkers for diagnostic and prognostic classification of thyroid cancer. Finally, I present ORegAnno (www.oreganno.org), a resource for the community-driven curation of experimentally verified regulatory sequences. This resource has proven a great success with ~30,000 sequences entered from over 900 publications by ~50 contributing users. These data, methods and resources contribute to our overall understanding of gene regulation, gene expression, and the changes that occur in cancer. Such an understanding should help identify new cancer mechanisms, potential treatment targets, and have significant diagnostic and prognostic implications.

APA, Harvard, Vancouver, ISO, and other styles

39

Dockter, Rhyan B. "Genome Snapshot and Molecular Marker Development in Penstemon (Plantaginaceae)." BYU ScholarsArchive, 2011. https://scholarsarchive.byu.edu/etd/2512.

Full text

Abstract:

Penstemon Mitchell (Plantaginaceae) is one of the largest, most diverse plant genera in North America. Their unique diversity, paired with their drought-tolerance and overall hardiness, give Penstemon a vast amount of potential in the landscaping industry—especially in the more arid western United States where they naturally thrive. In order to develop Penstemon lines for more widespread commercial and private landscaping use, we must improve our understanding of the vast genetic diversity of the genus on a molecular level. In this study we utilize genome reduction and barcoding to optimize 454-pyrosequencing in four target species of Penstemon (P. cyananthus, P. davidsonii, P. dissectus and P. fruticosus). Sequencing and assembly produced contigs representing an average of 0.5% of the Penstemon species. From the sequence, SNP information and microsatellite markers were extracted. One hundred and thirty-three interspecific microsatellite markers were discovered, of which 50 met desired primer parameters, and were of high quality with readable bands on 3% Metaphor gels. Of the microsatellite markers, 82% were polymorphic with an average heterozygosity value of 0.51. An average of one SNP in 2,890 bp per species was found within the individual species assemblies and one SNP in 97 bp were found between any two supposed homologous sequences of the four species. An average of 21.5% of the assembled contigs were associated with putative genes involved in cellular components, biological processes, and molecular functions. On average 19.7% of the assembled contigs were identified as repetitive elements of which LTRs, DNA transposons and other unclassified repeats, were discovered. Our study demonstrates the effectiveness of using the GR-RSC technique to selectively reduce the genome size to putative homologous sequence in different species of Penstemon. It has also enabled us the ability to gain greater insights into microsatellite, SNP, putative gene and repetitive element content in the Penstemon genome which provide essential tools for further genetic work including plant breeding and phylogenetics.

APA, Harvard, Vancouver, ISO, and other styles

40

Blank, Carrine E., Hong Cui, Lisa R. Moore, and Ramona L. Walls. "MicrO: an ontology of phenotypic and metabolic characters, assays, and culture media found in prokaryotic taxonomic descriptions." BIOMED CENTRAL LTD, 2016. http://hdl.handle.net/10150/614758.

Full text

Abstract:

Background: MicrO is an ontology of microbiological terms, including prokaryotic qualities and processes, material entities (such as cell components), chemical entities (such as microbiological culture media and medium ingredients), and assays. The ontology was built to support the ongoing development of a natural language processing algorithm, MicroPIE (or, Microbial Phenomics Information Extractor). During the MicroPIE design process, we realized there was a need for a prokaryotic ontology which would capture the evolutionary diversity of phenotypes and metabolic processes across the tree of life, capture the diversity of synonyms and information contained in the taxonomic literature, and relate microbiological entities and processes to terms in a large number of other ontologies, most particularly the Gene Ontology (GO), the Phenotypic Quality Ontology (PATO), and the Chemical Entities of Biological Interest (ChEBI). We thus constructed MicrO to be rich in logical axioms and synonyms gathered from the taxonomic literature. Results: MicrO currently has similar to 14550 classes (similar to 2550 of which are new, the remainder being microbiologically-relevant classes imported from other ontologies), connected by similar to 24,130 logical axioms (5,446 of which are new), and is available at (http://purl.obolibrary.org/obo/MicrO.owl) and on the project website at https://github.com/carrineblank/MicrO. MicrO has been integrated into the OBO Foundry Library (http://www.obofoundry.org/ontology/micro.html), so that other ontologies can borrow and re-use classes. Term requests and user feedback can be made using MicrO's Issue Tracker in GitHub. We designed MicrO such that it can support the ongoing and future development of algorithms that can leverage the controlled vocabulary and logical inference power provided by the ontology. Conclusions: By connecting microbial classes with large numbers of chemical entities, material entities, biological processes, molecular functions, and qualities using a dense array of logical axioms, we intend MicrO to be a powerful new tool to increase the computing power of bioinformatics tools such as the automated text mining of prokaryotic taxonomic descriptions using natural language processing. We also intend MicrO to support the development of new bioinformatics tools that aim to develop new connections between microbial phenotypes and genotypes (i.e., the gene content in genomes). Future ontology development will include incorporation of pathogenic phenotypes and prokaryotic habitats.

APA, Harvard, Vancouver, ISO, and other styles

41

Bettembourg, Charles. "Méthodes sémantiques pour la comparaison inter-espèces de voies métaboliques : application au métabolisme des lipides chez l'humain, la souris et la poule." Phd thesis, Université Rennes 1, 2013. http://tel.archives-ouvertes.fr/tel-00926498.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Saunders, Garret. "Family-Wise Error Rate Control in Quantitative Trait Loci (QTL) Mapping and Gene Ontology Graphs with Remarks on Family Selection." DigitalCommons@USU, 2014. https://digitalcommons.usu.edu/etd/2164.

Full text

Abstract:

The main aim of this dissertation is to meet real needs of practitioners in multiple hypothesis testing. The issue of multiplicity has become a signicant concern in most elds of research as computational abilities have increased, allowing for the simultaneous testing of many (thousands or millions) statistical hypothesis tests. While many error rates have been dened to address this issue of multiplicity, this work considers only the most natural generalization of the Type I Error rate to multiple tests, the family-wise error rate (FWER). Much work has already been done to establish powerful yet general methods which control the FWER under arbitrary dependencies among tests. This work both introduces these methods and expands upon them as is detailed through its four main chapters. Chapter 1 contains general introductions and preliminaries important to the remainder of the work, particularly a previously published graphical weighted Bonferroni multiplicity adjustment. Chapter 2 then applies the principles introduced in Chapter 1 to achieve a substantial computational improvement to an existing FWER controlling multiplicity approach (the Focus Level method) for gene set testing in high throughput microarray and next generation sequencing studies using Gene Ontology graphs. This improvement to the Focus Level procedure, which we call the Short Focus Level procedure, is achieved by extending the reach of graphical weighted Bonferroni testing to closed testing situations where restricted hypotheses are present. This is accomplished through Theorem 1 of Chapter 2. As a result of the improvement, the full top-down approach to the Focus Level procedure can now be performed, overcoming a signicant disadvantage of the otherwise powerful approach to multiple testing. Chapter 3 presents a solution to a multiple testing diculty within quantitative trait loci (QTL) mapping in natural populations for QTL LD (linkage disequilibrium) mapping models. Such models apply a two-hypothesis framework to the testing of thousands of genetic markers across the genome in search of QTL underlying a quantitative trait of interest. Inherent to the model is an unidentiability issue where a parameter of interest is identiable only under the alternative hypothesis. Through a second application of graphical weighted Bonferroni methods we show how the multiplicity can be accounted for while simultaneously accounting for the required logical structuring of the testing such that identiability is preserved. Finally, Chapter 4 details some of the diculties associated with the distributional assumptions for the test statistics of the two hypotheses of the LDbased QTL mapping framework. A novel bivariate testing strategy is proposed for these test statistics in order to overcome these distributional diculties while preserving power in the multiplicity correction by reducing the number of tests performed. Chapter 5 concludes the work with a summary of the main contributions and future research goals aimed at continual improvement to the multiple testing issues inherent to both the elds of genetics and genomics.

APA, Harvard, Vancouver, ISO, and other styles

43

Saunders, Garrett. "Family-Wise Error Rate Control in Quantitative Trait Loci (QTL) Mapping and Gene Ontology Graphs with Remarks on Family Selection." DigitalCommons@USU, 2014. https://digitalcommons.usu.edu/etd/7021.

Full text

Abstract:

One of the great aims of statistics, the science of collecting, analyzing, and interpreting data, is to protect against the probability of falsely rejecting an accepted claim, or hypothesis, given observed data stemming from some experiment. This is generally known as protecting against a Type I Error, or controlling the Type I Error rate. The extension of this protection against Type I Errors to the situation where thousands upon thousands of hypotheses are examined simultaneously is known as multiple hypothesis testing. This dissertation presents an improvement to an existing multiple hypothesis testing approach, the Focus Level method, specific to gene set testing (a branch of genomics) on Gene Ontology graphs. This improvement resolves a long standing computational difficulty of the Focus Level method, providing more than a 15.000-fold increase in computational efficiency. This dissertation also presents a solution to a multiple testing problem in genetics where a specific approach to mapping genes underlying quantitative traits of interest requires a multiplicity adjustment approach that both corrects for the number of tests while also ensuring logical consistency. The power advantage of the solution is demonstrated over the current standard approach to the problem. A side issue of this model framework led to the development of a new bivariate approach to quantitative trait marker detection, which is presented herein. The overall contribution of this dissertation to the statistics literature is that it provides novel solutions that meet real needs of practitioners in genetics and genomics with the aim of ensuring both that truth is discovered and that discoveries are actually true.

APA, Harvard, Vancouver, ISO, and other styles

44

Bedhiafi, Walid. "Sciences de l'information pour l'étude des systèmes biologiques (exemple du vieillissement du système immunitaire)." Electronic Thesis or Diss., Paris 6, 2017. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2017PA066139.pdf.

Full text

Abstract:

Le laboratoire i3 et le laboratoire LGIPH, utilisent des approches à haut débit pour l’étude du système immunitaire et ces disfonctionnements. Des limites ont été observées quant à l’utilisation des approches classiques pour l’annotation des signatures d’expression des gènes. L’objectif principal a été de développer une approche d’annotation pour répondre à ce besoin. L’approche que nous avons développée est une approche basée sur la contextualisation des gènes et de leurs produits puis sur la modélisation des voies biologiques pour la production de bases de connaissances pour l’étude de l’expression des gènes. Nous définissons ici un contexte d’expression des gènes comme suit : population cellulaire+compartiment anatomique+état pathologique. Pour connaitre ces contextes, nous avons opté pour la fouille de la littérature et nous avons développé un package Python, qui permet d’annoter les textes automatiquement en fonction de trois ontologies choisies en fonction de notre définition du contexte. Nous montrons ici que notre package a des performances meilleures que un outil de référence. Nous avons l’avons utilisé pour le criblage d’un corpus sur le vieillissement du système immunitaire dont on présente ici les résultats. Pour la modélisation des voies biologiques nous avons développé en collaboration avec le LIPAH une méthode de modélisation basée sur un algorithme génétique qui permet de combiner les résultats de mesure de la proximité sémantique sur la base des annotations des gènes et les données d’interactions. Nous avons réussis retrouver des réseaux de références avec un taux d’erreur de 0,47 High-throughput experimental approaches for gene expression study involve several processing steps for the quantification, the annotation and interpretation of the results. The i3 lab and the LGIPH, applies these approaches in various experimental setups. However, limitations have been observed when using conventional approaches for annotating gene expression signatures. The main objective of this thesis was to develop an alternative annotation approach to overcome this problem. The approach we have developed is based on the contextualization of genes and their products, and then biological pathways modeling to produce a knowledge base for the study of gene expression. We define a gene expression context as follows: cell population+ anatomical compartment+ pathological condition. For the production of gene contexts, we have opted for the massive screening of literature. We have developed a Python package, which allows annotating the texts according to three ontologies chosen according to our definition of the context. We show here that it ensures better performance for text annotation the reference tool. We used our package to screen an aging immune system text corpus. The results are presented here. To model the biological pathways we have developed, in collaboration with the LIPAH lab a modeling method based on a genetic algorithm that allows combining the results semantics proximity using the Biological Process ontology and the interactions data from db-string. We were able to find networks with an error rate of 0.47

APA, Harvard, Vancouver, ISO, and other styles

45

Hassan, Aamir Ul. "Integration of Genome Scale Data for Identifying New Biomarkers in Colon Cancer: Integrated Analysis of Transcriptomics and Epigenomics Data from High Throughput Technologies in Order to Identifying New Biomarkers Genes for Personalised Targeted Therapies for Patients Suffering from Colon Cancer." Thesis, University of Bradford, 2017. http://hdl.handle.net/10454/17419.

Full text

Abstract:

Colorectal cancer is the third most common cancer and the leading cause of cancer deaths in Western industrialised countries. Despite recent advances in the screening, diagnosis, and treatment of colorectal cancer, an estimated 608,000 people die every year due to colon cancer. Our current knowledge of colorectal carcinogenesis indicates a multifactorial and multi-step process that involves various genetic alterations and several biological pathways. The identification of molecular markers with early diagnostic and precise clinical outcome in colon cancer is a challenging task because of tumour heterogeneity. This Ph.D.-thesis presents the molecular and cellular mechanisms leading to colorectal cancer. A systematical review of the literature is conducted on Microarray Gene expression profiling, gene ontology enrichment analysis, microRNA and system Biology and various bioinformatics tools. We aimed this study to stratify a colon tumour into molecular distinct subtypes, identification of novel diagnostic targets and prediction of reliable prognostic signatures for clinical practice using microarray expression datasets. We performed an integrated analysis of gene expression data based on genetic, epigenetic and extensive clinical information using unsupervised learning, correlation and functional network analysis. As results, we identified 267-gene and 124-gene signatures that can distinguish normal, primary and metastatic tissues, and also involved in important regulatory functions such as immune-response, lipid metabolism and peroxisome proliferator-activated receptors (PPARs) signalling pathways. For the first time, we also identify miRNAs that can differentiate between primary colon from metastatic and a prognostic signature of grade and stage levels, which can be a major contributor to complex transcriptional phenotypes in a colon tumour.

APA, Harvard, Vancouver, ISO, and other styles

46

Stiehler, Maik, Juliane Rauh, Cody Bünger, et al. "Large-scale gene expression profiling data of bone marrow stromal cells from osteoarthritic donors." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-217567.

Full text

Abstract:

This data article contains data related to the research article entitled, "in vitro characterization of bone marrow stromal cells from osteoarthritic donors" [1]. Osteoarthritis (OA) represents the main indication for total joint arthroplasty and is one of the most frequent degenerative joint disorders. However, the exact etiology of OA remains unknown. Bone marrow stromal cells (BMSCs) can be easily isolated from bone marrow aspirates and provide an excellent source of progenitor cells. The data shows the identification of pivotal genes and pathways involved in osteoarthritis by comparing gene expression patterns of BMSCs from osteoarthritic versus healthy donors using an array-based approach.

APA, Harvard, Vancouver, ISO, and other styles

47

Kim, Wooyoung. "Innovative Algorithms and Evaluation Methods for Biological Motif Finding." Digital Archive @ GSU, 2012. http://digitalarchive.gsu.edu/cs_diss/63.

Full text

Abstract:

Biological motifs are defined as overly recurring sub-patterns in biological systems. Sequence motifs and network motifs are the examples of biological motifs. Due to the wide range of applications, many algorithms and computational tools have been developed for efficient search for biological motifs. Therefore, there are more computationally derived motifs than experimentally validated motifs, and how to validate the biological significance of the ‘candidate motifs’ becomes an important question. Some of sequence motifs are verified by their structural similarities or their functional roles in DNA or protein sequences, and stored in databases. However, biological role of network motifs is still invalidated and currently no databases exist for this purpose. In this thesis, we focus not only on the computational efficiency but also on the biological meanings of the motifs. We provide an efficient way to incorporate biological information with clustering analysis methods: For example, a sparse nonnegative matrix factorization (SNMF) method is used with Chou-Fasman parameters for the protein motif finding. Biological network motifs are searched by various clustering algorithms with Gene ontology (GO) information. Experimental results show that the algorithms perform better than existing algorithms by producing a larger number of high-quality of biological motifs. In addition, we apply biological network motifs for the discovery of essential proteins. Essential proteins are defined as a minimum set of proteins which are vital for development to a fertile adult and in a cellular life in an organism. We design a new centrality algorithm with biological network motifs, named MCGO, and score proteins in a protein-protein interaction (PPI) network to find essential proteins. MCGO is also combined with other centrality measures to predict essential proteins using machine learning techniques. We have three contributions to the study of biological motifs through this thesis; 1) Clustering analysis is efficiently used in this work and biological information is easily integrated with the analysis; 2) We focus more on the biological meanings of motifs by adding biological knowledge in the algorithms and by suggesting biologically related evaluation methods. 3) Biological network motifs are successfully applied to a practical application of prediction of essential proteins.

APA, Harvard, Vancouver, ISO, and other styles

48

Wang, Yufei. "Ontology engineering the brain gene ontology case study : submitted by Yufei Wang ... in partial fulfillment of the requirements for the degree of Master of Computer and Information Sciences, Auckland University of Technology, March 2007." Click here access this resource online, 2007. http://aut.researchgateway.ac.nz/handle/10292/104.

Full text

Abstract:

Thesis (MCIS - Computer and Information Sciences) --AUT University, 2007. Includes bibliographical references. Also held in print (ix, 74 leaves : ill. ; 30 cm.) in City Campus Theses Collection (T 006.33 WAN)

APA, Harvard, Vancouver, ISO, and other styles

49

REHMAN, HAFEEZ UR. "Integration and Analysis of Heterogeneous Biological Data." Doctoral thesis, Politecnico di Torino, 2014. http://hdl.handle.net/11583/2537092.

Full text

Abstract:

We live in the era of networks. The power of networks is the most fundamental driving force behind the machinery of life. Living bodies stay alive through complex inter-regulations of biochemical networks and information flows through these networks with such a great intensity and complexity that it exceeds anything that the human ingenuity has been able to spawn so far. Due to this overwhelming complexity we have begun to see a rapid rise in studies aimed at explaining the fundamental concepts and hidden properties of such complex systems. This thesis provides a strong foundation of using networks to understand complex biological phenomenon like protein functions, as well as more accurate method of modeling gene regulatory networks. In the first part we presented a methodology that uses existing biological data with gene ontology functional dependencies to infer functions of uncharacterized proteins. We combined different sources of structural and functional information along with gene ontology based term-specific relationships to predict precise functions of unannotated proteins. Such term-specific relationships, defined to clearly identify the functional contexts of each activity among the interacting proteins, which enables a dramatical improvement of the annotation accuracy with respect to previous approaches. The presented methodology may be easily extended to integrate more sources of biological information to further improve the function prediction confidence. In the second part of this thesis we discussed an extended BN model to account for post-transcriptional regulation in GRN simulation. Thanks to this extended model, we discussed the set of attractors of two biologically confirmed networks, focusing on the regulatory role of miR-7. Attractors have been compared with networks in which the miRNA was removed. The central role of the miRNA for increasing the network stability has been highlighted in both the networks, confirming the cooperative stabilizing role of miR-7. The enhanced BN model presented in this thesis is only a first step towards a more realistic analysis of the high-level functional and topological characteristics of GRNs. Resorting to the tool facilities, the dynamics of real networks can be analyzed. Thanks to the extended model that includes post-transcriptional regulations, not only the network simulation can be more reliable, but also it can offer new insights on the role of miRNAs from a functional perspective, and this improves the current state-of-the-art, which mostly focuses on high-level gene/gene or gene/protein interactions, neglecting post-transcriptional regulations. Due to its discrete nature, the BN model may still neglect some regulatory fine adjustments. However, the largest number of the computed attractors, now including miRNAs, still represents meaningful states of the network. The simple glimpse into the complexity of the network dynamics, that the toolkit is able to provide, could be used not only as a validation of in vitro experiments, but as a real System Biology tool able to rise new questions and drive new experiments.

APA, Harvard, Vancouver, ISO, and other styles

50

Stiehler, Maik, Juliane Rauh, Cody Bünger, et al. "Large-scale gene expression profiling data of bone marrow stromal cells from osteoarthritic donors." Elsevier, 2016. https://tud.qucosa.de/id/qucosa%3A30119.

Full text

Abstract:

This data article contains data related to the research article entitled, 'in vitro characterization of bone marrow stromal cells from osteoarthritic donors' [1]. Osteoarthritis (OA) represents the main indication for total joint arthroplasty and is one of the most frequent degenerative joint disorders. However, the exact etiology of OA remains unknown. Bone marrow stromal cells (BMSCs) can be easily isolated from bone marrow aspirates and provide an excellent source of progenitor cells. The data shows the identification of pivotal genes and pathways involved in osteoarthritis by comparing gene expression patterns of BMSCs from osteoarthritic versus healthy donors using an array-based approach.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Gene ontology'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles