Dissertations / Theses on the topic 'Omic data'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Omic data.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Guan, Xiaowei. "Bioinformatics Approaches to Heterogeneous Omic Data Integration." Case Western Reserve University School of Graduate Studies / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=case1340302883.
Full textXiao, Hui. "Network-based approaches for multi-omic data integration." Thesis, University of Cambridge, 2019. https://www.repository.cam.ac.uk/handle/1810/289716.
Full textZuo, Yiming. "Differential Network Analysis based on Omic Data for Cancer Biomarker Discovery." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/78217.
Full textPh. D.
Tsai, Tsung-Heng. "Bayesian Alignment Model for Analysis of LC-MS-based Omic Data." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/64151.
Full textPh. D.
Ruffalo, Matthew M. "Algorithms for Constructing Features for Integrated Analysis of Disparate Omic Data." Case Western Reserve University School of Graduate Studies / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=case1449238712.
Full textElhezzani, Najla Saad R. "New statistical methodologies for improved analysis of genomic and omic data." Thesis, King's College London (University of London), 2018. https://kclpure.kcl.ac.uk/portal/en/theses/new-statistical-methodologies-for-improved-analysis-of-genomic-and-omic-data(eb8d95f4-e926-4c54-984f-94d86306525a).html.
Full textElsheikh, Samar Salah Mohamedahmed. "Integration of multi-omic data and neuroimaging characteristics in studying brain related diseases." Doctoral thesis, Faculty of Health Sciences, 2020. http://hdl.handle.net/11427/32609.
Full textEhrenberger, Tobias. "Cancer systems biology : functional insights and therapeutic strategies for medulloblastoma from omic data integration." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123062.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 151-167).
Medulloblastoma (MB) is a chiefly pediatric cancer of the cerebellum that has been studied extensively using genomic, epigenomic, and transcriptomic data. It comprises at least four molecularly distinct subgroups: WNT, SHH, Group 3, and Group 4. Despite the detailed characterization of MB, many disease-driving events remain to be elucidated and therapeutic targets to be nominated. In this thesis, we describe three studies that contribute to a better understanding of this devastating disease: First, we describe a study that aims to fully describe the genomic landscape in the largest medulloblastoma cohort to date, using 491 sequenced MB tumors and 1,256 epigenetically analyzed cases. This work describes subgroup-specific driver alterations including previously unappreciated actionable targets; and, based on epigenetic data, identifies further heterogeneity within Group 3 and Group 4 tumors. Second, we focus on the proteomes and phospho-proteomes of 45 medulloblastoma samples.
We identified distinct pathways associated with two subsets of SHH tumors that showed robustly distinct proteomes, but similar transcriptomes, and found post-translational modifications of MYC that are associated with poor outcomes in Group 3 tumors. We also found kinases associated with subtypes and showed that inhibiting PRKDC sensitizes MYC-driven cells to radiation. This study shows that proteomics enables a more comprehensive, functional readout, providing a foundation for future therapeutic strategies. Third, we characterize the metabolomic space of MB on largely the same 45 tumors as used in the proteome-focused study. Here, we present preliminary insights from derived from integrative network and other analyses. We find that MB consensus subgroups are preserved in metabolic space, and that certain classes of metabolites are elevated in MYC-activated MB.
We also show that, similar to other cancers, a previously described gain-of-function mutation in IDH1 may cause elevated 2-hydroxyglutarate levels in MB. The work described in this thesis significantly enhances previous knowledge of medulloblastoma and its subgroups, and provides insights that may aid in the development of medulloblastoma therapies in the near future.
by Tobias Ehrenberger.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Biological Engineering
Curti, Nico. "Implementazione e benchmarking dell'algoritmo QDANet PRO per l'analisi di big data genomici." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/12018/.
Full textArsenteva, Polina. "Statistical modeling and analysis of radio-induced adverse effects based on in vitro and in vivo data." Electronic Thesis or Diss., Bourgogne Franche-Comté, 2023. http://www.theses.fr/2023UBFCK074.
Full textIn this work we address the problem of adverse effects induced by radiotherapy on healthy tissues. The goal is to propose a mathematical framework to compare the effects of different irradiation modalities, to be able to ultimately choose those treatments that produce the minimal amounts of adverse effects for potential use in the clinical setting. The adverse effects are studied in the context of two types of data: in terms of the in vitro omic response of human endothelial cells, and in terms of the adverse effects observed on mice in the framework of in vivo experiments. In the in vitro setting, we encounter the problem of extracting key information from complex temporal data that cannot be treated with the methods available in literature. We model the radio-induced fold change, the object that encodes the difference in the effect of two experimental conditions, in the way that allows to take into account the uncertainties of measurements as well as the correlations between the observed entities. We construct a distance, with a further generalization to a dissimilarity measure, allowing to compare the fold changes in terms of all the important statistical properties. Finally, we propose a computationally efficient algorithm performing clustering jointly with temporal alignment of the fold changes. The key features extracted through the latter are visualized using two types of network representations, for the purpose of facilitating biological interpretation. In the in vivo setting, the statistical challenge is to establish a predictive link between variables that, due to the specificities of the experimental design, can never be observed on the same animals. In the context of not having access to joint distributions, we leverage the additional information on the observed groups to infer the linear regression model. We propose two estimators of the regression parameters, one based on the method of moments and the other based on optimal transport, as well as the estimators for the confidence intervals based on the stratified bootstrap procedure
LOVINO, MARTA. "Algorithms for complex systems in the life sciences." Doctoral thesis, Politecnico di Torino, 2021. http://hdl.handle.net/11583/2910082.
Full textSerra, Angela. "Multi-view learning and data integration for omics data." Doctoral thesis, Universita degli studi di Salerno, 2017. http://hdl.handle.net/10556/2580.
Full textIn recent years, the advancement of high-throughput technologies, combined with the constant decrease of the data-storage costs, has led to the production of large amounts of data from different experiments that characterise the same entities of interest. This information may relate to specific aspects of a phenotypic entity (e.g. Gene expression), or can include the comprehensive and parallel measurement of multiple molecular events (e.g., DNA modifications, RNA transcription and protein translation) in the same samples. Exploiting such complex and rich data is needed in the frame of systems biology for building global models able to explain complex phenotypes. For example, theuseofgenome-widedataincancerresearch, fortheidentificationof groups of patients with similar molecular characteristics, has become a standard approach for applications in therapy-response, prognosis-prediction, and drugdevelopment.ÂăMoreover, the integration of gene expression data regarding cell treatment by drugs, and information regarding chemical structure of the drugs allowed scientist to perform more accurate drug repositioning tasks. Unfortunately, there is a big gap between the amount of information and the knowledge in which it is translated. Moreover, there is a huge need of computational methods able to integrate and analyse data to fill this gap. Current researches in this area are following two different integrative methods: one uses the complementary information of different measurements for the 7 i i “Template” — 2017/6/9 — 16:42 — page 8 — #8 i i i i i i study of complex phenotypes on the same samples (multi-view learning); the other tends to infer knowledge about the phenotype of interest by integrating and comparing the experiments relating to it with respect to those of different phenotypes already known through comparative methods (meta-analysis). Meta-analysis can be thought as an integrative study of previous results, usually performed aggregating the summary statistics from different studies. Due to its nature, meta-analysis usually involves homogeneous data. On the other hand, multi-view learning is a more flexible approach that considers the fusion of different data sources to get more stable and reliable estimates. Based on the type of data and the stage of integration, new methodologies have been developed spanning a landscape of techniques comprising graph theory, machine learning and statistics. Depending on the nature of the data and on the statistical problem to address, the integration of heterogeneous data can be performed at different levels: early, intermediate and late. Early integration consists in concatenating data from different views in a single feature space. Intermediate integration consists in transforming all the data sources in a common feature space before combining them. In the late integration methodologies, each view is analysed separately and the results are then combined. The purpose of this thesis is twofold: the former objective is the definition of a data integration methodology for patient sub-typing (MVDA) and the latter is the development of a tool for phenotypic characterisation of nanomaterials (INSIdEnano). In this PhD thesis, I present the methodologies and the results of my research. MVDA is a multi-view methodology that aims to discover new statistically relevant patient sub-classes. Identify patient subtypes of a specific diseases is a challenging task especially in the early diagnosis. This is a crucial point for the treatment, because not allthe patients affected bythe same diseasewill have the same prognosis or need the same drug treatment. This problem is usually solved by using transcriptomic data to identify groups of patients that share the same gene patterns. The main idea underlying this research work is that to combine more omics data for the same patients to obtain a better characterisation of their disease profile. The proposed methodology is a late integration approach i i “Template” — 2017/6/9 — 16:42 — page 9 — #9 i i i i i i based on clustering. It works by evaluating the patient clusters in each single view and then combining the clustering results of all the views by factorising the membership matrices in a late integration manner. The effectiveness and the performance of our method was evaluated on six multi-view cancer datasets related to breast cancer, glioblastoma, prostate and ovarian cancer. The omics data used for the experiment are gene and miRNA expression, RNASeq and miRNASeq, Protein Expression and Copy Number Variation. In all the cases, patient sub-classes with statistical significance were found, identifying novel sub-groups previously not emphasised in literature. The experiments were also conducted by using prior information, as a new view in the integration process, to obtain higher accuracy in patients’ classification. The method outperformed the single view clustering on all the datasets; moreover, it performs better when compared with other multi-view clustering algorithms and, unlike other existing methods, it can quantify the contribution of single views in the results. The method has also shown to be stable when perturbation is applied to the datasets by removing one patient at a time and evaluating the normalized mutual information between all the resulting clusterings. These observations suggest that integration of prior information with genomic features in sub-typing analysis is an effective strategy in identifying disease subgroups. INSIdE nano (Integrated Network of Systems bIology Effects of nanomaterials) is a novel tool for the systematic contextualisation of the effects of engineered nanomaterials (ENMs) in the biomedical context. In the recent years, omics technologies have been increasingly used to thoroughly characterise the ENMs molecular mode of action. It is possible to contextualise the molecular effects of different types of perturbations by comparing their patterns of alterations. While this approach has been successfully used for drug repositioning, it is still missing to date a comprehensive contextualisation of the ENM mode of action. The idea behind the tool is to use analytical strategies to contextualise or position the ENM with the respect to relevant phenotypes that have been studied in literature, (such as diseases, drug treatments, and other chemical exposures) by comparing their patterns of molecular alteration. This could greatly increase the knowledge on the ENM molecular effects and in turn i i “Template” — 2017/6/9 — 16:42 — page 10 — #10 i i i i i i contribute to the definition of relevant pathways of toxicity as well as help in predicting the potential involvement of ENM in pathogenetic events or in novel therapeutic strategies. The main hypothesis is that suggestive patterns of similarity between sets of phenotypes could be an indication of a biological association to be further tested in toxicological or therapeutic frames. Based on the expression signature, associated to each phenotype, the strength of similarity between each pair of perturbations has been evaluated and used to build a large network of phenotypes. To ensure the usability of INSIdE nano, a robust and scalable computational infrastructure has been developed, to scan this large phenotypic network and a web-based effective graphic user interface has been built. Particularly, INSIdE nano was scanned to search for clique sub-networks, quadruplet structures of heterogeneous nodes (a disease, a drug, a chemical and a nanomaterial) completely interconnected by strong patterns of similarity (or anti-similarity). The predictions have been evaluated for a set of known associations between diseases and drugs, based on drug indications in clinical practice, and between diseases and chemical, based on literature-based causal exposure evidence, and focused on the possible involvement of nanomaterials in the most robust cliques. The evaluation of INSIdE nano confirmed that it highlights known disease-drug and disease-chemical connections. Moreover, disease similarities agree with the information based on their clinical features, as well as drugs and chemicals, mirroring their resemblance based on the chemical structure. Altogether, the results suggest that INSIdE nano can also be successfully used to contextualise the molecular effects of ENMs and infer their connections to other better studied phenotypes, speeding up their safety assessment as well as opening new perspectives concerning their usefulness in biomedicine. [edited by author]
L’avanzamento tecnologico delle tecnologie high-throughput, combinato con il costante decremento dei costi di memorizzazione, ha portato alla produzione di grandi quantit`a di dati provenienti da diversi esperimenti che caratterizzano le stesse entit`a di interesse. Queste informazioni possono essere relative a specifici aspetti fenotipici (per esempio l’espressione genica), o possono includere misure globali e parallele di diversi aspetti molecolari (per esempio modifiche del DNA, trascrizione dell’RNA e traduzione delle proteine) negli stessi campioni. Analizzare tali dati complessi `e utile nel campo della systems biology per costruire modelli capaci di spiegare fenotipi complessi. Ad esempio, l’uso di dati genome-wide nella ricerca legata al cancro, per l’identificazione di gruppi di pazienti con caratteristiche molecolari simili, `e diventato un approccio standard per una prognosi precoce piu` accurata e per l’identificazione di terapie specifiche. Inoltre, l’integrazione di dati di espressione genica riguardanti il trattamento di cellule tramite farmaci ha permesso agli scienziati di ottenere accuratezze elevate per il drug repositioning. Purtroppo, esiste un grosso divario tra i dati prodotti, in seguito ai numerosi esperimenti, e l’informazione in cui essi sono tradotti. Quindi la comunit`a scientifica ha una forte necessit`a di metodi computazionali per poter integrare e analizzate tali dati per riempire questo divario. La ricerca nel campo delle analisi multi-view, segue due diversi metodi di analisi integrative: uno usa le informazioni complementari di diverse misure per studiare fenotipi complessi su diversi campioni (multi-view learning); l’altro tende ad inferire conoscenza sul fenotipo di interesse di una entit`a confrontando gli esperimenti ad essi relativi con quelli di altre entit`a fenotipiche gi`a note in letteratura (meta-analisi). La meta-analisi pu`o essere pensata come uno studio comparativo dei risultati identificati in un particolare esperimento, rispetto a quelli di studi precedenti. A causa della sua natura, la meta-analisi solitamente coinvolge dati omogenei. D’altra parte, il multi-view learning `e un approccio piu` flessibile che considera la fusione di diverse sorgenti di dati per ottenere stime piu` stabili e affidabili. In base al tipo di dati e al livello di integrazione, nuove metodologie sono state sviluppate a partire da tecniche basate sulla teoria dei grafi, machine learning e statistica. In base alla natura dei dati e al problema statistico da risolvere, l’integrazione di dati eterogenei pu`o essere effettuata a diversi livelli: early, intermediate e late integration. Le tecniche di early integration consistono nella concatenazione dei dati delle diverse viste in un unico spazio delle feature. Le tecniche di intermediate integration consistono nella trasformazione di tutte le sorgenti dati in un unico spazio comune prima di combinarle. Nelle tecniche di late integration, ogni vista `e analizzata separatamente e i risultati sono poi combinati. Lo scopo di questa tesi `e duplice: il primo obbiettivo `e la definizione di una metodologia di integrazione dati per la sotto-tipizzazione dei pazienti (MVDA) e il secondo `e lo sviluppo di un tool per la caratterizzazione fenotipica dei nanomateriali (INSIdEnano). In questa tesi di dottorato presento le metodologie e i risultati della mia ricerca. MVDA `e una tecnica multi-view con lo scopo di scoprire nuove sotto tipologie di pazienti statisticamente rilevanti. Identificare sottotipi di pazienti per una malattia specifica `e un obbiettivo con alto rilievo nella pratica clinica, soprattutto per la diagnosi precoce delle malattie. Questo problema `e generalmente risolto usando dati di trascrittomica per identificare i gruppi di pazienti che condividono gli stessi pattern di alterazione genica. L’idea principale alla base di questo lavoro di ricerca `e quello di combinare piu` tipologie di dati omici per gli stessi pazienti per ottenere una migliore caratterizzazione del loro profilo. La metodologia proposta `e un approccio di tipo late integration basato sul clustering. Per ogni vista viene effettuato il clustering dei pazienti rappresentato sotto forma di matrici di membership. I risultati di tutte le viste vengono poi combinati tramite una tecnica di fattorizzazione di matrici per ottenere i metacluster finali multi-view. La fattibilit`a e le performance del nostro metodo sono stati valutati su sei dataset multi-view relativi al tumore al seno, glioblastoma, cancro alla prostata e alle ovarie. I dati omici usati per gli esperimenti sono relativi alla espressione dei geni, espressione dei mirna, RNASeq, miRNASeq, espressione delle proteine e della Copy Number Variation. In tutti i dataset sono state identificate sotto-tipologie di pazienti con rilevanza statistica, identificando nuovi sottogruppi precedentemente non noti in letteratura. Ulteriori esperimenti sono stati condotti utilizzando la conoscenza a priori relativa alle macro classi dei pazienti. Tale informazione `e stata considerata come una ulteriore vista nel processo di integrazione per ottenere una accuratezza piu` elevata nella classificazione dei pazienti. Il metodo proposto ha performance migliori degli algoritmi di clustering clussici su tutti i dataset. MVDA ha ottenuto risultati migliori in confronto a altri algoritmi di integrazione di tipo ealry e intermediate integration. Inoltre il metodo `e in grado di calcolare il contributo di ogni singola vista al risultato finale. I risultati mostrano, anche, che il metodo `e stabile in caso di perturbazioni del dataset effettuate rimuovendo un paziente alla volta (leave-one-out). Queste osservazioni suggeriscono che l’integrazione di informazioni a priori e feature genomiche, da utilizzare congiuntamente durante l’analisi, `e una strategia vincente nell’identificazione di sotto-tipologie di malattie. INSIdE nano (Integrated Network of Systems bIology Effects of nanomaterials) `e un tool innovativo per la contestualizzazione sistematica degli effetti delle nanoparticelle (ENMs) in contesti biomedici. Negli ultimi anni, le tecnologie omiche sono state ampiamente applicate per caratterizzare i nanomateriali a livello molecolare. E’ possibile contestualizzare l’effetto a livello molecolare di diversi tipi di perturbazioni confrontando i loro pattern di alterazione genica. Mentre tale approccio `e stato applicato con successo nel campo del drug repositioning, una contestualizzazione estensiva dell’effetto dei nanomateriali sulle cellule `e attualmente mancante. L’idea alla base del tool `e quello di usare strategie comparative di analisi per contestualizzare o posizionare i nanomateriali in confronto a fenotipi rilevanti che sono stati studiati in letteratura (come ad esempio malattie dell’uomo, trattamenti farmacologici o esposizioni a sostanze chimiche) confrontando i loro pattern di alterazione molecolare. Questo potrebbe incrementare la conoscenza dell’effetto molecolare dei nanomateriali e contribuire alla definizione di nuovi pathway tossicologici oppure identificare eventuali coinvolgimenti dei nanomateriali in eventi patologici o in nuove strategie terapeutiche. L’ipotesi alla base `e che l’identificazione di pattern di similarit`a tra insiemi di fenotipi potrebbe essere una indicazione di una associazione biologica che deve essere successivamente testata in ambito tossicologico o terapeutico. Basandosi sulla firma di espressione genica, associata ad ogni fenotipo, la similarit`a tra ogni coppia di perturbazioni `e stata valuta e usata per costruire una grande network di interazione tra fenotipi. Per assicurare l’utilizzo di INSIdE nano, `e stata sviluppata una infrastruttura computazionale robusta e scalabile, allo scopo di analizzare tale network. Inoltre `e stato realizzato un sito web che permettesse agli utenti di interrogare e visualizzare la network in modo semplice ed efficiente. In particolare, INSIdE nano `e stato analizzato cercando tutte le possibili clique di quattro elementi eterogenei (un nanomateriale, un farmaco, una malattia e una sostanza chimica). Una clique `e una sotto network completamente connessa, dove ogni elemento `e collegato con tutti gli altri. Di tutte le clique, sono state considerate come significative solo quelle per le quali le associazioni tra farmaco e malattia e farmaco e sostanze chimiche sono note. Le connessioni note tra farmaci e malattie si basano sul fatto che il farmaco `e prescritto per curare tale malattia. Le connessioni note tra malattia e sostanze chimiche si basano su evidenze presenti in letteratura del fatto che tali sostanze causano la malattia. Il focus `e stato posto sul possibile coinvolgimento dei nanomateriali con le malattie presenti in tali clique. La valutazione di INSIdE nano ha confermato che esso mette in evidenza connessioni note tra malattie e farmaci e tra malattie e sostanze chimiche. Inoltre la similarit`a tra le malattie calcolata in base ai geni `e conforme alle informazioni basate sulle loro informazioni cliniche. Allo stesso modo le similarit`a tra farmaci e sostanze chimiche rispecchiano le loro similarit`a basate sulla struttura chimica. Nell’insieme, i risultati suggeriscono che INSIdE nano pu`o essere usato per contestualizzare l’effetto molecolare dei nanomateriali e inferirne le connessioni rispetto a fenotipi precedentemente studiati in letteratura. Questo metodo permette di velocizzare il processo di valutazione della loro tossicit`a e apre nuove prospettive per il loro utilizzo nella biomedicina. [a cura dell'autore]
XV n.s.
Nonell, Mazelon Lara 1972. "New approaches in omics data modelling." Doctoral thesis, Universitat Pompeu Fabra, 2019. http://hdl.handle.net/10803/668053.
Full textL’avenç en el camp tecnològic ens ha permès obtenir grans quantitats de les anomenades dades òmiques. L’anàlisi i integració d’aquesta mena de dades mitjançant mètodes estadístics i bioinformàtics avançats ha de permetre la millora en el maneig de les malalties. La diversitat i complexitat de les dades òmiques ha incentivat el desenvolupament de centenars de nous mètodes estadístics per a complir amb aquest objectiu. Per tant, és primordial disposar de mètodes que acomodin les distribucions adequades i modelin estructures de dades complexes. Davant d’això, aquesta tesi presenta avenços en tres direccions. En primer lloc, l’estudi de diferents mètodes per a analitzar associacions no lineals, molt rellevant en estudis d’associació entre exposicions mediambientals (i.e. exposoma) i malalties complexes. Aquesta anàlisi va acompanyada del desenvolupament del paquet de R nlOmicAssoc. En segon lloc, es proposa utilitzar la distribució simplex per analitzar dades metilòmiques, donat que aquesta distribució ajusta els valors beta generats en aquesta mena d’estudis. També es formula l’extensió a models lineals generalitzats amb resposta simplex. I per últim, el paquet de R HOmics, que incorpora coneixement biològic als estudis d’associació mitjançant models Bayesians jeràrquics. També implementa mètodes per modelar la dependència entre dades òmiques, permetent la integració de dades
Wang, Zhi. "Module-Based Analysis for "Omics" Data." Thesis, North Carolina State University, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3690212.
Full textThis thesis focuses on methodologies and applications of module-based analysis (MBA) in omics studies to investigate the relationships of phenotypes and biomarkers, e.g., SNPs, genes, and metabolites. As an alternative to traditional single–biomarker approaches, MBA may increase the detectability and reproducibility of results because biomarkers tend to have moderate individual effects but significant aggregate effect; it may improve the interpretability of findings and facilitate the construction of follow-up biological hypotheses because MBA assesses biomarker effects in a functional context, e.g., pathways and biological processes. Finally, for exploratory “omics” studies, which usually begin with a full scan of a long list of candidate biomarkers, MBA provides a natural way to reduce the total number of tests, and hence relax the multiple-testing burdens and improve power.
The first MBA project focuses on genetic association analysis that assesses the main and interaction effects for sets of genetic (G) and environmental (E) factors rather than for individual factors. We develop a kernel machine regression approach to evaluate the complete effect profile (i.e., the G, E, and G-by-E interaction effects separately or in combination) and construct a kernel function for the Gene-Environmental (GE) interaction directly from the genetic kernel and the environmental kernel. We use simulation studies and real data applications to show improved performance of the Kernel Machine (KM) regression method over the commonly adapted PC regression methods across a wide range of scenarios. The largest gain in power occurs when the underlying effect structure is involved complex GE interactions, suggesting that the proposed method could be a useful and powerful tool for performing exploratory or confirmatory analyses in GxE-GWAS.
In the second MBA project, we extend the kernel machine framework developed in the first project to model biomarkers with network structure. Network summarizes the functional interplay among biological units; incorporating network information can more precisely model the biological effects, enhance the ability to detect true signals, and facilitate our understanding of the underlying biological mechanisms. In the work, we develop two kernel functions to capture different network structure information. Through simulations and metabolomics study, we show that the proposed network-based methods can have markedly improved power over the approaches ignoring network information.
Metabolites are the end products of cellular processes and reflect the ultimate responses of biology system to genetic variations or environment exposures. Because of the unique properties of metabolites, pharmcometabolomics aims to understand the underlying signatures that contribute to individual variations in drug responses and identify biomarkers that can be helpful to response predictions. To facilitate mining pharmcometabolomic data, we establish an MBA pipeline that has great practical value in detection and interpretation of signatures, which may potentially indicate a functional basis for the drug response. We illustrate the utilities of the pipeline by investigating two scientific questions in aspirin study: (1) which metabolites changes can be attributed to aspirin intake, and (2) what are the metabolic signatures that can be helpful in predicting aspirin resistance. Results show that the MBA pipeline enables us to identify metabolic signatures that are not found in preliminary single-metabolites analysis.
Müller, Nikola. "Finding correlations and independences in omics data." Diss., lmu, 2012. http://nbn-resolving.de/urn:nbn:de:bvb:19-144027.
Full textCicek, A. Ercument. "METABOLIC NETWORK-BASED ANALYSES OF OMICS DATA." Case Western Reserve University School of Graduate Studies / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=case1372866879.
Full textSathyanarayanan, Anita. "Integration of multi-omics data in cancer." Thesis, Queensland University of Technology, 2021. https://eprints.qut.edu.au/225924/1/Anita_Sathyanarayanan_Thesis.pdf.
Full textBersanelli, Matteo <1987>. "Mathematical Physics Techniques for Omics Data Integration." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amsdottorato.unibo.it/7812/1/Bersanelli_Matteo_tesi.pdf.
Full textWack, Maxime. "Dimension longitudinale du suivi omique dans les entrepôts de données cliniques : application aux cancers suivis par biopsie liquide." Electronic Thesis or Diss., Université Paris Cité, 2024. http://www.theses.fr/2024UNIP5258.
Full textA novel technique in viral genomics enables capturing and sequencing HPV (Human Papilloma Virus) DNA present in patients with lesions associated with HPV. The integration of genomic data with information present in Clinical Data Warehouses (CDWs) opens new avenues in translational research in HPV-induced cancers. However, genomic data that are necessary are not available in CDWs, but usually in dedicated tools which strongly constraint such studies. We propose viroCapt, a bioinformatics pipeline automating the analysis of HPV capture data, enabling the characterization of patients with HPV-induced cancers. Using viroCapt in translational research highlighted the need to integrate longitudinal genomic data in CDWs, particularly in the case of ctDNA monitoring in cancer follow-up. This led us to consider the limit of CDW in handling large files and longitudinal relationships.For this reason, we designed gitOmmix, a method combining file versioning systems and formal provenance knowledge representation to address longitudinal data integration. We show that viroCapt supports HPV-induced cancer follow-up, and generalizes to other virus-induced cancers. We designed and implemented a model enabling the longitudinal collection of omic data in CDWs, supported by robust tools and standards. gitOmmix generalizes to other large biomedical data, is agnostic from any CDW system, and supports adherence to FAIR principles by adding provenance and versioned data access. Our contribution helped characterize virus-induced cancers, and exposed new challenges in translation research. This motivated designing a general method to handle provenance and longitudinal management of high-throughput data in CDWs
MASPERO, DAVIDE. "Computational strategies to dissect the heterogeneity of multicellular systems via multiscale modelling and omics data analysis." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2022. http://hdl.handle.net/10281/368331.
Full textHeterogeneity pervades biological systems and manifests itself in the structural and functional differences observed both among different individuals of the same group (e.g., organisms or disease systems) and among the constituent elements of a single individual (e.g., cells). The study of the heterogeneity of biological systems and, in particular, of multicellular systems is fundamental for the mechanistic understanding of complex physiological and pathological phenomena (e.g., cancer), as well as for the definition of effective prognostic, diagnostic, and therapeutic strategies. This work focuses on developing and applying computational methods and mathematical models for characterising the heterogeneity of multicellular systems and, especially, cancer cell subpopulations underlying the evolution of neoplastic pathology. Similar methodologies have been developed to characterise viral evolution and heterogeneity effectively. The research is divided into two complementary portions, the first aimed at defining methods for the analysis and integration of omics data generated by sequencing experiments, the second at modelling and multiscale simulation of multicellular systems. Regarding the first strand, next-generation sequencing technologies allow us to generate vast amounts of omics data, for example, related to the genome or transcriptome of a given individual, through bulk or single-cell sequencing experiments. One of the main challenges in computer science is to define computational methods to extract useful information from such data, taking into account the high levels of data-specific errors, mainly due to technological limitations. In particular, in the context of this work, we focused on developing methods for the analysis of gene expression and genomic mutation data. In detail, an exhaustive comparison of machine-learning methods for denoising and imputation of single-cell RNA-sequencing data has been performed. Moreover, methods for mapping expression profiles onto metabolic networks have been developed through an innovative framework that has allowed one to stratify cancer patients according to their metabolism. A subsequent extension of the method allowed us to analyse the distribution of metabolic fluxes within a population of cells via a flux balance analysis approach. Regarding the analysis of mutational profiles, the first method for reconstructing phylogenomic models from longitudinal data at single-cell resolution has been designed and implemented, exploiting a framework that combines a Markov Chain Monte Carlo with a novel weighted likelihood function. Similarly, a framework that exploits low-frequency mutation profiles to reconstruct robust phylogenies and likely chains of infection has been developed by analysing sequencing data from viral samples. The same mutational profiles also allow us to deconvolve the signal in the signatures associated with specific molecular mechanisms that generate such mutations through an approach based on non-negative matrix factorisation. The research conducted with regard to the computational simulation has led to the development of a multiscale model, in which the simulation of cell population dynamics, represented through a Cellular Potts Model, is coupled to the optimisation of a metabolic model associated with each synthetic cell. Using this model, it is possible to represent assumptions in mathematical terms and observe properties emerging from these assumptions. Finally, we present a first attempt to combine the two methodological approaches which led to the integration of single-cell RNA-seq data within the multiscale model, allowing data-driven hypotheses to be formulated on the emerging properties of the system.
Zheng, Ning. "Mediation modeling and analysis forhigh-throughput omics data." Thesis, Uppsala universitet, Statistiska institutionen, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-256318.
Full textAyati, Marzieh. "Algorithms to Integrate Omics Data for Personalized Medicine." Case Western Reserve University School of Graduate Studies / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=case1527679638507616.
Full textCampanella, Gianluca. "Statistical analysis of '-omics' data : developments and applications." Thesis, Imperial College London, 2015. http://hdl.handle.net/10044/1/32109.
Full textBudimir, Iva <1992>. "Stochastic Modeling and Correlation Analysis of Omics Data." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amsdottorato.unibo.it/9792/1/Budimir_Iva_tesi.pdf.
Full textZandonà, Alessandro. "Predictive networks for multi meta-omics data integration." Doctoral thesis, Università degli studi di Trento, 2017. https://hdl.handle.net/11572/367893.
Full textZandonà, Alessandro. "Predictive networks for multi meta-omics data integration." Doctoral thesis, University of Trento, 2017. http://eprints-phd.biblio.unitn.it/2547/1/zandona2017_phdthesis.pdf.
Full textBussoli, Ilaria. "Heterogeneous Graphical Models with Applications to Omics Data." Doctoral thesis, Università degli studi di Padova, 2019. http://hdl.handle.net/11577/3423293.
Full textKim, Jieun. "Computational tools for the integrative analysis of muti-omics data to decipher trans-omics networks." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/28524.
Full textErten, Mehmet Sinan. "Algorithms for discovering disease genes by integrating 'omics data." Case Western Reserve University School of Graduate Studies / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=case1343769483.
Full textDing, Hao. "Visualization and Integrative analysis of cancer multi-omics data." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1467843712.
Full textNikolayeva, Iryna. "Network and machine learning approaches to dengue omics data." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCB032/document.
Full textThe last 20 years have seen the emergence of powerful measurement technologies, enabling omics analysis of diverse diseases. They often provide non-invasive means to study the etiology of newly emerging complex diseases, such as the mosquito-borne infectious dengue disease. My dissertation concentrates on adapting and applying network and machine learning approaches to genomic and transcriptomic data. The first part goes beyond a previously published genome-wide analysis of 4,026 individuals by applying network analysis to find groups of interacting genes in a gene functional interaction network that, taken together, are associated to severe dengue. In this part, I first recalculated association p-values of sequences polymorphisms, then worked on mapping polymorphisms to functionally related genes, and finally explored different pathway and gene interaction databases to find groups of genes together associated to severe dengue. The second part of my dissertation unveils a theoretical approach to study a size bias of active network search algorithms. My theoretical analysis suggests that the best score of subnetworks of a given size should be size-normalized, based on the hypothesis that it is a sample of an extreme value distribution, and not a sample of the normal distribution, as usually assumed in the literature. I then suggest a theoretical solution to this bias. The third part introduces a new subnetwork search tool that I co-designed. Its underlying model and the corresponding efficient algorithm avoid size bias found in existing methods, and generates easily comprehensible results. I present an application to transcriptomic dengue data. In the fourth and last part, I describe the identification of a biomarker that detects dengue severity outcome upon arrival at the hospital using a novel machine learning approach. This approach combines two-dimensional monotonic regression with feature selection. The underlying model goes beyond the commonly used linear approaches, while allowing controlling the number of transcripts in the biomarker. The small number of transcripts along with its visual representation maximize the understanding and the interpretability of the biomarker by biomedical professionals. I present an 18-gene biomarker that allows distinguishing severe dengue patients from non-severe ones upon arrival at the hospital with a unique biomarker of high and robust predictive performance. The predictive performance of the biomarker has been confirmed on two datasets that both used different transcriptomic technologies and different blood cell subtypes
Jagtap, Surabhi. "Multilayer Graph Embeddings for Omics Data Integration in Bioinformatics." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPAST014.
Full textBiological systems are composed of interacting bio-molecules at different molecular levels. With the advent of high-throughput technologies, omics data at their respective molecular level can be easily obtained. These huge, complex multi-omics data can be useful to provide insights into the flow of information at multiple levels, unraveling the mechanisms underlying the biological condition of interest. Integration of different omics data types is often expected to elucidate potential causative changes that lead to specific phenotypes, or targeted treatments. With the recent advances in network science, we choose to handle this integration issue by representing omics data through networks. In this thesis, we have developed three models, namely BraneExp, BraneNet, and BraneMF, for learning node embeddings from multilayer biological networks generated with omics data. We aim to tackle various challenging problems arising in multi-omics data integration, developing expressive and scalable methods capable of leveraging rich structural semantics of realworld networks
PATRIZI, SARA. "Multi-omics approaches to complex diseases in children." Doctoral thesis, Università degli Studi di Trieste, 2022. http://hdl.handle.net/11368/3015193.
Full text“-Omic” technologies can detect the entirety of the molecules in the biological sample of interest, in a non-targeted and non-biased fashion. The integration of multiple types of omics data, known as “multi-omics” or “vertical omics”, can provide a better understanding of how the cause of disease leads to its functional consequences, which is particularly valuable in the study of complex diseases, that are caused by the interaction of multiple genetic and regulatory factors with contributions from the environment. In the present work appropriate multi-omics approaches are applied to two complex conditions that usually first manifest in childhood, have rising incidence and gaps in the knowledge of their molecular pathology, specifically Congenital Lung Malformations and Coeliac Disease. The aims are, respectively, to verify if cancer-associated genomic variants or DNA methylation features exist in the malformed lung tissue and to find common alterations in the methylome and the transcriptome of small intestine epithelial cells of children with CD. The methods used in the Congenital Lung Malformations project are Whole Genome Methylation microarrays and Whole Genome Sequencing, and for the Coeliac Disease the whole genome methylation microarrays and mRNA sequencing. Differentially methylated regions in possibly cancer-related genes were found in each one of the 20 lung malformation samples included. Moreover, 5 malformed samples had at least one somatic missense single nucleotide variant in genes known as lung cancer drivers, and 5 malformed samples had a total of 2 deletions of lung cancer driver tumour suppressor and 10 amplifications of lung cancer driver oncogenes. The data showed that congenital lung malformations can have premalignant genetic and epigenetic features, that are impossible to predict with clinical information only. In the second project, Principal Component Analysis of the whole genome methylation data showed that CD patients divide into two clusters, one of which overlaps with controls. 174 genes were differentially methylated compared to the controls in both clusters. Principal Component Analysis of gene expression data (mRNA-Seq) showed a distribution that is similar to the methylation data, and 442 genes were differentially expressed in both clusters. Six genes, mainly related to interferon response and antigen processing and presentation, were differentially expressed and methylated in both clusters. These results show that the intestinal epithelial cells of individuals with CD are highly variable from a molecular point of view, but they share some fundamental differences that make them able to respond to interferons, process, and present antigens more efficiently than controls. Despite the limitations of the present studies, they have shown that targeted multi-omics approaches can be set up to answer the relevant disease-specific questions by investigating many cellular functions at once, often generating new hypotheses and making unexpected discoveries in the process.
Tellaroli, Paola. "Three topics in omics research." Doctoral thesis, Università degli studi di Padova, 2015. http://hdl.handle.net/11577/3423912.
Full textIl titolo piuttosto generico di questa tesi è dovuto al fatto che sono stati indagati diversi aspetti di fenomeni biologici. La maggior parte di questo lavoro è stato rivolto alla ricerca dei limiti di uno degli strumenti essenziali per l'analisi di dati di espressione genica: l'analisi dei gruppi. Esistendo diverse centinaia di metodi di raggruppamento, chiaramente non c'è carenza di algoritmi di analisi dei gruppi, ma, allo stesso tempo, alcuni quesiti fondamentali non hanno ancora ricevuto risposte soddisfacenti. In particolare, presentiamo un nuovo algoritmo di analisi dei gruppi per dati statici ed una nuova strategia per il raggruppamento di dati temporali di breve lunghezza. Infine, abbiamo analizzato dati provenienti da una tecnologia relativamente nuova, chiamata Cap Analysis Gene Expression, utile per l'analisi dei promotori su tutto il genoma e ancora in gran parte inesplorata.
Lu, Yingzhou. "Multi-omics Data Integration for Identifying Disease Specific Biological Pathways." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/83467.
Full textMaster of Science
Zampieri, Guido. "Prioritisation of candidate disease genes via multi-omics data integration." Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3421826.
Full textLa scoperta dei geni legati alle malattie nell'uomo è una sfida pressante in biologia molecolare, in vista del pieno raggiungimento della medicina di precisione. Le tecnologie di nuova generazione forniscono una quantità di informazioni biologiche senza precedenti, ma allo stesso tempo rivelano numeri enormi di geni malattia candidati e pongono nuove sfide a molteplici livelli di analisi. L'integrazione di dati multi-omici è attualmente la strategia principale per prioritizzare geni malattia candidati. In particolare, i metodi basati su kernel sono una potente risorsa per l'integrazione della conoscenza biologica, tuttavia il loro utilizzo è spesso precluso dalla loro limitata scalabilità. In questa tesi, proponiamo un nuovo metodo kernel scalabile per la prioritizzazione di geni, che applica un nuovo approccio di multiple kernel learning basato su una prospettiva semi-supervisionata e sull'ottimizzazione della distribuzione dei margini in problemi binari. Il nostro metodo è ottimizzato per fare fronte a condizioni fortemente sbilanciate in cui si disponga di pochi geni malattia noti e siano richieste predizioni su larga scala. Significativamente, è capace di gestire sia un gran numero di candidati sia un numero arbitrario di sorgenti di informazione. Attraverso la simulazione di casi studio reali, mostriamo che il nostro metodo supera in prestazioni un'ampia gamma di metodi allo stato dell'arte ed è dotato di migliore scalabilità rispetto a metodi kernel esistenti per dati genomici. Applichiamo il metodo proposto per studiare il potenziale ruolo per la predizione di geni malattia dei riarrangiamenti metabolici causati da perturbazioni genetiche. A questo scopo, utilizziamo modelli del metabolismo basati su vincoli per generare informazione sui geni a scala genomica, che viene analizzata tramite apprendimento automatico. Inoltre, compariamo modelli basati su vincoli ed il nostro metodo basato su kernel come strategie di integrazione alternative per dati omici come profili trascrizionali. Valutazioni sperimentali su vari cancri dimostrano come i riarrangiamenti metabolici ricostruiti in silico possano essere utili per prioritizzare i geni associati, nonostante l'accuratezza dipenda fortemente dalla tipologia di cancro. Malgrado queste fluttuazioni, le predizioni basate su modelli metabolici sono largamente complentari a quelle basate su espressione genica o annotazioni di pathway, evidenziando il potenziale di questo approccio per identificare nuovi geni implicati nel cancro.
Mönchgesang, Susann [Verfasser]. "Metabolomics and biochemical omics data - integrative approaches : [kumulative Dissertation] / Susann Mönchgesang." Halle, 2017. http://d-nb.info/1131075994/34.
Full textKonrad, Attila. "Investigation of Pathway Analysis Tools for mapping omics data to pathways." Thesis, Malmö högskola, Fakulteten för teknik och samhälle (TS), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20843.
Full textThis thesis examines PATs from a multidisciplinary view. There are a lot of PAT's existing today analyzing specific type of omics data, therefore we investigate them and what they can do. By defining some specific requirements such as how many omics data types it can handle, the accuracy of the PAT can be obtained to get the most suitable PAT when it comes to mapping omics data to pathways. Results show that no PATs found today fulfills the specific set of requirements or the main goal though software testing. The Ingenuity PAT is the closest to fulfill the requirements. Requested by the end user, two PATs are tested in combination to see if these can fulfill the requirements of the end user. Uniprot batch converter was tested with FEvER and results did not turn out successfully since the combination of the two PATs is no better than the Ingenuity PAT. Focus then turned to an alternative combination, a homepage called NCBI that have search engines connected to several free PATs available thus fulfilling the requirements. Through the search engine “omics” data can be combined and more than one input can be taken at a time. Since technology is rapidly moving forward, the need for new tools for data interpretation also grows. It means that in a near future we may be able to find a PAT that fulfills the requirements of the end users.
Castleberry, Alissa. "Integrated Analysis of Multi-Omics Data Using Sparse Canonical Correlation Analysis." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu15544898045976.
Full textStrbenac, Dario. "Novel Preprocessing Approaches for Omics Data Types and Their Performance Evaluation." Thesis, The University of Sydney, 2016. http://hdl.handle.net/2123/16007.
Full textPestarino, Luca <1992>. "Challenges and Opportunities of Machine Learning for Clinical and Omics Data." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amsdottorato.unibo.it/10091/1/PhD_Thesis_Pestarino_Luca.pdf.
Full textSalviato, Elisa. "Computational methods for the discovery of molecular signatures from Omics Data." Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3421961.
Full textI biomarcatori molecolari, ottenuti attraverso l'utilizzo di piattaforme high-throughput sequencing, costituiscono le basi della medicina personalizzata di nuova generazione. Nonostante un decennio di sforzi e di investimenti, il numero di biomarcatori validi a livello clinico rimane modesto. La natura di "big-data" dei dati omici infatti ha introdotto nuove sfide che richiedono un miglioramento sia degli strumenti di analisi che di quelli di esplorazione dei risultati. In questa tesi vengono proposti due temi centrali, entrambi volti al miglioramento delle metodologie statistiche e computazionali nell'ambito dell'individuazione di firme molecolari. Il primo lavoro si sviluppa attorno all'identificazione di miRNA su siero in pazienti affetti da carcinoma ovarico impiegabili a livello diagnostico. In particolare si propongono delle linee guida per il processo di analisi e una normalizzazione ad-hoc per dati di microarray da utilizzarsi nel contesto di molecole circolanti. Nel secondo lavoro si presenta un nuovo approccio basato sui modelli grafici Gaussiani per l'identificazione di firme molecolari funzionali. Il metodo proposto è in grado di esplorare le informazioni contenute nei pathway biologici e di evidenziare la potenziale origine del comportamento differenziale tra due condizioni sperimentali.
Boyd, Joseph. "BioBridge: Bringing Data Exploration to Biologists." Digital WPI, 2014. https://digitalcommons.wpi.edu/etd-theses/1186.
Full textGadaleta, Emanuela. "A multidisciplinary computational approach to model cancer-omics data : organising, integrating and mining multiple sources of data." Thesis, Queen Mary, University of London, 2015. http://qmro.qmul.ac.uk/xmlui/handle/123456789/8141.
Full textEichner, Johannes [Verfasser]. "Machine learning and statistical methods for preclinical omics data analysis / Johannes Eichner." München : Verlag Dr. Hut, 2015. http://d-nb.info/1079768874/34.
Full textWrzodek, Clemens [Verfasser]. "Inference and integration of biochemical networks with multilayered omics data / Clemens Wrzodek." München : Verlag Dr. Hut, 2013. http://d-nb.info/1042307652/34.
Full textMüller, Nikola [Verfasser], and Christian [Akademischer Betreuer] Böhm. "Finding correlations and independences in omics data / Nikola Müller. Betreuer: Christian Böhm." München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2012. http://d-nb.info/1023435594/34.
Full textBarcelona, Cabeza Rosa. "Genomics tools in the cloud: the new frontier in omics data analysis." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/672757.
Full textLos avances tecnológicos en la secuenciación de próxima generación (NGS) han revolucionado el campo de la genómica. El aumento de velocidad y rendimiento de las tecnologías NGS de los últimos años junto con la reducción de su coste ha permitido interrogar base por base el genoma humano de una manera eficiente y asequible. Todos estos avances han permitido incrementar el uso de las tecnologías NGS en la práctica clínica para la identificación de variaciones genómicas y su relación con determinadas enfermedades. Sin embargo, sigue siendo necesario mejorar la accesibilidad, el procesamiento y la interpretación de los datos debido a la enorme cantidad de datos generados y a la gran cantidad de herramientas disponibles para procesarlos. Además de la gran cantidad de algoritmos disponibles para el descubrimiento de variantes, cada tipo de variación y de datos requiere un algoritmo específico. Por ello, se requiere una sólida formación en bioinformática tanto para poder seleccionar el algoritmo más adecuado como para ser capaz de ejecutarlo correctamente. Partiendo de esa base, el objetivo de este proyecto es facilitar el procesamiento de datos de secuenciación para la identificación e interpretación de variantes para los no bioinformáticos. Todo ello mediante la creación de flujos de trabajo de alto rendimiento y con una sólida base científica, sin dejar de ser accesibles y fáciles de utilizar, así como de una plataforma sencilla y muy intuitiva para la interpretación de datos. Se ha realizado una exhaustiva revisión bibliográfica donde se han seleccionado los mejores algoritmos con los que crear flujos de trabajo automáticos para el descubrimiento de variantes cortas germinales (SNPs e indels) y variantes estructurales germinales (SV), incluyendo tanto CNV como reordenamientos cromosómicos, de ADN humano moderno. Además de crear flujos de trabajo para el descubrimiento de variantes, se ha implementado un flujo para la optimización in silico de la detección de CNV a partir de datos de WES y TS (isoCNV). Se ha demostrado que dicha optimización aumenta la sensibilidad de detección utilizando solo datos NGS, lo que es especialmente importante para el diagnóstico clínico. Además, se ha desarrollado un flujo de trabajo para el descubrimiento de variantes mediante la integración de datos de WES y RNA-seq (varRED) que ha demostrado aumentar el número de variantes detectadas sobre las identificadas cuando solo se utilizan datos de WES. Es importante señalar que la identificación de variantes no solo es importante para las poblaciones modernas, el estudio de las variaciones en genomas antiguos es esencial para comprender la evolución humana. Por ello, se ha implementado un flujo de trabajo para la identificación de variantes cortas a partir de muestras antiguas de WGS. Dicho flujo se ha aplicado a una mandíbula humana datada entre el 16980-16510 a.C. Las variantes ancestrales allí descubiertas se informaron sin mayor interpretación debido a la baja cobertura de la muestra. Finalmente, se ha implementado GINO para facilitar la interpretación de las variantes identificadas por los flujos de trabajo desarrollados en esta tesis. GINO es una plataforma fácil de usar para la visualización e interpretación de variantes germinales que requiere licencia de uso. Con el desarrollo de esta tesis se ha conseguido implementar las herramientas necesarias para la identificación de alto rendimiento de todos los tipos de variantes germinales, así como de una poderosa plataforma para visualizar dichas variantes de forma sencilla y rápida. El uso de esta plataforma permite a los no bioinformáticos centrarse en interpretar los resultados sin tener que preocuparse por el procesamiento de los datos con la garantía de que estos sean científicamente robustos. Además, ha sentado las bases para en un futuro próximo implementar una plataforma para el completo análisis y visualización de datos genómicos
Bioinformática
Wolf, Beat [Verfasser], and Thomas [Gutachter] Dandekar. "Reducing the complexity of OMICS data analysis / Beat Wolf ; Gutachter: Thomas Dandekar." Würzburg : Universität Würzburg, 2017. http://d-nb.info/1142114295/34.
Full textSala, Claudia <1987>. "Stochastic Modeling and Statistical Properties of Biological Systems Inferred from Omics Data." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amsdottorato.unibo.it/7810/1/sala_claudia_tesi.pdf.
Full text