To see the other types of publications on this topic, follow the link: Bioinformatics server.

Dissertations / Theses on the topic 'Bioinformatics server'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 22 dissertations / theses for your research on the topic 'Bioinformatics server.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

ZHANG, JIAOJIAO. "The study of malignant melanoma treatment on various platforms." Doctoral thesis, Università Politecnica delle Marche, 2021. http://hdl.handle.net/11566/291029.

Full text
Abstract:
Il cancro rappresenta una delle principali cause di morte nel mondo, ma causa anche un enorme dispendio di risorse mediche. Sebbene negli ultimi anni siano stati sviluppati trattamenti innovativi come terapia genica, immunoterapia e trattamenti classici (cioè chemioterapia, radioterapia e rimozione chirurgica), ad oggi non esiste un trattamento efficace per curare questa patologia, anche perché i pazienti sviluppano spesso chemioresistenza, radioresistenza e alcuni resistono anche alla terapia genica e all'immunoterapia. La pelle è l’organo più grande del corpo umano. Il carcinoma a cellule basali, il carcinoma a cellule squamose e il melanoma sono i tre tumori della pelle più comuni. Il melanoma si sviluppa originariamente dai melanociti (cellule contenenti pigmento) e la sua incidenza è aumentata notevolmente negli ultimi 30 anni con una bassa sopravvivenza a cinque anni e un basso tasso di prognosi. Gli attuali approcci terapeutici del melanoma non solo potrebbero essere inefficaci, ma anche avere gravi effetti collaterali come la vitiligine. Nell'ambito dello studio del melanoma, l'attuale tesi ha studiato la soppressione del melanoma su diverse piattaforme. In primo luogo, il resveratrolo è stato utilizzato per identificare i potenziali biomarcatori del melanoma. In secondo luogo, attraverso studi precedenti e poi mediante studi in silico, alcuni biomarcatori sono stati ulteriormente valutati. Quindi sono stati studiati modelli di colture cellulari di melanoma in vitro. Abbiamo dimostrato che le cellule del melanoma erano inibite dall'interazione proteina-proteina. In terzo luogo, dopo analisi LC-MS/MS, il database delle proteine è stato utilizzato per analizzare e annotare le funzioni dei potenziali biomarcatori. Quarto, è stato riportato un raro caso di melanoma maligno amelanotico (AMM) che amplia la comprensione e integra i fonotipi del melanoma. Il resveratrolo (RSV) è un tipo di fitoalessina ampiamente distribuita nella dieta mediterranea che potrebbe agire anche da soppressore del tumore. Abbiamo valutato gli effetti dell'RSV sulle cellule di melanoma (A375) e abbiamo scoperto che l'RSV potrebbe inibire la proliferazione delle cellule di melanoma modulando il ciclo cellulare e innescando l'apoptosi; in particolare, l’espressione della Cyclin D1 e PCDH9 sono stati fortemente influenzati dalla durata del trattamento dell'RSV, mentre l’espressione di Rac1 non è stata affatto alterata. Abbiamo ulteriormente esplorato il meccanismo di questi geni mirati del melanoma. Studi in silico hanno mostrato che il PCDH9 potrebbe rappresentare il nuovo biomarcatore del melanoma. Pertanto, l'alterazione dell'espressione di PCDH9 (sovraespressione e interferenza) è stata valutata per esplorare gli effetti di PCDH9 sul melanoma. La comune metalloproteinasi della matrice (MMP) è responsabile della degradazione della matrice extracellulare. È stato dimostrato che MMP2, tra l'insieme di enzimi MMP, gioca un ruolo importante nella migrazione cellulare. I risultati della qRT-PCR hanno mostrato che PCDH9 potrebbe sopprimere le cellule di melanoma influenzando MMP2, CCND1 (Cyclin D1) e RAC1. Il melanoma e i tessuti sani sono stati analizzati analogicamente per dimostrare l'inibizione delle cellule di melanoma da parte del PCDH9. I metodi di Co-IP e LC-MS/MS sono stati utilizzati anche per indagare in profondità la correlazione tra PCDH9 e i suoi effetti di soppressione del melanoma. Abbiamo scoperto che PCDH9 e RAC1 possono predire la prognosi del melanoma maligno e abbiamo ipotizzato che PCDH9 possa modulare la progressione del melanoma attraverso MMP2 e RAC1 riducendo la generazione di ROS dipendente da RAC1 e migliorando il complesso di attività dell'ossidasi NADPH. Abbiamo anche riportato il raro caso di AMM diagnosticato per la prima volta come carcinoma cutaneo a cellule squamose. Questo caso mostra i vari fenotipi del melanoma.<br>With the medical improvement, the life expectancy has rocketed globally. Cancer as the main Noncommunicable disease (NCDs) is the major barrier to extend longevity, causing also a huge medical resource expense. Although innovative treatments as gene therapy, immunotherapy and classical treatments (i.e. chemotherapy, radiotherapy and surgical removal), there is no effective treatment to cure cancer, even because patients usually develop chemoresistance, radioresistance and some resist to gene therapy and immunotherapy as well. Skin is the largest organ first barrier of human body. The basal cell carcinoma, squamous cell carcinoma and melanoma are three common skin cancers Melanoma is a type of cancer that originally develops from melanocytes (pigment containing cells). Melanoma has dramatically increased during last 30 years with low five-year survival and prognosis rate. The current therapeutic approaches of melanoma not only could bring treatment assistance but also have serious side effects like vitiligo. Under the setting of melanoma study, the current thesis investigated the melanoma suppression on different platforms. Firstly, resveratrol, a common bioactive compound, was used to target the potential biomarkers of melanoma. Secondly, through previous studies and then in silico research, certain biomarkers were furtherly targeted. Then in vitro melanoma-cell-culture models were investigated. We demonstrated that the melanoma cells were inhibited by protein-protein interaction. Third, after LC-MS/MS, the protein database was used to analyze and annotate the functions of the potential biomarkers. Forth, the rare case of amelanotic malignant melanoma (AMM) was reported that enlarged the understanding and supplement the phonotypes of melanoma. Resveratrol (RSV) is a kind of phytoalexin that is widely distributed in Mediterranean diet, that as a bioactive natural product, could be a tumor suppressor. We evaluated the effects of RSV on melanoma cells (A375) and found that RSV could obviously inhibit the proliferation of melanoma cells by modulating cell cycle and triggering apoptosis; Cyclin D1 and PCDH9 were strongly affected by RSV duration while RAC1 was not influenced. We furtherly explored the mechanism of these targeted genes of melanoma. In silico and reference studies exhibited the PCDH9 would be the novel biomarker of melanoma. Therefore, the alteration of PCDH9 expression (overexpression and interference) were performed to explore the effects of PCDH9 on melanoma. The common matrix metalloproteinase (MMPs) is responsible for the extracellular matrix degradation. MMP2 -among the set of enzymes (MMPs)- has been demonstrated to play important roles in cell migration. The results of qRT-PCR exhibited that PCDH9 could suppress melanoma cells by affecting MMP2, CCND1 (Cyclin D1) and RAC1. The melanoma and healthy tissues were analogically analyzed to demonstrate the inhibition of melanoma cells by PCDH9. The methods of Co-IP and LC-MS/MS were used as well to deeply investigate the correlation between PCDH9 and its suppression effects of melanoma. We found that PCDH9 and RAC1 can predict the prognosis of malignant melanoma and hypothesized that PCDH9 can modulate melanoma progression through MMP2 and RAC1 by reducing RAC1-dependent ROS generation and enhancing NADPH oxidase activity complex. The rare AMM firstly diagnosed as cutaneous squamous cell carcinoma (cSCC) was reported. According to the results of immunohistochemical examination (Ki67 (+++), Melan-A (+++), human melanoma black (HMB)45 (+), CD20 (-), cytokeratin (CK)7 (-) and CK5/6 (-) were found), the AMM was confirmed and the patient was applied surgical resection. This case showed the various phenotypes of melanoma.
APA, Harvard, Vancouver, ISO, and other styles
2

Transell, Mark Marriott. "The Use of bioinformatics techniques to perform time-series trend matching and prediction." Diss., University of Pretoria, 2012. http://hdl.handle.net/2263/37061.

Full text
Abstract:
Process operators often have process faults and alarms due to recurring failures on process equipment. It is also the case that some processes do not have enough input information or process models to use conventional modelling or machine learning techniques for early fault detection. A proof of concept for online streaming prediction software based on matching process behaviour to historical motifs has been developed, making use of the Basic Local Alignment Search Tool (BLAST) used in the Bioinformatics field. Execution times of as low as 1 second have been recorded, demonstrating that online matching is feasible. Three techniques have been tested and compared in terms of their computational effciency, robustness and selectivity, with results shown in Table 1: • Symbolic Aggregate Approximation combined with PSI-BLAST • Naive Triangular Representation with PSI-BLAST • Dynamic Time Warping Table 1: Properties of different motif-matching methods Property SAX-PSIBLAST TER-PSIBLAST DTW Noise tolerance (Selectivity) Acceptable Inconclusive Good Vertical Shift tolerance None Perfect Poor Matching speed Acceptable Acceptable Fast Match speed scaling O < O(mn) O < O(mn) O(mn) Dimensionality Reduction Tolerance Good Inconclusive Acceptable It is recommended that a method using a weighted confidence measure for each technique be investigated for the purpose of online process event handling and operator alerts. Keywords: SAX, BLAST, motif-matching, Dynamic Time Warping<br>Dissertation (MEng)--University of Pretoria, 2012.<br>Chemical Engineering<br>unrestricted
APA, Harvard, Vancouver, ISO, and other styles
3

Zhivkoplias, Erik. "Comparing the performance of different methods to estimate selection coefficient across parameter space using time-series genomic data." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-420278.

Full text
Abstract:
Estimating selection is of key importance in evolutionary biology research. The recent price drop in sequencing and advances in NGS data analysis have opened up new avenues for novel methods that estimate selection quantitatively from time-series allele frequency data. However, it is not yet well understood which method performs best given specific model systems and experimental designs. Here, using popular quantitative metrics, we compared the performance of four prominent methods on a series of simulated data sets and on data from real biological experiments. We identified in three out of four methods the experi- mental conditions best suited for estimating selection. We also explored the limitations of these methods when estimating selection from complex patterns of allele frequency change in some relevant evolutionary scenarios. Our findings highlight the need for modification of population genomics models that are still used in inference of model parameters with the goal to develop new, more accurate methods for the quantitative estimation of selection in time-series genomic data.
APA, Harvard, Vancouver, ISO, and other styles
4

Assaad, Firas Souhail. "Biometric Multi-modal User Authentication System based on Ensemble Classifier." University of Toledo / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1418074931.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ghalwash, Mohamed. "Interpretable Early Classification of Multivariate Time Series." Diss., Temple University Libraries, 2013. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/239730.

Full text
Abstract:
Computer and Information Science<br>Ph.D.<br>Recent advances in technology have led to an explosion in data collection over time rather than in a single snapshot. For example, microarray technology allows us to measure gene expression levels in different conditions over time. Such temporal data grants the opportunity for data miners to develop algorithms to address domain-related problems, e.g. a time series of several different classes can be created, by observing various patient attributes over time and the task is to classify unseen patient based on his temporal observations. In time-sensitive applications such as medical applications, some certain aspects have to be considered besides providing accurate classification. The first aspect is providing early classification. Accurate and timely diagnosis is essential for allowing physicians to design appropriate therapeutic strategies at early stages of diseases, when therapies are usually the most effective and the least costly. We propose a probabilistic hybrid method that allows for early, accurate, and patient-specific classification of multivariate time series that, by training on a full time series, offer classification at a very early time point during the diagnosis phase, while staying competitive in terms of accuracy with other models that use full time series both in training and testing. The method has attained very promising results and outperformed the baseline models on a dataset of response to drug therapy in Multiple Sclerosis patients and on a sepsis therapy dataset. Although attaining accurate classification is the primary goal of data mining task, in medical applications it is important to attain decisions that are not only accurate and obtained early, but can also be easily interpreted which is the second aspect of medical applications. Physicians tend to prefer interpretable methods rather than black-box methods. For that purpose, we propose interpretable methods for early classification by extracting interpretable patterns from the raw time series to help physicians in providing early diagnosis and to gain insights into and be convinced about the classification results. The proposed methods have been shown to be more accurate and provided classifications earlier than three alternative state-of-the-art methods when evaluated on human viral infection datasets and a larger myocardial infarction dataset. The third aspect has to be considered for medical applications is the need for predictions to be accompanied by a measure which allows physicians to judge about the uncertainty or belief in the prediction. Knowing the uncertainty associated with a given prediction is especially important in clinical diagnosis where data mining methods assist clinical experts in making decisions and optimizing therapy. We propose an effective method to provide uncertainty estimate for the proposed interpretable early classification methods. The method was evaluated on four challenging medical applications by characterizing decrease in uncertainty of prediction. We showed that our proposed method meets the requirements of uncertainty estimates (the proposed uncertainty measure takes values in the range [0,1] and propagates over time). To the best of our knowledge, this PhD thesis will have a great impact on the link between data mining community and medical domain experts and would give physicians sufficient confidence to put the proposed methods into real practice.<br>Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
6

Fulcher, Benjamin D. "Highly comparative time-series analysis." Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:642b65cf-4686-4709-9f9d-135e73cfe12e.

Full text
Abstract:
In this thesis, a highly comparative framework for time-series analysis is developed. The approach draws on large, interdisciplinary collections of over 9000 time-series analysis methods, or operations, and over 30 000 time series, which we have assembled. Statistical learning methods were used to analyze structure in the set of operations applied to the time series, allowing us to relate different types of scientific methods to one another, and to investigate redundancy across them. An analogous process applied to the data allowed different types of time series to be linked based on their properties, and in particular to connect time series generated by theoretical models with those measured from relevant real-world systems. In the remainder of the thesis, methods for addressing specific problems in time-series analysis are presented that use our diverse collection of operations to represent time series in terms of their measured properties. The broad utility of this highly comparative approach is demonstrated using various case studies, including the discrimination of pathological heart beat series, classification of Parkinsonian phonemes, estimation of the scaling exponent of self-affine time series, prediction of cord pH from fetal heart rates recorded during labor, and the assignment of emotional content to speech recordings. Our methods are also applied to labeled datasets of short time-series patterns studied in temporal data mining, where our feature-based approach exhibits benefits over conventional time-domain classifiers. Lastly, a feature-based dimensionality reduction framework is developed that links dependencies measured between operations to the number of free parameters in a time-series model that could be used to generate a time-series dataset.
APA, Harvard, Vancouver, ISO, and other styles
7

Abualhamayl, Abdullah Jameel Mr. "APPLY DATA CLUSTERING TO GENE EXPRESSION DATA." CSUSB ScholarWorks, 2015. https://scholarworks.lib.csusb.edu/etd/259.

Full text
Abstract:
Data clustering plays an important role in effective analysis of gene expression. Although DNA microarray technology facilitates expression monitoring, several challenges arise when dealing with gene expression datasets. Some of these challenges are the enormous number of genes, the dimensionality of the data, and the change of data over time. The genetic groups which are biologically interlinked can be identified through clustering. This project aims to clarify the steps to apply clustering analysis of genes involved in a published dataset. The methodology for this project includes the selection of the dataset representation, the selection of gene datasets, Similarity Matrix Selection, the selection of clustering algorithm, and analysis tool. R language with the focus of Kmeans, fpc, hclust, and heatmap3 packages in R is used in this project as an analysis tool. Different clustering algorithms are used on Spellman dataset to illustrate how genes are grouped together in clusters which help to understand our genetic behaviors.
APA, Harvard, Vancouver, ISO, and other styles
8

Morcillo, Suárez Carlos. "Analysis of genetic polymorphisms for statistical genomics: tools and applications." Doctoral thesis, Universitat Pompeu Fabra, 2011. http://hdl.handle.net/10803/78126.

Full text
Abstract:
New approaches are needed to manage and analyze the enormous quantity of biological data generated by modern technologies. Existing solutions are often fragmented and uncoordinated and, thus, they require considerable bioinformatics skills from users. Three applications have been developed illustrating different strategies to help users without extensive IT knowledge to take maximum profit from their data. SNPator is an easy-to-use suite that integrates all the usual tools for genetic association studies: from initial quality control procedures to final statistical analysis. CHAVA is an interactive visual application for CNV calling from aCGH data. It presents data in a visual way that helps assessing the quality of the calling and assists in the process of optimization. Haplotype Association Pattern Analysis visually presents data from exhaustive genomic haplotype associations, so that users can recognize patterns of possible associations that cannot be detected by single-SNP tests.<br>Calen noves aproximacions per la gestió i anàlisi de les enormes quantitats de dades biològiques generades per les tecnologies modernes. Les solucions existents, sovint fragmentaries i descoordinades, requereixen elevats nivells de formació bioinformàtica. Hem desenvolupat tres aplicacions que il•lustren diferents estratègies per ajudar als usuaris no experts en informàtica a aprofitar al màxim les seves dades. SNPator és un paquet de fàcil us que integra les eines usades habitualment en estudis de associació genètica: des del control de qualitat fins les anàlisi estadístiques. CHAVA és una aplicació visual interactiva per a la determinació de CNVs a partir de dades aCGH. Presenta les dades visualment per ajudar a valorar la qualitat de les CNV predites i ajudar a optimitzar-la. Haplotype Pattern Analysis presenta dades d’anàlisi d’associació haplotípica de forma visual per tal que els usuaris puguin reconèixer patrons de associacions que no es possible detectar amb tests de SNPs individuals.
APA, Harvard, Vancouver, ISO, and other styles
9

Vandenbussche, Pierre-Yves. "Définition d'un cadre formel de représentation des Systèmes d'Organisation de la Connaissance." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2011. http://tel.archives-ouvertes.fr/tel-00642545.

Full text
Abstract:
Ce travail de thèse, réalisé au sein de l'entreprise MONDECA et du laboratoire de recherche INSERM, est né du besoin de disposer d'un serveur capable de supporter le processus éditorial de Systèmes d'Organisation de Connaissances (SOC) et soulève la problématique suivante: comment harmoniser la représentation des SOC et de leurs correspondances afin de proposer des services unifiés qui supportent l'édition, la publication et l'utilisation efficaces des connaissances de ces référentiels? Pour répondre à cette problématique, nous soutenons la thèse que l'élaboration d'un modèle de représentation commune de SOC est une solution adaptée pour (i) pallier l'hétérogénéité de ces référentiels, (ii) favoriser l'interopérabilité sémantique au sein d'un Système d'Information et (iii) proposer des services unifiés quel que soit le SOC. Nous utilisons pour cela des méthodes propres à l'Ingénierie des Connaissances couplées à celles de l'Ingénierie des modèles. Les contributions présentées se concentrent sur trois axes. Dans un premier axe, nous souhaitons obtenir une solution de modélisation de SOC la plus générique possible et qui puisse être étendue pour prendre en compte les spécificités de chacun des référentiels. Nous proposons donc un modèle extensible commun de représentation, nommé UniMoKR, construit à partir des standards, recommandations et projets existants. Notre modèle a été proposé et intégré en partie dans la future norme ISO 25964 qui porte sur la représentation des terminologies. Nous avons également soumis deux patrons de modélisation d'ontologie au portail Ontology Design Pattern. Le second axe est consacré à la proposition de services unifiés qui reposent sur cette modélisation. Parmi ces services nous distinguons l'export de tout ou partie de SOC dans un format standard d'échange ou encore des services Web de gestion de terminologies. Pour mettre ces services à disposition, nous préconisons la méthode de transformation de modèles qui utilise le langage SPARQL pour l'expression des règles de transformation. Dans un troisième axe, nous présentons l'application de notre solution testée et commercialisée pour divers projets dans différents domaines d'applications. Nous montrons ici la faisabilité de notre approche, ainsi que l'amélioration que la représentation formelle de notre modèle apporte à la qualité des informations. Ces implémentations ont permis d'effectuer une validation en condition d'utilisation.
APA, Harvard, Vancouver, ISO, and other styles
10

Chen, Jiuqiang. "Designing scientific workflows following a structure and provenance-aware strategy." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00931122.

Full text
Abstract:
Les systèmes de workflows disposent de modules de gestion de provenance qui collectent les informations relatives aux exécutions (données consommées et produites) permettant d'assurer la reproductibilité d'une expérience. Pour plusieurs raisons, la complexité de la structure du workflow et de ses d'exécutions est en augmentation, rendant la réutilisation de workflows plus difficile. L'objectif global de cette thèse est d'améliorer la réutilisation des workflows en fournissant des stratégies pour réduire la complexité des structures de workflow tout en préservant la provenance. Deux stratégies sont introduites. Tout d'abord, nous introduisons SPFlow un algorithme de réécriture de workflow scientifique préservant la provenance et transformant tout graphe acyclique orienté (DAG) en une structure plus simple, série-parallèle (SP). Ces structures permettent la conception d'algorithmes polynomiaux pour effectuer des opérations complexes sur les workflows (par exemple, leur comparaison) alors que ces mêmes opérations sont associées à des problèmes NP-difficile pour des structures générales de DAG. Deuxièmement, nous proposons une technique capable de réduire la redondance présente dans les workflow en détectant et supprimant des motifs responsables de cette redondance, nommés "anti-patterns". Nous avons conçu l'algorithme DistillFlow capable de transformer un workflow en un workflow sémantiquement équivalent "distillé", possédant une structure plus concise et dans laquelle on retire autant que possible les anti-patterns. Nos solutions (SPFlow et DistillFlow) ont été testées systématiquement sur de grandes collections de workflows réels, en particulier avec le système Taverna. Nos outils sont disponibles à l'adresse: https://www.lri.fr/~chenj/.
APA, Harvard, Vancouver, ISO, and other styles
11

East, Jackie R. "NATURAL PHENOMENA AS POTENTIAL INFLUENCE ON SOCIAL AND POLITICAL BEHAVIOR: THE EARTH’S MAGNETIC FIELD." UKnowledge, 2014. http://uknowledge.uky.edu/polysci_etds/11.

Full text
Abstract:
Researchers use natural phenomena in a number of disciplines to help explain human behavioral outcomes. Research regarding the potential effects of magnetic fields on animal and human behavior indicates that fields could influence outcomes of interest to social scientists. Tests so far have been limited in scope. This work is a preliminary evaluation of whether the earth’s magnetic field influences human behavior it examines the baseline relationship exhibited between geomagnetic readings and a host of social and political outcomes. The emphasis on breadth of topical coverage in these statistical trials, rather than on depth of development for any one model, means that evidence is only suggestive – but geomagnetic readings frequently covary with social and political variables in a fashion that seems inexplicable in the absence of a causal relationship. The pattern often holds up in more-elaborate statistical models. Analysis provides compelling evidence that geomagnetic variables furnish valuable information to models. Many researchers are already aware of potential causal mechanisms that link human behavior to geomagnetic levels and this evidence provides a compelling case for continuing to develop the line of research with in-depth, focused analysis.
APA, Harvard, Vancouver, ISO, and other styles
12

Hsiao, Han-Jung, and 蕭函容. "Development of Integrated Web Server for Drug Repositioning using Bioinformatics Approaches." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/10967911957132606821.

Full text
Abstract:
碩士<br>國立中興大學<br>基因體暨生物資訊學研究所<br>104<br>The primary strategy of drug development usually adopts high-throughput screening of different molecules to identify lead compounds showing activity against single therapeutic target and pathway during past decades. However, the success rate in successfully identified new drugs has declined dramatically over the years. The scientists began to look for different methods in the drug development. Drug repositioning is one of the options alternative to new drug discovery. Drug repositioning is the application of known drugs and compounds to new indications. Reducing the procedure in early development can keep the cost down in identifying new therapeutic applications for existing drugs. In summary, drug repositioning can save about 5-7 years compared to most new drug development. This study applied bioinformatics methods to investigate drug-disease-gene connectivity network as follows: (1) constructing the rank matrix of large-scale gene-expression profiles data from Connectivity Map database; (2) extracting the drug-gene relations from DrugBank database and drug-treatment relations from Unified Medical Language System - National Drug File Reference Terminology; and (3) taking protein-protein interaction data from IntAct database. Furthermore, we calculated the correlation coefficients between gene expression profiles and integrate DrugBank, NDFRT, IntAct data to construct the drug-disease-gene connectivity network. The drug-drug connectivity could be used to predict their therapeutic of each drug. Ultimate, this study used precision and recall to validate the results. This study result has been used to construct the web server. (http://syslab4.nchu.edu.tw/DrugCorrelation/)
APA, Harvard, Vancouver, ISO, and other styles
13

Lu, Tzu-Pin, and 盧子彬. "Integrative Bioinformatics Approaches for Dynamic Time Series and Steady State Transcriptome Microarray Data." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/49152827579496024539.

Full text
Abstract:
博士<br>國立臺灣大學<br>生醫電子與資訊學研究所<br>99<br>Microarray technology has been widely utilized in biological and medical researches in the past two decades. The high-throughput feature facilitates the exploration of dysregulated cellular functions driven by experimental manipulations and identification of potential candidate genes for further validations. However, dealing with those massive data poses an exciting challenge in how to perform an efficient and accurate analysis. To address this issue, various statistical algorithms and mathematical models have been developed. In this dissertation, four bioinformatics approaches were presented and applied on two microarray datasets, three human lymphoblastoid cell lines exposed to radiation treatments and non-smoking female lung cancer patients in Taiwan. The first approach was a dynamic time series analysis, which explored the radiation-induced effects between higher and lower doses in the cells with different p53 status. Template-based clustering and tight clustering were performed to identify differentially expressed genes, and the results exhibited distinct signaling pathways in the three cell lines after 10Gy and iso-survival radiation exposures. After 10Gy radiation treatments, the p53 signaling pathway was triggered in TK6, whereas the NFkB signaling pathway was activated in WTK1 without functional p53 protein. Alternatively, irradiation with iso-survival doses induced down-regulations of many E2F4-related genes in all cell lines in spite of p53 status, which indicated that the E2F4 signaling pathway might serve as important regulators in response to lower dose radiation. The second approach investigated the gene expression profiles of non-smoking female lung cancer patients in Taiwan. This data set was composed of 60 pairs of tumor and adjacent normal tissue specimens. There were 687 differentially expressed genes in tumor tissue identified by paired t-test and significantly enriched in the pathway of axon guidance signaling. The varying patterns were highly similar to two public lung cancer datasets with both tumor and normal tissues from the same individual, which strengthened that these dysregulated genes were involved in lung tumorigenesis. Among them, the downregulation of SEMA5A in tumor tissue, both at the transcriptional and translational levels, was associated with poor survival outcomes. The results suggested that SEMA5A might be used as a novel biomarker for non-smoking female lung cancer patients. In the third approach, concurrent analyses of gene expression and copy number variations (CNVs) were performed in 42 pairs of non-smoking lung adenocarcinoma women. The results revealed the genomic landscape of recurrent copy number variated regions and 475 differentially expressed genes associated with CNVs. Among these CNV-driven genes, two important functions, survival regulation via AKT signaling and cytoskeleton reorganization, were significantly enriched. Survival analyses based on these enriched pathways demonstrated effective predictions in three independent microarray datasets, which suggested that those identified genes/pathways with concordant changes in both gene expression and CNV might be used as prognostic biomarkers for lung tumorigenesis. In the fourth approach, a comprehensive analysis was conducted in 32 pairs of non-smoking female lung adenocarcinoma patients to investigate SNPs, CNVs, methylation alterations, and gene expressions simultaneously. Associated co-varying patterns were observed between genetic modifications and transcriptional dysregulations. Three statistical approaches identified 617 SNP alleles related to CNVs or methylation alterations, and among them, Kruskal-Wallis test indicated 13 SNPs with downstream gene expression changes. Therefore, these SNPs with concordant changes in both DNA and RNA levels deserve more research efforts to elucidate their roles in lung cancer. In conclusion, these four bioinformatics approaches were effective in addressing biomedical issues and the results are confirmable in external datasets or biological experiments.
APA, Harvard, Vancouver, ISO, and other styles
14

Scanfeld, Daniel. "Exploring the Plasmodium falciparum Transcriptome Using Hypergeometric Analysis of Time Series (HATS)." Thesis, 2013. https://doi.org/10.7916/D8CN7B17.

Full text
Abstract:
Malaria poses a significant public health and economic threat in many regions of the world, disproportionately affecting children in sub-Saharan Africa under the age of five. Though success has been celebrated in lowering infection rates, it remains a serious challenge, causing at least 200 million infections and 655,000 deaths per year, with deleterious effects on economic growth and development. Investigation of the malaria parasite Plasmodium falciparum has entered the post-genomics age, with several strains sequenced and many microarray gene expression studies performed. Gene expression studies allow a full sampling of the genomic repertoire of a parasite, and their detailed analysis will prove invaluable in deciphering novel parasite biology as well as the modes of action of antimalarial drug resistance. We have developed a computational pipeline that converts a series of fluorescence readings from a DNA microarray into a meaningful set of biological hypotheses based on the comparison of two lines, generally one that is drug sensitive and one that is drug resistant. Each step of the computational pipeline is described in detail in this thesis, beginning with data normalization and alignment, followed by visualization through dimensionality reduction, and finally a direct analysis of the differences and similarities between the two lines. Comparisons and analyses were performed at both the individual gene and gene set level. An important component of the analytical methods we have developed is a suite of visualization tools that help to easily identify outliers and experimental flaws, measure the significance of predictions, show how lines relate and how well they can be aligned, and demonstrate the results of an analysis. These visualization tools should be used as a starting point for further biological study to test the resulting hypotheses. We also developed a software tool, Gene Attribute and Set Enrichment Ranking (GASER), which combines a wealth of genomic data from the TDR Targets web site along with expression data from a variety of sources, and allows researchers to create sophisticated weighted queries to undercover potential drug targets. Queries in our system can be updated in real time, along with their accompanying gene and gene set lists. We analyzed all possible pair-wise combinations of 11 parasite lines to create baseline distributions for gene and gene set enrichment. Using the baseline as a comparison, we identified and discarded spurious results and recognized stochastic genes and gene sets. We analyzed three major sets of parasite lines: those involving manipulation of the multidrug resistance-1 (PfMDR1) transporter, a key resistance determinant; those involving manipulation of the P. falciparum chloroquine resistance transporter (PfCRT), another important resistance determinant; and finally a set of parasites that had varying sensitivity to artemisinins. This analysis resulted in a rich library of high scoring genes that may merit further exploration as potential modes of action of resistance. More specifically, we found that manipulation of pfcrt expression resulted in an up-regulation of tRNA synthetases, which might serve to increase protein production in response to reduced amino acid availability from degraded hemoglobin. We observed that a copy number increase in pfmdr1 resulted in increases in glycerophospholipid metabolism and up-regulation of a number of ABC transporters. Finally, when comparing artemisinin sensitive to artemisinin tolerant lines, we found an increased abundance of redox metabolites and the transcripts involved in redox regulation, and significant reduction in transcription and altered expression of transcripts encoding for core histone proteins. These alterations could help confer an increased tolerance to drug induced redox perturbation by lowering endogenous redox stress. We also offer a robust computational tool, Hypergeometric Analysis of Time Series (HATS), to handle challenging biological questions related to comparison of time series experiments. Our pipeline provides a rigorous method for aligning expression experiments and then determining which genes and gene sets differ most between them. The changes in gene expression level between drug-sensitive and drug-resistant lines offer important clues in our quest for understanding mechanisms of resistance and identifying new drug targets. Our pipeline allows for comparison of future lines with our base set and holds potential for other organisms, especially those similar to Plasmodium with a strong time-dependent component. The full excel files of all the analyses performed in this thesis can be found at: (http://www.fidock.org/dan).
APA, Harvard, Vancouver, ISO, and other styles
15

Mitra, Pralay. "Algorithmic Approaches For Protein-Protein Docking And quarternary Structure Inference." Thesis, 2010. http://hdl.handle.net/2005/1066.

Full text
Abstract:
Molecular interaction among proteins drives the cellular processes through the formation of complexes that perform the requisite biochemical function. While some of the complexes are obligate (i.e., they fold together while complexation) others are non-obligate, and are formed through macromolecular recognition. Macromolecular recognition in proteins is highly specific, yet it can be both permanent and non permanent in nature. Hallmarks of permanent recognition complexes include large surface of interaction, stabilization by hydrophobic interaction and other noncovalent forces. Several amino acids which contribute critically to the free energy of binding at these interfaces are called as “hot spot” residues. The non permanent recognition complexes, on the other hand, usually show small interface of interaction, with limited stabilization from non covalent forces. For both the permanent and non permanent complexes, the specificity of molecular interaction is governed by the geometric compatibility of the interaction surface, and the noncovalent forces that anchor them. A great deal of studies has already been performed in understanding the basis of protein macromolecular recognition.1; 2 Based on these studies efforts have been made to develop protein-protein docking algorithms that can predict the geometric orientation of the interacting molecules from their individual unbound states. Despite advances in docking methodologies, several significant difficulties remain.1 Therefore, in this thesis, we start with literature review to understand the individual merits and demerits of the existing approaches (Chapter 1),3 and then, we attempt to address some of the problems by developing methods to infer protein quaternary structure from the crystalline state, and improve structural and chemical understanding of protein-protein interactions through biological complex prediction. The understanding of the interaction geometry is the first step in a protein-protein interaction study. Yet, no consistent method exists to assess the geometric compatibility of the interacting interface because of its highly rugged nature. This suggested that new sensitive measures and methods are needed to tackle the problem. We, therefore, developed two new and conceptually different measures using the Delaunay tessellation and interface slice selection to compute the surface complementarity and atom packing at the protein-protein interface (Chapter 2).4 We called these Normalized Surface Complementarity (NSc) and Normalized Interface Packing (NIP). We rigorously benchmarked the measures on the non redundant protein complexes available in the Protein Data Bank (PDB) and found that they efficiently segregate the biological protein-protein contacts from the non biological ones, especially those derived from X-ray crystallography. Sensitive surface packing/complementarity recognition algorithms are usually computationally expensive and thus limited in application to high-throughput screening. Therefore, special emphasis was given to make our measure compute-efficient as well. Our final evaluation showed that NSc, and NIP have very strong correlation among themselves, and with the interface area normalized values available from the Surface Complementarity program (CCP4 Suite: <http://smb.slac.stanford.edu/facilities/software/ccp4/html/sc.html>); but at a fraction of the computing cost. After building the geometry based surface complementarity and packing assessment methods to assess the rugged protein surface, we advanced our goal to determine the stabilities of the geometrically compatible interfaces formed. For doing so, we needed to survey the quaternary structure of proteins with various affinities. The emphasis on affinity arose due to its strong relationship with the permanent and non permanent life-time of the complex. We, therefore, set up data mining studies on two databases named PQS (Protein Quaternary structure database: http://pqs.ebi.ac.uk) and PISA (Protein Interfaces, Surfaces and Assemblies: www.ebi.ac.uk/pdbe/prot_int/pistart.html) that offered downloads on quaternary structure data on protein complexes derived from X-ray crystallographic methods. To our surprise, we found that above mentioned databases provided the valid quaternary structure mostly for moderate to strong affinity complexes. The limitation could be ascertained by browsing annotations from another curated database of protein quaternary structure (PiQSi:5 supfam.mrc-lmb.cam.ac.uk/elevy/piqsi/piqsi_home.cgi) and literature surveys. This necessitated that we at first develop a more robust method to infer quaternary structures of all affinity available from the PDB. We, therefore, developed a new scheme focused on covering all affinity category complexes, especially the weak/very weak ones, and heteromeric quaternary structures (Chapter 3).6 Our scheme combined the naïve Bayes classifier and point-group symmetry under a Boolean framework to detect all categories of protein quaternary structures in crystal lattice. We tested it on a standard benchmark consisting of 112 recognition heteromeric complexes, and obtained a correct recall in 95% cases, which are significantly better than 53% achieved by the PISA,7 a state-of-art quaternary structure detection method hosted at the European Bioinformatics Institute, Hinxton, UK. A few cases that failed correct detection through our scheme, offered interesting insights into the intriguing nature of protein contacts in the lattice. The findings have implications for accurate inference of quaternary states of proteins, especially weak affinity complexes, where biological protein contacts tend to be sacrificed for the energetically optimal ones that favor the formation/stabilization of the crystal lattice. We expect our method to be used widely by all researchers interested in protein quaternary structure and interaction. Having developed a method that allows us to sample all categories of quaternary structures in PDB, we set our goal in addressing the next problem that of accurately determining stabilities of the geometrically compatible protein surfaces involved in interaction. Reformulating the question in terms of protein-protein docking, we sought to ask how we could reliably infer the stabilities of any arbitrary interface that is formed when two protein molecules are brought sterically closer. In a real protein docking exercise this question is asked innumerable times during energy-based screening of thousands of decoys geometrically sampled (through rotation+translation) from the unbound subunits. The current docking methods face problems in two counts: (i), the number of interfaces from decoys to evaluate energies is rather large (64320 for a 9º rotation and translation for a dimeric complex), and (ii) the energy based screening is not quite efficient such that the decoys with native-like quaternary structure are rarely selected at high ranks. We addressed both the problems with interesting results. Intricate decoy filtering approaches have been developed, which are either applied during the search stage or the sampling stage, or both. For filtering, usually statistical information, such as 3D conservation information of the interfacial residues, or similar facts is used; more expensive approaches screen for orientation, shape complementarity and electrostatics. We developed an interface area based decoy filter for the sampling stage, exploiting an assumption that native-like decoys must have the largest, or close to the largest, interface (Chapter 4).8 Implementation of this assumption and standard benchmarking showed that in 91% of the cases, we could recover native-like decoys of bound and unbound binary docking-targets of both strong and weak affinity. This allowed us to propose that “native-like decoys must have the largest, or close to the largest, interface” can be used as a rule to exclude non native decoys efficiently during docking sampling. This rule can dramatically clip the needle-in-a-haystack problem faced in a docking study by reducing >95% of the decoy set available from sampling search. We incorporated the rule as a central part of our protein docking strategy. While addressing the question of energy based screening to rank the native-like decoys at high rank during docking, we came across a large volume of work already published. The mainstay of most of the energy based screenings that avoid statistical potential, involve some form of the Coulomb’s potential, Lennard Jones potential and solvation energy. Different flavors of the energy functions are used with diverse preferences and weights for individual terms. Interestingly, in all cases the energy functions were of the unnormalized form. Individual energy terms were simply added to arrive at a final score that was to be used for ranking. Proteins being large molecules, offer limited scope of applying semi-empirical or quantum mechanical methods for large scale evaluation of energy. We, therefore, developed a de novo empirical scoring function in the normalized form. As already stated, we found NSc and NIP to be highly discriminatory for segregating biological and non biological interface. We, therefore, incorporated them as parameters for our scoring function. Our data mining study revealed that there is a reasonable correlation of -0.73 between normalized solvation energy and normalized nonbonding energy (Coulombs + van der Waals) at the interface. Using the information, we extended our scoring function by combining the geometric measures and the normalized interaction energies. Tests on 30 unbound binary protein-protein complexes showed that in 16 cases we could identify at least one decoy in top three ranks with ≤10 Å backbone root-mean-square-deviation (RMSD) from true binding geometry. The scoring results were compared with other state-of-art methods, which returned inferior results. The salient feature of our scoring function was exclusion of any experiment guided restraints, evolutionary information, statistical propensities or modified interaction energy equations, commonly used by others. Tests on 118 less difficult bound binary protein-protein complexes with ≤35% sequence redundancy at the interface gave first rank in 77% cases, where the native like decoy was chosen among 1 in 10,000 and had ≤5 Å backbone RMSD from true geometry. The details about the scoring function, results and comparison with the other methods are extensively discussed in Chapter 5.9 The method has been implemented and made available for public use as a web server - PROBE (http://pallab.serc.iisc.ernet.in/probe). The development and use of PROBE has been elaborated in Chapter 7.10 On course of this work, we generated huge amounts of data, which is useful information that could be used by others, especially “protein dockers”. We, therefore, developed dockYard (http://pallab.serc.iisc.ernet.in/dockYard) - a repository for protein-protein docking decoys (Chapter 6).11 dockYard offers four categories of docking decoys derived from: Bound (native dimer co-crystallized), Unbound (individual subunits as well as the target are crystallized), Variants (match the previous two categories in at least one subunit with 100% sequence identity), and Interlogs (match the previous categories in at least one subunit with ≥90% or ≥50% sequence identity). There is facility for full or selective download based on search parameters. The portal also serves as a repository to modelers who may want to share their decoy sets with the community. In conclusion, although we made several contributions in development of algorithms for improved protein-protein docking and quaternary structure inference, a lot of challenges remain (Chapter 8). The principal challenge arises by considering proteins as flexible bodies, whose conformational states may change on quaternary structure formation. In addition, solvent plays a major role in the free energy of binding, but its exact contribution is not straightforward to estimate. Undoubtedly, the cost of computation is one of the limiting factors apart from good energy functions to evaluate the docking decoys. Therefore, the next generation of algorithms must focus on improved docking studies that realistically incorporate flexibility and solvent environment in all their evaluations.
APA, Harvard, Vancouver, ISO, and other styles
16

Yahi, Alexandre. "Simulating drug responses in laboratory test time series with deep generative modeling." Thesis, 2019. https://doi.org/10.7916/d8-arta-jt32.

Full text
Abstract:
Drug effects can be unpredictable and vary widely among patients with environmental, genetic, and clinical factors. Randomized control trials (RCTs) are not sufficient to identify adverse drug reactions (ADRs), and the electronic health record (EHR) along with medical claims have become an important resource for pharmacovigilance. Among all the data collected in hospitals, laboratory tests represent the most documented and reliable data type in the EHR. Laboratory tests are at the core of the clinical decision process and are used for diagnosis, monitoring, screening, and research by physicians. They can be linked to drug effects either directly, with therapeutic drug monitoring (TDM), or indirectly using drug laboratory effects (DLEs) that affect surrogate tests. Unfortunately, very few automated methods use laboratory tests to inform clinical decision making and predict drug effects, partly due to the complexity of these time series that are irregularly sampled, highly dependent on other clinical covariates, and non-stationary. Deep learning, the branch of machine learning that relies on high-capacity artificial neural networks, has known a renewed popularity this past decade and has transformed fields such as computer vision and natural language processing. Deep learning holds the promise of better performances compared to established machine learning models, although with the necessity for larger training datasets due to their higher degrees of freedom. These models are more flexible with multi-modal inputs and can make sense of large amounts of features without extensive engineering. Both qualities make deep learning models ideal candidate for complex, multi-modal, noisy healthcare datasets. With the development of novel deep learning methods such as generative adversarial networks (GANs), there is an unprecedented opportunity to learn how to augment existing clinical dataset with realistic synthetic data and increase predictive performances. Moreover, GANs have the potential to simulate effects of individual covariates such as drug exposures by leveraging the properties of implicit generative models. In this dissertation, I present a body of work that aims at paving the way for next generation laboratory test-based clinical decision support systems powered by deep learning. To this end, I organized my experiments around three building blocks: (1) the evaluation of various deep learning architectures with laboratory test time series and their covariates with a forecasting task; (2) the development of implicit generative models of laboratory test time series using the Wasserstein GAN framework; (3) the inference properties of these models for the simulation of drug effects in laboratory test time series, and their application for data augmentation. Each component has its own evaluation: The forecasting task enabled me to explore the properties and performances of different learning architectures; the Wasserstein GAN models are evaluated with both intrinsic metrics and extrinsic tasks, and I always set baselines to avoid providing results in a "neural-network only" referential. Applied machine learning, and more so with deep learning, is an empirical science. While the datasets used in this dissertation are not publicly available due to patient privacy regulation, I described pre-processing steps, hyper-parameters selection and training processes with reproducibility and transparency in mind. In the specific context of these studies involving laboratory test time series and their clinical covariates, I found that for supervised tasks, machine learning holds up well against deep learning methods. Complex recurrent architectures like long short-term memory (LSTM) do not perform well on these short time series, while convolutional neural networks (CNNs) and multi-layer perceptrons (MLPs) provide the best performances, at the cost of extensive hyper-parameter tuning. Generative adversarial networks, enabled by deep learning models, were able to generate high-fidelity laboratory test time series, and the quality of the generated samples was increased with conditional models using drug exposures as auxiliary information. Interestingly, forecasting models trained on synthetic data exclusively still retain good performances, confirming the potential of GANs in privacy-oriented applications. Finally, conditional GANs demonstrated an ability to interpolate samples from drug exposure combinations not seen during training, opening the way for laboratory test simulation with larger auxiliary information spaces. In specific cases, augmenting real training sets with synthetic data improved performances in the forecasting tasks, and could be extended to other applications where rare cases present a high prediction error.
APA, Harvard, Vancouver, ISO, and other styles
17

de, Oliveira Sales Ana Paula. "Computational Methods for Investigating Dendritic Cell Biology." Diss., 2011. http://hdl.handle.net/10161/5677.

Full text
Abstract:
<p>The immune system is constantly faced with the daunting task of protecting the host from a large number of ever-evolving pathogens. In vertebrates, the immune response results from the interplay of two cellular systems: the innate immunity and the adaptive immunity. In the past decades, dendritic cells have emerged as major players in the modulation of the immune response, being one of the primary links between these two branches of the immune system.</p><p>Dendritic cells are pathogen-sensing cells that alert the rest of the immune system of the presence of infection. The signals sent by dendritic cells result in the recruitment of the appropriate cell types and molecules required for effectively clearing the infection. A question of utmost importance in our understanding of the immune response and our ability to manipulate it in the development of vaccines and therapies is: "How do dendritic cells translate the various cues they perceive from the environment into different signals that specifically activate the appropriate parts of the immune system that result in an immune response streamlined to clear the given pathogen?"</p><p>Here we have developed computational and statistical methods aimed to address specific aspects of this question. In particular, understanding how dendritic cells ultimately modulate the immune response requires an understanding of the subtleties of their maturation process in response to different environmental signals. Hence, the first part of this dissertation focuses on elucidating the changes in the transcriptional</p><p>program of dendritic cells in response to the detection of two common pathogen- associated molecules, LPS and CpG. We have developed a method based on Langevin and Dirichlet processes to model and cluster gene expression temporal data, and have used it to identify, on a large scale, genes that present unique and common transcriptional behaviors in response to these two stimuli. Additionally, we have also investigated a different, but related, aspect of dendritic cell modulation of the adaptive immune response. In the second part of this dissertation, we present a method to predict peptides that will bind to MHC molecules, a requirement for the activation of pathogen-specific T cells. Together, these studies contribute to the elucidation of important aspects of dendritic cell biology.</p><br>Dissertation
APA, Harvard, Vancouver, ISO, and other styles
18

"Machine Learning Models for High-dimensional Biomedical Data." Doctoral diss., 2018. http://hdl.handle.net/2286/R.I.50520.

Full text
Abstract:
abstract: The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to help understand the data, discover the patterns and improve the decision making. All the proposed methods can generalize to other industrial fields. The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data with mixed categorical and numeric attributes remains a challenging, yet important task. A clustering algorithm based on tree ensembles, CRAFTER, is proposed to tackle this task in a scalable manner. The second part of this dissertation aims to develop data representation methods for genome sequencing data, a special type of high-dimensional data in the biomedical domain. The proposed data representation method, Bag-of-Segments, can summarize the key characteristics of the genome sequence into a small number of features with good interpretability. The third part of this dissertation introduces an end-to-end deep neural network model, GCRNN, for time series classification with emphasis on both the accuracy and the interpretation. GCRNN contains a convolutional network component to extract high-level features, and a recurrent network component to enhance the modeling of the temporal characteristics. A feed-forward fully connected network with the sparse group lasso regularization is used to generate the final classification and provide good interpretability. The last topic centers around the dimensionality reduction methods for time series data. A good dimensionality reduction method is important for the storage, decision making and pattern visualization for time series data. The CRNN autoencoder is proposed to not only achieve low reconstruction error, but also generate discriminative features. A variational version of this autoencoder has great potential for applications such as anomaly detection and process control.<br>Dissertation/Thesis<br>Doctoral Dissertation Industrial Engineering 2018
APA, Harvard, Vancouver, ISO, and other styles
19

Saker, Halima. "Segmentation of Heterogeneous Multivariate Genome Annotation Data." 2021. https://ul.qucosa.de/id/qucosa%3A75914.

Full text
Abstract:
Due to the potential impact of next-generation sequencing NGS, we have seen a rapid increase in genomic information and annotation information that can be naturally mapped to genomic locations. In cancer research, for example, there are significant efforts to chart DNA methylation at single-nucleotide resolution. The NIH Roadmap Epigenomics Projects, on the other hand, has set out to chart a large number of different histone modifications. However, throughout the last few years, a very diverse set of aspects has become the aim of large-scale experiments with a genome-wide readout. Therefore, the identification of functional units of the genomic DNA is considered a significant and essential challenge. Subsequently, we have been motivated to implement multi-dimensional segmentation approaches that serve gene variety and genome heterogeneity. The segmentation of multivariate genomic, epigenomic, and transcriptomic data from multiple time points, tissue, and cell types to compare changes in genomic organization and identify common elements form the headline of our research. Next generation sequencing offers a rich material used in bioinformatics research to find answers, solutions, and exploration for the molecular functions, diseases causes, etc. Rapid advances in technology also have led to the proliferation of types of experiments. Although sharing next-generation sequencing as the readout produces signals with an entirely different inherent resolution, ranging from a precise transcript structure at the single-nucleotide resolution to pull-down and enrichment-based protocols with resolutions on order 100 nt to chromosome conformation data that are only accurate at kilobase resolution. Therefore, the main goal of the dissertation project is to design, implement, and test novel segmentation algorithms that work on one- and multi-dimensional and can accommodate data of different types and resolutions. The target data in this project is multivariate genetic, epigenetic, transcriptomic, and proteomic data; the reason is that these datasets can change under the effect of several conditions such as chemical, genetic and epigenetic modifications. A promising approach towards this end is to identify intervals of the genomic DNA that behave coherently in multiple conditions and tissues and could be defined as intervals on which all measured quantities are constant within each experiment. A naive approach would take each data set in isolation and estimate intervals in which the signal at hand is constant. Another approach takes datasets all at once as input without recurring to one-dimensional segmentation. Once implemented, the algorithm should be applied on heterogeneous genomic, transcriptomic, proteomic, and epigenomic data; the aim here is to draw and improve the map of functionally coherent segments of a genome. Current approaches either focus on individual datasets, as in the case of tiling array transcriptomics data; Or on the analysis of comparable experiments such as ChIP-seq data for various histone modifications. The simplest sub-problem in segmentation is to decide whether two adjacent intervals should form two distinct segments or whether they should be combined into a single one. We have to find out how this should be done in the multi-D segmentation; in 1-D, this is relatively well known. This leads to a segmentation of the genome concerning the particular dataset. The intersection of segmentations for different datasets could identify then the DNA elements.
APA, Harvard, Vancouver, ISO, and other styles
20

"Integrative Analyses of Diverse Biological Data Sources." Doctoral diss., 2011. http://hdl.handle.net/2286/R.I.9224.

Full text
Abstract:
abstract: The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards these objectives, this research focuses on data integration within two scenarios: (1) transcriptomic, proteomic and functional information and (2) real-time sensor-based measurements motivated by single-cell technology. To assess relationships between protein abundance, transcriptomic and functional data, a nonlinear model was explored at static and temporal levels. The successful integration of these heterogeneous data sources through the stochastic gradient boosted tree approach and its improved predictability are some highlights of this work. Through the development of an innovative validation subroutine based on a permutation approach and the use of external information (i.e., operons), lack of a priori knowledge for undetected proteins was overcome. The integrative methodologies allowed for the identification of undetected proteins for Desulfovibrio vulgaris and Shewanella oneidensis for further biological exploration in laboratories towards finding functional relationships. In an effort to better understand diseases such as cancer at different developmental stages, the Microscale Life Science Center headquartered at the Arizona State University is pursuing single-cell studies by developing novel technologies. This research arranged and applied a statistical framework that tackled the following challenges: random noise, heterogeneous dynamic systems with multiple states, and understanding cell behavior within and across different Barrett's esophageal epithelial cell lines using oxygen consumption curves. These curves were characterized with good empirical fit using nonlinear models with simple structures which allowed extraction of a large number of features. Application of a supervised classification model to these features and the integration of experimental factors allowed for identification of subtle patterns among different cell types visualized through multidimensional scaling. Motivated by the challenges of analyzing real-time measurements, we further explored a unique two-dimensional representation of multiple time series using a wavelet approach which showcased promising results towards less complex approximations. Also, the benefits of external information were explored to improve the image representation.<br>Dissertation/Thesis<br>Ph.D. Industrial Engineering 2011
APA, Harvard, Vancouver, ISO, and other styles
21

Leap, Katie. "Multiple Testing Correction with Repeated Correlated Outcomes: Applications to Epigenetics." 2017. https://scholarworks.umass.edu/masters_theses_2/559.

Full text
Abstract:
Epigenetic changes (specifically DNA methylation) have been associated with adverse health outcomes; however, unlike genetic markers that are fixed over the lifetime of an individual, methylation can change. Given that there are a large number of methylation sites, measuring them repeatedly introduces multiple testing problems beyond those that exist in a static genetic context. Using simulations of epigenetic data, we considered different methods of controlling the false discovery rate. We considered several underlying associations between an exposure and methylation over time. We found that testing each site with a linear mixed effects model and then controlling the false discovery rate (FDR) had the highest positive predictive value (PPV), a low number of false positives, and was able to differentiate between differential methylation that was present at only one time point vs. a persistent relationship. In contrast, methods that controlled FDR at a single time point and ad hoc methods tended to have lower PPV, more false positives, and/or were unable to differentiate these conditions. Validation in data obtained from Project Viva found a difference between fitting longitudinal models only to sites significant at one time point and fitting all sites longitudinally.
APA, Harvard, Vancouver, ISO, and other styles
22

Kusiak, Caroline. "Real-Time Dengue Forecasting In Thailand: A Comparison Of Penalized Regression Approaches Using Internet Search Data." 2018. https://scholarworks.umass.edu/masters_theses_2/708.

Full text
Abstract:
Dengue fever affects over 390 million people annually worldwide and is of particu- lar concern in Southeast Asia where it is one of the leading causes of hospitalization. Modeling trends in dengue occurrence can provide valuable information to Public Health officials, however many challenges arise depending on the data available. In Thailand, reporting of dengue cases is often delayed by more than 6 weeks, and a small fraction of cases may not be reported until over 11 months after they occurred. This study shows that incorporating data on Google Search trends can improve dis- ease predictions in settings with severely underreported data. We compare penalized regression approaches to seasonal baseline models and illustrate that incorporation of search data can improve prediction error. This builds on previous research show- ing that search data and recent surveillance data together can be used to create accurate forecasts for diseases such as influenza and dengue fever. This work shows that even in settings where timely surveillance data is not available, using search data in real-time can produce more accurate short-term forecasts than a seasonal baseline prediction. However, forecast accuracy degrades the further into the future the forecasts go. The relative accuracy of these forecasts compared to a seasonal average forecast varies depending on location. Overall, these data and models can improve short-term public health situational awareness and should be incorporated into larger real-time forecasting efforts.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography