To see the other types of publications on this topic, follow the link: Bioinformatics - Methodology.

Dissertations / Theses on the topic 'Bioinformatics - Methodology'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 20 dissertations / theses for your research on the topic 'Bioinformatics - Methodology.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Wu, Chao. "Intelligent Data Mining on Large-scale Heterogeneous Datasets and its Application in Computational Biology." University of Cincinnati / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1406880774.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Herai, Roberto Hirochi. "Metodologias de bioinformatica para detecção e estudo de sequencias repetitivas em loci genicos de transcritos quimericos." [s.n.], 2010. http://repositorio.unicamp.br/jspui/handle/REPOSIP/317152.

Full text
Abstract:
Orientador: Michel Eduardo Beleza Yamagishi<br>Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Biologia<br>Made available in DSpace on 2018-08-15T17:21:19Z (GMT). No. of bitstreams: 1 Herai_RobertoHirochi_D.pdf: 3625854 bytes, checksum: 3f19d10a9b0bb7f77091197cd302f66e (MD5) Previous issue date: 2010<br>Resumo: A grande quantidade de dados biológicos gerados recentemente permitiu verificar que os genomas são repletos de seqüências repetitivas (SR), como microsatélites e elementos genéticos móveis, altamente improváveis de ocorrer estatisticamente se os genomas fossem gerados a partir de uma distribuição aleatória de nucleotídeos. Tal comprovação motivou a classificação de tais seqüências e também a construção de diversas ferramentas de bioinformática, além de mecanismos de armazenamento baseados em sistemas de gerenciamento de bancos de dados (SGBD) para permitir localizá-las e armazená-las para posterior estudo. Entretanto, foi com a comprovação biológica da importância das SR, como no mecanismo de interferência por RNAi (SR reversa complementar), que as SR despertaram maior interesse por parte da comunidade científica. Atualmente, já há fortes evidências que associam as SR com fenômenos biológicos bastante interessantes, como o processamento de RNA por cis-splicing e a formação de transcritos quiméricos, freqüentes em organismos inferiores e muito raro em organismos superiores. Tais tipos de transcritos podem ser gerados a partir de trans-splicing ou, como conjecturamos nesse trabalho, pela transposição de elementos genéticos móveis (como por exemplo transposons ou retrotransposons). Em virtude disso, este projeto propõe a construção de metodologias de Bioinformática, disponibilizadas na WEB, para detectar transcritos quiméricos em genomas de organismos, tanto em versões draft ou em alta qualidade, e também estudar as SR que ocorrem no locus gênico dos transcritos envolvidos na formação de uma seqüência quimérica. As ferramentas propostas permitiram identificar, a partir de bibliotecas de transcritos de full-length cDNA, tanto de humanos quanto de bovinos, novos transcritos quiméricos provenientes de células de tecidos normais, e que não seguem splice-sites canônicos na região de fusão dos transcritos envolvidos. Além disso, as seqüências encontradas apresentam uma elevada taxa de concentração de pares de SR do tipo reverso complementar no locus gênico dos dois transcritos que formam a seqüência quimérica. As ferramentas propostas podem ser utilizadas para outros organismos e direcionar trabalhos experimentais para tentar comprovar em bancada novos transcritos quiméricos, tanto em organismos inferiores quanto em superiores<br>Abstract: The recent availability of a huge amount of biological data allowed to know about the high concentration of repetitive sequences (SR) like microsatellites and genetic mobile elements in different genomes. Repetitive sequences are improbable to occur statistically if genome data were generated by a random distribution of nucleotides. Such observation motivated the classification of repetitive sequences, and the construction of several bioinformatics tools. Furthermore, several mechanisms to store repetitive sequences, which are based on data base management systems (DBMS) were proposed and created. They can be used to search for specific sequences to make a posteriori study. However, it was with the biological confirmation of the importance of repetitive sequences, like by the RNA interference (reverse complement, or inverted repeat) mechanism, that the scientific community gained more interest by such sequences. Actually, there is strong evidence that associates the repetitive sequences with some interesting biological phenomena, like in RNA processing by cis-splicing, and in chimeric transcript formation mechanism. This last one is very frequently in inferior organism, but rare in superior organisms. Such types of transcripts can be generated by trans-splicing, or like conjectured in this work, by the retrotransposition of mobile genetic elements (like transposons or retrotransposons). In this way, this work proposed the construction of several Bioinformatics methodologies, available in the WEB, to detect new evidences of chimeric transcripts in genomes of different organisms, both in draft genome and in high quality genome assemblage. We also studied repetitive sequences in gene loci of the involved transcripts in a chimeric sequence formation. The proposed tools allowed us to identify, using a full-length cDNA databank, new chimeric transcript candidates in human and in bovine genome. They are from cells of normal tissues, and do not follow canonical splice-sites in the fusion region of the involved transcripts. Moreover, it was possible to show that the detected sequences have high concentration pairs of reverse complement type of repetitive sequences in gene loci of the two involved transcripts, which originated a new chimeric transcript candidate. The created bioinformatics tools can be used in other organisms in addition to the one used in this work, leading to the proposition of new experimental work to try to prove in vivo new chimeric transcripts, both in superior organism and in inferior organism<br>Doutorado<br>Bioinformatica<br>Doutor em Genetica e Biologia Molecular
APA, Harvard, Vancouver, ISO, and other styles
3

Zhuang, Jiali. "Structural Variation Discovery and Genotyping from Whole Genome Sequencing: Methodology and Applications: A Dissertation." eScholarship@UMMS, 2009. http://escholarship.umassmed.edu/gsbs_diss/875.

Full text
Abstract:
A comprehensive understanding about how genetic variants and mutations contribute to phenotypic variations and alterations entails experimental technologies and analytical methodologies that are able to detect genetic variants/mutations from various biological samples in a timely and accurate manner. High-throughput sequencing technology represents the latest achievement in a series of efforts to facilitate genetic variants discovery and genotyping and promises to transform the way we tackle healthcare and biomedical problems. The tremendous amount of data generated by this new technology, however, needs to be processed and analyzed in an accurate and efficient way in order to fully harness its potential. Structural variation (SV) encompasses a wide range of genetic variations with different sizes and generated by diverse mechanisms. Due to the technical difficulties of reliably detecting SVs, their characterization lags behind that of SNPs and indels. In this dissertation I presented two novel computational methods: one for detecting transposable element (TE) transpositions and the other for detecting SVs in general using a local assembly approach. Both methods are able to pinpoint breakpoint junctions at single-nucleotide resolution and estimate variant allele frequencies in the sample. I also applied those methods to study the impact of TE transpositions on the genomic stability, the inheritance patterns of TE insertions in the population and the molecular mechanisms and potential functional consequences of somatic SVs in cancer genomes.
APA, Harvard, Vancouver, ISO, and other styles
4

Zhuang, Jiali. "Structural Variation Discovery and Genotyping from Whole Genome Sequencing: Methodology and Applications: A Dissertation." eScholarship@UMMS, 2015. https://escholarship.umassmed.edu/gsbs_diss/875.

Full text
Abstract:
A comprehensive understanding about how genetic variants and mutations contribute to phenotypic variations and alterations entails experimental technologies and analytical methodologies that are able to detect genetic variants/mutations from various biological samples in a timely and accurate manner. High-throughput sequencing technology represents the latest achievement in a series of efforts to facilitate genetic variants discovery and genotyping and promises to transform the way we tackle healthcare and biomedical problems. The tremendous amount of data generated by this new technology, however, needs to be processed and analyzed in an accurate and efficient way in order to fully harness its potential. Structural variation (SV) encompasses a wide range of genetic variations with different sizes and generated by diverse mechanisms. Due to the technical difficulties of reliably detecting SVs, their characterization lags behind that of SNPs and indels. In this dissertation I presented two novel computational methods: one for detecting transposable element (TE) transpositions and the other for detecting SVs in general using a local assembly approach. Both methods are able to pinpoint breakpoint junctions at single-nucleotide resolution and estimate variant allele frequencies in the sample. I also applied those methods to study the impact of TE transpositions on the genomic stability, the inheritance patterns of TE insertions in the population and the molecular mechanisms and potential functional consequences of somatic SVs in cancer genomes.
APA, Harvard, Vancouver, ISO, and other styles
5

Yngman, Gunnar. "Individualization of fixed-dose combination regimens : Methodology and application to pediatric tuberculosis." Thesis, Uppsala universitet, Institutionen för farmaceutisk biovetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-242059.

Full text
Abstract:
Introduction: No Fixed-Dose Combination (FDC) formulations currently exist for pediatric tuberculosis (TB) treatment. Earlier work implemented, in the software NONMEM, a rational method for optimizing design and individualization of pediatric anti-TB FDC formulations based on patient body weight, but issues with parameter estimation, dosage strata heterogeneity and representative pharmacokinetics remained. Aim: To further develop the rational model-based methodology aiding the selection of appropriate FDC formulation designs and dosage regimens, in pediatric TB treatment. Materials and Methods: Optimization of the method with respect to the estimation of body weight breakpoints was sought. Heterogeneity of dosage groups with respect to treatment efficiency was sought to be improved. Recently published pediatric pharmacokinetic parameters were implemented and the model translated to MATLAB, where also the performance was evaluated by stochastic estimation and graphical visualization. Results: A logistic function was found better suited as an approximation of breakpoints. None of the estimation methods implemented in NONMEM were more suitable than the originally used FO method. Homogenization of dosage group treatment efficiency could not be solved. MATLAB translation was successful but required stochastic estimations and highlighted high densities of local minima. Representative pharmacokinetics were successfully implemented. Conclusions: NONMEM was found suboptimal for the task due to problems with discontinuities and heterogeneity, but a stepwise method with representative pharmacokinetics were successfully implemented. MATLAB showed more promise in the search for a method also addressing the heterogeneity issue.
APA, Harvard, Vancouver, ISO, and other styles
6

Graversen, Therese. "Statistical and computational methodology for the analysis of forensic DNA mixtures with artefacts." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:4c3bfc88-25e7-4c5b-968f-10a35f5b82b0.

Full text
Abstract:
This thesis proposes and discusses a statistical model for interpreting forensic DNA mixtures. We develop methods for estimation of model parameters and assessing the uncertainty of the estimated quantities. Further, we discuss how to interpret the mixture in terms of predicting the set of contributors. We emphasise the importance of challenging any interpretation of a particular mixture, and for this purpose we develop a set of diagnostic tools that can be used in assessing the adequacy of the model to the data at hand as well as in a systematic validation of the model on experimental data. An important feature of this work is that all methodology is developed entirely within the framework of the adopted model, ensuring a transparent and consistent analysis. To overcome the challenge that lies in handling the large state space for DNA profiles, we propose a representation of a genotype that exhibits a Markov structure. Further, we develop methods for efficient and exact computation in a Bayesian network. An implementation of the model and methodology is available through the R package DNAmixtures.
APA, Harvard, Vancouver, ISO, and other styles
7

Forest, Marie. "Simultaneous estimation of population size changes and splits times using importance sampling." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:8c067a3d-44d5-468a-beb5-34c5830998c4.

Full text
Abstract:
The genome is a treasure trove of information about the history of an individual, his population, and his species. For as long as genomic data have been available, methods have been developed to retrieve this information and learn about population history. Over the last decade, large international genomic projects (e.g. the HapMap Project and the 1000 Genomes Project) have offered access to high quality data collected from thousands of individuals from a vast number of populations. Freely available to all, these databases offer the possibility to develop new methods to uncover the history of the peopling of the world by modern humans. Due to the complexity of the problem and the large amount of available data, all developed methods either simplify the model with strong assumptions or use an approximation; they also dramatically down-sample their data by either using fewer individuals or only portions of the genome. In this thesis, we present a novel method to jointly estimate the time of divergence of a pair of populations and their variable sizes, a previously unsolved problem. The method uses multiple regions of the genome with low recombination rate. For each region, we use an importance sampler to build a large number of possible genealogies, and from those we estimate the likelihood function of parameters of interest. By modelling the population sizes as piecewise constant within fixed time intervals, we aim to capture population size variation through time. We show via simulation studies that the method performs well in many situations, even when the model assumptions are not totally met. We apply the method to five populations from the 1000 Genomes Project, obtaining estimates of split times between European groups and among Europe, Africa and Asia. We also infer shared and non-shared bottlenecks in out-of- Africa groups, expansions following population separations, and the sizes of ancestral populations further back in time.
APA, Harvard, Vancouver, ISO, and other styles
8

Liu, Wanting. "An Integrated Bioinformatics Approach for the Identification of Melanoma-Associated Biomarker Genes. A Ranking and Stratification Approach as a New Meta-Analysis Methodology for the Detection of Robust Gene Biomarker Signatures of Cancers." Thesis, University of Bradford, 2014. http://hdl.handle.net/10454/7346.

Full text
Abstract:
Genome-wide microarray technology has facilitated the systematic discovery of diagnostic biomarkers of cancers and other pathologies. However, meta-analyses of published arrays using melanoma as a test cancer has uncovered significant inconsistences that hinder advances in clinical practice. In this study a computational model for the integrated analysis of microarray datasets is proposed in order to provide a robust ranking of genes in terms of their relative significance; both genome-wide relative significance (GWRS) and genome-wide global significance (GWGS). When applied to five melanoma microarray datasets published between 2000 and 2011, a new 12-gene diagnostic biomarker signature for melanoma was defined (i.e., EGFR, FGFR2, FGFR3, IL8, PTPRF, TNC, CXCL13, COL11A1, CHP2, SHC4, PPP2R2C, and WNT4). Of these, CXCL13, COL11A1, PTPRF and SHC4 are components of the MAPK pathway and were validated by immunocyto- and immunohisto-chemistry. These proteins were found to be overexpressed in metastatic and primary melanoma cells in vitro and in melanoma tissue in situ compared to melanocytes cultured from healthy skin epidermis and normal healthy human skin. One challenge for the integrated analysis of microarray data is that the microarray data are produced using different platforms and bio-samples, e.g. including both cell line- and biopsy-based microarray datasets. In order to address these challenges, the computational model was further enhanced the stratification of datasets into either biopsy or cell line derived datasets, and via the weighting of microarray data based on quality criteria of data. The methods enhancement was applied to 14 microarray datasets of three cancers (breast, prostate, and melanoma) based on classification accuracy and on the capability to identify predictive biomarkers. Four novel measures for evaluating the capability to identify predictive biomarkers are proposed: (1) classifying independent testing data using wrapper feature selection with machine leaning, (2) assessing the number of common genes with the genes retrieved in independent testing data, (3) assessing the number of common genes with the genes retrieved in across multiple training datasets, (4) assessing the number of common genes with the genes validated in the literature. This enhancement of computational approach (i) achieved reliable classification performance across multiple datasets, (ii) recognized more significant genes into the top-ranked genes as compared to the genes detected by the independent test data, and (iii) detected more meaningful genes than were validated in previous melanoma studies in the literature.
APA, Harvard, Vancouver, ISO, and other styles
9

Czerwińska, Urszula. "Unsupervised deconvolution of bulk omics profiles : methodology and application to characterize the immune landscape in tumors Determining the optimal number of independent components for reproducible transcriptomic data analysis Application of independent component analysis to tumor transcriptomes reveals specific and reproducible immune-related signals A multiscale signalling network map of innate immune response in cancer reveals signatures of cell heterogeneity and functional polarization." Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCB075.

Full text
Abstract:
Les tumeurs sont entourées d'un microenvironnement complexe comprenant des cellules tumorales, des fibroblastes et une diversité de cellules immunitaires. Avec le développement actuel des immunothérapies, la compréhension de la composition du microenvironnement tumoral est d'une importance critique pour effectuer un pronostic sur la progression tumorale et sa réponse au traitement. Cependant, nous manquons d'approches quantitatives fiables et validées pour caractériser le microenvironnement tumoral, facilitant ainsi le choix de la meilleure thérapie. Une partie de ce défi consiste à quantifier la composition cellulaire d'un échantillon tumoral (appelé problème de déconvolution dans ce contexte), en utilisant son profil omique de masse (le profil quantitatif global de certains types de molécules, tels que l'ARNm ou les marqueurs épigénétiques). La plupart des méthodes existantes utilisent des signatures prédéfinies de types cellulaires et ensuite extrapolent cette information à des nouveaux contextes. Cela peut introduire un biais dans la quantification de microenvironnement tumoral dans les situations où le contexte étudié est significativement différent de la référence. Sous certaines conditions, il est possible de séparer des mélanges de signaux complexes, en utilisant des méthodes de séparation de sources et de réduction des dimensions, sans définitions de sources préexistantes. Si une telle approche (déconvolution non supervisée) peut être appliquée à des profils omiques de masse de tumeurs, cela permettrait d'éviter les biais contextuels mentionnés précédemment et fournirait un aperçu des signatures cellulaires spécifiques au contexte. Dans ce travail, j'ai développé une nouvelle méthode appelée DeconICA (Déconvolution de données omiques de masse par l'analyse en composantes immunitaires), basée sur la méthodologie de séparation aveugle de source. DeconICA a pour but l'interprétation et la quantification des signaux biologiques, façonnant les profils omiques d'échantillons tumoraux ou de tissus normaux, en mettant l'accent sur les signaux liés au système immunitaire et la découverte de nouvelles signatures. Afin de rendre mon travail plus accessible, j'ai implémenté la méthode DeconICA en tant que librairie R. En appliquant ce logiciel aux jeux de données de référence, j'ai démontré qu'il est possible de quantifier les cellules immunitaires avec une précision comparable aux méthodes de pointe publiées, sans définir a priori des gènes spécifiques au type cellulaire. DeconICA peut fonctionner avec des techniques de factorisation matricielle telles que l'analyse indépendante des composants (ICA) ou la factorisation matricielle non négative (NMF). Enfin, j'ai appliqué DeconICA à un grand volume de données : plus de 100 jeux de données, contenant au total plus de 28 000 échantillons de 40 types de tumeurs, générés par différentes technologies et traités indépendamment. Cette analyse a démontré que les signaux immunitaires basés sur l'ICA sont reproductibles entre les différents jeux de données. D'autre part, nous avons montré que les trois principaux types de cellules immunitaires, à savoir les lymphocytes T, les lymphocytes B et les cellules myéloïdes, peuvent y être identifiés et quantifiés. Enfin, les métagènes dérivés de l'ICA, c'est-à-dire les valeurs de projection associées à une source, ont été utilisés comme des signatures spécifiques permettant d'étudier les caractéristiques des cellules immunitaires dans différents types de tumeurs. L'analyse a révélé une grande diversité de phénotypes cellulaires identifiés ainsi que la plasticité des cellules immunitaires, qu'elle soit dépendante ou indépendante du type de tumeur. Ces résultats pourraient être utilisés pour identifier des cibles médicamenteuses ou des biomarqueurs pour l'immunothérapie du cancer<br>Tumors are engulfed in a complex microenvironment (TME) including tumor cells, fibroblasts, and a diversity of immune cells. Currently, a new generation of cancer therapies based on modulation of the immune system response is in active clinical development with first promising results. Therefore, understanding the composition of TME in each tumor case is critically important to make a prognosis on the tumor progression and its response to treatment. However, we lack reliable and validated quantitative approaches to characterize the TME in order to facilitate the choice of the best existing therapy. One part of this challenge is to be able to quantify the cellular composition of a tumor sample (called deconvolution problem in this context), using its bulk omics profile (global quantitative profiling of certain types of molecules, such as mRNA or epigenetic markers). In recent years, there was a remarkable explosion in the number of methods approaching this problem in several different ways. Most of them use pre-defined molecular signatures of specific cell types and extrapolate this information to previously unseen contexts. This can bias the TME quantification in those situations where the context under study is significantly different from the reference. In theory, under certain assumptions, it is possible to separate complex signal mixtures, using classical and advanced methods of source separation and dimension reduction, without pre-existing source definitions. If such an approach (unsupervised deconvolution) is feasible to apply for bulk omic profiles of tumor samples, then this would make it possible to avoid the above mentioned contextual biases and provide insights into the context-specific signatures of cell types. In this work, I developed a new method called DeconICA (Deconvolution of bulk omics datasets through Immune Component Analysis), based on the blind source separation methodology. DeconICA has an aim to decipher and quantify the biological signals shaping omics profiles of tumor samples or normal tissues. A particular focus of my study was on the immune system-related signals and discovering new signatures of immune cell types. In order to make my work more accessible, I implemented the DeconICA method as an R package named "DeconICA". By applying this software to the standard benchmark datasets, I demonstrated that DeconICA is able to quantify immune cells with accuracy comparable to published state-of-the-art methods but without a priori defining a cell type-specific signature genes. The implementation can work with existing deconvolution methods based on matrix factorization techniques such as Independent Component Analysis (ICA) or Non-Negative Matrix Factorization (NMF). Finally, I applied DeconICA to a big corpus of data containing more than 100 transcriptomic datasets composed of, in total, over 28000 samples of 40 tumor types generated by different technologies and processed independently. This analysis demonstrated that ICA-based immune signals are reproducible between datasets and three major immune cell types: T-cells, B-cells and Myeloid cells can be reliably identified and quantified. Additionally, I used the ICA-derived metagenes as context-specific signatures in order to study the characteristics of immune cells in different tumor types. The analysis revealed a large diversity and plasticity of immune cells dependent and independent on tumor type. Some conclusions of the study can be helpful in identification of new drug targets or biomarkers for immunotherapy of cancer
APA, Harvard, Vancouver, ISO, and other styles
10

Prabhu, Snehit. "Computational Contributions Towards Scalable and Efficient Genome-wide Association Methodology." Thesis, 2013. https://doi.org/10.7916/D8R78NF0.

Full text
Abstract:
Genome-wide association studies are experiments designed to find the genetic bases of physical traits: for example, markers correlated with disease status by comparing the DNA of healthy individuals to the DNA of affecteds. Over the past two decades, an exponential increase in the resolution of DNA-testing technology coupled with a substantial drop in their cost have allowed us to amass huge and potentially invaluable datasets to conduct such comparative studies. For many common diseases, datasets as large as a hundred thousand individuals exist, each tested at million(s) of markers (called SNPs) across the genome. Despite this treasure trove, so far only a small fraction of the genetic markers underlying most common diseases have been identified. Simply stated - our ability to predict phenotype (disease status) from a person's genetic constitution is still very limited today, even for traits that we know to be heritable from one's parents (e.g. height, diabetes, cardiac health). As a result, genetics today often lags far behind conventional indicators like family history of disease in terms of its predictive power. To borrow a popular metaphor from astronomy, this veritable "dark matter" of perceivable but un-locatable genetic signal has come to be known as missing heritability. This thesis will present my research contributions in two hotly pursued scientific hypotheses that aim to close this gap: (1) gene-gene interactions, and (2) ultra-rare genetic variants - both of which are not yet widely tested. First, I will discuss the challenges that have made interaction testing difficult, and present a novel approximate statistic to measure interaction. This statistic can be exploited in a Monte-Carlo like randomization scheme, making an exhaustive search through trillions of potential interactions tractable using ordinary desktop computers. A software implementation of our algorithm found a reproducible interaction between SNPs in two calcium channel genes in Bipolar Disorder. Next, I will discuss the functional enrichment pipeline we subsequently developed to identify sets of interacting genes underlying this disease. Lastly, I will talk about the application of coding theory to cost-efficient measurement of ultra-rare genetic variation (sometimes, as rare as just one individual carrying the mutation in the entire population).
APA, Harvard, Vancouver, ISO, and other styles
11

"Comparative Genomics and Novel Bioinformatics Methodology Applied to the Green Anole Reveal Unique Sex Chromosome Evolution." Master's thesis, 2016. http://hdl.handle.net/2286/R.I.40699.

Full text
Abstract:
abstract: In species with highly heteromorphic sex chromosomes, the degradation of one of the sex chromosomes can result in unequal gene expression between the sexes (e.g., between XX females and XY males) and between the sex chromosomes and the autosomes. Dosage compensation is a process whereby genes on the sex chromosomes achieve equal gene expression which prevents deleterious side effects from having too much or too little expression of genes on sex chromsomes. The green anole is part of a group of species that recently underwent an adaptive radiation. The green anole has XX/XY sex determination, but the content of the X chromosome and its evolution have not been described. Given its status as a model species, better understanding the green anole genome could reveal insights into other species. Genomic analyses are crucial for a comprehensive picture of sex chromosome differentiation and dosage compensation, in addition to understanding speciation. In order to address this, multiple comparative genomics and bioinformatics analyses were conducted to elucidate patterns of evolution in the green anole and across multiple anole species. Comparative genomics analyses were used to infer additional X-linked loci in the green anole, RNAseq data from male and female samples were anayzed to quantify patterns of sex-biased gene expression across the genome, and the extent of dosage compensation on the anole X chromosome was characterized, providing evidence that the sex chromosomes in the green anole are dosage compensated. In addition, X-linked genes have a lower ratio of nonsynonymous to synonymous substitution rates than the autosomes when compared to other Anolis species, and pairwise rates of evolution in genes across the anole genome were analyzed. To conduct this analysis a new pipeline was created for filtering alignments and performing batch calculations for whole genome coding sequences. This pipeline has been made publicly available.<br>Dissertation/Thesis<br>Masters Thesis Biology 2016
APA, Harvard, Vancouver, ISO, and other styles
12

Lee, Pei-Chen, and 李佩真. "Application of Stochastic Optimization Methodology to Bioinformatics -- A Case Study on Applying Ant Colony Optimization to the Shortest Superstring problem." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/8cc5d6.

Full text
Abstract:
碩士<br>國立臺北科技大學<br>工業工程與管理研究所<br>95<br>Bioinformatics has received wide attention in recent years. It is interesting to see how stochastic optimization methodologies such as genetic algorithm, simulated annealing and ant colony optimization, that can be applied to solve problems in bioinformatics. Among many research problems in bioinformatics, the shortest superstring problem has wide applications in many research areas, such as DNA sequencing and data compression. However, the problem is NP-hard and difficult to solve efficiently. In the literature, the ant colony optimization algorithm has been reported to be successfully applied to many combinatorial problems, such as the traveling salesperson problem and the assignment problem. In this paper, we describe the use of the ant colony optimization algorithm to solve the shortest superstring problem, which highlights a way for applying stochastic optimization methodologies to solve problem in bioinformatics.
APA, Harvard, Vancouver, ISO, and other styles
13

Levitin, Hanna M. "Biological Inference from Single Cell RNA-Sequencing." Thesis, 2020. https://doi.org/10.7916/d8-arqe-4159.

Full text
Abstract:
Tissues are heterogeneous communities of cells that work together to achieve a higher-order function. Large-scale single cell RNA-sequencing (scRNA-seq) offers an unprecedented opportunity to systematically map the transcriptional programs underlying this diversity. However, extracting biological signal from noisy, high-dimensional scRNA-seq data requires carefully designed, statistically robust methodology that makes appropriate assumptions both for the data and for the biological question of interest. This thesis explores computational approaches to finding biological signal in scRNA-seq datasets. Chapter 2 focuses on preprocessing and cell-centric approaches to downstream analysis that have become a mainstay of analytical pipelines for scRNA-seq, and includes dissections of lineage diversity in high grade glioma and in the largest neural stem cell niche in the adult mouse brain. Notably, the former study suggests that heterogeneity in high grade glioma arises from at least two distinct biological processes: aberrant neural development and mesenchymal transformation. Chapter 3 presents a flexible approach for de novo discovery of gene expression programs without an a priori structure across cells, revealing subtle properties of a spatially sampled high grade glioma that would not have been apparent with previous approaches. Chapter 4 leverages our prior work and a unique tissue resource to build a unified reference map of human T cell functional states across tissues and ages. We discover and validate a novel pan-T cell activation marker and a previously undescribed kinetic intermediate in CD4+ T cell activation. Finally, ongoing work defines key programs of gene expression in tissue-associated T cells in infants and adults and predicts their candidate regulators.
APA, Harvard, Vancouver, ISO, and other styles
14

Suphavilai, Chayaporn. "Computational development of regulatory gene set networks for systems biology applications." Thesis, 2014. http://hdl.handle.net/1805/6163.

Full text
Abstract:
Indiana University-Purdue University Indianapolis (IUPUI)<br>In systems biology study, biological networks were used to gain insights into biological systems. While the traditional approach to studying biological networks is based on the identification of interactions among genes or the identification of a gene set ranking according to differentially expressed gene lists, little is known about interactions between higher order biological systems, a network of gene sets. Several types of gene set network have been proposed including co-membership, linkage, and co-enrichment human gene set networks. However, to our knowledge, none of them contains directionality information. Therefore, in this study we proposed a method to construct a regulatory gene set network, a directed network, which reveals novel relationships among gene sets. A regulatory gene set network was constructed by using publicly available gene regulation data. A directed edge in regulatory gene set networks represents a regulatory relationship from one gene set to the other gene set. A regulatory gene set network was compared with another type of gene set network to show that the regulatory network provides additional information. In order to show that a regulatory gene set network is useful for understand the underlying mechanism of a disease, an Alzheimer's disease (AD) regulatory gene set network was constructed. In addition, we developed Pathway and Annotated Gene-set Electronic Repository (PAGER), an online systems biology tool for constructing and visualizing gene and gene set networks from multiple gene set collections. PAGER is available at http://discern.uits.iu.edu:8340/PAGER/. Global regulatory and global co-membership gene set networks were pre-computed. PAGER contains 166,489 gene sets, 92,108,741 co-membership edges, 697,221,810 regulatory edges, 44,188 genes, 651,586 unique gene regulations, and 650,160 unique gene interactions. PAGER provided several unique features including constructing regulatory gene set networks, generating expanded gene set networks, and constructing gene networks within a gene set. However, tissue specific or disease specific information was not considered in the disease specific network constructing process, so it might not have high accuracy of presenting the high level relationship among gene sets in the disease context. Therefore, our framework can be improved by collecting higher resolution data, such as tissue specific and disease specific gene regulations and gene sets. In addition, experimental gene expression data can be applied to add more information to the gene set network. For the current version of PAGER, the size of gene and gene set networks are limited to 100 nodes due to browser memory constraint. Our future plans is integrating internal gene or proteins interactions inside pathways in order to support future systems biology study.
APA, Harvard, Vancouver, ISO, and other styles
15

Chen, Ko-Fan, and 陳克帆. "Whole Genome Search of Candidate Hypoxia Response Genes by Bioinformatic Methodology." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/70822334854372391625.

Full text
Abstract:
碩士<br>國立成功大學<br>生理學研究所<br>93<br>Hypoxia is the reduction of environmental oxygen In absent of oxygen, hypoxia inducible factor-1 a (HIF-1a) dimerizes with HIF-1b and binds to the hypoxia response element (HRE) on the target DNA sequence. HIF-1a regulated genes have been found to be involved in cell proliferation, angiogenesis, glycolysis, apoptosis, and tumor formation. The HRE with a short core sequence “RCGTG” is necessary but not sufficient to be bound by HIF-1a. The flanking region also determines the binding activity. Accordingly, the 20 well known HREs were retrieved, aligned, and built up a hidden Markov model based HRE profile. The HMM-based HRE profile was used to search candidate HRE on the promoter region of human and mouse genes. 8170 human genes and 6477 mouse genes were identified by the cutoff score -1.8. About one-third of putative these HIF-1a regulated genes are conserved between human and mouse genome. The expression profiles of randomly picked fifty genes were investigated at various time points after DFO mimic hypoxia treatment. The regulation rate of the genes with positive score is 91%. This indicates that about 2500 human genes and 1600 mouse genes could be regulated by HIF-1a. In analysis of regulation pattern the candidate genes were regulated consistently among different cells or were specifically expressed and/ or regulated in one cell. For the time course analysis, the genes regulated by hypoxia can be further classified into one of early, delay, or biphasic category. The regulation patterns are similar in hypoxia and DFO treatment suggesting that DFO is a proper hypoxia mimetic. By detecting intra nuclear HIF-1a protein and in vivo binding of HIF-1a on the candidate HRE, it was demonstrated the altered RNA expression in candidate genes under chemical or true hypoxia is correlated with nuclear HIF-1a protein level and the binding activity. Put all together, this study demonstrated a high throughput screening and verification approach in understanding the whole picture of gene regulation mediated by hypoxia.
APA, Harvard, Vancouver, ISO, and other styles
16

Pandit, Yogesh. "Context specific text mining for annotating protein interactions with experimental evidence." Thesis, 2014. http://hdl.handle.net/1805/3809.

Full text
Abstract:
Indiana University-Purdue University Indianapolis (IUPUI)<br>Proteins are the building blocks in a biological system. They interact with other proteins to make unique biological phenomenon. Protein-protein interactions play a valuable role in understanding the molecular mechanisms occurring in any biological system. Protein interaction databases are a rich source on protein interaction related information. They gather large amounts of information from published literature to enrich their data. Expert curators put in most of these efforts manually. The amount of accessible and publicly available literature is growing very rapidly. Manual annotation is a time consuming process. And with the rate at which available information is growing, it cannot be dealt with only manual curation. There need to be tools to process this huge amounts of data to bring out valuable gist than can help curators proceed faster. In case of extracting protein-protein interaction evidences from literature, just a mere mention of a certain protein by look-up approaches cannot help validate the interaction. Supporting protein interaction information with experimental evidence can help this cause. In this study, we are applying machine learning based classification techniques to classify and given protein interaction related document into an interaction detection method. We use biological attributes and experimental factors, different combination of which define any particular interaction detection method. Then using predicted detection methods, proteins identified using named entity recognition techniques and decomposing the parts-of-speech composition we search for sentences with experimental evidence for a protein-protein interaction. We report an accuracy of 75.1% with a F-score of 47.6% on a dataset containing 2035 training documents and 300 test documents.
APA, Harvard, Vancouver, ISO, and other styles
17

Desai, Akshay A. "Data analysis and creation of epigenetics database." Thesis, 2014. http://hdl.handle.net/1805/4452.

Full text
Abstract:
Indiana University-Purdue University Indianapolis (IUPUI)<br>This thesis is aimed at creating a pipeline for analyzing DNA methylation epigenetics data and creating a data model structured well enough to store the analysis results of the pipeline. In addition to storing the results, the model is also designed to hold information which will help researchers to decipher a meaningful epigenetics sense from the results made available. Current major epigenetics resources such as PubMeth, MethyCancer, MethDB and NCBI’s Epigenomics database fail to provide holistic view of epigenetics. They provide datasets produced from different analysis techniques which raises an important issue of data integration. The resources also fail to include numerous factors defining the epigenetic nature of a gene. Some of the resources are also struggling to keep the data stored in their databases up-to-date. This has diminished their validity and coverage of epigenetics data. In this thesis we have tackled a major branch of epigenetics: DNA methylation. As a case study to prove the effectiveness of our pipeline, we have used stage-wise DNA methylation and expression raw data for Lung adenocarcinoma (LUAD) from TCGA data repository. The pipeline helped us to identify progressive methylation patterns across different stages of LUAD. It also identified some key targets which have a potential for being a drug target. Along with the results from methylation data analysis pipeline we combined data from various online data reserves such as KEGG database, GO database, UCSC database and BioGRID database which helped us to overcome the shortcomings of existing data collections and present a resource as complete solution for studying DNA methylation epigenetics data.
APA, Harvard, Vancouver, ISO, and other styles
18

Kusiak, Caroline. "Real-Time Dengue Forecasting In Thailand: A Comparison Of Penalized Regression Approaches Using Internet Search Data." 2018. https://scholarworks.umass.edu/masters_theses_2/708.

Full text
Abstract:
Dengue fever affects over 390 million people annually worldwide and is of particu- lar concern in Southeast Asia where it is one of the leading causes of hospitalization. Modeling trends in dengue occurrence can provide valuable information to Public Health officials, however many challenges arise depending on the data available. In Thailand, reporting of dengue cases is often delayed by more than 6 weeks, and a small fraction of cases may not be reported until over 11 months after they occurred. This study shows that incorporating data on Google Search trends can improve dis- ease predictions in settings with severely underreported data. We compare penalized regression approaches to seasonal baseline models and illustrate that incorporation of search data can improve prediction error. This builds on previous research show- ing that search data and recent surveillance data together can be used to create accurate forecasts for diseases such as influenza and dengue fever. This work shows that even in settings where timely surveillance data is not available, using search data in real-time can produce more accurate short-term forecasts than a seasonal baseline prediction. However, forecast accuracy degrades the further into the future the forecasts go. The relative accuracy of these forecasts compared to a seasonal average forecast varies depending on location. Overall, these data and models can improve short-term public health situational awareness and should be incorporated into larger real-time forecasting efforts.
APA, Harvard, Vancouver, ISO, and other styles
19

Li, Pin. "Effects of carbon nanotubes on airway epithelial cells and model lipid bilayers : proteomic and biophysical studies." Thesis, 2014. http://hdl.handle.net/1805/5968.

Full text
Abstract:
Indiana University-Purdue University Indianapolis (IUPUI)<br>Carbon nanomaterials are widely produced and used in industry, medicine and scientific research. To examine the impact of exposure to nanoparticles on human health, the human airway epithelial cell line, Calu-3, was used to evaluate changes in the cellular proteome that could account for alterations in cellular function of airway epithelia after 24 h exposure to 10 μg/mL and 100 ng/mL of two common carbon nanoparticles, singleand multi-wall carbon nanotubes (SWCNT, MWCNT). After exposure to the nanoparticles, label-free quantitative mass spectrometry (LFQMS) was used to study differential protein expression. Ingenuity Pathway Analysis (IPA) was used to conduct a bioinformatics analysis of proteins identified by LFQMS. Interestingly, after exposure to a high concentration (10 μg/mL; 0.4 μg/cm2) of MWCNT or SWCNT, only 8 and 13 proteins, respectively, exhibited changes in abundance. In contrast, the abundance of hundreds of proteins was altered in response to a low concentration (100 ng/mL; 4 ng/cm2) of either CNT. Of the 281 and 282 proteins that were significantly altered in response to MWCNT or SWCNT, respectively, 231 proteins were the same. Bioinformatic analyses found that the proteins common to both kinds of nanotubes are associated with the cellular functions of cell death and survival, cell-to-cell signaling and interaction, cellular assembly and organization, cellular growth and proliferation, infectious disease, molecular transport and protein synthesis. The decrease in expression of the majority proteins suggests a general stress response to protect cells. The STRING database was used to analyze the various functional protein networks. Interestingly, some proteins like cadherin 1 (CDH1), signal transducer and activator of transcription 1 (STAT1), junction plakoglobin (JUP), and apoptosis-associated speck-like protein containing a CARD (PYCARD), appear in several functional categories and tend to be in the center of the networks. This central positioning suggests they may play important roles in multiple cellular functions and activities that are altered in response to carbon nanotube exposure. To examine the effect of nanotubes on the plasma membrane, we investigated the interaction of short purified MWCNT with model lipid membranes using a planar bilayer workstation. Bilayer lipid membranes were synthesized using neutral 1, 2-diphytanoylsn-glycero-3-phosphocholine (DPhPC) in 1 M KCl. The ion channel model protein, Gramicidin A (gA), was incorporated into the bilayers and used to measure the effect of MWCNT on ion transport. The opening and closing of ion channels, amplitude of current, and open probability and lifetime of ion channels were measured and analyzed by Clampfit. The presence of an intermediate concentration of MWCNT (2 μg/ml) could be related to a statistically significant decrease of the open probability and lifetime of gA channels. The proteomic studies revealed changes in response to CNT exposure. An analysis of the changes using multiple databases revealed alterations in pathways, which were consistent with the physiological changes that were observed in cultured cells exposed to very low concentrations of CNT. The physiological changes included the break down of the barrier function and the inhibition of the mucocillary clearance, both of which could increase the risk of CNT’s toxicity to human health. The biophysical studies indicate MWCNTs have an effect on single channel kinetics of Gramicidin A model cation channel. These changes are consistent with the inhibitory effect of nanoparticles on hormone stimulated transepithelial ion flux, but additional experiments will be necessary to substantiate this correlation.
APA, Harvard, Vancouver, ISO, and other styles
20

Andere, Anne A. "De novo genome assembly of the blow fly Phormia regina (Diptera: Calliphoridae)." Thesis, 2014. http://hdl.handle.net/1805/5630.

Full text
Abstract:
Indiana University-Purdue University Indianapolis (IUPUI)<br>Phormia regina (Meigen), commonly known as the black blow fly is a dipteran that belongs to the family Calliphoridae. Calliphorids play an important role in various research fields including ecology, medical studies, veterinary and forensic sciences. P. regina, a non-model organism, is one of the most common forensically relevant insects in North America and is typically used to assist in estimating postmortem intervals (PMI). To better understand the roles P. regina plays in the numerous research fields, we re-constructed its genome using next generation sequencing technologies. The focus was on generating a reference genome through de novo assembly of high-throughput short read sequences. Following assembly, genetic markers were identified in the form of microsatellites and single nucleotide polymorphisms (SNPs) to aid in future population genetic surveys of P. regina. A total 530 million 100 bp paired-end reads were obtained from five pooled male and female P. regina flies using the Illumina HiSeq2000 sequencing platform. A 524 Mbp draft genome was assembled using both sexes with 11,037 predicted genes. The draft reference genome assembled from this study provides an important resource for investigating the genetic diversity that exists between and among blow fly species; and empowers the understanding of their genetic basis in terms of adaptations, population structure and evolution. The genomic tools will facilitate the analysis of genome-wide studies using modern genomic techniques to boost a refined understanding of the evolutionary processes underlying genomic evolution between blow flies and other insect species.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography