Log in

Relevant bibliographies by topics / Bioinformatics analysis / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Bioinformatics analysis.

Dissertations / Theses on the topic 'Bioinformatics analysis'

Author: Grafiati

Published: 4 June 2021

Last updated: 18 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Bioinformatics analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Snøve, Jr Ola. "Hardware-accelerated analysis of non-protein-coding RNAs." Doctoral thesis, Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, 2005. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-713.

Full text

Abstract:

<p>A tremendous amount of genomic sequence data of relatively high quality has become publicly available due to the human genome sequencing projects that were completed a few years ago. Despite considerable efforts, we do not yet know everything that is to know about the various parts of the genome, what all the regions code for, and how their gene products contribute in the myriad of biological processes that are performed within the cells. New high-performance methods are needed to extract knowledge from this vast amount of information.</p><p>Furthermore, the traditional view that DNA codes for RNA that codes for protein, which is known as the central dogma of molecular biology, seems to be only part of the story. The discovery of many non-proteincoding gene families with housekeeping and regulatory functions brings an entirely new perspective to molecular biology. Also, sequence analysis of the new gene families require new methods, as there are significant differences between protein-coding and non-protein-coding genes.</p><p>This work describes a new search processor that can search for complex patterns in sequence data for which no efficient lookup-index is known. When several chips are mounted on search cards that are fitted into PCs in a small cluster configuration, the system’s performance is orders of magnitude higher than that of comparable solutions for selected applications. The applications treated in this work fall into two main categories, namely pattern screening and data mining, and both take advantage of the search capacity of the cluster to achieve adequate performance. Specifically, the thesis describes an interactive system for exploration of all types of genomic sequence data. Moreover, a genetic programming-based data mining system finds classifiers that consist of potentially complex patterns that are characteristic for groups of sequences. The screening and mining capacity has been used to develop an algorithm for identification of new non-protein-coding genes in bacteria; a system for rational design of effective and specific short interfering RNA for sequence-specific silencing of protein-coding genes; and an improved algorithmic step for identification of new regulatory targets for the microRNA family of non-protein-coding genes.</p><br>Paper V, VI, and VII are reprinted with kind permision of Elsevier, sciencedirect.com

APA, Harvard, Vancouver, ISO, and other styles

2

Petty, Emma Marie. "Shape analysis in bioinformatics." Thesis, University of Leeds, 2009. http://etheses.whiterose.ac.uk/822/.

Full text

Abstract:

In this thesis we explore two main themes, both of which involve proteins. The first area of research focuses on the analyses of proteins displayed as spots on 2-dimensional planes. The second area of research focuses on a specific protein and how interactions with this protein can naturally prevent or, in the presence of a pesticide, cause toxicity. The first area of research builds on previously developed EM methodology to infer the matching and transformation necessary to superimpose two partially labelled point configurations, focusing on the application to 2D protein images. We modify the methodology to account for the possibility of missing and misallocated markers, where markers make up the labelled proteins manually located across images. We provide a way to account for the likelihood of an increased edge variance within protein images. We find that slight marker misallocations do not greatly influence the final output superimposition when considering data simulated to mimic the given dataset. The methodology is also successfully used to automatically locate and remove a grossly misallocated marker within the given dataset before further analyses is carried out. We develop a method to create a union of replicate images, which can then be used alone in further analyses to reduce computational expense. We describe how the data can be modelled to enable the inference on the quality of a dataset, a property often overlooked in protein image analysis. To complete this line of research we provide a method to rank points that are likely to be present in one group of images but absent in a second group. The produced score is used to highlight the proteins that are not present in both image sets representing control or diseased tissue, therefore providing biological indicators which are vitally important to improve the accuracy of diagnosis. In the second area of research, we test the hypothesis that pesticide toxicity is related to the shape similarity between the pesticide molecule itself and the natural ligand of the protein to which a pesticide will bind (and ultimately cause toxicity). A ligand of aprotein is simply a small molecule that will bind to that protein. It seems intuitive that the similarities between a naturally formed ligand and a synthetically developed ligand (the pesticide) may be an indicator of how well a pesticide and the protein bind, as well as provide an indicator of pesticide toxicity. A graphical matching algorithm is used to infer the atomic matches across ligands, with Procrustes methodology providing the final superimposition before a measure of shape similarity is defined considering the aligned molecules. We find evidence that the measure of shape similarity does provide a significant indicator of the associated pesticide toxicity, as well as providing a more significant indicator than previously found biological indicators. Previous research has found that the properties of a molecule in its bioactive form are more suitable indicators of an associated activity. Here, these findings dictate that the docked conformation of a pesticide within the protein will provide more accurate indicators of the associated toxicity. So next we use a docking program to predict the docked conformation of a pesticide. We provide a technique to calculate the similarity between the docks of both the pesticide and the natural ligand. A similar technique is used to provide a measure for the closeness of fit between a pesticide and the protein. Both measures are then considered as independent variables for the prediction of toxicity. In this case the results show potential for the calculated variables to be useful toxicity predictors, though further analysis is necessary to properly explore their significance.

APA, Harvard, Vancouver, ISO, and other styles

3

Wakadkar, Sachin. "Analysis of transmembrane and globular protein depending on their solvent energy." Thesis, University of Skövde, School of Life Sciences, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-2971.

Full text

Abstract:

<p>The number of experimentally determined protein structures in the protein data bank (PDB) is continuously increasing. The common features like; cellular location, function, topology, primary structure, secondary structure, tertiary structure, domains or fold are used to classify them. Therefore, there are various methods available for classification of proteins. In this work we are attempting an additional method for making appropriate classification, i.e. solvent energy. Solvation is one of the most important properties of macromolecules and biological membranes by which they remain stabilized in different environments. The energy required for solvation can be measured in term of solvent energy. Proteins from similar environments are investigated for similar solvent energy. That is, the solvent energy can be used as a measure to analyze and classify proteins. In this project solvent energy of proteins present in the Protein Data Bank (PDB) was calculated by using Jones’ algorithm. The proteins were classified into two classes; transmembrane and globular. The results of statistical analysis showed that the values of solvent energy obtained for two main classes (globular and transmebrane) were from different sets of populations. Thus, by adopting classification based on solvent energy will definitely help for prediction of cellular placement.</p><p> </p>

APA, Harvard, Vancouver, ISO, and other styles

4

Huque, Enamul. "Shape Analysis and Measurement for the HeLa cell classification of cultured cells in high throughput screening." Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-27.

Full text

Abstract:

<p>Feature extraction by digital image analysis and cell classification is an important task for cell culture automation. In High Throughput Screening (HTS) where thousands of data points are generated and processed at once, features will be extracted and cells will be classified to make a decision whether the cell-culture is going on smoothly or not. The culture is restarted if a problem is detected. In this thesis project HeLa cells, which are human epithelial cancer cells, are selected for the experiment. The purpose is to classify two types of HeLa cells in culture: Cells in cleavage that are round floating cells (stressed or dead cells are also round and floating) and another is, normal growing cells that are attached to the substrate. As the number of cells in cleavage will always be smaller than the number of cells which are growing normally and attached to the substrate, the cell-count of attached cells should be higher than the round cells. There are five different HeLa cell images that are used. For each image, every single cell is obtained by image segmentation and isolation. Different mathematical features are found for each cell. The feature set for this experiment is chosen in such a way that features are robust, discriminative and have good generalisation quality for classification. Almost all the features presented in this thesis are rotation, translation and scale invariant so that they are expected to perform well in discriminating objects or cells by any classification algorithm. There are some new features added which are believed to improve the classification result. The feature set is considerably broad rather than in contrast with the restricted sets which have been used in previous work. These features are used based on a common interface so that the library can be extended and integrated into other applications. These features are fed into a machine learning algorithm called Linear Discriminant Analysis (LDA) for classification. Cells are then classified as ‘Cells attached to the substrate’ or Cell Class A and ‘Cells in cleavage’ or Cell Class B. LDA considers features by leaving and adding shape features for increased performance. On average there is higher than ninety five percent accuracy obtained in the classification result which is validated by visual classification.</p>

APA, Harvard, Vancouver, ISO, and other styles

5

Chawade, Aakash. "Inferring Gene Regulatory Networks in Cold-Acclimated Plants by Combinatorial Analysis of mRNA Expression Levels and Promoter Regions." Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-20.

Full text

Abstract:

<p>Understanding the cold acclimation process in plants may help us develop genetically engineered plants that are resistant to cold. The key factor in understanding this process is to study the genes and thus the gene regulatory network that is involved in the cold acclimation process. Most of the existing approaches1-8 in deriving regulatory networks rely only on the gene expression data. Since the expression data is usually noisy and sparse the networks generated by these approaches are usually incoherent and incomplete. Hence a new approach is proposed here that analyzes the promoter regions along with the expression data in inferring the regulatory networks. In this approach genes are grouped into sets if they contain similar over-represented motifs or motif pairs in their promoter regions and if their expression pattern follows the expression pattern of the regulating gene. The network thus derived is evaluated using known literature evidence, functional annotations and from statistical tests.</p>

APA, Harvard, Vancouver, ISO, and other styles

6

Lythgow, Kieren. "Bioinformatics analysis of mitochondrial disease." Thesis, University of Newcastle Upon Tyne, 2011. http://hdl.handle.net/10443/1174.

Full text

Abstract:

Several bioinformatic methods have been developed to aid the identification of novel nuclear-mitochondrial genes involved in disease. Previous research has aimed to increase the sensitivity and specificity of these predictions through a combination of available techniques. This investigation shows the optimum sensitivity and specificity can be achieved by carefully selecting seven specific classifiers in combination. The results also show that increasing the number of classifiers even further can paradoxically decrease the sensitivity and specificity of a prediction. Additionally, text mining applications are playing a huge role in disease candidate gene identification providing resources for interpreting the vast quantities of biomedical literature currently available. A workflow resource was developed identifying a number of genes potentially associated with Lebers Hereditary Optic Neuropathy (LHON). This included specific orthologues in mouse displaying a potential association to LHON not annotated as such in humans. Mitochondrial DNA (mtDNA) fragments have been transferred to the human nuclear genome over evolutionary time. These insertions were compared to an existing database of 263 mtDNA deletions to highlight any associated mechanisms governing DNA loss from mitochondria. Flanking regions were also screened within the nuclear genome that surrounded these insertions for transposable elements, GC content and mitochondrial genes. No obvious association was found relating NUMTs to mtDNA deletions. NUMTs do not appear to be distributed throughout the genome via transposition and integrate predominantly in areas of low %GC with low gene content. These areas also lacked evidence of an elevated number of surrounding nuclear-mitochondrial genes but a further genome-wide study is required.

APA, Harvard, Vancouver, ISO, and other styles

7

Akman, Kemal. "Bioinformatics of DNA Methylation analysis." Diss., Ludwig-Maximilians-Universität München, 2014. http://nbn-resolving.de/urn:nbn:de:bvb:19-182873.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Benaim, Jalfon Carlos 1966. "Analysis of the bioinformatics industry." Thesis, Massachusetts Institute of Technology, 2001. http://hdl.handle.net/1721.1/8882.

Full text

Abstract:

Thesis (S.M.M.O.T.)--Massachusetts Institute of Technology, Sloan School of Management, Management of Technology Program, 2001.<br>Includes bibliographical references (leaf 76).<br>The rise of the commercial genomic industry and the broadening application of genomic techniques in biology and medicine together with the growing availability of DNA sequence information have created a new industry: The Bioinformatics Industry. This thesis analyzes technologies, applications market and competitors in this industry and explores potential changes to the business models that are being used today. The technology and market information indicates that this is an industry in a very early stage. On the other hand, the business models being used are very similar to the ones used traditionally in the hardware and software industry: licensing, ASP (Application Service Provider), joint developments and hardware/software solutions. The actual market size is relatively small, estimated in no more than $300M. Only by implementing strategies of horizontal or vertical integration, a company in this industry might be able to boost revenues in the long term.<br>by Carlos Benaim Jalfon.<br>S.M.M.O.T.

APA, Harvard, Vancouver, ISO, and other styles

9

Yu, Jennifer. "Bioinformatics Analysis of Vasorin in Gliomas." Case Western Reserve University School of Graduate Studies / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=case1484927314447688.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Xu, Jia. "Bioinformatics analysis of small silencing RNAs." Thesis, Boston University, 2011. https://hdl.handle.net/2144/38118.

Full text

Abstract:

Thesis (Ph.D.)--Boston University<br>PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you.<br>More than 180 genomes have been deciphered, however, much remains to be learned about how genes are regulated. Transcription factors harboring promoters and distal elements are known to activate or repress downstream gene expression, and DNA methylation and histone modifications add the complexity of epigenetic regulation. Furthermore, three classes of small RNA regulators have recently been discovered to repress the target gene and transposon expression. In flies, microRNAs (miRNAs) inhibit translation and expedite degradation of the target mRNAs. Small interfering RNAs (siRNAs) participate in a self defense mechanism called RNA interference (RNAi) to silence infected virus mRNAs or endogenous transposon elements. Piwi-interaction RNAs (piRNAs) efficiently silence the transposon elements in the gonad. The advent of next generation sequencing technologies has allowed us to sequence with sufficient coverage and accuracy and perform genome-wide bioinformatics analyses on small regulatory RNAs to enrich our knowledge on regulation. In this dissertation, I developed a suite of computational algorithms and programs to study small RNAs from next generation sequencing data. First I developed a de novo miRNA discovery pipeline to discover miRNAs in sea urchin and demonstrated one of the sources of endo-siRNAs in flies was overlapping complementary mRNAs. I further investigated the question of how miRNAs and siRNAs were sorted into their own pathways. First nucleotide composition and duplex structure were shown to significantly affect the sorting protein (R2D2) to decide small RNA's destiny. Next, I described collaboration work on piRNA pathway proteins, Ago3 and Rhino. Ago3 was found to catalyze the ping-pong amplification cycle in the piRNA pathway and Rhino, a HP1 homolog, was essential for dual strand piRNA clusters. Lastly, I demonstrated a sequencing-depth independent computational approach to quantify ping-pong efficiency and illustrated the function of each piRNA pathway protein after implementing. In addition, I developed a dynamic programming for detecting piRNA clusters to better annotate the piRNAs enriched segments in the genome and revealed the expression pattern for each cluster.<br>2031-01-01

APA, Harvard, Vancouver, ISO, and other styles

11

Odelgard, Anna. "Coverage Analysis in Clinical Next-Generation Sequencing." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-379228.

Full text

Abstract:

With the new way of sequencing by NGS new tools had to be developed to be able to work with new data formats and to handle the larger data sizes compared to the previous techniques but also to check the accuracy of the data. Coverage analysis is one important quality control for NGS data, the coverage indicates how many times each base pair has been sequenced and thus how trustworthy each base call is. For clinical purposes every base of interest must be quality controlled as one wrong base call could affect the patient negatively. The softwares used for coverage analysis with enough accuracy and detail for clinical applications are sparse. Several softwares like Samtools, are able to calculate coverage values but does not further process this information in a useful way to produce a QC report of each base pair of interest. My master thesis has therefore been to create a new coverage analysis report tool, named CAR tool, that extract the coverage values from Samtools and further uses this data to produce a report consisting of tables, lists and figures. CAR tool is created to replace the currently used tool, ExCID, at the Clinical Genomics facility at SciLifeLab in Uppsala and was developed to meet the needs of the bioinformaticians and clinicians. CAR tool is written in python and launched from a terminal window. The main function of the tool is to display coverage breath values for each region of interest and to extract all sub regions below a chosen coverage depth threshold. The low coverage regions are then reported together with region name, start and stop positions, length and mean coverage value. To make the tool useful to as many as possible several settings are possible by entering different flags when calling the tool. Such settings can be to generate pie charts of each region’s coverage values, filtering of the read and bases by quality or write your own entry that will be used for the coverage calculation by Samtools. The tool has been proved to find these low coverage regions very well. Most low regions found are also found by ExCID, the currently used tool, some differences did however occur and every such region was verified by IGV. The coverage values shown in IGV coincided with those found by CAR tool. CAR tool is written to find all low coverage regions even if they are only one base pair long, while ExCID instead seem to generate larger low regions not taking very short low regions into account. To read more about the functions and how to use CAR tool I refer to User instructions in the appendix and on GitHub at the repository anod6351

APA, Harvard, Vancouver, ISO, and other styles

12

Stenerlöw, Oskar. "Artefact detection in microstructures using image analysis." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-417342.

Full text

Abstract:

Gyros Protein Technologies AB produce instruments designed to perform automated immunoassaying on plastic CDs with microstructures. While generally being a very robust process, the company had noticed that some runs on the instruments encountered problems. They hypothesised it had to do with the chamber on the CD in which the sample is added to. It was believed that the chamber was not being filled properly, leaving it completely empty or contained with a small amount of air, rather than liquid. This project aimed to investigate this hypothesis and to develop an image analysis solution that could reliably detect these occurrences. An image analysis script was developed which mainly utilised template matching and canny edge detection to assess the presence of air. The analysis had great success in detecting empty chambers and large bubbles of air, while it had some trouble with discerning small bubbles from dirt on top of the CD. Evaluating the analysis on a test set of 1305 images annotated by two people, the analysis managed to score an accuracy of 96.8 % and 99.5 % respectively.

APA, Harvard, Vancouver, ISO, and other styles

13

Bresell, Anders. "Characterization of protein families, sequence patterns, and functional annotations in large data sets." Doctoral thesis, Linköping : Department of Physics, Chemistry and Biology, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10565.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Bertoldi, Loris. "Bioinformatics for personal genomics: development and application of bioinformatic procedures for the analysis of genomic data." Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3421950.

Full text

Abstract:

In the last decade, the huge decreasing of sequencing cost due to the development of high-throughput technologies completely changed the way for approaching the genetic problems. In particular, whole exome and whole genome sequencing are contributing to the extraordinary progress in the study of human variants opening up new perspectives in personalized medicine. Being a relatively new and fast developing field, appropriate tools and specialized knowledge are required for an efficient data production and analysis. In line with the times, in 2014, the University of Padua funded the BioInfoGen Strategic Project with the goal of developing technology and expertise in bioinformatics and molecular biology applied to personal genomics. The aim of my PhD was to contribute to this challenge by implementing a series of innovative tools and by applying them for investigating and possibly solving the case studies included into the project. I firstly developed an automated pipeline for dealing with Illumina data, able to sequentially perform each step necessary for passing from raw reads to somatic or germline variant detection. The system performance has been tested by means of internal controls and by its application on a cohort of patients affected by gastric cancer, obtaining interesting results. Once variants are called, they have to be annotated in order to define their properties such as the position at transcript and protein level, the impact on protein sequence, the pathogenicity and more. As most of the publicly available annotators were affected by systematic errors causing a low consistency in the final annotation, I implemented VarPred, a new tool for variant annotation, which guarantees the best accuracy (>99%) compared to the state-of-the-art programs, showing also good processing times. To make easy the use of VarPred, I equipped it with an intuitive web interface, that allows not only a graphical result evaluation, but also a simple filtration strategy. Furthermore, for a valuable user-driven prioritization of human genetic variations, I developed QueryOR, a web platform suitable for searching among known candidate genes as well as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive, flexible and easy to use. The prioritization is achieved by a global positive selection process that promotes the emergence of the most reliable variants, rather than filtering out those not satisfying the applied criteria. QueryOR has been used to analyze the two case studies framed within the BioInfoGen project. In particular, it allowed to detect causative variants in patients affected by lysosomal storage diseases, highlighting also the efficacy of the designed sequencing panel. On the other hand, QueryOR simplified the recognition of LRP2 gene as possible candidate to explain such subjects with a Dent disease-like phenotype, but with no mutation in the previously identified disease-associated genes, CLCN5 and OCRL. As final corollary, an extensive analysis over recurrent exome variants was performed, showing that their origin can be mainly explained by inaccuracies in the reference genome, including misassembled regions and uncorrected bases, rather than by platform specific errors.<br>Nell’ultimo decennio, l’enorme diminuzione del costo del sequenziamento dovuto allo sviluppo di tecnologie ad alto rendimento ha completamente rivoluzionato il modo di approcciare i problemi genetici. In particolare, il sequenziamento dell’intero esoma e dell’intero genoma stanno contribuendo ad un progresso straordinario nello studio delle varianti genetiche umane, aprendo nuove prospettive nella medicina personalizzata. Essendo un campo relativamente nuovo e in rapido sviluppo, strumenti appropriati e conoscenze specializzate sono richieste per un’efficiente produzione e analisi dei dati. Per rimanere al passo con i tempi, nel 2014, l’Università degli Studi di Padova ha finanziato il progetto strategico BioInfoGen con l’obiettivo di sviluppare tecnologie e competenze nella bioinformatica e nella biologia molecolare applicate alla genomica personalizzata. Lo scopo del mio dottorato è stato quello di contribuire a questa sfida, implementando una serie di strumenti innovativi, al fine di applicarli per investigare e possibilmente risolvere i casi studio inclusi all’interno del progetto. Inizialmente ho sviluppato una pipeline per analizzare i dati Illumina, capace di eseguire in sequenza tutti i processi necessari per passare dai dati grezzi alla scoperta delle varianti sia germinali che somatiche. Le prestazioni del sistema sono state testate mediante controlli interni e tramite la sua applicazione su un gruppo di pazienti affetti da tumore gastrico, ottenendo risultati interessanti. Dopo essere state chiamate, le varianti devono essere annotate al fine di definire alcune loro proprietà come la posizione a livello del trascritto e della proteina, l’impatto sulla sequenza proteica, la patogenicità, ecc. Poiché la maggior parte degli annotatori disponibili presentavano errori sistematici che causavano una bassa coerenza nell’annotazione finale, ho implementato VarPred, un nuovo strumento per l’annotazione delle varianti, che garantisce la migliore accuratezza (>99%) comparato con lo stato dell’arte, mostrando allo stesso tempo buoni tempi di esecuzione. Per facilitare l’utilizzo di VarPred, ho sviluppato un’interfaccia web molto intuitiva, che permette non solo la visualizzazione grafica dei risultati, ma anche una semplice strategia di filtraggio. Inoltre, per un’efficace prioritizzazione mediata dall’utente delle varianti umane, ho sviluppato QueryOR, una piattaforma web adatta alla ricerca all’interno dei geni causativi, ma utile anche per trovare nuove associazioni gene-malattia. QueryOR combina svariate caratteristiche innovative che lo rendono comprensivo, flessibile e facile da usare. La prioritizzazione è raggiunta tramite un processo di selezione positiva che fa emergere le varianti maggiormente significative, piuttosto che filtrare quelle che non soddisfano i criteri imposti. QueryOR è stato usato per analizzare i due casi studio inclusi all’interno del progetto BioInfoGen. In particolare, ha permesso di scoprire le varianti causative dei pazienti affetti da malattie da accumulo lisosomiale, evidenziando inoltre l’efficacia del pannello di sequenziamento sviluppato. Dall’altro lato invece QueryOR ha semplificato l’individuazione del gene LRP2 come possibile candidato per spiegare i soggetti con un fenotipo simile alla malattia di Dent, ma senza alcuna mutazione nei due geni precedentemente descritti come causativi, CLCN5 e OCRL. Come corollario finale, è stata effettuata un’analisi estensiva su varianti esomiche ricorrenti, mostrando come la loro origine possa essere principalmente spiegata da imprecisioni nel genoma di riferimento, tra cui regioni mal assemblate e basi non corrette, piuttosto che da errori piattaforma-specifici.

APA, Harvard, Vancouver, ISO, and other styles

15

Thelander, Tilia. "Optimisation of ForenSeq STR data analysis with FDSTools and comparative analysis with UAS." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-20053.

Full text

Abstract:

DNA profiling with short tandem repeat data generated with massively parallel sequencing is associated with several challenges. FDSTools is an open-source software which applies correction models based on a reference database to correct DNA profiles. The correction models aim to provide an accurate representation of the true DNA profile and associated artefacts. Low analytical thresholds in FDSTools are suggested to improve detection of minor profiles in complex mixtures. The objective was to optimise FDSTools analysis for ForenSeq data, and to establish a Swedish reference database. The FDSTools analysis was subsequently compared to default analysis with the commercial Universal Analysis Software, and the likelihood ratio was evaluated. The FDSTools Library file was adapted for ForenSeq data. FASTQ files from single- and mixed-source samples were analysed with the software. The concordance between the software was assessed, and analytical thresholds in FDSTools were optimised. Likelihood ratios were calculated for sequencing- and capillary electrophoresis data to investigate the benefit of sequence level information. A reference database and correction models could not be generated, meaning that uncorrected data was used. The two software showed a 98.5% concordance. Disconcordance was caused by allele drop-out in heterozygous loci which implicated that certain markers may require individual interpretation. Lowering the analytical thresholds in FDSTools appeared to improve mixture deconvolution, but the lack of correction models obscured interpretation. Hence, without correction models optimial analytical thresholds could not be defined. Likelihood ratio based on sequencing data was not consistently higher compared to capillary electrophoresis, suggesting that sequence information is not always advantageous.

APA, Harvard, Vancouver, ISO, and other styles

16

Freyhult, Eva. "A Study in RNA Bioinformatics : Identification, Prediction and Analysis." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis Acta Universitatis Upsaliensis, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-8305.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Jadhav, Trishul. "Knowledge Based Gene Set analysis (KB-GSA) : A novel method for gene expression analysis." Thesis, University of Skövde, School of Life Sciences, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-4352.

Full text

Abstract:

<p>Microarray technology allows measurement of the expression levels of thousand of genes simultaneously. Several gene set analysis (GSA) methods are widely used for extracting useful information from microarrays, for example identifying differentially expressed pathways associated with a particular biological process or disease phenotype. Though GSA methods like Gene Set Enrichment Analysis (GSEA) are widely used for pathway analysis, these methods are solely based on statistics. Such methods can be awkward to use if knowledge of specific pathways involved in particular biological processes are the aim of the study. Here we present a novel method <strong><em>(Knowledge Based Gene Set Analysis: KB-GSA</em></strong>) which integrates knowledge about user-selected pathways that are known to be involved in specific biological processes. The method generates an easy to understand graphical visualization of the changes in expression of the genes, complemented with some common statistics about the pathway of particular interest.</p>

APA, Harvard, Vancouver, ISO, and other styles

18

Castleberry, Alissa. "Integrated Analysis of Multi-Omics Data Using Sparse Canonical Correlation Analysis." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu15544898045976.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Dampier, William Tozeren Aydin. "Analysis of host-pathogen interactions : a bioinformatics approach /." Philadelphia, Pa. : Drexel University, 2010. http://hdl.handle.net/1860/3249.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Ahmed, Ikhlak. "A bioinformatics analysis of the arabidopsis thaliana epigenome." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00684391.

Full text

Abstract:

Eukaryotic genomes are packed into the confines of the nucleus through a nucleoproteic structure called chromatin. Chromatin is a dynamic structure that can respond to developmental or environmental cues to regulate and orchestrate the functions of the genome. The fundamental unit of chromatin, the nucleosome, consists of a protein octamer, which contains two molecules of each of the core histone proteins (H2A, H2B, H3, H4), around which 147 bp of DNA is wrapped. The post-translational modifications (PTMs) of histones and methylation of the cytosine residues in DNA (DNA methylation) constitute primary epigenomic markers that dynamically alter the interaction of DNA with nucleosomes and participate in the regulation and control access to the underlying DNA. The main objective of my thesis was to understand the spatial and temporal dynamics of chromatin states in Arabidopsis by investigating on a genome-wide scale, patterns of DNA methylation and a set of well-characterized histone post-translational modifications. DNA methylation, a hallmark of epigenetic inactivation and heterochromatin in both plants and mammals, is largely confined to transposable elements and other repeat sequences. I show in this thesis that in Arabidopsis, methylated TE sequences having no or few matching siRNAs, and therefore unlikely to be targeted by the RNA-directed DNA methylation (RdDM) machinery, acquire DNA methylation through spreading from adjacent siRNA-targeted regions. Further, I propose that this spreading of DNA methylation through promoter regions can explain, at least in part, the negative impact of siRNA-targeted TE sequences on neighbouring gene expression. In a second part, I have contributed to integrative analysis of DNA methylation and eleven histone PTMs. I have shown through combinatorial and cluster analysis that the Arabidopsis epigenome shows simple principles of organisation and can be distinguished into four primary types of chromatin that preferentially index active genes, repressed genes, TEs, and intergenic regions. Finally, in a third part, I integrated epigenomics with transcriptome data at three different time points in a developmental window to investigate the temporal dynamics of chromatin states in response to an external stimulus. This used the light-induced transcriptional response as a paradigm to assess the impact of histone H2B monoubiquitination (H2Bub), and showed that this PTM is associated with active transcription and implicated in the selective fine-tuning of gene expression. Taken together, the work presented here contributes significantly to our understanding of the spatial organisation of chromatin states in plants, its dynamic nature and how it can contribute to allow plants to respond to a signal from the environment.

APA, Harvard, Vancouver, ISO, and other styles

21

Alves, Alexessander Da Silva Couto. "Bioinformatics methods for the analysis of metabolic profiles." Thesis, Imperial College London, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.534966.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Holzerlandt, Ria. "Analysis of host and herpesvirus interactions using bioinformatics." Thesis, University College London (University of London), 2004. http://discovery.ucl.ac.uk/1446593/.

Full text

Abstract:

Bioinformatics methods have become central to analysing and organising the sequence data continually produced by new and existing sequencing projects. The field of bioinformatics covers both the static aspects of organising and presenting these raw data, by compiling existing knowledge into accessible databases, ontologies, and libraries; and the more dynamic aspects of knowledge discovery informatics for interpreting and mining existing data. The aim of this thesis is to utilise such methods to analyse the herpesvirus-host relationship. In Chapter 2 comparative host and herpesvirus genome analysis is used to compare the sequences of all currently sequenced herpesvirus open reading frames to the conceptually translated human genome with the aim of identifying herpesvirus-human (host) sequence homologues. Collating in one search all currently known host homologues provides the first complete assessment of herpesvirus-host homologues. This search identified 55 previously identified herpesvirus-host homologues, and 4 previously unknown herpesvirus-host homologues. The work performed in Chapter 2 highlighted the need for consistent annotation of genomes and gene products to allow greater comparative genomics. It is not feasible to manually curate large numbers of genes whose relationships to each other are not immediately clear. Therefore, Chapters 3 and 4 focus upon the use of the Gene Ontology; a resource that is made publicly available for the purpose of annotating gene products with unified vocabulary derived from a structured directed acyclic graph. The Gene Ontology was extended to allow host-pathogen interaction annotation by a) adding 187 new terms relating specifically to virus function and structure (Chapter 3), and b) using both the new and existing terms to annotate the entire Human Herpesvirus 1 genome using references from the available literature (Chapter 4). Finally, Chapter 5 examines the utility of the Gene Ontology when analysing such large-scale host and herpesvirus gene expression datasets as produced experimentally by DNA microarray studies. Using such automated annotation methods a cluster of 12 proteins were identified that increase mitochondrial function in HUVEC cells 24 hours post HCMV infection. A cluster of nine proteins that function in the MAPK pathway were also identified using the Gene Ontology that provide evidence for HCMV inhibition of the MAPK pathway.

APA, Harvard, Vancouver, ISO, and other styles

23

Ramraj, Varun. "Exploiting whole-PDB analysis in novel bioinformatics applications." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:6c59c813-2a4c-440c-940b-d334c02dd075.

Full text

Abstract:

The Protein Data Bank (PDB) is the definitive electronic repository for experimentally-derived protein structures, composed mainly of those determined by X-ray crystallography. Approximately 200 new structures are added weekly to the PDB, and at the time of writing, it contains approximately 97,000 structures. This represents an expanding wealth of high-quality information but there seem to be few bioinformatics tools that consider and analyse these data as an ensemble. This thesis explores the development of three efficient, fast algorithms and software implementations to study protein structure using the entire PDB. The first project is a crystal-form matching tool that takes a unit cell and quickly (< 1 second) retrieves the most related matches from the PDB. The unit cell matches are combined with sequence alignments using a novel Family Clustering Algorithm to display the results in a user-friendly way. The software tool, Nearest-cell, has been incorporated into the X-ray data collection pipeline at the Diamond Light Source, and is also available as a public web service. The bulk of the thesis is devoted to the study and prediction of protein disorder. Initially, trying to update and extend an existing predictor, RONN, the limitations of the method were exposed and a novel predictor (called MoreRONN) was developed that incorporates a novel sequence-based clustering approach to disorder data inferred from the PDB and DisProt. MoreRONN is now clearly the best-in-class disorder predictor and will soon be offered as a public web service. The third project explores the development of a clustering algorithm for protein structural fragments that can work on the scale of the whole PDB. While protein structures have long been clustered into loose families, there has to date been no comprehensive analytical clustering of short (~6 residue) fragments. A novel fragment clustering tool was built that is now leading to a public database of fragment families and representative structural fragments that should prove extremely helpful for both basic understanding and experimentation. Together, these three projects exemplify how cutting-edge computational approaches applied to extensive protein structure libraries can provide user-friendly tools that address critical everyday issues for structural biologists.

APA, Harvard, Vancouver, ISO, and other styles

24

Raj, Kumar Praveen Kumar. "BIOINFORMATICS ANALYSIS OF ALTERNATIVE SPLICING IN CHLAMYDOMONAS REINHARDTII." Miami University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=miami1281124967.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Santos, Tiago Filipe Melo. "Bioinformatics analysis of the Pedobacter sp. NL19 genome." Master's thesis, Universidade de Aveiro, 2015. http://hdl.handle.net/10773/16041.

Full text

Abstract:

Mestrado em Biologia Molecular e Celular<br>The last decades of the 20th century defined the genetic engineering advent, climaxing in the development of techniques, such as PCR and Sanger sequencing. This, permitted the appearance of new techniques to sequencing whole genomes, identified as next-generation sequencing. One of the many applications of these techniques is the in silico search for new secondary metabolites, synthesized by microorganisms exhibiting antimicrobial properties. The peptide antibiotics compounds can be classified in two classes, according to their biosynthesis, in ribosomal or nonribosomal peptides. Lanthipeptides are the most studied ribosomal peptides and are characterized by the presence of lanthionine and methylanthionine that result from posttranslational modifications. Lanthipeptides are divided in four classes, depending on their biosynthetic machinery. In class I, a LanB enzyme dehydrate serine and threonine residues in the C-terminus precursor peptide. Then, these residues undergo a cyclization step performed by a LanC enzyme, forming the lanthionine rings. The cleavage and the transport of the peptide is achieved by the LanP and LanT enzymes, respectively. Although, in class II only one enzyme, LanM, is responsible for the dehydration and cyclization steps and also only one enzyme performs the cleavage and transport, LanT. Pedobacter sp. NL19 is a Gram-negative bacterium, isolated from sludge of an abandon uranium mine, in Viseu (Portugal). Antibacterial activity in vitro was detected against several Gram-positive and Gram-negative bacteria. Sequencing and in silico analysis of NL19 genome revealed the presence of 21 biosynthetic clusters for secondary metabolites, including nonribosomal and ribosomal peptides biosynthetic clusters. Four lanthipeptides clusters were predicted, comprising the precursor peptides, the modifying enzymes (LanB and LanC), and also a bifunctional LanT. This result revealed the hybrid nature of the clusters, comprising characteristics from two distinct classes, which are poorly described in literature. The phylogenetic analysis of their enzymes showed that they clustered within the bacteroidetes clade. Furthermore, hybrid gene clusters were also found in other species of this phylum, revealing that it is a common characteristic in this group. Finally, the analysis of NL19 colonies by MALDI-TOF MS allowed the identification of a 3180 Da mass that corresponds to the predicted mass of a lanthipeptide encoded in one of the clusters. However, this result is not fully conclusive and further experiments are needed to understand the full potential of the compounds encoded in this type of clusters. In conclusion, it was determined that NL19 strain has the potential to produce diverse secondary metabolites, including lanthipeptides that were not functionally characterized so far.<br>O final do século XX marcou o advento da engenharia genética, que culminou com o desenvolvimento de diversas técnicas, como o PCR ou a sequenciação de Sanger. Isto permitiu o aparecimento de novas técnicas de sequenciação de genomas, conhecidas como next-generation sequencing. Uma das suas aplicações é a procura in silico de novos metabolitos secundários, sintetizados por microrganismos e com ação antimicrobiana. Os péptidos antimicrobianos podem ser classificados em péptidos ribossomais e péptidos não-ribossomais, de acordo com a sua biossíntese. Os lantipéptidos são os péptidos ribossomais mais estudados, sendo caracterizados pela presença de lantioninas e metillantioninas na sua estrutura, que resultam de modificações pós-traducionais. Estes podem ser classificados em quatro classes consoante a sua maquinaria de biossíntese. Na classe I, resíduos de serina e treonina são desidratados no terminal C do péptido precursor por uma enzima LanB. Em seguida, estes resíduos sofrem ciclização por ação de uma enzima LanC, formando ligações de lantionina. A clivagem e transporte são posteriormente realizadas por duas enzimas LanP e LanT, respectivamente. Na classe II uma enzima bifuncional LanM é responsável pela desidratação e ciclização, e uma enzima LanT, pela clivagem e transporte. Pedobacter sp. NL19 é uma bactéria de Gram-negativo, isolada a partir de lamas de uma mina de urânio abandonada, em Viseu (Portugal). Possui atividade antimicrobiana in vitro contra várias bactérias de Gram-positivo e de Gram-negativo. A sequenciação e análise do genoma desta bactéria permitiu identificar a presença de 21 clusters biossintéticos para metabolitos secundários, incluindo clusters que codificam para péptidos ribossomais e nãoribossomais. Foram identificados quatro clusters de lantipéptidos contendo péptidos precursores, enzimas de modificação (LanB e LanC) de classe I, e a enzima bifuncional LanT, de classe II. Este resultado revela a existência de clusters de genes híbridos, pouco descritos na literatura, possuindo características de duas classes distintas. A análise filogenética efectuada revelou que as enzimas destes clusters agrupam dentro da clade de bacteroidetes. Assim, verificou-se que outras espécies deste filo também possuem os clusters de gene híbridos de lantipéptidos, mostrando que esta não é uma característica rara neste grupo de organismos. Por fim, a análise de colónias da NL19 por MALDI-TOF MS permitiu detectar uma massa com 3180 Da, correspondente à massa prevista para um lantipéptido codificado por um dos clusters híbridos. Contudo, este resultado não é totalmente conclusivo e mais procedimentos experimentais terão que ser realizados para caracterizar totalmente o potencial destes péptidos. Assim, a análise realizada revelou que a bactéria NL19 possui potencial para produzir diversos metabolitos secundários, incluindo lantipéptidos que não se encontram ainda funcionalmente caracterizados.

APA, Harvard, Vancouver, ISO, and other styles

26

Mahram, Atabak. "FPGA acceleration of sequence analysis tools in bioinformatics." Thesis, Boston University, 2013. https://hdl.handle.net/2144/11126.

Full text

Abstract:

Thesis (Ph.D.)--Boston University<br>With advances in biotechnology and computing power, biological data are being produced at an exceptional rate. The purpose of this study is to analyze the application of FPGAs to accelerate high impact production biosequence analysis tools. Compared with other alternatives, FPGAs offer huge compute power, lower power consumption, and reasonable flexibility. BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. It is a complex highly-optimized system, consisting of tens of thousands of lines of code and a large number of heuristics. Our idea is to emulate the main phases of its algorithm on FPGA. Utilizing our FPGA engine, we quickly reduce the size of the database to a small fraction, and then use the original code to process the query. Using a standard FPGA-based system, we achieved 12x speedup over a highly optimized multithread reference code. Multiple Sequence Alignment (MSA)--the extension of pairwise Sequence Alignment to multiple Sequences--is critical to solve many biological problems. Previous attempts to accelerate Clustal-W, the most commonly used MSA code, have directly mapped a portion of the code to the FPGA. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 8Ox to 190x over the CPU code (8 cores). The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics. The challenge in FPGA-based acceleration is finding a suitable application mapping. Unfortunately many software heuristics do not fall into this category and so other methods must be applied. One is restructuring: an entirely new algorithm is applied. Another is to analyze application utilization and develop accuracy/performance tradeoffs. Using our prefiltering approach and novel FPGA programming models we have achieved significant speedup over reference programs. We have applied approximation, seeding, and filtering to this end. The bulk of this study is to introduce the pros and cons of these acceleration models for biosequence analysis tools.

APA, Harvard, Vancouver, ISO, and other styles

27

Murat, Katarzyna. "Bioinformatics analysis of epigenetic variants associated with melanoma." Thesis, University of Bradford, 2018. http://hdl.handle.net/10454/17220.

Full text

Abstract:

The field of cancer genomics is currently being enhanced by the power of Epigenome-wide association studies (EWAS). Over the last couple of years comprehensive sequence data sets have been generated, allowing analysis of genome-wide activity in cohorts of different individuals to be increasingly available. Finding associations between epigenetic variation and phenotype is one of the biggest challenges in biomedical research. Laboratories lacking dedicated resources and programming experience require bioinformatics expertise which can be prohibitively costly and time-consuming. To address this, we have developed a collection of freely available Galaxy tools (Poterlowicz, 2018a), combining analytical methods into a range of convenient analysis pipelines with graphical user-friendly interface.The tool suite includes methods for data preprocessing, quality assessment and differentially methylated region and position discovery. The aim of this project was to make EWAS analysis flexible and accessible to everyone and compatible with routine clinical and biological use. This is exemplified by my work undertaken by integrating DNA methylation profiles of melanoma patients (at baseline and mitogen-activated protein kinase inhibitor MAPKi treatment) to identify novel epigenetic switches responsible for tumour resistance to therapy (Hugo et al., 2015). Configuration files are publicly published on our GitHub repository (Poterlowicz, 2018b) with scripts and dependency settings also available to download and install via Galaxy test toolshed (Poterlowicz, 2018a). Results and experiences using this framework demonstrate the potential for Galaxy to be a bioinformatics solution for multi-omics cancer biomarker discovery tool.

APA, Harvard, Vancouver, ISO, and other styles

28

Fortino, Vittorio. "Sequence analysis in bioinformatics: methodological and practical aspects." Doctoral thesis, Universita degli studi di Salerno, 2013. http://hdl.handle.net/10556/985.

Full text

Abstract:

2011 - 2012<br>My PhD research activities has focused on the development of new computational methods for biological sequence analyses. To overcome an intrinsic problem to protein sequence analysis, whose aim was to infer homologies in large biological protein databases with short queries, I developed a statistical framework BLAST-based to detect distant homologies conserved in transmembrane domains of different bacterial membrane proteins. Using this framework, transmembrane protein domains of all Salmonella spp. have been screened and more than five thousands of significant homologies have been identified. My results show that the proposed framework detects distant homologies that, because of their conservation in distinct bacterial membrane proteins, could represent ancient signatures about the existence of primeval genetic elements (or mini-genes) coding for short polypeptides that formed, through a primitive assembly process, more complex genes. Further, my statistical framework lays the foundation for new bioinformatics tools to detect homologies domain-oriented, or in other words, the ability to find statistically significant homologies in specific target-domains. The second problem that I faced deals with the analysis of transcripts obtained with RNA-Seq data. I developed a novel computational method that combines transcript borders, obtained from mapped RNA-Seq reads, with sequence features based operon predictions to accurately infer operons in prokaryotic genomes. Since the transcriptome of an organism is dynamic and condition dependent, the RNA-Seq mapped reads are used to determine a set of confirmed or predicted operons and from it specific transcriptomic features are extracted and combined with standard genomic features to train and validate three operon classification models (Random Forests - RFs, Neural Networks – NNs, and Support Vector Machines - SVMs). These classifiers have been exploited to refine the operon map annotated by DOOR, one of the most used database of prokaryotic operons. This method proved that the integration of genomic and transcriptomic features improve the accuracy of operon predictions, and that it is possible to predict the existence of potential new operons. An inherent limitation of using RNA-Seq to improve operon structure predictions is that it can be not applied to genes not expressed under the condition studied. I evaluated my approach on different RNA-Seq based transcriptome profiles of Histophilus somni and Porphyromonas gingivalis. These transcriptome profiles were obtained using the standard RNA-Seq or the strand-specific RNA-Seq method. My experimental results demonstrate that the three classifiers achieved accurate operon maps including reliable predictions of new operons. [edited by author]<br>XI n.s.

APA, Harvard, Vancouver, ISO, and other styles

29

URGESE, GIANVITO. "Computational Methods for Bioinformatics Analysis and Neuromorphic Computing." Doctoral thesis, Politecnico di Torino, 2016. http://hdl.handle.net/11583/2646486.

Full text

Abstract:

The latest biological discoveries and the exponential growth of more and more sophisticated biotechnologies led in the current century to a revolution that totally reshaped the concept of genetic study. This revolution, which began in the last decades, is still continuing thanks to the introduction of new technologies capable of producing a huge amount of biological data in a relatively short time and at a very low price with respect to some decades ago. These new technologies are known as Next Generation Sequencing (NGS). These platforms perform massively parallel sequencing of both RNA and DNA molecules, thus allowing to retrieve the nucleic acid sequence of millions of fragments of DNA or RNA in a single machine run. The introduction of such technologies rapidly changed the landscape of genetic research, providing the ability to answer questions with heretofore unimaginable accuracy and speed. Moreover, the advent of NGS with the consequent need for ad-hoc strategies for data storage, sharing, and analysis is transforming genetics in a big data research field. Indeed, the large amount of data coming from sequencing technologies and the complexity of biological processes call for novel computational tools (Bioinformatics tools) and informatics resources to exploit this kind of information and gain novel insights into human beings, living organisms, and pathologies mechanisms. At the same time, a new scientific discipline called Neuromorphic Computing has been established to develop SW/HW systems having brain-specific features, such as high degree of parallelism and low power consumption. These platforms are usually employed to support the simulation of the nervous system, thus allowing the study of the mechanisms at the basis of the brain functioning. In this scenario, my research program focused on the development of optimized HW/SW algorithms and tools to process the biological information from Bioinformatics and Neuromorphic studies. The main objective of the methodologies proposed in this thesis consisted in achieving a high level of sensitivity and specificity in data analysis while minimizing the computational time. To reach these milestones, then, some bottlenecks identified in the state-of-the-art tools have been solved through a careful design of three new optimised algorithms. The work that led to this thesis is part of three collaborative projects. Two concerning the design of Bioinformatics sequence alignment algorithms and one aimed at optimizing the resources usage of a Neuromorphic platform. In the next paragraphs, the projects are briefly introduced. Dynamic Gap Selector Project This project concerned the design and implementation of a new gap model implemented in the dynamic programming sequence alignment algorithms. Smith-Waterman (S-W) and Needleman-Wunsch (N-W) are widespread methods to perform Local and Global alignments of biological sequences such as proteins, DNA and RNA molecules that are represented such as sequences of letters. Both the algorithms make use of scoring procedures to evaluate matches and errors that can be encountered during the sequence alignment process. These scoring strategies are designed to consider insertions and deletions through the identification of gaps in the aligned sequences. The Affine gap model is considered the most accurate model for the alignment of biomolecules. However, its application to S-W and N-W algorithms is quite expensive both in terms of computational time as well as in terms of memory requirements when compared to other less demanding models as the Linear gap one. In order to overcome these drawbacks, an optimised version of the Affine gap model called Dynamic Gap Selector (DGS) has been developed. The alignment scores computed using DGS are very similar to those computed using the gold standard Affine gap model. However, the implementation of this novel gap model during the S-W and N-W alignment procedures leads to the reduction of the memory requirements by a factor of 3. Moreover, the DGS model application accounts for a reduction by a factor of 2 in the number of operations required with respect to the standard Affine gap model. isomiR-SEA Project One of the most attractive research fields that is currently investigated by several interdisciplinary research teams is the study of small and medium RNA sequences with regulatory functions on the production of proteins. These RNA molecules are respectively called microRNAs (miRNAs) and long non-coding RNAs (lncRNAs). In the second project, an alignment algorithm specific for miRNAs detection and characterization have been designed and implemented. miRNAs are a class of short RNAs (18-25 bases) that play essential roles in a variety of cellular processes such as development, metabolism, regulation of immunological response and tumor genesis. Several tools have been developed in the last years to align and analyse the huge amount of data coming from the sequencing of short RNA molecules. However, these tools still lack accuracy and completeness because they use general alignment procedures that do not take into account the structural characteristics of miRNA molecules. Moreover, they are not able to detect specific miRNA variants, called isomiRs, that have recently been found to be relevant for miRNA targets regulation. To overcome these limitations, a miRNA-based alignment algorithm has been designed and developed. The isomiR-SEA algorithm is specifically tailored to detect different miRNAs variants (isomiRs) in the RNA-Seq data and to provide users with a detailed picture of the isomiRs spectrum characterizing the sample under investigation. The accuracy proper of the implemented alignment policy is reflected in the precise miRNAs and isomiRs quantification, and in the detailed profiling of miRNAtarget mRNA interaction sites. This information, hidden in raw miRNA sequencing data, can be very useful to properly characterize miRNAs and to adopt them as reliable biomarkers able to describe multifactorial pathologies such as cancer. SNN Partitioning and Placement Project In the Neuromorphic Computing field, SpiNNaker is one of the state-of-the-art massively parallel neuromorphic platform. It is designed to simulate Spiking Neural Networks (SNN) but it is characterized by several bottlenecks in the neuron partitioning and placement phases executed during the simulation configuration. In this activity, related to the European Flagship project Human Brain Project, a top-down methodology has been developed to improve the scalability and reliability of SNN simulations on massively many-core and densely interconnected platforms. In this context, SNNs mimic the brain activity by emulating spikes sent among neurons populations. Many-core platforms are emerging computing resources to achieve real-time SNNs simulations. Neurons are mapped to parallel cores and spikes are sent in the form of packets over the on-chip and off-chip network. However, due to the heterogeneity and complexity of neuron populations activity, achieving an efficient exploitation of platforms resources is a challenge, often impacting simulation reliability and limiting the biological network size. To address this challenge, the proposed methodology makes use of customized SNN configurations capable of extracting detailed profiling information about network usage of on-chip and off-chip resources. Thus, allowing to recognize the bottlenecks in the spike propagation system. These bottlenecks have been then considered during the SNN Partitioning and Placement of a graph describing the SNN interconnection on chips and cores available on the SpiNNaker board. The advantages of the proposed SNN Partitioning and Placement applied to the SpiNNaker has been evaluated in terms of traffic reduction and consequent simulation reliability. The results demonstrate that it is possible to consistently reduce packet traffic and improve simulation reliability by means of an effective neuron placement.

APA, Harvard, Vancouver, ISO, and other styles

30

Roberts, Rick Lee. "Structural and bioinformatic analysis of ethylmalonyl-CoA decarboxylase." Thesis, State University of New York at Buffalo, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=1600817.

Full text

Abstract:

<p> Many enzymes of the major metabolic pathways are categorized into superfamilies which share common folds. Current models postulate these superfamilies are the result of gene duplications coupled with mutations that result in the acquisition of new functions. Some of these new functions are considered advantageous and selected for, while others may simply be tolerated. The latter can result in metabolites being produced at low rates that are of no known use by the cell, and can become toxic when accumulated. Concurrent with the evolution of this tolerable or potentially detrimental metabolism, organisms are selected to evolve a means of correcting or “proofreading” these non-canonical metabolites to counterbalance their detrimental effects. Metabolite proofreading is a process of intermediary metabolism analogous to DNA proof reading that acts on these abnormal metabolites to prevent their accumulation and toxic effects. </p><p> Here we structurally characterize ethylmalonyl-CoA decarboxylase (EMCD), a member of the family of enoyl-CoA hydratases within the crotonase superfamily of proteins, which is coded by the ECHDC1 (enoyl-CoA hydratase domain containing 1) gene. EMCD has been shown to have a metabolic proofreading property, acting on the metabolic byproduct ethylmalonyl-CoA to prevent its accumulation which could result in oxidative damage. We use the complimentary methods of in situ crystallography, small angle X-ray scattering, and single crystal X-ray crystallography to structurally characterize EMCD, followed by homology analysis in order to propose a mechanism of action. This represents the first structure of a crotonase superfamily member thought to function as a metabolite proof reading enzyme.</p>

APA, Harvard, Vancouver, ISO, and other styles

31

Brown, David K. "Bioinformatics tool development with a focus on structural bioinformatics and the analysis of genetic variation in humans." Thesis, Rhodes University, 2018. http://hdl.handle.net/10962/60708.

Full text

Abstract:

This thesis is divided into three parts, united under the general theme of bioinformatics tool development and variation analysis. Part 1 describes the design and development of the Job Management System (JMS), a workflow management system for high performance computing (HPC). HPC has become an integral part of bioinformatics. Computational methods for molecular dynamics and next generation sequencing (NGS) analysis, which require complex calculations on large datasets, are not yet feasible on desktop computers. As such, powerful computer clusters have been employed to perform these calculations. However, making use of these HPC clusters requires familiarity with command line interfaces. This excludes a large number of researchers from taking advantage of these resources. JMS was developed as a tool to make it easier for researchers without a computer science background to make use of HPC. Additionally, JMS can be used to host computational tools and pipelines and generates both web-based interfaces and RESTful APIs for those tools. The web-based interfaces can be used to quickly and easily submit jobs to the underlying cluster. The RESTful web API, on the other hand, allows JMS to provided backend functionality for external tools and web servers that want to run jobs on the cluster. Numerous tools and workflows have already been added to JMS, several of which have been incorporated into external web servers. One such web server is the Human Mutation Analysis (HUMA) web server and database. HUMA, the topic of part 2 of this thesis, is a platform for the analysis of genetic variation in humans. HUMA aggregates data from various existing databases into a single, connected and related database. The advantages of this are realized in the powerful querying abilities that it provides. HUMA includes protein, gene, disease, and variation data and can be searched from the angle of any one of these categories. For example, searching for a protein will return the protein data (e.g. protein sequences, structures, domains and families, and other meta-data). However, the related nature of the database means that genes, diseases, variation, and literature related to the protein will also be returned, giving users a powerful and holistic view of all data associated with the protein. HUMA also provides links to the original sources of the data, allowing users to follow the links to find additional details. HUMA aims to be a platform for the analysis of genetic variation. As such, it also provides tools to visualize and analyse the data (several of which run on the underlying cluster, via JMS). These tools include alignment and 3D structure visualization, homology modeling, variant analysis, and the ability to upload custom variation datasets and map them to proteins, genes and diseases. HUMA also provides collaboration features, allowing users to share and discuss datasets and job results. Finally, part 3 of this thesis focused on the development of a suite of tools, MD-TASK, to analyse genetic variation at the protein structure level via network analysis of molecular dynamics simulations. The use of MD-TASK in combination with the tools developed in the previous parts of this thesis is showcased via the analysis of variation in the renin-angiotensinogen complex, a vital part of the renin-angiotensin system.

APA, Harvard, Vancouver, ISO, and other styles

32

Alborgeba, Zainab. "Development and evaluation of a cost-effectiveness analysis model for sepsis diagnosis." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19155.

Full text

Abstract:

Sepsis is a life-threatening organ dysfunction that is caused by a dysregulated host response to infection. Sepsis is a substantial health care and economic burden worldwide and is one of the most common reasons for admission to the hospital and intensive care unit. Early diagnosis and targeted treatment of sepsis are the bases to reduce the mortality and morbidity. Conventional blood culturing is the gold standard method for sepsis diagnostics. However, blood culturing is a time consuming method, requiring at least 48 to 72 hours to get the first results with very low sensitivity and specificity. The aim of this study was to determine and assess the direct sepsis-related costs for PCR-based diagnostic strategies (SeptiFast and POC/LAB). A mathematical model was constructed to compare PCR-based diagnostic strategies with the conventional blood culturing. Three case scenarios were investigated based on data from the United Kingdom, Spain and the Czech Republic. It was found that, POC/LAB was the most cost effective strategy in all countries if it could reduce the hospitalization length of stay with at least 3 days in the normal hospital ward and 1 day in the intensive care unit. Reducing the hospitalization length of stay had the greatest impact on the economic outcomes. While, reducing the costs of the diagnostic strategies did not show a remarkable effect on the economic results. In conclusion, the findings suggest that PCR-rapid diagnostic methods could be cost-effective for the diagnosis of patients with sepsis if they could reduce the hospitalization length of stay.

APA, Harvard, Vancouver, ISO, and other styles

33

Guo, Xiangxue. "Biochemical and Bioinformatics Analysis of CVAB C-Terminal Domain." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/biology_diss/3.

Full text

Abstract:

Cytoplasmic membrane proteins CvaB and CvaA and the outer membrane protein TolC form the bacteriocin colicin V (ColV) secretion system in Escherichia coli. CvaB functions as an ATP-binding cassette transporter with nucleotide-binding motifs in the C-terminal domain (CTD). To study the role of CvaB-CTD in the ColV secretion, a truncated construct of this domain was made and over-expressed. Different forms of CvaB-CTD were obtained during purification, and were identified as monomer, dimer, and oligomer on gel filtration. Nucleotide binding was shown critical for the CvaB-CTD dimerization: oligomers could be converted into dimers by nucleotide bindings; the removal of nucleotide from dimers resulted in transient monomers followed by CTD oligomerization and aggregation; no dimer form could be cross-linked from the nucleotide-binding deficient mutant D654H. The spatial proximity of the Walker A site and ABC signature motif in CTD dimer was identified through disulfide cross-linking of mixed CvaB-CTD with mutants A530C and L630C, while mutations did not dimerize individually. Those results indicated that the CvaB-CTD formed a nucleotide-dependent head-to-tail dimer. Molecular basis of differential nucleotide bindings was also studied through bioinformatics prediction and biochemical verification. Through sequence alignment and homology modeling with bound ATP or GTP, it was found that the Ser503 and Gln504 on aromatic stacking region (Y501DSQ-loop) of CvaB-CTD provided two additional hydrogen-bonds to GTP, but not to ATP. Site-directed mutations of the S503A and/or Q504L were designed based on the model. While site-directed mutagenesis studies of Walker A&B sites or the ABC signature motif affected little on the GTP-binding preference, the double mutation (S503A/Q504L) on the Y501DSQ-loop increased both ATP-binding and ATPase activity at low temperatures. The double mutant showed slight decrease of GTP-binding and about 10-fold increase of the ATP/GTP-binding ratio. Similar temperature sensitivity in nucleotide-binding and activity assays were identified in the double mutant at the same time. Mutations on the Y501DSQ-loop did not affect the ColV secretion level in vivo. Together, the Y501DSQ-loop is structurally involved in the differential binding of GTP over ATP.

APA, Harvard, Vancouver, ISO, and other styles

34

Borsani, M. "BIOINFORMATICS APPROACHES TO MALDI-TOF MASS SPECTROMETRY DATA ANALYSIS." Doctoral thesis, Università degli Studi di Milano, 2013. http://hdl.handle.net/2434/221050.

Full text

Abstract:

Despite the increasing performance of Mass spectrometry (MS) and others analytical tools, only few biomarkers have been validated and proved to be robust and clinically relevant; indeed a large numbers of proteomic biomarkers have been described, but they are not yet clinical implemented [1]. MALDI-TOF MS seems one of the more powerful tool for biomarkers discovery [2, 3], and shows interesting clinical properties, for instance the possibility to directly search in peripheral fuids for proteins related to an altered physiological state: samples (urine, plasma, serum, etc.) can be collected easily and cheaply by non-invasive, or very low-invasive, methods [4]. The combination of some biomarkers is actually considered more informative than a single biomarker [5, 6], and the improvement in the bioinformatics analysis of MS data could probably help this investigation, decreasing costs and time necessary for each discovery [7]. It is possible to approach the problems related to the analysis of (MALDI-TOF) MS data in two ways, either trying to increase the number of available samples or by reducing the complexity of the problem [8]: in the first case, we developed an approach to compare small datasets from different sources (i.e. hospitals), based on mutual information and mass spectra alignment, that showed significant performance increase compare to the competing ones tested. In the latter case, we developed novel methods and approaches to compare MALDI-TOF MS profiles of normal and Renal Cell Carcinoma (RCC) patients, with the goal of isolating the more interesting subset of small proteins and peptides from the whole analysed peptidome. MS-based profiling is in fact able to detect differently expressed proteins or peptides during physiological and pathological processes. Every MALDI-TOF MS spectrum, that reports the relative abundance of sample analytes, could be considered as a snapshot of samples peptidome in a definite mass range. The relationship between mass/charge ratio, or m/z, and concentration of detected peptides can be represented by networks. Tumor case and control subjects show different peptidome profiles, due to differences in biomolecular and/or biochemical features of cancer cells: they will show some changes in the networks that describe them. We use graphs to create networks representation of data and to evaluate networks properties. We explore the networks properties comparing cases versus controls datasets, and subdividing cases in the different histological subtypes of RCC, clear cell RCC (ccRCC) and not-ccRCC, using different methods both for networks creation and analysis, and for results evaluation. We identify, for each datasets (controls, ccRCC and not-ccRCC) some interesting mass ranges within which we believe biomarkers signals should be searched. In conclusion, we have developed a set of methods which we believe improve the current computational approaches for the analysis of mass spectrometry data. These results have been published or presented at workshops and conferences.

APA, Harvard, Vancouver, ISO, and other styles

35

Mishra, Satyakam. "Frequent Subgraph Mining Analysis of GPCR Activation." Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1613575702373053.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Martin, Paul. "Post-GWAS bioinformatics and functional analysis of disease susceptibility loci." Thesis, University of Manchester, 2017. https://www.research.manchester.ac.uk/portal/en/theses/postgwas-bioinformatics-and-functional-analysis-of-disease-susceptibility-loci(cc0e6cee-5c32-4b75-b3d3-f7c18b6f126d).html.

Full text

Abstract:

Genome-wide association studies (GWAS) have been tremendously successful in identifying genetic variants associated with complex diseases, such as rheumatoid arthritis (RA). However, the majority of these associations lie outside traditional protein coding regions and do not necessarily represent the causal effect. Therefore, the challenges post-GWAS are to identify causal variants, link them to target genes and explore the functional mechanisms involved in disease. The aim of the work presented here is to use high level bioinformatics to help address these challenges. There is now an increasing amount of experimental data generated by several large consortia with the aim of characterising the non-coding regions of the human genome, which has the ability to refine and prioritise genetic associations. However, whilst being publicly available, manually mining and utilising it to full effect can be prohibitive. I developed an automated tool, ASSIMILATOR, which quickly and effectively facilitated the mining and rapid interpretation of this data, inferring the likely functional consequence of variants and informing further investigation. This was used in a large extended GWAS in RA which assessed the functional impact of associated variants at the 22q12 locus, showing evidence that they could affect gene regulation. Environmental factors, such as vitamin D, can also affect gene regulation, increasing the risk of disease but are generally not incorporated into most GWAS. Vitamin D deficiency is common in RA and can regulate genes through vitamin D response elements (VDREs). I interrogated a large, publicly available VDRE ChIP-Seq dataset using a permutation testing approach to test for VDRE enrichment in RA loci. This study was the first comprehensive analysis of VDREs and RA associated variants and showed that they are enriched for VDREs, suggesting an involvement of vitamin D in RA.Indeed, evidence suggests that disease associated variants effect gene regulation through enhancer elements. These can act over large distances through physical interactions. A newly developed technique, Capture Hi-C, was used to identify regions of the genome which physically interact with associated variants for four autoimmune diseases. This study showed the complex physical interactions between genetic elements, which could be mediated by regions associated with disease. This work is pivotal in fully characterising genetic associations and determining their effect on disease. Further work has re-defined the 6q23 locus, a region associated with multiple diseases, resulting in a major re-evaluation of the likely causal gene in RA from TNFAIP3 to IL20RA, a druggable target, illustrating the huge potential of this research. Furthermore, it has been used to study the genetic associations unique to multiple sclerosis in the same region, showing chromatin interactions which support previously implicated genes and identify novel candidates. This could help improve our understanding and treatment of the disease. Bioinformatics is fundamental to fully exploit new and existing datasets and has made many positive impacts on our understanding of complex disease. This empowers researchers to fully explore disease aetiology and to further the discovery of new therapies.

APA, Harvard, Vancouver, ISO, and other styles

37

Reddy, Joseph. "Identification and Analysis of Important Proteins in Protein Interaction Networks Using Functional and Topological Information." Thesis, University of Skövde, School of Life Sciences, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-2395.

Full text

Abstract:

<p>Studying protein interaction networks using functional and topological information is important for understanding cellular organization and functionality. This study deals with identifying important proteins in protein interaction networks using SWEMODE (Lubovac, et al, 2006) and analyzing topological and functional properties of these proteins with the help of information derived from modular organization in protein interaction networks as well as information available in public resources, in this case, annotation sources describing the functionality of proteins. Multi-modular proteins are short-listed from the modules generated by SWEMODE. Properties of these short-listed proteins are then analyzed using functional information from SGD Gene Ontology(GO) (Dwight, et al., 2002) and MIPS functional categories (Ruepp, et al., 2004). Topological features such as lethality and centrality of these proteins are also investigated, using graph theoretic properties and information on lethal genes from Yeast Hub (Kei-Hoi, et al., 2005). The findings of the study based on GO terms reveal that these important proteins are mostly involved in the biological process of “organelle organization and biogenesis” and a majority of these proteins belong to MIPS “cellular organization” and “transcription” functional categories. A study of lethality reveals that multi-modular proteins are more likely to be lethal than proteins present only in a single module. An examination of centrality (degree of connectivity of proteins) in the network reveals that the ratio of number of important proteins to number of hubs at different hub sizes increases with the hub size (degree).</p>

APA, Harvard, Vancouver, ISO, and other styles

38

Tangruksa, Benyapa. "Extracellular vesicles (EVs):roles in cell proliferation and transcriptomic analysis of HUVEC receiving cancerderived EVs." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-20593.

Full text

Abstract:

Extracellular vesicles (EVs) are released by almost all types of cells. EVs play an important role in cell-to-cell communication by sending biomolecules such as mRNAs to other cells via endocytosis. This project aims to understand the roles of EVs and their potential application as mRNA delivery vehicles by completing two objectives. One objective was to investigate the EVs’ roles in cell proliferation by routinely removing EVs from cell-conditioned media, transferring stress-induced EVs to recipient epithelial cells, and examining their cell number. Another objective was to analyzethe transcriptomic expression of HUVEC cells treated with breast-cancer-derived EVs. EVs were routinely removed from the cell culture using 100 kDa filters, 15 kDa filters, or by changing to new media. The result showed that the epithelial cancer cell line grows at a lower growth rate when EVs are removed using 100 kDa filters and 15 kDa filters compared to the control. The stressinduced EV transfer experiment showed no significant difference in cell number among different stress conditions and stress-EV treatment. The transcriptomic analysis revealed 765 differentially expressed genes, which were mapped onto pathways using the IPA software. The majority of the top 10 significant pathways were associated with cancer progression. IPA’s Biomarker Filter function revealed 35 cancer biomarkers, as well as 33 putative angiogenesis biomarkers. VEGFAtargeted genes were identified, and the ones that are upregulated and located in the extracellular matrix or plasma membrane were identified as putative biomarkers for VEGFA-induced angiogenesis. The top 10-pathways suggest that the recipient HUVEC may receive cancer-related messages from the EVs. It is important to evaluate the content of EVs derived from different origins thoroughly. Cancer-derived EVs may induced angiogenesis in HUVEC, as shownpreviously, but cancer EVs might also communicate cancer signals that can cause pathological effects. Extensive studies and validation are needed to fully understand the role of EVs and their application as an mRNA delivering system.

APA, Harvard, Vancouver, ISO, and other styles

39

Palma, Daniela. "Analysis of metatranscriptomes from an acidophilic electricity generating community treating acidic mining wastewaters." Thesis, Linnéuniversitetet, Institutionen för biologi och miljö (BOM), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-75933.

Full text

Abstract:

Human ́s constant need for metals requires unsustainable mining and refining of metalore. As a result, highly contaminated wastewaters are discharged in the environmentcompromising the nearby habitat together with all its life forms. Microbial fuel cells arebioelectrochemical systems (BES’s) that use microorganisms to convert organic andinorganic matter, producing electricity as the final product. This technology has shownto have great potential for application for bioremediation of wastewaters. This thesisdescribes the gene expression and the taxonomical abundance of an acidophilic,electricity generating community that was used to treat mining wastewaters in amicrobial fuel cell. A complete metatranscriptomics analysis has been performed onduplicate MFC anode acidophilic microbial community generating electricity frominorganic sulfur compounds (ISC) oxidation at extremely low pH. The analysis showsthat the most expressed genus is Ferroplasma-like, the genus Acidithiobacillus-like isfollowing along with Sulfobacillus-like and Thermoplasma-like. Some of the generaexpressed show behaviours never described before suggesting that potentially, newspecies have been selected. The reactions of the sulfur pathway are regulated mostly bytwo genera: Acidithiobacillus-like during the disproportionation of tetrathionate, andFerroplasma-like by expressing the hdr gene that catalyses the reaction from elementalsulfur to sulfite, the sulfite is then converted to sulfate. The hdr gene has not previouslybeen found in F. acidarmanus-like suggesting that the specie might have been selectedfor this trait. Acidithiobacillus-like genus has a bigger role for the energy conservationand the electron transport in the sample, however the data are not sufficient to point outwhich gene has the major role in the process. The CO2 fixation in the chamber wasconsiderably low as a result from a significant carboxysomes production, bacterialcompartiments involved in the carbon dioxide fixation. The transcripts abundanceregarding the metal resistance genes have shown low expression suggesting that thecells were not under stress. This result is indicated by the synthesis of a transcriptionalrepressor protein that had prevented a significant production of metal resistanceenzymes. Likewise, the pH homeostasis plot does not show vast transcripts abundances,indicating that the cells were thriving under conditions not far from the optimum.

APA, Harvard, Vancouver, ISO, and other styles

40

Gunnarsson, Ida. "Deriving Protein Networks by Combining Gene Expression and Protein Chip Analysis." Thesis, University of Skövde, Department of Computer Science, 2002. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-706.

Full text

Abstract:

<p>In order to derive reliable protein networks it has recently been suggested that the combination of information from both gene and protein level is required. In this thesis a combination of gene expression and protein chip analysis was performed when constructing protein networks. Proteins with high affinity to the same substrates and encoded by genes with high correlation is here thought to constitute reliable protein networks. The protein networks derived are unfortunately not as reliable as were hoped for. According to the tests performed, the method derived in this thesis does not perform more than slightly better than chance. However, the poor results can depend on the data used, since mismatching and shortage of data has been evident.</p>

APA, Harvard, Vancouver, ISO, and other styles

41

Oguchi, Chizoba. "A Comparison of Sensitive Splice Aware Aligners in RNA Sequence Data Analysis in Leaping towards Benchmarking." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-18513.

Full text

Abstract:

Bioinformatics, as a field, rapidly develops and such development requires the design ofalgorithms and software. RNA-seq provides robust information on RNAs, both alreadyknown and new, hence the increased study of the RNA. Alignment is an important step indownstream analyses and the ability to map reads across splice junctions is a requirement ofan aligner to be suitable for mapping RNA-seq reads. Therefore, the necessity for a standardsplice-aware aligner. STAR, Rsubread and HISAT2 have not been singly studied for thepurpose of benchmarking one of them as a standard aligner for spliced RNA-seq reads. Thisstudy compared these aligners, found to be sensitive to splice sites, with regards to theirsensitivity to splice sites, performance with default parameter settings and the resource usageduring the alignment process. The aligners were matched with featureCounts. The resultsshow that STAR and Rsubread outperform HISAT2 in the aspects of sensitivity and defaultparameter settings. Rsubread was more sensitive to splice junctions than STAR butunderperformed with featureCounts. STAR had a consistent performance, with more demandon the memory and time resource, but showed it could be more sensitive with real data.

APA, Harvard, Vancouver, ISO, and other styles

42

Conners, Shannon Burns. "Carbohydrate utilization pathway analysis in the hyperthermophile Thermotoga maritima." NCSU, 2005. http://www.lib.ncsu.edu/theses/available/etd-11302005-092748/.

Full text

Abstract:

Carbohydrate utilization and production pathways identified in Thermotoga species likely contribute to their ubiquity in hydrothermal environments. Many carbohydrate-active enzymes from Thermotoga maritima have been characterized biochemically; however, sugar uptake systems and regulatory mechanisms that control them have not been well defined. Transcriptional data from cDNA microarrays were examined using mixed effects statistical models to predict candidate sugar substrates for ABC (ATP-binding cassette) transporters in T. maritima. Genes encoding proteins previously annotated as oligopeptide/dipeptide ABC transporters responded transcriptionally to various carbohydrates. This finding was consistent with protein sequence comparisons that revealed closer relationships to archaeal sugar transporters than to bacterial peptide transporters. In many cases, glycosyl hydrolases, co-localized with these transporters, also responded to the same sugars. Putative transcriptional repressors of the LacI, XylR, and DeoR families were likely involved in regulating genomic units for beta-1,4-glucan, beta-1,3-glucan, beta-1,4-mannan, ribose, and rhamnose metabolism and transport. Carbohydrate utilization pathways in T. maritima may be related to ecological interactions within cell communities. Exopolysaccharide-based biofilms composed primarily of ?Ò-linked glucose, with small amounts of mannose and ribose, formed under certain conditions in both pure T. maritima cultures and mixed cultures of T. maritima and M. jannaschii. Further examination of transcriptional differences between biofilm-bound sessile cells and planktonic cells revealed differential expression of beta-glucan-specific degradation enzymes, even though maltose, an alpha-1,4 linked glucose disaccharide, was used as a growth substrate. Higher transcripts of genes encoding iron and sulfur compound transport, iron-sulfur cluster chaperones, and iron-sulfur cluster proteins suggest altered redox environments in biofilm cells. Further direct comparisons between cellobiose and maltose-grown cells suggested that transcription of cellobiose utilization genes is highly sensitive to the presence of cellobiose, or a cellobiose-maltose mixture. Increased transcripts of genes related to polysulfide reductases in cellobiose-grown cells and biofilm cells suggested that T. maritima cells in pure culture biofilms escaped hydrogen inhibition by preferentially reducing sulfur compounds, while cells in mixed culture biofilms form close associations with hydrogen-utilizing methanogens. In addition to probing issues related to the microbial physiology and ecology of T. maritima, this work illustrates the strategic use of DNA microarray-based transcriptional analysis for functional genomics studies.

APA, Harvard, Vancouver, ISO, and other styles

43

Le, Faucheur Xavier Jean Maurice. "Statistical methods for feature extraction in shape analysis and bioinformatics." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33911.

Full text

Abstract:

The presented research explores two different problems of statistical data analysis. In the first part of this thesis, a method for 3D shape representation, compression and smoothing is presented. First, a technique for encoding non-spherical surfaces using second generation wavelet decomposition is described. Second, a novel model is proposed for wavelet-based surface enhancement. This part of the work aims to develop an efficient algorithm for removing irrelevant and noise-like variations from 3D shapes. Surfaces are encoded using second generation wavelets, and the proposed methodology consists of separating noise-like wavelet coefficients from those contributing to the relevant part of the signal. The empirical-based Bayesian models developed in this thesis threshold wavelet coefficients in an adaptive and robust manner. Once thresholding is performed, irrelevant coefficients are removed and the inverse wavelet transform is applied to the clean set of wavelet coefficients. Experimental results show the efficiency of the proposed technique for surface smoothing and compression. The second part of this thesis proposes using a non-parametric clustering method for studying RNA (RiboNucleic Acid) conformations. The local conformation of RNA molecules is an important factor in determining their catalytic and binding properties. RNA conformations can be characterized by a finite set of parameters that define the local arrangement of the molecule in space. Their analysis is particularly difficult due to the large number of degrees of freedom, such as torsion angles and inter-atomic distances among interacting residues. In order to understand and analyze the structural variability of RNA molecules, this work proposes a methodology for detecting repetitive conformational sub-structures along RNA strands. Clusters of similar structures in the conformational space are obtained using a nearest-neighbor search method based on the statistical mechanical Potts model. The proposed technique is a mostly automatic clustering algorithm and may be applied to problems where there is no prior knowledge on the structure of the data space, in contrast to many other clustering techniques. First, results are reported for both single residue conformations- where the parameter set of the data space includes four to seven torsional angles-, and base pair geometries. For both types of data sets, a very good match is observed between the results of the proposed clustering method and other known classifications, with only few exceptions. Second, new results are reported for base stacking geometries. In this case, the proposed classification is validated with respect to specific geometrical constraints, while the content and geometry of the new clusters are fully analyzed.

APA, Harvard, Vancouver, ISO, and other styles

44

Akman, Kemal [Verfasser], and Klaus [Akademischer Betreuer] Förstemann. "Bioinformatics of DNA Methylation analysis / Kemal Akman. Betreuer: Klaus Förstemann." München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2014. http://d-nb.info/1071803514/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Liu, Xuan. "Sequential and parallel algorithms for sequence analysis problems in bioinformatics." Thesis, Loughborough University, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.506205.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Lees, Karen. "Data projections for the analysis and visualisation of bioinformatics data." Thesis, University of Oxford, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.496994.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Wang, Yan, and 王嫣. "Bioinformatics analysis of genetic and epigenetic factors regulating gene expression." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2015. http://hdl.handle.net/10722/208546.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Sigurgeirsson, Benjamín. "Analysis of RNA and DNA sequencing data : Improved bioinformatics applications." Doctoral thesis, KTH, Genteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-184158.

Full text

Abstract:

Massively parallel sequencing has rapidly revolutionized DNA and RNA research. Sample preparations are steadfastly advancing, sequencing costs have plummeted and throughput is ever growing. This progress has resulted in exponential growth in data generation with a corresponding demand for bioinformatic solutions. This thesis addresses methodological aspects of this sequencing revolution and applies it to selected biological topics. Papers I and II are technical in nature and concern sample preparation and data anal- ysis of RNA sequencing data. Paper I is focused on RNA degradation and paper II on generating strand specific RNA-seq libraries. Paper III and IV deal with current biological issues. In paper III, whole exomes of cancer patients undergoing chemotherapy are sequenced and their genetic variants associ- ated to their toxicity induced adverse drug reactions. In paper IV a comprehensive view of the gene expression of the endometrium is assessed from two time points of the menstrual cycle. Together these papers show relevant aspects of contemporary sequencing technologies and how it can be applied to diverse biological topics.<br><p>QC 20160329</p>

APA, Harvard, Vancouver, ISO, and other styles

49

Moulin, Serge. "Use of data analysis techniques to solve specific bioinformatics problems." Thesis, Bourgogne Franche-Comté, 2018. http://www.theses.fr/2018UBFCD049/document.

Full text

Abstract:

De nos jours, la quantité de données génétiques séquencées augmente de manière exponentielle sous l'impulsion d'outils de séquençage de plus en plus performants, tels que les outils de séquençage haut débit en particulier. De plus, ces données sont de plus en plus facilement accessibles grâce aux bases de données en ligne. Cette plus grande disponibilité des données ouvre de nouveaux sujets d'étude qui nécessitent de la part des statisticiens et bio-informaticiens de développer des outils adaptés. Par ailleurs, les progrès constants de la statistique, dans des domaines tels que le clustering, la réduction de dimension, ou les régressions entre autres, nécessitent d'être régulièrement adaptés au contexte de la bio-informatique. L’objectif de cette thèse est l’application de techniques avancées de statistiques à des problématiques de bio-informatique. Dans ce manuscrit, nous présentons les résultats de nos travaux concernant le clustering de séquences génétiques via Laplacian eigenmaps et modèle de mélange gaussien, l'étude de la propagation des éléments transposables dans le génome via un processus de branchement, l'analyse de données métagénomiques en écologie via des courbes ROC ou encore la régression polytomique ordonnée pénalisée par la norme l1<br>Nowadays, the quantity of sequenced genetic data is increasing exponentially under the impetus of increasingly powerful sequencing tools, such as high-throughput sequencing tools in particular. In addition, these data are increasingly accessible through online databases. This greater availability of data opens up new areas of study that require statisticians and bioinformaticians to develop appropriate tools. In addition, constant statistical progress in areas such as clustering, dimensionality reduction, regressions and others needs to be regularly adapted to the context of bioinformatics. The objective of this thesis is the application of advanced statistical techniques to bioinformatics issues. In this manuscript we present the results of our works concerning the clustering of genetic sequences via Laplacian eigenmaps and Gaussian mixture model, the study of the propagation of transposable elements in the genome via a branching process, the analysis of metagenomic data in ecology via ROC curves or the ordinal polytomous regression penalized by the l1-norm

APA, Harvard, Vancouver, ISO, and other styles

50

Lember, Geivi. "Sepsis-associated Escherichia coli whole-genome sequencing analysis using in-house developed pipeline and 1928 diagnostics tool." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19841.

Full text

Abstract:

Sepsis is a life-threatening condition that is caused by a dysregulated host response to infection. Timely detection of sepsis and antibiotic treatment is important for the patient’s recovery from sepsis. Usually, when sepsis is detected, immediate antibiotic treatment is started with broad-spectrum antibiotics as it takes time to determine the correct antibiotic susceptibility. To overcome this problem, next-generation sequencing is seen as one possible development in clinical diagnostics in the future. Automated bioinformatics pipelines could be used initially for surveillance purposes but eventually for rapid clinical diagnosis. Therefore, the results of 1928 Diagnostics, an automated pipeline for whole-genome sequencing (WGS) data analysis, were compared with the results of an in-house developed pipeline for manual data processing by analyzing sepsis-associated Escherichia coli (SEPEC) WGS data. The pipelines were compared by assessing their predicted antimicrobial resistance (AMR) genes, virulence genes and epidemiological relatedness. In addition, the predicted resistance genes were compared to phenotypic antimicrobial susceptibility testing (AST) data from the clinical microbiology laboratory. All the results obtained from the 1928 Diagnostics and in-house pipeline were similar but differed in the number of virulence/predicted AMR genes, AMR gene variants, detection of species and epidemiologically related E. coli samples. Moreover, the predicted AMR genes from both pipelines did not show a good overall relation to the phenotypic AST result. More studies are needed to make predictions of genes from the WGS analysis more reliable so that WGS analysis can be used as a diagnostics tool in clinical laboratories in the future.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!