To see the other types of publications on this topic, follow the link: Bioinformatic.

Dissertations / Theses on the topic 'Bioinformatic'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Bioinformatic.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Hedlund, Joel. "Bioinformatic protein family characterisation." Doctoral thesis, Linköpings universitet, Bioinformatik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-61754.

Full text
Abstract:
Biological research is necessary; not only to further our understanding of the processes of life, but also to combat disease, hunger and environmental damage. Bioinformatics is the science of handling biological information. It entails integrating, structuring and analysing the ever-increasing amounts of available biological data. In practise it means using computers to analyse huge amounts of very complicated data taken from a field that is only partially understood, to see the hidden trends and connections, and to draw useful conclusions. My thesis work has mainly concerned the study of protein families, which are groups of evolutionarily related proteins. I have analysed known protein families and created predictive models for them, and developed algorithms for defining new protein families. My principal techniques have been sequence alignments and hidden Markov models (HMM). To aid my work, I have written a lot of software, including MSAView, a visualiser for multiple sequence alignments (MSA). In this thesis, the protein family of inorganic pyrophosphatases (H+-PPases) is studied, as well as the two protein superfamilies BRICHOS and MDR (medium-chain dehydrogenases/reductases). The H+-PPases are tightly membrane bound, proton pumping, dimeric enzymes with ~700-residue subunits and found in bacteria, plants and eukaryotic parasites, and which use pyrophosphate as an alternative to ATP. The BRICHOS superfamily is only present in higher eukaryotes, but encompasses at least 8 protein families with a wide range of functions and disease associations, such as respiratory distress syndrome, dementia and cancer. The sequences are typically ~200 residues with even shorter functional forms. Finally, MDR, is a large and complex protein superfamily; it currently has over 16000 members, it is present in all kingdoms of life, the pairwise sequence identity is typically around 25 %, the chain lengths vary as does the oligomericity, and the members are partaking in a multitude of biological processes. The member families include the classical liver alcohol dehydrogenase (ADH), quinone reductase, leukotriene B4 dehydrogenase, and many more forms. There are at least 25 human MDR genes excluding close homologues. There are HMMs available for detecting MDR superfamily membership, but none for the individual families. For the H+-PPase family, we characterised member sequences found using an HMM of a conserved 57-residue region thought to form part of the active site. This region was found to contain two highly conserved nonapeptides, mainly consisting of the four “very early” residues Gly, Ala, Val and Asp, compatible with an ancient origin of the family. The two patterns have charged amino acid residues at positions 1, 5 and 9, are apparent binding sites for the substrate and parts of the active site, and were shown to be so specific for these enzymes that they can be used for automated annotation of new sequences. For the BRICHOS superfamily, we were able to find three previously unknown member families; group A, which may be ancestral to the ITM2 families (integral membrane protein 2); group B, which is a close relative to the gastrokine families, and group C, which appears to be a truly novel, disjoint BRICHOS family. The C-terminal region of group C has nearly identical sequences in all species ranging from fish to man and is seemingly unique to this family, indicating critical functional or structural properties. For the MDR superfamily, we characterised and built stable HMMs for 17 member families using an empiric approach. From our experiences we were able to develop an algorithm for automated HMM refinement that uses relationships in data to produce stable and reliable classifiers, and we used it to produce HMMs for 86 distinct MDR families. We have made the program freely available and it can be readily applied to other protein families. We also developed a web site (http://mdr–enzymes.org) that makes our findings directly useful also for non-bioinformaticians. In our analyses of the 86 families, we found that MDR forms with 2 Zn2+ ions in general are dehydrogenases, while MDR forms with no Zn2+ in general are reductases. Furthermore, in Bacteria, MDRs without Zn2+ are more frequent than those with Zn2+, while the opposite is true for eukaryotic MDRs, indicating that Zn2+ has been recruited into the MDR superfamily after the initial life kingdom separations. Multiple sequence alignments (MSA) play a central part in most work on protein families, and are integral to many bioinformatic methods. With the ongoing explosive increase of available sequence data, the scales of bioinformatic projects are growing, and efficient and human-friendly data visualisation becomes increasingly challenging, but is still essential for making new interpretations and discovering unexpected properties of the data. Ideally, visualisation should be comprehensive and detailed, and never distract with irrelevant information. It needs to offer natural and responsive ways of exploring the data, as well as provide consistent views in order to facilitate comparisons between datasets. I therefore developed MSAView, which is a fast, modular, configurable and extensible package for analysing and visualising MSAs and sequence features. It has a graphical user interface and a powerful command line client, and can be imported as a package into any Python program. It has a plugin architecture and a user extendable preset library. It can integrate and display data from online sources and launch external viewers for showing additional details. It also includes two new conservation measures; alignment divergences, which indicate atypical residues or deletions, and sequence conformances, which highlight sequences that differ from their siblings at crucial positions. In conclusion, this thesis details my work in analysing two protein superfamilies and one protein family using bioinformatic methods; developing an algorithm for automated generation of stable and reliable HMMs, as well as a new conservation measure, and a software platform for working with aligned sequences.
APA, Harvard, Vancouver, ISO, and other styles
2

Kallberg, Yvonne. "Bioinformatic methods in protein characterization /." Stockholm, 2002. http://diss.kib.ki.se/2002/91-7349-370-8/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Li, Yvonne Yiyuan. "Bioinformatic approaches to drug repositioning." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/39934.

Full text
Abstract:
Repositioning existing drugs for new therapeutic uses is an efficient approach to drug discovery. However, most successful repositioning cases to date have been serendipitous; the goal of my thesis was to use computational methods to rationally discover drug repositioning candidates. I first virtually screened (VS) 4621 drugs against 252 drug targets with molecular docking. This method emphasized removing potential false positives using stringent criteria from known interaction docking, consensus scores, and rank information. Published literature indicated experimental evidence for 31 top predicted interactions, supporting the approach. The chemotherapeutic nilotinib was validated as a potent MAPK14 inhibitor in vitro (IC50 40nM), suggesting a potential use in inflammatory diseases. I then applied this method to the cancer target EGFR, predicting the anti-HIV drug tenofovir disoproxil fumarate (TDF) as a novel inhibitor. In vitro, TDF inhibited the proliferation and EGFR-signaling of an EGFR-overexpressing cell line, but did not inhibit EGFR in direct kinase binding assays. This study highlighted limitations of computational and experimental methodologies that should be considered when interpreting or designing other studies. We then screened 1,120 off-patent drugs against the triple-negative breast cancer (TNBC) target p90RSK using both VS and high-throughput (HTS) methods. VS predicted a set of compounds 26-times enriched for known RSK inhibitors and 11 times enriched for HTS hits, underscoring its efficiency. In secondary screens, the chemotherapeutic ellipticine and the bioflavonoids luteolin and apigenin inhibited RSK activity (IC50 0.50-4.77μM), blocked RSK signaling, and inhibited TNBC cell proliferation. These drugs thus have potential to be repositioned to TNBC. Finally, we rationally repositioned renal cell carcinoma drugs for a patient with a rare tongue adenocarcinoma. Whole genome and transcriptome sequencing of the patient’s tumor and normal cells detected sequence, copy number, and expression aberrations, and analysis suggested that the tumor was driven by the RET oncogene. Treatment with RET-inhibiting drugs stabilized the disease for eight months, after which the disease progressed. We also sequenced the post-treatment tumor and found changes consistent with acquired therapeutic resistance. Overall, this thesis details two novel high-throughput approaches for drug repositioning: virtual screening of drugs and targets and personalized medicine via sequencing.
APA, Harvard, Vancouver, ISO, and other styles
4

Weinstein, Earl G. 1974. "MicroRNA cloning and bioinformatic analysis." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/8390.

Full text
Abstract:
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Biology, 2002.
Includes bibliographical references.
Part I. Two gene-regulatory noncoding RNAs (ncRNAs), let-7 RNA and lin-4 RNA, were previously discovered in the C. elegans genome. The let-7 gene is conserved across a wide range of genomes, suggesting that these ncRNAs represent a wider class of gene-regulatory RNAs. Both lin-4 and let-7 RNAs are generated from stem-loop precursor RNAs, and share a common biochemical signature, namely 5'-terminal phosphate and 3'-terminal hydroxyl groups. We refer to ncRNAs that share the characteristic size, biochemical signature, and precursor structures of let-7 and lin-4 as microRNAs (miRNAs). The size of this class of genes, and its prevalence in other genomes, are unknown. Therefore, we developed an experimental and bioinformatics strategy to identify novel miRNA genes. We discovered a total of 75 miRNA genes in the C. elegans genome, and orthologues for a majority of these were computationally identified in the C. briggsae, D. melanogaster or H. sapiens genomes. Northern analysis was used to confirm and analyze the expression of these miRNAs. The data set has implications for understanding miRNA gene regulation, miRNA processing, and regulation of miRNA genes. Part II. Directed molecular evolution has previously been applied to generate RNAs with novel structures and functions. This method works because nucleic acids can be selected, randomized, amplified and characterized using polymerase chain reaction (PCR)-based methods. Here we present a novel method for extending directed molecular evolution to the realm of peptide selections by linking a peptide to its encoding mRNA.
(cont.) A proof of principle selection for two different peptides indicates that this tRNA should prove useful in discovering more complex protein molecules using directed molecular evolution.
by Earl G. Weinstein.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
5

Leonardi, Emanuela. "Bioinformatic Analysis of Protein Mutations." Doctoral thesis, Università degli studi di Padova, 2012. http://hdl.handle.net/11577/3426280.

Full text
Abstract:
Many gene defects have been associated to genetic disorders, but the details of molecular mechanisms by which they contribute to the disease are often unclear. The study of mutation effects at the protein level can help elucidate the biological processes involved in the disease and the role of the protein in it. Bioinformatics can help to address this problem, being the connection between different disciplines including clinical, genetics, structural biology, and biochemistry. By using a computational approach I tackled the analysis of some examples of biomedical interesting proteins integrating various sources of data and addressing experimental and clinical investigations. Experimentally defined structures and molecular modelling were used as a basis to determine the protein structure-function relationship, which is essential to gain insights into disease genotype-phenotype correlation. Proteins have been further analyzed in their context, considering interactions that they take in specific cellular compartments. The results have been used to formulate functional hypotheses, which in some cases have been tested and confirmed by further investigations performed by cooperation groups. Mutations found in genes encoding these proteins have been evaluated for their impact on the protein structure and function by using several available prediction methods. These studies provided the idea for developing novel approaches, using residue interaction networks and an ensemble of methods. A novel strategy has been also designed to evaluate genomic data obtained by next generation sequencing technology. This consists in using available resources and software to prioritize rare functional variants and estimate their contribution to the disease. The novel approaches developed in this thesis have been applied and assessed at the Critical Assessment of Genome Interpretation (CAGI) experiment in 2011, providing in some cases very successful results
Alterazioni genetiche sono state identificate per molte malattie di natura genetica, ma in molti casi i meccanismi molecolari che contribuiscono all’insorgere della malattia non sono ancora chiari. Lo studio degli effetti delle mutazioni a livello della proteina permette di chiarire i processi biologici coinvolti nella malattia e il ruolo della proteina in essa. La bioinformatica può aiutare a affrontare questo problema rappresentando il punto di connessione tra diverse discipline quali la clinica, la genetica, la biologia strutturale e la biochimica. In questa tesi ho impiegato un approccio computazionale per affrontare l’analisi di alcuni esempi di proteine di interesse biomedico, integrando diverse risorse di dati e indirizzando la ricerca sperimentale e clinica. Strutture proteiche determinate sperimentalmente o mediante il modelling molecolare sono state utilizzate come base per determinare la relazione tra struttura e funzione, essenziale per ottenere informazioni sulla correlazione genotipo-fenotipo. Le proteine prese in esame sono state inoltre analizzate nel loro contesto, considerando le interazioni che avvengono con altre proteine o ligandi nei diversi compartimenti cellulari. I risultati dell’analisi bioinformatica sono stati poi utilizzati per formulare ipotesi funzionali che in alcuni casi sono state verificate e confermate sperimentalmente da altri gruppi di ricerca. Le mutazioni identificate nei geni codificanti per le proteine in esame sono state valutate per il loro impatto sulla struttura e funzione della proteina utilizzando numerosi metodi di predizione disponibili online. Le diverse applicazioni descritte in questa tesi hanno fornito l’idea per lo sviluppo di nuovi approcci computazionali per lo caratterizzazione strutturale e funzionale di proteine e dei loro mutanti. Si è visto che la predizione migliora utilizzando un ensemble dei diversi metodi di predizione disponibili. Inoltre, per la predizione degli effetti di mutazioni è stato ideato un nuovo approccio computazionale che utilizza le reti di interazione tra residui per rappresentare la struttura proteica. Questi metodi sono stati utilizzati anche nell’analisi di dati genomici originati da nuove tecnologie di sequenziamento. Questo ambito necessita di nuove strategie di indagine per l’individuazione di poche varianti causative in un’enorme quantità di varianti identificate di dubbio significato. A questo scopo viene proposta una strategia di analisi che utilizza informazioni derivanti dalle reti di interazioni proteiche. I nuovi approcci formulati in questa tesi sono stati applicati e valutati ad un nuovo esperimento internazionale, chiamato Critical Assessment of Genome Interpretation (CAGI), fornendo in alcuni casi ottimi risultati
APA, Harvard, Vancouver, ISO, and other styles
6

Bertoldi, Loris. "Bioinformatics for personal genomics: development and application of bioinformatic procedures for the analysis of genomic data." Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3421950.

Full text
Abstract:
In the last decade, the huge decreasing of sequencing cost due to the development of high-throughput technologies completely changed the way for approaching the genetic problems. In particular, whole exome and whole genome sequencing are contributing to the extraordinary progress in the study of human variants opening up new perspectives in personalized medicine. Being a relatively new and fast developing field, appropriate tools and specialized knowledge are required for an efficient data production and analysis. In line with the times, in 2014, the University of Padua funded the BioInfoGen Strategic Project with the goal of developing technology and expertise in bioinformatics and molecular biology applied to personal genomics. The aim of my PhD was to contribute to this challenge by implementing a series of innovative tools and by applying them for investigating and possibly solving the case studies included into the project. I firstly developed an automated pipeline for dealing with Illumina data, able to sequentially perform each step necessary for passing from raw reads to somatic or germline variant detection. The system performance has been tested by means of internal controls and by its application on a cohort of patients affected by gastric cancer, obtaining interesting results. Once variants are called, they have to be annotated in order to define their properties such as the position at transcript and protein level, the impact on protein sequence, the pathogenicity and more. As most of the publicly available annotators were affected by systematic errors causing a low consistency in the final annotation, I implemented VarPred, a new tool for variant annotation, which guarantees the best accuracy (>99%) compared to the state-of-the-art programs, showing also good processing times. To make easy the use of VarPred, I equipped it with an intuitive web interface, that allows not only a graphical result evaluation, but also a simple filtration strategy. Furthermore, for a valuable user-driven prioritization of human genetic variations, I developed QueryOR, a web platform suitable for searching among known candidate genes as well as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive, flexible and easy to use. The prioritization is achieved by a global positive selection process that promotes the emergence of the most reliable variants, rather than filtering out those not satisfying the applied criteria. QueryOR has been used to analyze the two case studies framed within the BioInfoGen project. In particular, it allowed to detect causative variants in patients affected by lysosomal storage diseases, highlighting also the efficacy of the designed sequencing panel. On the other hand, QueryOR simplified the recognition of LRP2 gene as possible candidate to explain such subjects with a Dent disease-like phenotype, but with no mutation in the previously identified disease-associated genes, CLCN5 and OCRL. As final corollary, an extensive analysis over recurrent exome variants was performed, showing that their origin can be mainly explained by inaccuracies in the reference genome, including misassembled regions and uncorrected bases, rather than by platform specific errors.
Nell’ultimo decennio, l’enorme diminuzione del costo del sequenziamento dovuto allo sviluppo di tecnologie ad alto rendimento ha completamente rivoluzionato il modo di approcciare i problemi genetici. In particolare, il sequenziamento dell’intero esoma e dell’intero genoma stanno contribuendo ad un progresso straordinario nello studio delle varianti genetiche umane, aprendo nuove prospettive nella medicina personalizzata. Essendo un campo relativamente nuovo e in rapido sviluppo, strumenti appropriati e conoscenze specializzate sono richieste per un’efficiente produzione e analisi dei dati. Per rimanere al passo con i tempi, nel 2014, l’Università degli Studi di Padova ha finanziato il progetto strategico BioInfoGen con l’obiettivo di sviluppare tecnologie e competenze nella bioinformatica e nella biologia molecolare applicate alla genomica personalizzata. Lo scopo del mio dottorato è stato quello di contribuire a questa sfida, implementando una serie di strumenti innovativi, al fine di applicarli per investigare e possibilmente risolvere i casi studio inclusi all’interno del progetto. Inizialmente ho sviluppato una pipeline per analizzare i dati Illumina, capace di eseguire in sequenza tutti i processi necessari per passare dai dati grezzi alla scoperta delle varianti sia germinali che somatiche. Le prestazioni del sistema sono state testate mediante controlli interni e tramite la sua applicazione su un gruppo di pazienti affetti da tumore gastrico, ottenendo risultati interessanti. Dopo essere state chiamate, le varianti devono essere annotate al fine di definire alcune loro proprietà come la posizione a livello del trascritto e della proteina, l’impatto sulla sequenza proteica, la patogenicità, ecc. Poiché la maggior parte degli annotatori disponibili presentavano errori sistematici che causavano una bassa coerenza nell’annotazione finale, ho implementato VarPred, un nuovo strumento per l’annotazione delle varianti, che garantisce la migliore accuratezza (>99%) comparato con lo stato dell’arte, mostrando allo stesso tempo buoni tempi di esecuzione. Per facilitare l’utilizzo di VarPred, ho sviluppato un’interfaccia web molto intuitiva, che permette non solo la visualizzazione grafica dei risultati, ma anche una semplice strategia di filtraggio. Inoltre, per un’efficace prioritizzazione mediata dall’utente delle varianti umane, ho sviluppato QueryOR, una piattaforma web adatta alla ricerca all’interno dei geni causativi, ma utile anche per trovare nuove associazioni gene-malattia. QueryOR combina svariate caratteristiche innovative che lo rendono comprensivo, flessibile e facile da usare. La prioritizzazione è raggiunta tramite un processo di selezione positiva che fa emergere le varianti maggiormente significative, piuttosto che filtrare quelle che non soddisfano i criteri imposti. QueryOR è stato usato per analizzare i due casi studio inclusi all’interno del progetto BioInfoGen. In particolare, ha permesso di scoprire le varianti causative dei pazienti affetti da malattie da accumulo lisosomiale, evidenziando inoltre l’efficacia del pannello di sequenziamento sviluppato. Dall’altro lato invece QueryOR ha semplificato l’individuazione del gene LRP2 come possibile candidato per spiegare i soggetti con un fenotipo simile alla malattia di Dent, ma senza alcuna mutazione nei due geni precedentemente descritti come causativi, CLCN5 e OCRL. Come corollario finale, è stata effettuata un’analisi estensiva su varianti esomiche ricorrenti, mostrando come la loro origine possa essere principalmente spiegata da imprecisioni nel genoma di riferimento, tra cui regioni mal assemblate e basi non corrette, piuttosto che da errori piattaforma-specifici.
APA, Harvard, Vancouver, ISO, and other styles
7

Markstedt, Olof. "Kubernetes as an approach for solving bioinformatic problems." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-330217.

Full text
Abstract:
The cluster orchestration tool Kubernetes enables easy deployment and reproducibility of life science research by utilizing the advantages of the container technology. The container technology allows for easy tool creation, sharing and runs on any Linux system once it has been built. The applicability of Kubernetes as an approach to run bioinformatic workflows was evaluated and resulted in some examples of how Kubernetes and containers could be used within the field of life science and how they should not be used. The resulting examples serves as proof of concepts and the general idea of how implementation is done. Kubernetes allows for easy resource management and includes automatic scheduling of workloads. It scales rapidly and has some interesting components that are beneficial when conducting life science research.
APA, Harvard, Vancouver, ISO, and other styles
8

Hull, Duncan. "Semantic matching of bioinformatic web services." Thesis, University of Manchester, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.497578.

Full text
Abstract:
Understanding bioinformatic data on the Web often requires the interoperation of heterogeneous and autonomous services. Unfortunately, getting many different services to interoperate is problematic, and frequently requires cumbersome shim components which can be difficult to describe and discover using existing techniques. The use of description logic reasoning has been proposed as a method for improving discovery of services, by classifying advertisements and matchmaking them with requests on the semantic Web. However, theoretical approaches to reasoning with semantic Web services have not been adequately tested on realistic scenarios while practical approaches have not fully investigated or applied useful aspects of current theory.
APA, Harvard, Vancouver, ISO, and other styles
9

Cova, Marta Alexandra Mendonça Nóbrega. "Bioinformatic analysis of the neuronal phosphoproteome." Master's thesis, Universidade de Aveiro, 2013. http://hdl.handle.net/10773/11623.

Full text
Abstract:
Mestrado em Biomedicina Molecular
A fosforilação anormal de proteínas é uma das características chave da Doença de Alzheimer (DA) que pode estar envolvida tanto na patogénese como na progressão da doença. A fosforilação reversível de proteínas representa um importante mecanismo regulador que envolve a atividade de fosfoproteínas fosfatases (FPF) e proteínas cinases (PC). Um desequilíbrio intracelular entre a actividade de FPF e PC pode alterar a atividade, localização subcelular e interacções de proteínas, contribuindo para a desregulação da função e sinalização neuronal e, consequentemente para a neurodegeneração. Assim, o estudo do fosfoproteoma neuronal da DA tornase relevante tanto do ponto de vista fisiológico como patológico. Culturas primárias corticais foram expostas ao ácido ocadáico (AO, um inibidor de PPP) ou ao péptido β amilóide (Aβ) para mimetizar as condições da DA. Os lisados celulares foram aplicados numa coluna de afinidade para fosfoproteínas. As frações enriquecidas em fosfoproteínas foram analisadas por espetrometria de massa tendo sido desenvolvido um script em linguagem python (http://sourceforge.net/projects/protdb/) para análise das proteínas identificadas. Os resultados provenientes das condições Controlo vs AO indicam que o tratamento com este inibidor de FPF leva a um aumento do número de fosfoproteínas (174 vs 242 proteínas totais e 32 vs 100 proteínas exclusivas). Os resultados do tratamento com Aβ indicam uma alteração qualitativa do fosfoproteoma neuronal (174 vs 166 proteínas totais) com um número considerável de proteínas exclusivas (42 vs 34 proteínas exclusivas). Subsequentemente, para a obtenção de informação detalhada e caracterização das proteínas identificadas em cada condição, foi realizada uma análise exploratória das fosfoproteínas organizando-as por classe proteica, processos biológicos, localização subcelular e funções moleculares. Os tratamentos com AO e Aβ levam a alterações em proteínas envolvidas em processos celulares que se encontram comprometidos na DA, tais como a actividade das PC e FPF, degradação proteica, stress oxidativo, folding proteico, dinâmica do citoesqueleto, síntese proteica e apoptose. A caracterização do fosfoproteoma neuronal da DA pode revelar ou elucidar os mecanismos moleculares subjacentes à transdução de sinais anormal associada com a patogénese da doença. A análise das fosfoproteínas exclusivas poderá, também, contribuir para a identificação de potenciais novos biomarcadores ou alvos terapêuticos para a DA.
Abnormal protein phosphorylation is a characteristic hallmark of Alzheimer’s disease (AD) and may be implicated both in pathogenesis or disease progression. Reversible protein phosphorylation represents a key regulatory mechanism involving the activity of protein phosphatases (PPP) and protein kinases (PK). Imbalanced PPP and PK activity can alter protein action, subcellular localization and protein interactions, thus contributing to abnormal neuronal function and signaling and consequently to neurodegeneration. Hence, the study of the AD neuronal phosphoproteome is of physiological and pathological relevance. Primary cortical cultures were exposed to okadaic acid (OA, a PPP inhibitor) or amyloid-β peptide (Aβ), in order to mimic AD conditions. Cell lysates were applied to a phosphoprotein affinity column and phosphoprotein enriched fractions analyzed by mass spectrometry. A protein database management framework (http://sourceforge.net/projects/protdb/) was set up allowing for the development of a script to analyze the identified proteins. Data from Control vs OA conditions indicates that OA treatment leads to an increase in phosphoproteins (174 vs 242 proteins and 32 vs 100 exclusive proteins). Data indicates that Aβ treatment leads to a shift in neuronal phosphoproteome pool (174 vs 166 proteins) with noteworthy alterations in the exclusive neurophosphoproteome (42 vs 34 exclusive proteins). Subsequently, analysis of the protein classes, biological processes, subcellular localization and molecular functions allowed for detailed information regarding the proteins obtained in the different groups. Upon treatments an alteration in the proteins involved in critical processes impaired in AD such as PK and PPP activities, protein degradation, oxidative stress, protein folding, cytoskeleton network dynamics, protein synthesis and apoptosis was observed. The characterization of AD neuronal phosphoproteome may reveal or elucidate the molecular mechanisms underlying abnormal signal transduction associated with AD pathogenesis. Further, by analyzing the pool of exclusive proteins, this work may also contribute to identify potential novel biomarker candidates or AD targets for therapeutic intervention.
APA, Harvard, Vancouver, ISO, and other styles
10

Atkinson, Samantha Nicole. "Bioinformatic assessment of disrupted microbial communities." Diss., University of Iowa, 2019. https://ir.uiowa.edu/etd/6696.

Full text
Abstract:
Bioinformatics is a unique field in that it incorporates many different disciplines, including biology, computer science, and statistics, to study biological data. There is a vast array of techniques that utilize bioinformatics, including pangenomics, RNASeq, whole genome metagenomics, and 16S sequencing. To study bacterial interactions, we used a model system of species interactions, Myxococcus xanthus. M. xanthus is a soil bacterium that is a known predator of other bacteria. It has one of the largest repertoires of two component systems (TCS) to respond to external stresses. TCS are a pair of proteins, one that senses environmental stress (histidine kinase, HK) and another that usually acts as a transcriptional regulator (response regulators, RR). We studied a class of RRs, NtrC-like, reliant on an alternative sigma factor, sigma54. The oligomerization of NtrC-like RRs is regulated to modulate activation of the protein, which would change the bacterium’s ability to respond to its environment. We studied HsfA, a NtrC-like RR that regulates specialized metabolites. Specialized metabolites are used in bacterial interactions. In predation interactions they are used to kill prey. Our goal was to find genes that might be involved in specialized metabolite production that would aid in predation. We used prediction tools to find putative binding sites of HsfA to find potentially new metabolites. We used two motifs to attempt to predict if the oligomerization of these response regulators is positively or negatively regulated. We found that the presence of a motif in the receiver domain to be associated with negative regulation of oligomerization, but further studies are needed to experimentally confirm this finding. One environment in which bacterial interactions occur is in the gut. The gut microbiome is the consortium of organisms and their genomic content in the gastrointestinal tract. The gut microbiome is sensitive to aspects of a person’s lifestyle, such as diet and medication. Here we studied the effect of two different diets and two drugs on the gut microbiome. Risperidone, an antipsychotic used to treat schizophrenia and bipolar disorder, has been shown to cause obesity and diabetes. We studied the effect of diet and risperidone usage on weight gain and the microbiome using a C57Bl/6J female mouse model. Our results show that diet has a strong impact on the microbial composition of the gut in response to risperidone. As many mental health patients stop and restart their medication, we examined the effect of stopping and restarting risperidone on the microbiome. When risperidone is stopped the microbiome reverts to a state similar to the control group but diverges into a different microbial composition upon restarting treatment. Interestingly, mice did not gain significantly more weight than their control group upon the second risperidone treatment. Further studies are needed to examine the functional changes occurring with the stop and restart of risperidone to determine the mechanism of mice resisting weight gain during the second round of treatment. Captopril is used to treat hypertension, a very common disease in the United States. Here we studied the effect of captopril on weight gain, metabolic phenotypes, and the gut microbiome. Our results showed that captopril caused an increase in resting metabolic rate (RMR) in mice. This occurred through an increase in energy expenditure. This increase in RMR had the effect of captopril-treated mice being resistant to weight gain. Our group has previously shown that the gut microbiome can directly affect RMR. Therefore, we studied the gut microbiome of captopril-treated mice. We observed a shift in their gut microbiome to organisms Akkermansia muciniphila and Lactobacillus, associated with lean body mass. Captopril therefore has the potential to be a better medication to treat patients with both hypertension and obesity. Further studies are needed to determine the effect of captopril on the microbiome in a hypertension mouse model.
APA, Harvard, Vancouver, ISO, and other styles
11

Fronza, Raffaele <1971&gt. "Bioinformatic methods in applied genomic research." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2011. http://amsdottorato.unibo.it/3567/1/fronza_raffaele_tesi.pdf.

Full text
Abstract:
Here I will focus on three main topics that best address and include the projects I have been working in during my three year PhD period that I have spent in different research laboratories addressing both computationally and practically important problems all related to modern molecular genomics. The first topic is the use of livestock species (pigs) as a model of obesity, a complex human dysfunction. My efforts here concern the detection and annotation of Single Nucleotide Polymorphisms. I developed a pipeline for mining human and porcine sequences. Starting from a set of human genes related with obesity the platform returns a list of annotated porcine SNPs extracted from a new set of potential obesity-genes. 565 of these SNPs were analyzed on an Illumina chip to test the involvement in obesity on a population composed by more than 500 pigs. Results will be discussed. All the computational analysis and experiments were done in collaboration with the Biocomputing group and Dr.Luca Fontanesi, respectively, under the direction of prof. Rita Casadio at the Bologna University, Italy. The second topic concerns developing a methodology, based on Factor Analysis, to simultaneously mine information from different levels of biological organization. With specific test cases we develop models of the complexity of the mRNA-miRNA molecular interaction in brain tumors measured indirectly by microarray and quantitative PCR. This work was done under the supervision of Prof. Christine Nardini, at the “CAS-MPG Partner Institute for Computational Biology” of Shangai, China (co-founded by the Max Planck Society and the Chinese Academy of Sciences jointly) The third topic concerns the development of a new method to overcome the variety of PCR technologies routinely adopted to characterize unknown flanking DNA regions of a viral integration locus of the human genome after clinical gene therapy. This new method is entirely based on next generation sequencing and it reduces the time required to detect insertion sites, decreasing the complexity of the procedure. This work was done in collaboration with the group of Dr. Manfred Schmidt at the Nationales Centrum für Tumorerkrankungen (Heidelberg, Germany) supervised by Dr. Annette Deichmann and Dr. Ali Nowrouzi. Furthermore I add as an Appendix the description of a R package for gene network reconstruction that I helped to develop for scientific usage (http://www.bioconductor.org/help/bioc-views/release/bioc/html/BUS.html).
APA, Harvard, Vancouver, ISO, and other styles
12

Fronza, Raffaele <1971&gt. "Bioinformatic methods in applied genomic research." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2011. http://amsdottorato.unibo.it/3567/.

Full text
Abstract:
Here I will focus on three main topics that best address and include the projects I have been working in during my three year PhD period that I have spent in different research laboratories addressing both computationally and practically important problems all related to modern molecular genomics. The first topic is the use of livestock species (pigs) as a model of obesity, a complex human dysfunction. My efforts here concern the detection and annotation of Single Nucleotide Polymorphisms. I developed a pipeline for mining human and porcine sequences. Starting from a set of human genes related with obesity the platform returns a list of annotated porcine SNPs extracted from a new set of potential obesity-genes. 565 of these SNPs were analyzed on an Illumina chip to test the involvement in obesity on a population composed by more than 500 pigs. Results will be discussed. All the computational analysis and experiments were done in collaboration with the Biocomputing group and Dr.Luca Fontanesi, respectively, under the direction of prof. Rita Casadio at the Bologna University, Italy. The second topic concerns developing a methodology, based on Factor Analysis, to simultaneously mine information from different levels of biological organization. With specific test cases we develop models of the complexity of the mRNA-miRNA molecular interaction in brain tumors measured indirectly by microarray and quantitative PCR. This work was done under the supervision of Prof. Christine Nardini, at the “CAS-MPG Partner Institute for Computational Biology” of Shangai, China (co-founded by the Max Planck Society and the Chinese Academy of Sciences jointly) The third topic concerns the development of a new method to overcome the variety of PCR technologies routinely adopted to characterize unknown flanking DNA regions of a viral integration locus of the human genome after clinical gene therapy. This new method is entirely based on next generation sequencing and it reduces the time required to detect insertion sites, decreasing the complexity of the procedure. This work was done in collaboration with the group of Dr. Manfred Schmidt at the Nationales Centrum für Tumorerkrankungen (Heidelberg, Germany) supervised by Dr. Annette Deichmann and Dr. Ali Nowrouzi. Furthermore I add as an Appendix the description of a R package for gene network reconstruction that I helped to develop for scientific usage (http://www.bioconductor.org/help/bioc-views/release/bioc/html/BUS.html).
APA, Harvard, Vancouver, ISO, and other styles
13

Chiara, M. "BIOINFORMATIC TOOLS FOR NEXT GENERATION GENOMICS." Doctoral thesis, Università degli Studi di Milano, 2012. http://hdl.handle.net/2434/173424.

Full text
Abstract:
New sequencing strategies have redefined the concept of “high-throughput sequencing” and many companies, researchers, and recent reviews use the term “Next-Generation Sequencing” (NGS) instead of high-throughput sequencing. These advances have introduced a new era in genomics and bioinformatics⁠⁠. During my years as PhD student I have developed various software, algorithms and procedures for the analysis of Nest Generation sequencing data required for distinct biological research projects and collaborations in which our research group was involved. The tools and algorithms are thus presented in their appropriate biological contexts. Initially I dedicated myself to the development of scripts and pipelines which were used to assemble and annotate the mitochondrial genome of the model plant Vitis vinifera. The sequence was subsequently used as a reference to study the RNA editing of mitochondrial transcripts, using data produced by the Illumina and SOLiD platforms. I subsequently developed a new approach and a new software package for the detection of of relatively small indels between a donor and a reference genome, using NGS paired-end (PE) data and machine learning algorithms. I was able to show that, suitable Paired End data, contrary to previous assertions, can be used to detect, with high confidence, very small indels in low complexity genomic contexts. Finally I participated in a project aimed at the reconstruction of the genomic sequences of 2 distinct strains of the biotechnologically relevant fungus Fusarium. In this context I performed the sequence assembly to obtain the initial contigs and devised and implemented a new scaffolding algorithm which has proved to be particularly efficient.
APA, Harvard, Vancouver, ISO, and other styles
14

Prazzoli, G. M. "BIOINFORMATIC TOOLS FOR NEXT GENERATION TRANSCRIPTOMICS." Doctoral thesis, Università degli Studi di Milano, 2015. http://hdl.handle.net/2434/275276.

Full text
Abstract:
In the last few years the introduction of novel technologies known as “next-generation sequencing” (NGS) has brought a major step forward in sequencing. These techniques have practically supplanted the conventional Sanger strategies that have been the principal method of sequencing DNA since the late 1970s. Different NGS platforms have been introduced, with the newest using ion-sensitive sensors to detect the incorporation of bases as opposed to the more commonly used fluorescent labelled nucleotides. Since the first techniques were introduced, both the sequencing runtime and the cost per sequenced base have dramatically decreased, and, at the current state of the art, a complete human genome can be fully sequenced in under 24 hours. On the other hand, the ever-increasing amount of short sequences (or reads) yielded per single run makes the processing of the data more difficult and challenging from a computational point of view. One of the most prominent and promising fields of application is RNA-Seq, an assay that provides a fast and reliable way to study transcriptomic variability on a whole-genome scale. Generally, in a RNA-Seq experiment, a RNA sample is converted in a cDNA library, which then undergoes several cycles of sequencing with a NGS method of choice. Usually, the resulting sequences are either mapped on the reference genome or assembled de novo without the aid of genomic sequence to produce a genome-scale transcription map, or trascriptome. The data analyzed in this thesis comes from a three year research project focused on the characterization of tissue- and individual-specific alternative splicing, and its regulation. Data consist of several RNA-Seq experiments performed on different human tissues, coming from three healthy individuals. A total of 18 sets of data (6 tissues from three individuals with 3 replicates for each) were studied. The work initially focused on the quantification of mitochondrial DNA and RNA in the six individuals, and its variability. Then, we developed a computational method for the identification of tissue- and individual- specific transcripts, able to perform a multi-sample comparison. The algorithm we implemented employs statistical test based on a variant of Shannon’s information entropy, in order to identify transcripts with an expression pattern presenting a significant bias towards one or more of the samples studied. The results obtained show the method to be robust and efficient, overcoming the need of performing pairwise comparison as with the algorithms currently available, providing a thorough and complete map of the extent of tissue-specificity of gene expression at the single individual level.
APA, Harvard, Vancouver, ISO, and other styles
15

Fagerberg, Linn. "Mapping the human proteome using bioinformatic methods." Doctoral thesis, KTH, Proteomik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-31477.

Full text
Abstract:
The fundamental goal of proteomics is to gain an understanding of the expression and function of the proteome on the level of individual proteins, on the level of defined cell types and on the level of the entire organism. In this thesis, the human proteome is explored using membrane protein topology prediction methods to define the human membrane proteome and by global protein expression profiling, which relies on a complex study of the location and expression levels of proteins in tissues and cells. A whole-proteome analysis was performed based on the predicted protein-coding genes of humans using a selection of membrane protein topology prediction methods. The study used a majority decision-based method, which estimated that approximately 26% of the human genes encode for a membrane protein. The prediction results are displayed in a visualization tool to facilitate the selection of antigens to be used for antibody generation. Global protein expression profiles in a large number of cells and tissues in the human body were analyzed for more than 4000 protein targets, based on data from the antibody-based immunohistochemistry and immunofluorescence methods within the framework of the Human Protein Atlas project. The results revealed few cell-type specific proteins and a high fraction of human proteins expressed in most cells, suggesting that cell and tissue specificity is attained by a fine-tuned regulation of protein levels. The expression profiles were also used to analyze the relationship between 45 cell lines by hierarchical clustering and principal component analysis. The global protein expression patterns overall reflected the tumor origin of the cells, and also allowed for identification of proteins of importance for distinguishing different categories of cell lines, as defined by phenotype of progenitor cell. In addition, the protein distribution in 16 subcellular compartments in three of the human cell lines was mapped. A large fraction of proteins were localized in two or more compartments and, in line with previous results, a majority of proteins were detected in all three cell lines. Finally, mass spectrometry-based protein expression levels were compared to RNA-seq-based transcript expression levels in three cell lines. Highly ubiquitous mRNA expression was found and the changes of expression levels between the cell lines showed high correlations between proteins and transcripts. Large general differences in abundance of proteins from various functional classes were observed. A comparison between categories based on expression levels revealed that, in general, genes with varying expression levels between the cell lines or only expressed in one cell line were highly enriched for cell-surface proteins. These studies show a path for a systematic analysis to characterize the proteome in human cells, tissues and organs.
QC 20110317
The Human Protein Atlas project
APA, Harvard, Vancouver, ISO, and other styles
16

Casteleijn, M. G. (Marinus G. ). "Towards new enzymes:protein engineering versus bioinformatic studies." Doctoral thesis, University of Oulu, 2010. http://urn.fi/urn:isbn:9789514260995.

Full text
Abstract:
Abstract The aim of this PhD-study was to address some of the overlapping bottlenecks in protein engineering and metagenomics by developing or applying new tools which are useful for both disciplines. Two enzymes were studied as an example: Triosephosphate Isomerase (TIM) and Uridine Phosphorylase (UP). TIM is an important enzyme of the glycolysis pathway and has been investigated via means of protein engineering, while UP is a key enzyme in the pyrimidine-salvage pathway. In this thesis TIM was used to address protein engineering aspects, while UP was used in regards to some metagenomic and bioinformatic aspects. The aspects of a structural driven rational design approach and its implications for further engineering of monomeric TIM variants are discussed. Process development based on a new technology, EnBase®, addresses the relative instability of new variants, compared to its ancestors, for further studies. EnBase® is then applied for the production of 15N isotope labeling of a monomeric TIM variant, A-TIM. Systematical function- and engineering studies on dimeric TIM and monomeric TIM in regards to the hinges of the catalytic loop-6 were conducted to investigate enzyme activity and stability. Both the A178L and P168A were proposed to induce loop-6 closure, a wanted feature for A-TIM variants. The P168A mutants are hardly active, but gave great insight into the catalytic machinery, while the A178L mutants did induce partial loop-6 closure, however in addition, monomeric A178L was destabilized. Homology driven genome mining and subsequent isolation- high throughput (HTP) overexpression of a thermostable UP from the Archaea Aeopyrum pernix was carried out as an example for the production of recombinant proteins. In addition an alternative kinetic method to study the kinetics of UP by means of NMR directly from cell lysate is discussed. The combination of expression libraries and EnBase® in a HTP manner may relieve up the gene-to-product bottleneck. The structural aspects of A. pernix UP are explored by means of simple bioinformatic tools in the last section of this thesis. A thermostable, truncated version of UP was created and its use for protein engineering in the future is explored. The long N-terminal and C-terminal ends of A. pernix UP seem to be involved in stabilizing the dimeric and hexameric structures of UP. However, deletion of the N-terminal end of A. pernix UP yielded a thermostable protein. Overall, the finding in regards to process optimization and HTP expression and optimization and the underlying methods used in the TIM studies and the UP studies are interchangeable.
APA, Harvard, Vancouver, ISO, and other styles
17

Morrissy, Anca Sorana. "Bioinformatic analysis of cis-encoded antisense transcription." Thesis, University of British Columbia, 2010. http://hdl.handle.net/2429/30509.

Full text
Abstract:
A key first step in understanding cellular processes is a quantitative and comprehensive measurement of gene expression profiles. The scale and complexity of the mammalian transcriptome is a significant challenge to efforts aiming to identify the complete set of expressed transcripts. Specifically, detection of low-abundance sequences, such as antisense transcripts, has historically been difficult to achieve using EST libraries, microarrays, or tag sequencing methods. Antisense transcripts are expressed from the opposite strand of a partner gene, and in some cases can regulate the processing of the sense transcript, highlighting their biological relevance. Recently, efficient profiling of low-frequency transcripts was made possible with the advent of next generation sequencing platforms. Thus, a major goal of my thesis was to assess the prevalence of antisense transcripts using Tag-seq, a tag sequencing method modified to take advantage of the Illumina sequencing platform. The increase in sampling depth provided by Tag-seq resulted in significantly improved detection of low abundance antisense transcripts, and allowed accurate measurements of their differential expression across normal and cancerous states. While antisense transcription is known to regulate sense transcript processing at a small number of loci, no genome wide assessments of this regulatory interaction exist. I addressed this knowledge gap using Affymetrix exon arrays, and found a significant correlation between antisense transcription and alternative splicing in normal human cells. Further exploring the biological relevance of antisense-correlated splicing events in human disease, I found that these events could be used to identify clinically distinct subtypes of cancer. Together, the findings in this thesis provide a new foundation for the investigation of antisense transcripts in the regulation of alternative transcript processing, and open new avenues of research into understanding the molecular heterogeneity of human cancers.
APA, Harvard, Vancouver, ISO, and other styles
18

Jones, Katy June. "Bioinformatic analysis of biotechnologically important microbial communities." Thesis, University of Exeter, 2018. http://hdl.handle.net/10871/34543.

Full text
Abstract:
Difficulties associated with the study of microbial communities, such as low proportions of cultivable species, have been addressed in recent years with the advent of a range of sequencing technologies and bioinformatic tools. This is enabling previously unexplored communities to be characterised and utilised in a range of biotechnology applications. In this thesis bioinformatic methods were applied to two datasets of biotechnological interest: microbial communities found living with the oil-producing alga Botryococcus braunii and microbial communities in acid mine drainage (AMD). B. braunii is of high interest to the biofuel industry due to its ability to produce high amounts of oils, in the form of hydrocarbons. However, a number of factors, including low growth rates, have prevented its cultivation on an industrial scale. Studies show B. braunii lives in a consortium with numerous bacteria which may influence its growth. This thesis reports both whole genome analysis and 16S rRNA gene sequence analysis to gain a greater understanding of the B. braunii bacterial consortium. Bacteria have been identified, some of which had not previously been documented as living with B. braunii, and evidence is presented for ways in which they may influence growth of the alga, including B-vitamin synthesis and secretion systems. AMD is a worldwide problem, polluting the environment and negatively impacting on human health. This by-product of the mining industry is a problem in the South West of England, where disused metalliferous mines are now a source of AMD. Bioremediation of AMD is an active area of research; sulphur-reducing bacteria and other bacteria which can remove toxic metals from AMD can be utilised for this purpose. Identifying bacteria and archaea that are able to thrive in AMD and which also have these bioremediation properties is therefore of great importance. Metagenomic sequencing has been carried out on the microbial community living in AMD sediment at the Wheal Maid tailings lagoon near Penryn in Cornwall. From these data have been identified a diverse range of bacteria and archaea present at both the sediment surface level and at depth, including microorganisms closely related to taxa reported from metalliferous mines on other continents. Evidence has been found of sulphur-reducing bacteria and of pathways for various other bioremediation-linked processes.
APA, Harvard, Vancouver, ISO, and other styles
19

Moreno, Cortez Pablo Andres. "Bioinformatic methods for species-specific metabolome inference." Thesis, University of Cambridge, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.607925.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Wagstaff, John Francis. "Generating bioinformatic resources for L1-dependent retrotransposons." Thesis, University of Leicester, 2014. http://hdl.handle.net/2381/29050.

Full text
Abstract:
Human retrotransposons are genetic elements that copy themselves into new locations in the genome by way of an RNA intermediate. They are extremely numerous making up at least 45% of human DNA. Retrotransposon insertions are a major source of inter-human genetic variation, and have been known to cause disease. They are also intrinsically difficult to analyse in genomes due to their highly repetitive nature. In humans there are three currently active retro-transposable elements: LINE-1, Alu and SVA. LINE-1 is an independent element and Alu and SVA parasitise the LINE-1 retrotransposition machinery. There are experimental ways of discovering and analysing such elements, but they require significant investment, while human sequence datasets containing potentially usable data are multiplying at an ever increasing rate. In particular there are now many assembled human genome sequences as well as new sources of whole genome high throughput sequencing data, such as the 1000 Genomes Project. For this reason this study is devoted to using bioinformatic approaches to extract new knowledge about human retrotransposons from the existing datasets. Previous efforts, by past members of this research group, have been devoted to analysing the genomic variation of the LINE-1 element itself. However this study focuses on the extraction of presence / absence variation in the LINE-1 -dependent elements, Alu and SVA. In addition to building software to extract this information from a wide variety of data sources, this project has also involved making the information data available to non-specialist researchers in the form of a website. The tools developed and described here utilise generic design principles, enabling rapid, largely automated updating, necessary with the constant expansion of the underlying data.
APA, Harvard, Vancouver, ISO, and other styles
21

Cui, Chenming. "Integrating bioinformatic approaches to promote crop resilience." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/94424.

Full text
Abstract:
Even under the best management strategies contemporary crops face yield losses from diverse threats such as, pathogens, pests, and environmental stress. Adding to this management challenge is that under current global climate projections these impacts are predicted to become even greater. Natural genetic variation, long used by traditional plant breeders, holds great promise for adapting high performing agronomic lines to these stressors. Yet, efforts to bolster crop plant resilience using wild relatives have been hindered by time consuming efforts to develop genomic tools and/or identify the genetic basis for agronomic traits. Thus, increasing crop plant resilience requires developing and deploying approaches that leverage current high-throughput sequencing technologies to more rapidly and robustly develop genomic tools in these systems. Here we report the integration of bioinformatic and statistical tools to leverage high-throughput sequencing to 1) develop a machine learning approach to determine factors impacting transcriptome assembly and quantitatively evaluate transcriptome completeness, 2) dissect complex physiological pathway interactions in Solanum pimpinellifolium under combined stresses—using comparative transcriptomics, and 3) develop a genome assembly pipeline that can be deployed to rapidly assemble a more contiguous genome, unraveling previously hidden complexity, using Phytopthora capsici as a model. As a result, we have generated strategic guidelines for transcriptome assembly and developed an orthologue and reference free, machine learning based tool "WWMT" to quantitatively score transcriptome completeness from short read data. Secondly, we identified "hub genes" and describe genes involved with "cross-talk" between drought and herbivore stress response pathways. Finally, we demonstrate a protocol for combining long-read sequencing from the Oxford Nanopore Technologies MinION, and short-read data, to rapidly assembly a cost-effective, contiguous and relatively complete genome. Here we uncovered hidden variation in a well-known plant pathogen finding that the genome was 92% bigger than previous estimates with more than 39% of duplicated regions, supporting a hypothesized recent whole genome duplication in this clade. This community resource will support new functional and evolutionary studies in this economically important pathogen.
Doctor of Philosophy
Meeting the food production demands of a burgeoning population in a changing environment, means adapting crop plants to become more resilient to environmental stress. One of the greatest barriers to understanding and predicting crop responses to future environmental change is our poor understanding of the functional and genomic basis of stress resistance traits for contemporary crops. This impediment presents a barrier for rapid crop improvement technologies, such as, gene editing or genomic selection, that is only partially overcome by generating large amounts of sequencing data. Here we need tools that allow us to process and evaluate huge amounts of data generated from next generation sequencing studies to help identify genomic regions associated with agronomic traits. We also need technical approaches that allow us to disentangle the complex genetic interactions that drive plant stress responses. Here we present work that used statistical analysis and recent advances of artificial intelligence to develop a bioinformatic approach to evaluate genomic sequencing data prior to downstream analyses. Secondly, we used a reductionist approach to filter thousands of genes to key genes associated with combined stress responses (herbivory and drought), in the most widely used vegetable in the world, tomato. Finally, we developed a method for generating whole genome sequences that is low-cost and time sensitive and tested it using a well-known plant pathogen genome, wherein we unraveled significant hidden complexity. Overall this work provides community-wide genomic tools and information to promote crop resilience.
APA, Harvard, Vancouver, ISO, and other styles
22

Wooton, Jesse Meredith. "A bioinformatic analysis of the Alp8 family." Diss., [La Jolla] : University of California, San Diego, 2009. http://wwwlib.umi.com/cr/ucsd/fullcit?p1469264.

Full text
Abstract:
Thesis (M.S.)--University of California, San Diego, 2009.
Title from first page of PDF file (viewed Oct. 7, 2009). Available via ProQuest Digital Dissertations. Includes bibliographical references (p. 54-56).
APA, Harvard, Vancouver, ISO, and other styles
23

Al, Haj Baddar Nour W. "BIOINFORMATIC AND EXPERIMENTAL ANALYSES OF AXOLOTL REGENERATION." UKnowledge, 2019. https://uknowledge.uky.edu/biology_etds/61.

Full text
Abstract:
Salamanders have an extraordinary ability to regenerate appendages after loss or amputation, irrespective of age. My dissertation research explored the possibility that regenerative ability is associated with the evolution of novel, salamander-specific genes. I utilized transcriptional and genomic databases for the axolotl to discover previously unidentified genes, to the exclusion of other vertebrate taxa. Among the genes identified were multiple mmps (Matrix metalloproteases) and a jnk1/mapk8 (c-jun-N-terminal kinase) paralog. MMPs function in extracellular matrix remodeling (ECM) and tissue histolysis, processes that are essential for successful regeneration. Jjnk1/mapk8 plays a pivotal role in regulating transcription in response to cellular stress stimuli, including ROS (reactive oxygen species). Discovery of these novel genes motivated further bioinformatic studies of mmps and wet-lab experiments to characterize JNK and ROS signaling. The paralogy of the newly discovered mmps and orthology of 15 additional mmps was established by analyses of predicted, protein secondary structures and gene phylogeny. A microarray-analysis identified target genes downstream of JNK signaling that are predicted to function in cell proliferation, cellular stress response, and ROS production. These inferences were validated by additional experiments that showed a requirement for NOX (NADPH oxidase) activity, and thus presumably ROS production for successful tail regeneration. In summary, my dissertation identified novel, salamander-specific genes. The functions of these genes suggest that regenerative ability is associated with a diverse extracellular matrix remodeling and/or tissue histolysis response, and also stress-associated signaling pathways. The bioinformatic findings and functional assays that were developed to quantify ROS, cell proliferation, and mitosis will greatly empower the axolotl embryo model for tail regeneration research.
APA, Harvard, Vancouver, ISO, and other styles
24

Perner, Juliane [Verfasser]. "Bioinformatic approaches for understanding chromatin regulation / Juliane Perner." Berlin : Freie Universität Berlin, 2015. http://d-nb.info/1077007221/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Coll, I. Cerezo F. "Bioinformatic analysis of Mycobacterium tuberculosis whole genome data." Thesis, London School of Hygiene and Tropical Medicine (University of London), 2015. http://researchonline.lshtm.ac.uk/2124343/.

Full text
Abstract:
Tuberculosis (TB) caused by bacteria of the Mycobacterium- tuberculosis complex (MTBC) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information of clinical isolates of MTBC. The objectives of this work include developing bioinformatic tools for processing and making accessible MTBC genomic data, as well as the identification of informative genetic markers, both strainOspecific and associated with drug resistance (DR), to barcode MTBC isolates in research and clinical settings. SpolPred software was developed to accurately predict the spoligotype from raw sequence reads, and used to bridge the gap between classical genotyping and highO throughput sequencing. A genome variation discovery pipeline was implemented to derive genomic polymorphisms from MTBC raw sequence data. This pipeline was applied to >1,500 publicly available isolates and the characterised genomic variation hosted in PolyTB, a webObased tool where genetic variants can be investigated using a genome browser, a world map showing their global allele distribution, and an additional phylogenetic view. An extensive repertoire of strainOspecific mutations was identified, of which a subset was proposed to accurately discriminate known MTBC circulating strains. A curated list of DR associated mutations was compiled from the literature and their diagnostic accuracy for predicting phenotypic resistance assessed. In addition, potentially novel genes involved in DR were discovered by applying genomeOwide association approaches to a global population of more than 2,500 MTBC strains. Whole genome sequencing (WGS) promises to be transformative for the practice of clinical microbiology, and the rapidly falling cost and turnaround time mean that this will become a viable technology in clinical settings. In this new paradigm, the presented work will facilitate the transition to and applications of WGS in clinical settings as an important tool for TB control.
APA, Harvard, Vancouver, ISO, and other styles
26

Hossain, Muhammad Maqsud. "Bioinformatic analysis of Streptococcus uberis genes and genomes." Thesis, University of Nottingham, 2016. http://eprints.nottingham.ac.uk/37355/.

Full text
Abstract:
Streptococcus uberis is a Gram-positive, catalase-negative member of the family Streptococcaceae and is an important environmental pathogen primarily responsible for a significant amount of bovine intramammary infections. This thesis describes the sequencing and comparison of multiple strains from clinical and sub-clinical infections. Following de novo assembly, these are compared to the single reference strain (0140J). The assemblies of strains sequenced with two technologies (Illumina and SOLiD) were compared. From these assemblies, annotation allowed the comparison of gene content, the pan and core genomes and gene gain/loss of coding sequences associated with clustered regularly interspaced short palindromic repeats (CRISPRs), prophage and bacteriocin production. Identification of sequence variants allowed identification of highly conserved and highly variant genes. Inferred intraspecies and interspecies (host-S. uberis) protein-protein interaction networks revealed pathways of bovine proteins enriched with potentially interacting pathogen proteins. These identified known and predicted pathways and also novel interaction partners. This was the first “whole-genome” comparison of multiple S. uberis strains isolated from clinical vs non-clinical intramammary infections including the type virulent vs non-virulent strains. These data allowed the first insight into potential evolutionary forces behind virulence differences.
APA, Harvard, Vancouver, ISO, and other styles
27

Aksamit, Matthew Stephen. "Bioinformatic analysis of pea aphid salivary gland transcripts." Thesis, Kansas State University, 2014. http://hdl.handle.net/2097/32836.

Full text
Abstract:
Master of Science
Biochemistry and Molecular Biophysics Interdepartmental Program
Gerald Reeck
Pea aphids (Acyrthosiphon pisum) are sap-sucking insects that feed on the phloem sap of some plants of the family Fabaceae (legumes). Aphids feed on host plants by inserting their stylets between plant cells to feed from phloem sap in sieve elements. Their feeding is of major agronomical importance, as aphids cause hundreds of millions of dollars in crop damage worldwide, annually. Salivary gland transcripts from plant-fed and diet-fed pea aphids were studied by RNASeq to analyze their expression. Most transcripts had higher expression in plant-fed pea aphids, likely due to the need for saliva protein in the aphid/plant interaction. Numerous salivary gland transcripts and saliva proteins have been identified in aphids, including a glutathione peroxidase. Glutathione peroxidases are a group of enzymes with the purpose of protecting organisms from oxidative damage. Here, I present a bioinformatic analysis of pea aphid expressed sequence tag libraries that identified four unique glutathione peroxidases in pea aphids. One glutathione peroxidase, ApGPx1 has high expression in the pea aphid salivary gland. Two glutathione peroxidase genes are present in the current annotation of the pea aphid genome. My work indicates that the two genes need to be revised.
APA, Harvard, Vancouver, ISO, and other styles
28

Furió, Tarí Pedro. "Development of bioinformatic tools for massive sequencing analysis." Doctoral thesis, Universitat Politècnica de València, 2020. http://hdl.handle.net/10251/152485.

Full text
Abstract:
[EN] Transcriptomics is one of the most important and relevant areas of bioinformatics. It allows detecting the genes that are expressed at a particular moment in time to explore the relation between genotype and phenotype. Transcriptomic analysis has been historically performed using microarrays until 2008 when high-throughput RNA sequencing (RNA-Seq) was launched on the market, replacing the old technique. However, despite the clear advantages over microarrays, it was necessary to understand factors such as the quality of the data, reproducibility and replicability of the analyses and potential biases. The first section of the thesis covers these studies. First, an R package called NOISeq was developed and published in the public repository "Bioconductor", which includes a set of tools to better understand the quality of RNA-Seq data, minimise the impact of noise in any posterior analyses and implements two new methodologies (NOISeq and NOISeqBio) to overcome the difficulties of comparing two different groups of samples (differential expression). Second, I show our contribution to the Sequencing Quality Control (SEQC) project, a continuation of the Microarray Quality Control (MAQC) project led by the US Food and Drug Administration (FDA, United States) that aims to assess the reproducibility and replicability of any RNA-Seq analysis. One of the most effective approaches to understand the different factors that influence the regulation of gene expression, such as the synergic effect of transcription factors, methylation events and chromatin accessibility, is the integration of transcriptomic with other omics data. To this aim, a file that contains the chromosomal position where the events take place is required. For this reason, in the second chapter, we present a new and easy to customise tool (RGmatch) to associate chromosomal positions to the exons, transcripts or genes that could regulate the events. Another aspect of great interest is the study of non-coding genes, especially long non-coding RNAs (lncRNAs). Not long ago, these regions were thought not to play a relevant role and were only considered as transcriptional noise. However, they represent a high percentage of the human genes and it was recently shown that they actually play an important role in gene regulation. Due to these motivations, in the last chapter we focus, first, in trying to find a methodology to find out the generic functions of every lncRNA using publicly available data and, second, we develop a new tool (spongeScan) to predict the lncRNAs that could be involved in the sequestration of micro-RNAs (miRNAs) and therefore altering their regulation task.
[ES] La transcriptómica es una de las áreas más importantes y destacadas en bioinformática, ya que permite ver qué genes están expresados en un momento dado para poder explorar la relación existente entre genotipo y fenotipo. El análisis transcriptómico se ha realizado históricamente mediante el uso de microarrays hasta que, en el año 2008, la secuenciación masiva de ARN (RNA-Seq) fue lanzada al mercado y comenzó a desplazar poco a poco su uso. Sin embargo, a pesar de las ventajas evidentes frente a los microarrays, resultaba necesario entender factores como la calidad de los datos, reproducibilidad y replicabilidad de los análisis así como los potenciales sesgos. La primera parte de la tesis aborda precisamente estos estudios. En primer lugar, se desarrolla un paquete de R llamado NOISeq, publicado en el repositorio público "Bioconductor", el cual incluye un conjunto de herramientas para entender la calidad de datos de RNA-Seq, herramientas de procesado para minimizar el impacto del ruido en posteriores análisis y dos nuevas metodologías (NOISeq y NOISeqBio) para abordar la problemática de la comparación entre dos grupos (expresión diferencial). Por otro lado, presento nuestra contribución al proyecto Sequencing Quality Control (SEQC), una continuación del proyecto Microarray Quality Control (MAQC) liderado por la US Food and Drug Administration (FDA) que pretende evaluar precisamente la reproducibilidad y replicabilidad de los análisis realizados sobre datos de RNA-Seq. Una de las estrategias más efectivas para entender los diferentes factores que influyen en la regulación de la expresión génica, como puede ser el efecto sinérgico de los factores de transcripción, eventos de metilación y accesibilidad de la cromatina, es la integración de la transcriptómica con otros datos ómicos. Para ello se necesita generar un fichero que indique las posiciones cromosómicas donde se producen estos eventos. Por este motivo, en el segundo capítulo de la tesis presentamos una nueva herramienta (RGmatch) altamente customizable que permite asociar estas posiciones cromosómicas a los posibles genes, transcritos o exones a los que podría estar regulando cada uno de estos eventos. Otro de los aspectos de gran interés en este campo es el estudio de los genes no codificantes, especialmente los ARN largos no codificantes (lncRNAs). Hasta no hace mucho, se pensaba que estos genes no jugaban ningún papel fundamental y se consideraban como simple ruido transcripcional. Sin embargo, suponen un alto porcentaje de los genes del ser humano y se ha demostrado que juegan un papel crucial en la regulación de otros genes. Por este motivo, en el último capítulo nos centramos, en un primer lugar, en intentar obtener una metodología que permita averiguar las funciones generales de cada lncRNA haciendo uso de datos ya publicados y, en segundo lugar, generamos una nueva herramienta (spongeScan) que permite predecir qué lncRNAs podrían estar secuestrando determinados micro-RNAs (miRNAs), alterando así la regulación llevada a cabo por estos últimos.
[CA] La transcriptòmica és una de les àrees més importants i destacades en bioinformàtica, ja que permet veure quins gens s'expressen en un moment donat per a poder explorar la relació existent entre genotip i fenotip. L'anàlisi transcriptòmic s'ha fet històricament per mitjà de l'ús de microarrays fins l'any 2008 quan la tècnica de seqüenciació massiva d'ARN (RNA-Seq) es va fer pública i va començar a desplaçar a poc a poc el seu ús. No obstant això, a pesar dels avantatges evidents enfront dels microarrays, resultava necessari entendre factors com la qualitat de les dades, reproducibilitat i replicabilitat dels anàlisis, així com els possibles caires introduïts. La primera part de la tesi aborda precisament estos estudis. En primer lloc, es va programar un paquet de R anomenat NOISeq publicat al repositori públic "Bioconductor", el qual inclou un conjunt d'eines per a entendre la qualitat de les dades de RNA-Seq, eines de processat per a minimitzar l'impact del soroll en anàlisis posteriors i dos noves metodologies (NOISeq i NOISeqBio) per a abordar la problemàtica de la comparació entre dos grups (expressió diferencial). D'altra banda, presente la nostra contribució al projecte Sequencing Quality Control (SEQC), una continuació del projecte Microarray Quality Control (MAQC) liderat per la US Food and Drug Administration (FDA) que pretén avaluar precisament la reproducibilitat i replicabilitat dels anàlisis realitzats sobre dades de RNA-Seq. Una de les estratègies més efectives per a entendre els diferents factors que influïxen a la regulació de l'expressió gènica, com pot ser l'efecte sinèrgic dels factors de transcripció, esdeveniments de metilació i accessibilitat de la cromatina, és la integració de la transcriptómica amb altres dades ómiques. Per això es necessita generar un fitxer que indique les posicions cromosòmiques on es produïxen aquests esdeveniments. Per aquest motiu, en el segon capítol de la tesi presentem una nova eina (RGmatch) altament customizable que permet associar aquestes posicions cromosòmiques als possibles gens, transcrits o exons als que podria estar regulant cada un d'aquests esdeveniments regulatoris. Altre dels aspectes de gran interés en aquest camp és l'estudi dels genes no codificants, especialment dels ARN llargs no codificants (lncRNAs). Fins no fa molt, encara es pensava que aquests gens no jugaven cap paper fonamental i es consideraven com a simple soroll transcripcional. No obstant això, suposen un alt percentatge dels gens de l'ésser humà i s'ha demostrat que juguen un paper crucial en la regulació d'altres gens. Per aquest motiu, en l'últim capítol ens centrem, en un primer lloc, en intentar obtenir una metodologia que permeta esbrinar les funcions generals de cada lncRNA fent ús de dades ja publicades i, en segon lloc, presentem una nova eina (spongeScan) que permet predeir quins lncRNAs podríen estar segrestant determinats micro-RNAs (miRNAs), alterant així la regulació duta a terme per aquests últims.
Furió Tarí, P. (2020). Development of bioinformatic tools for massive sequencing analysis [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/152485
TESIS
APA, Harvard, Vancouver, ISO, and other styles
29

Bălan, Mirela. "Integrative bioinformatic analysis of SARs-CoV-2 data." Thesis, Uppsala universitet, Institutionen för cell- och molekylärbiologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-450821.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Wei, Ran. "Peptidomic and bioinformatic studies on neuroendocrine tumour cells." Thesis, Queen's University Belfast, 2015. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.696333.

Full text
Abstract:
Neuroendocrine tumours are rich sources of bioactive peptides and potentially-active fragments of proteins. As the cells that constitute such tumours have active regulated secretory pathways, they constantly deliver cocktails of these peptides into the circulation (tumours) or into the culture medium (cells). Many of these peptides may contribute to distinctive clinical tumour syndromes or may be involved in the growth and metastasis of the tumours themselves. This project, the first of its kind, was directed at the molecular inventory of the secretory peptidome of these tumours by application of high-throughput LC/MS/MS techniques. Individual peptides have been structurally-characterized and their parent proteins identified using bioinformatic techniques. They were then sorted into protein types, such as catalytic, structural, regulatory, etc., and comparisons made between individual tumour types to assess numbers of both common and unique components. Selected peptides were chemically-synthesized and subjected to appropriate bioassays. This project provides a body of basic information that may be of use to basic scientists and clinicians.
APA, Harvard, Vancouver, ISO, and other styles
31

Oliver, Jeffrey C. "Bioinformatic training needs at a health sciences campus." PUBLIC LIBRARY SCIENCE, 2017. http://hdl.handle.net/10150/624680.

Full text
Abstract:
Background Health sciences research is increasingly focusing on big data applications, such as genomic technologies and precision medicine, to address key issues in human health. These approaches rely on biological data repositories and bioinformatic analyses, both of which are growing rapidly in size and scope. Libraries play a key role in supporting researchers in navigating these and other information resources. Methods With the goal of supporting bioinformatics research in the health sciences, the University of Arizona Health Sciences Library established a Bioinformation program. To shape the support provided by the library, I developed and administered a needs assessment survey to the University of Arizona Health Sciences campus in Tucson, Arizona. The survey was designed to identify the training topics of interest to health sciences researchers and the preferred modes of training. Results Survey respondents expressed an interest in a broad array of potential training topics, including "traditional" information seeking as well as interest in analytical training. Of particular interest were training in transcriptomic tools and the use of databases linking genotypes and phenotypes. Staff were most interested in bioinformatics training topics, while faculty were the least interested. Hands-on workshops were significantly preferred over any other mode of training. The University of Arizona Health Sciences Library is meeting those needs through internal programming and external partnerships. Conclusion The results of the survey demonstrate a keen interest in a variety of bioinformatic resources; the challenge to the library is how to address those training needs. The mode of support depends largely on library staff expertise in the numerous subject-specific databases and tools. Librarian-led bioinformatic training sessions provide opportunities for engagement with researchers at multiple points of the research life cycle. When training needs exceed library capacity, partnering with intramural and extramural units will be crucial in library support of health sciences bioinformatic research.
APA, Harvard, Vancouver, ISO, and other styles
32

Acoca, Stephane. "Bioinformatic approaches to the discovery of apoptotic proteins." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=81582.

Full text
Abstract:
The core of this research project revolves around the development of knowledge and expertise in the field of sequence homology identification techniques, as applied towards the discovery of Bcl2 family members. The use of such methods resulted in the uncovering of a Bcl2 Homology 3 (BH3) domain in a ubiquitin ligase enzyme known as upstream regulatory binding protein 1 (UreB1). A complete biochemical analysis unequivocally demonstrated binding of the UreB1 BH3 domain to MCL-1, an antiapoptotic member of the Bcl2 protein family. Furthermore, the discovery of a BH3 domain in UreB1 may provide a link in the established involvement of the ubiquitin pathway in the degradation of Mcl1 following certain apoptotic stimuli. In addition, using an independent domain modelling strategy, we describe the development of BISA, a web-accessible software package for sequence homology detection.
APA, Harvard, Vancouver, ISO, and other styles
33

Schüler, Markus [Verfasser]. "Bioinformatic analysis of cardiac transcription networks / Markus Schüler." Berlin : Freie Universität Berlin, 2011. http://d-nb.info/102593928X/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Roberts, Rick Lee. "Structural and bioinformatic analysis of ethylmalonyl-CoA decarboxylase." Thesis, State University of New York at Buffalo, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=1600817.

Full text
Abstract:

Many enzymes of the major metabolic pathways are categorized into superfamilies which share common folds. Current models postulate these superfamilies are the result of gene duplications coupled with mutations that result in the acquisition of new functions. Some of these new functions are considered advantageous and selected for, while others may simply be tolerated. The latter can result in metabolites being produced at low rates that are of no known use by the cell, and can become toxic when accumulated. Concurrent with the evolution of this tolerable or potentially detrimental metabolism, organisms are selected to evolve a means of correcting or “proofreading” these non-canonical metabolites to counterbalance their detrimental effects. Metabolite proofreading is a process of intermediary metabolism analogous to DNA proof reading that acts on these abnormal metabolites to prevent their accumulation and toxic effects.

Here we structurally characterize ethylmalonyl-CoA decarboxylase (EMCD), a member of the family of enoyl-CoA hydratases within the crotonase superfamily of proteins, which is coded by the ECHDC1 (enoyl-CoA hydratase domain containing 1) gene. EMCD has been shown to have a metabolic proofreading property, acting on the metabolic byproduct ethylmalonyl-CoA to prevent its accumulation which could result in oxidative damage. We use the complimentary methods of in situ crystallography, small angle X-ray scattering, and single crystal X-ray crystallography to structurally characterize EMCD, followed by homology analysis in order to propose a mechanism of action. This represents the first structure of a crotonase superfamily member thought to function as a metabolite proof reading enzyme.

APA, Harvard, Vancouver, ISO, and other styles
35

MANDREOLI, PIETRO. "DEVELOPMENT AND IMPLEMENTATION OF CLOUD-ORIENTED BIOINFORMATIC SERVICES." Doctoral thesis, Università degli Studi di Milano, 2022. https://hdl.handle.net/2434/947848.

Full text
Abstract:
The increase in our capacity to produce biological data has led to a parallel growth in the number and complexity of the bioinformatics tools and pipelines needed to analyse them, exacerbating the long-standing issues associated with analytical reproducibility in bioinformatics. As a matter of fact, the proliferation of bioinformatics software tools and pipelines, their versioning and multiple dependencies risk not only hindering the daily work of researchers but also complicating the reproducibility of data analysis procedures, in turn, of the scientific results they support. The development and introduction of Workflow Management Systems, like Galaxy, and hardware virtualization and software containerization technologies are increasingly used to address those issues, making the execution of elaborate workflows possible while retaining an overall high degree of analytic reproducibility over time. However, the operation and maintenance of those instruments can still be out of reach for many researchers due to their infrastructural requirements, not just in terms of computing resources but also regarding the IT expertise necessary for their administration. To tackle this issue, our group developed Laniakea, a software framework that, leveraging virtualization, containerization, and cloud computing technologies, allows the on-demand deployment of Galaxy instances and other bioinformatics applications over cloud infrastructures. Laniakea users, performing only a handful of clicks, gain administrator-level access to a ready-to-use, customizable, Galaxy instance and to the underlying virtual hardware. The Laniakea software platform has been employed as a foundation for the ELIXIR-IT Laniakea@ReCaS bioinformatics service that provides researchers with cloud resources for creating production-grade Galaxy instances, enabling small research groups or organizations to analyse their data independently using a state-of-the-art scientific computing infrastructure. In the past few years, the Laniakea framework has proven to be a solid platform, for example, by facilitating us in the development of novel Galaxy services implementing specific bioinformatics workflows (i.e., genomic variant prioritization, functional annotation of Sars-CoV-2 genomic variants, scRNA-Seq). Furthermore, the experience acquired with Laniakea allowed us to participate in the growth of the Galaxy platform and the associated distributed computing infrastructure.
APA, Harvard, Vancouver, ISO, and other styles
36

Muñoz, Torres Pau Marc. "Bioinformatic Study of Antigen Presentation by HLA class II." Doctoral thesis, Universitat Autònoma de Barcelona, 2014. http://hdl.handle.net/10803/129336.

Full text
Abstract:
Entendre quin és el cribratge al que estan sotmesos els pèptids abans de poder-se unir a les molècules del complex major d’histocompatibilitat classe II o major histocompatibility complex class II en angles (MHC classe II o HLA classe II en humans) per a més tard ser presentats als limfòcits T és especialment rellevant per les seves implicacions en salut, a l’estar involucrats en diferents processos relacionats amb la defensa de l’organisme, des de la resposta davant d’infeccions a les reaccions autoimmunitàries, passant pel reconeixement de les cèl·lules cancerígenes. L’objectiu d’aquesta tesi ha estat desenvolupar diferents estratègies usant tècniques bioinformàtiques per a identificar els patrons que reconeixen les diferents molècules del HLA alhora de seleccionar els pèptids que més tard presentaran els diferents al·lels i, per extensió, poder arribar a predir si un determinat pèptid tindrà la capacitat d’unir-se a una determinada molècula d’HLA. Un cop desenvolupat l’algoritme de determinació i predicció de patrons es va construir una plataforma web per poder-hi analitzar grans quantitats de pèptids i/o proteïnes mitjançant diferents funcionalitats. Per a poder assolir aquests objectius, el treball es va dividir en tres fases diferents. La primera fase va consistir en construir una base de dades relacional en postgesql per a poder-hi emmagatzemar tant la informació requerida per al correcte funcionament de l’algoritme com les dades resultants de l’anàlisi d’aquesta informació. La informació requerida per al correcte funcionament de l’algoritme està formada per epítops per als quals es coneix si són o no presentats per les diferents molècules d’HLA classe II i diferents proteomes de patògens humans, així com el proteoma humà. A més a més, s’hi ha inclòs una secció privada on els usuaris registrats poden pujar-hi dades d’epítops derivades de les seves pròpies investigacions per poder-los analitzar en combinació amb les dades públiques del sistema per a una mateixa molècula. En la segona fase d’aquest treball es varen desenvolupar dos predictors, el primer usant un sistema basat en matrius de puntuació específiques de posició (position-specific scoring matrices en anglès, també conegudes com a PSSM) i el segon usant màquines d’aprenentatge de vectors de suport (Suport vector machines en anglès o SVM). Les PSSM varen ser desenvolupades usant un protocol iteratiu d’optimització, on es comença usant la informació proporcionada per l’alineament de segments de 9 residus en epítops, identificats com a possibles regions d’interacció amb les molècules d’HLA objectes de estudi, i posteriorment es va afegint informació tant de pèptids que no s’uneixen a la molècula com del grau de conservació dels diferents al·lels. Per a la construcció de la SVM, els segments d’unió dels pèptids a cada una de les molècules d’HLA es van definir a partir les PSSM construïdes per a cada una d’elles i els paràmetres per a la SVM amb una funció de base radial (Radial-basis function o RBF) com a nucli (kernel) varen ser fixades individualment per a cada cas a fi i efecte d’assolir els millors resultats possibles. En la tercera i última fase d’aquest projecte, es van construir dos pàgines web, una per cada predictor. Aquests predictors tenen en comú que els usuaris en general poden introduir-hi llistats de pèptids i/o proteïnes en format FASTA per a ser analitzades. Aquestes anàlisi tornen com a resultat els possibles motius d’unió detectats i la seva localització en els proteomes seleccionats. Una característica particular del predictor basat en PSSM és que els usuaris registrats poden pujar seqüències resultants de la seva pròpia investigació per trobar nous patrons d’unió a molècules d’HLA noves o millorar els existents i fer prediccions amb ells.
Understanding how peptides are selectively bound and presented by major histocompatibility complex class II molecules (MHC class II or HLA class II in humans) is of outmost importance for its broad implications in human health, from infection to autoimmunity or cancer. The aim of this thesis was to develop a computational strategy to identify HLA class II binding patterns for a variety of alleles and use this knowledge to predict their capacity to bind specific peptide sequences. To make an effective use of the prediction algorithm, a web-based platform for the analysis of large peptide or protein sets, including various functionalities, was also devised. In order to accomplish these objectives, the work was divided into three different stages. The first stage consisted in the construction of a postgresql relational database to store all the information required for and generated by the algorithms developed. The required, uploaded information (subject to updates) consisted of known HLA class II epitopes and the translated genomes of a list of pathogenic bacterial species and human. In addition, the database was designed to include a private section for the upload of user-owned epitope information, which the owner may use in combination with the public data. In a second stage two predictors were developed, one using position-specific scoring matrices (PSSMs) and the other one using a support vector machine (SVM). PSSM development was performed using an iterative optimisation protocol, starting from the alignment of known epitopes to identify HLA class II binding cores (9-residue segments) and incorporating additional information such as allele conservation and non-binders at different phases of the refinement. For SVM construction, the epitope core was defined using the corresponding PSSM and the parameters for the SVM with a radial-basis-function (RBF) kernel were set up individually for each molecule to get the best performance. In the third stage, two web pages were constructed, one for each predictor. The servers share a common part in which the user can introduce peptide or protein sequences in Fasta format to perform an analysis that delivers both putative epitopes and their localization in a selected proteome. In addition, the PSSM-based server allows the user to upload his/her own sequences to elucidate new HLA class II binding patterns and perform predictions with them.
APA, Harvard, Vancouver, ISO, and other styles
37

Harris, Justin Clay. "NEW BIOINFORMATIC TECHNIQUES FOR THE ANALYSIS OF LARGE DATASETS." UKnowledge, 2007. http://uknowledge.uky.edu/gradschool_diss/544.

Full text
Abstract:
A new era of chemical analysis is upon us. In the past, a small number of samples were selected from a population for use as a statistical representation of the entire population. More recently, advancements in data collection rate, computer memory, and processing speed have allowed entire populations to be sampled and analyzed. The result is massive amounts of data that convey relatively little information, even though they may contain a lot of information. These large quantities of data have already begun to cause bottlenecks in areas such as genetics, drug development, and chemical imaging. The problem is straightforward: condense a large quantity of data into only the useful portions without ignoring or discarding anything important. Performing the condensation in the hardware of the instrument, before the data ever reach a computer is even better. The research proposed tests the hypothesis that clusters of data may be rapidly identified by linear fitting of quantile-quantile plots produced from each principal component of principal component analysis. Integrated Sensing and Processing (ISP) is tested as a means of generating clusters of principal component scores from samples in a hyperspectral near-field scanning optical microscope. Distances from the centers of these multidimensional cluster centers to all other points in hyperspace can be calculated. The result is a novel digital staining technique for identifying anomalies in hyperspectral microscopic and nanoscopic imaging of human atherosclerotic tissue. This general method can be applied to other analytical problems as well.
APA, Harvard, Vancouver, ISO, and other styles
38

Mefford, Megan. "Molecular and Bioinformatic Analysis of Neurotropic HIV Envelope Glycoproteins." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10173.

Full text
Abstract:
Human immunodeficiency virus (HIV) infection of macrophages in brain and other tissues plays an important role in development of HIV-associated neurological disorders and other aspects of disease pathogenesis. Macrophages express low levels of CD4, and macrophage-tropic HIV strains express envelope glycoproteins (Envs) adapted to overcome this restriction to virus entry by mechanisms that are not well characterized. One mechanism that influences this phenotype is increased exposure of the CD4 or CCR5 binding site, which may increase dissociation of soluble gp120 (sgp120) from Env trimers based on structural models. Little is known about spontaneous sgp120 shedding from primary HIV Envs or its biological significance. In this dissertation, we identify genetic determinants in brain-derived Envs that overcome the restriction imposed by low CD4, examine spontaneous sgp120 shedding by these Envs, and explore the biological significance of these findings. Sequence analysis of the gp120 beta-3 strand of the CCR5-binding site bridging sheet identified D197, which eliminates an N-linked glycosylation site, as a viral determinant associated with brain infection and HIV-associated dementia (HAD), and position 200 as a positively-selected codon in HAD patients. Mutagenesis studies showed that D197 and T/V200 enhance fusion and infection of macrophages and other cells expressing low CD4 by enhancing gp120 binding to CCR5. Sgp120 shedding from primary brain and lymphoid Envs was highly variable within and between patients, representing a spectrum rather than a categorical phenotype. Brain Envs with high sgp120 shedding mediated enhanced fusion and infection with cells expressing low CD4. Furthermore, viruses expressing brain Envs with high sgp120 shedding had an increased capacity to induce lymphocyte activation during PBMC infection, despite similar levels of viral replication. Genetic analysis demonstrated greater entropy and positive selection in Envs with high versus low levels of sgp120 shedding, suggesting that diversifying evolution influences gp120-gp41 association. Finally, we examined V3 loop sequences from dual-tropic brain and lymphoid Envs and found that the frequency of R5X4 HIV-1 is underestimated by most predictive bioinformatic algorithms. Together, these studies provide a better understanding of how neurotropic HIV Envs adapt to target cells expressing low CD4, and possible roles of these viral adaptations in disease pathogenesis.
APA, Harvard, Vancouver, ISO, and other styles
39

Treepong, Panisa. "Bioinformatic analysis of the genomes of epidemic pseudomonas aeruginosa." Thesis, Bourgogne Franche-Comté, 2017. http://www.theses.fr/2017UBFCD065/document.

Full text
Abstract:
Le Pseudomonas aeruginosa est un pathogène nosocomial majeur. Le clone ST235 est le plus prévalent des clones internationaux dits à hautris que. Ce clone est très fréquemment multi résistant aux antibiotiques, ce qui complique la prise en charge des infections dont il est à l’origine.Malgré son importance clinique, la base moléculaire Du succès du clone ST235 n’est pas comprise.Dans ce travail, nous avons cherché à comprendre l’origine spacio temporelle de ce clone et les bases moléculaires de son succès. A l’aide d’outils bio informatiques existants ,nous avons trouvé que le clone ST235 a émergé en Europe en 1984 et que tous les isolates ST235 produisent l’exotoxine ExoU. Nous avons également identifié 22 gènes Contigus spécifiques de ce clone et impliqués dans l’efflux transmembranaire, dans le traitement de l’ADN et dans la transformation bactérienne. Cette combinaison unique de gènes a pu contribuer à la gravité des infections dues à ce clone et à sa capacité à acquérir des gènes de résistance aux antibiotiques. Ainsi, la diffusion mondiale de ce clone a probablement été favorisée par l’utilisation extensive des fluoroquinolones, puis il est de venu localement résistant aux amino glycosides, aux β-lactamines, et aux carbapénèmes par mutation et acquisition d’éléments de résistance. Nous avons majoritairement utilisé des outils existants,mais avons découvert que les programmes de détection des séquences d’insertions (IS, ayant un rôle important dans l’évolution des génomes bactériens) ne sont pas adaptés aux données dont nous disposions. Nous avons ainsi mis au point un outil (appelé panISa) qui détecte de façon précise et sensible les IS à partir de données brutes de séquençage de génomes bactériens
Pseudomonas aeruginosa is a major nosocomial pathogen with ST235 being the most prevalent of the so-called ‘international’ or ‘high-risk’ clones. This clone is associated with poor clinical outcomes in part due to multi- and high-level antibiotic resistance. Despite its clinical importance, the molecular basis for the success of the ST235 clone is poorly understood. Thus this thesis aimed to understand the origin of ST235 and the molecular basis for its success, including the design of bioinformatics tools for finding insertion sequences (IS) of bacterial genomes.To fulfill these objectives, this thesis was divided into 2 parts.First, the genomes of 79 P. aeruginosa ST235 isolates collected worldwide over a 27-year period were examined. A phylogenetic network was built using Hamming distance-based method, namely the NeighborNet. Then we have found the Time to the Most Recent Common Ancestor (TMRCA) by applying a Bayesian approach. Additionally, we have identified antibiotic resistance determinants, CRISPR-Cas systems, and ST235-specific genes profiles. The results suggested that the ST235 sublineage emerged in Europe around 1984, coinciding with the introduction of fluoroquinolones as an antipseudomonal treatment. The ST235 sublineage seemingly spreads from Europe via two independent clones. ST235 isolates then appeared to acquire resistance determinants to aminoglycosides, β-lactams, and carbapenems locally. Additionally, all the ST235 genomes contained the exoU-encoded exotoxin and identified 22 ST235-specific genes clustering in blocks and implicated in transmembrane efflux, DNA processing and bacterial transformation. These unique genes may have contributed to the poor outcome associated with P. aeruginosa ST235 infections and increased the ability of this international clone to acquire mobile resistance elements.The second part was to design a new Insertion Sequence (IS) searching tool on next-generation sequencing data, named panISa. This tool identifies the IS position, direct target repeats (DR) and inverted repeats (IR) from short read data (.bam/.sam) by investigating only the reference genome (without any IS database). To validate our proposal, we used simulated reads from 5 species: Escherichia coli, Mycobacterium tuberculosis, Pseudomonas aeruginosa, Staphylococcus aureus, and Vibrio cholerae with 30 random ISs. The experiment set is constituted by reads of various lengths (100, 150, and 300 nucleotides) and coverage of simulated reads at 20x, 40x, 60x, 80x, and 100x. We performed sensitivity and precision analyses to evaluate panISa and found that the sensitivity of IS position is not significantly different when the read length is changed, while the modifications become significant depending on species and read coverage. When focusing on the different read coverage, we found a significant difference only at 20x. For the other situations (40x-100x) we obtained a very good mean of sensitivity equal to 98% (95%CI: 97.9%-98.2%). Similarly, the mean of DR sensitivity of DR identification is high: 99.98% (95%CI: 99.957%-99.998%), but the mean of IR sensitivity is 73.99% (95%CI: 71.162%-76.826%), which should be improved. Focusing on precision instead of sensibility, the precision of IS position is significantly different when changing the species, read coverage, or read length. However, the mean of each precision value is larger than 95%, which is very good.In conclusion, P. aeruginosa ST235 (i) has become prevalent across the globe potentially due to the selective pressure of fluoroquinolones and (ii) readily became resistant to aminoglycosides, β-lactams, and carbapenems through mutation and acquisition of resistance elements among local populations. Concerning the second point, our panISa proposal is a sensitive and highly precise tool for identifying insertion sequences from short reads of bacterial data, which will be useful to study the epidemiology or bacterial evolution
APA, Harvard, Vancouver, ISO, and other styles
40

Stahl, Morgan A. "The Perilipin Family of Proteins: Structural and Bioinformatic Analysis." Otterbein University Honors Theses / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=otbnhonors1620460421392971.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Poluri, Raghavendra Tejo Karthik. "Using bioinformatic analyses to understand prostate cancer cell biology." Master's thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/66803.

Full text
Abstract:
Le cancer de la prostate (CaP) affecte 1 homme sur 7 au cours de sa vie. C’est le cancer numéro un diagnostiqué chez l'homme. Il s'agit du quatrième cancer le plus fréquent au Canada. Le CaP est une maladie hormonodépendante diagnostiquée chez l'homme. Les androgènes jouent un rôle vital dans la progression de la maladie. La première ligne de traitement, suivant une intervention chirurgicale ou un traitement de radiothérapie, est la thérapie de déprivation aux androgènes. Malgré une réponse initiale positive à l'inhibition des androgènes, la progression de la maladie vers un cancer de la prostate résistant à la castration (CRPC) est presque inévitable. Aux différentes étapes du CaP, le récepteur des androgènes joue un rôle majeur. Ainsi, cette thèse décrit les méthodes développées et utilisées pour mieux comprendre la biologie du CaP et le rôle joué par les androgènes dans cette maladie. Le travail démontré dans cette thèse se compose principalement d'analyses bioinformatiques effectuées sur des ensembles de données accessibles au public et d'un « pipeline » construit pour analyser des données RNA-Seq. Un pipeline RNA-Seq a été développé pour comprendre l'impact des androgènes et des gènes régulés lors du traitement aux androgènes dans les modèles de cellules de CaP. Ce pipeline bioinformatique se compose de divers outils qui ont été décrits ci-dessous dans le chapitre 1. L'objectif principal de ce projet était de développer un pipeline pour analyser les données RNA-Seq qui aide à comprendre et à définir les voies et les gènes métaboliques qui sont régulés par les androgènes, et qui jouent un rôle important dans la progression du CaP. Le flux de travail expérimental consistait en deux lignées cellulaires positives aux récepteurs aux androgènes LNCaP et LAPC4. Toutes les données utilisées dans ce projet ont été rendues publiques pour que la communauté de recherche puisse effectuer diverses autres études et analyses comparatives pour comprendre les fonctions des androgènes dans un sens beaucoup plus profond afin de développer de nouvelles thérapies pour traiter le CaP. Dans un autre projet décrit au chapitre 2, des analyses bioinformatiques ont été réalisées sur des données accessibles au public pour comprendre la fréquence de la perte et de l'altération génomique du gène PTEN localisé à 10q23. Ces analyses ont mis en évidence la fréquence d'altération génomique de PTEN qui est beaucoup plus élevée dans le CRPC que dans le CaP localisé. Ces analyses ont également aidé à identifier d'autres gènes altérés dans le CaP. Ces gènes n’ont pas été beaucoup étudiés dans la littérature, mais il semble que certains d’entre eux possèdent des caractéristiques de suppresseurs de tumeurs. Ces résultats pourraient être un bon début pour des analyses plus approfondies concernant la perte de gènes.La compréhension des fonctions de AR et de la suppression de PTEN aidera à développer de nouvelles stratégies et approches pour diagnostiquer et traiter le CaP. L'intégration des analyses bioinformatiques à la recherche clinique ouvre une nouvelle perspective dans le domaine de la recherche du CaP.
Prostate Cancer (PCa) affects 1 in 7 men in their lifetime and is the number one diagnosed cancer in men. It is the 4th most common cancer in Canada. PCa is a hormone-dependent disease diagnosed in men. Androgens play a vital role in the disease progression. The standard of care to treat PCa, following surgery or radiation therapy, is the androgen deprivation therapy (ADT). In spite of initial positive response to androgen inhibition, the progression of the disease to castration-resistant prostate cancer (CRPC) is almost inevitable. Across the various stages of PCa, the androgen receptor (AR) plays a major role. This thesis portrays the methods developed and used to understand PCa biology. The work demonstrated in this thesis majorly consists of bioinformatic analyses performed on publicly available data sets and a pipeline built to analyse RNA-Seq data. An RNA-Seq pipeline has been developed to understand the impact of androgens and the genes regulated upon androgen treatment in PCa cell models. This bioinformatic pipeline consists of various tools which have been described below in chapter 1. The major goal of this project was to develop a pipeline to analyse the RNA-Seq data which helps to understand and define the metabolic pathways and genes regulated by androgens which play an important role in PCa disease progression. The experimental workflow consisted of two androgen receptor positive cell lines LNCaP and LAPC4. All the data used in this project has been made publicly available for the research community to perform various other comparative studies and analyses to understand the functions of androgens in a much deeper sense to develop novel therapies to treat PCa. In another project described in chapter 2, bioinformatic analyses have been performed on publicly available data to understand the loss and genomic alteration frequency of the gene PTEN occurring at 10q23. These analyses highlighted that the genomic alteration frequency of PTEN is much higher in CRPC than in localised PCa, and also helped in identifying other genes which are lost along with PTEN. The lost genes have not been studied much in literature, but few studies demonstrated that they might possess tumor suppressor characteristics. These results might be a good start for further deeper analyses regarding the lost of genes. Understanding the functions of AR and the deletion of PTEN will help for the development of novel strategies and approaches to diagnose and treat PCa. Integration of bioinformatic analyses with clinical research open up a new perspective in the PCa research domain.
APA, Harvard, Vancouver, ISO, and other styles
42

Mthombeni, Jabulani S. "A comparative bioinformatic analysis of zinc binuclear cluster proteins." Thesis, Rhodes University, 2005. http://hdl.handle.net/10962/d1004064.

Full text
Abstract:
Members of the zinc binuclear cluster family are important fungal transcriptional regulators sharing a common DNA binding domain. Da181p is a pleotropic zinc binuclear cluster protein involved in the induction of the UGA genes required for the γ-aminobutyrate nitrogen catabolic pathway in Saccharomyces cerevisiae. The zinc binuclear cluster domain is indispensable for function in Da181p and little is known about other domains in this protein. The aim of the study was to explore the zinc binuclear cluster protein family using comparative bioinformatics as a complement to biochemical and structural approaches. A database of all zinc binuclear cluster proteins was composed. A total of 118 zinc binuclear proteins are reported in this work. Thirty nine previously unidentified zinc binuclear cluster proteins were found. Four homologues of Da181p were identified by homology searching. Important sequence motifs were identified in the aligned sequences of Da181p and its homologues. The coiled coil motif found in the Ga14p zinc binuclear cluster protein could not be identified in Da181p and its homologues. This suggested that Da181p did not dimerise through this structural motif as other zinc binuclear cluster proteins. Solvent accessible site that could be phosphorylated by protein kinase C or casein kinase II and the role of such sites in the possible regulation of Da181p function were discussed.
APA, Harvard, Vancouver, ISO, and other styles
43

Wong, Io Nam. "Bioinformatic and biochemical characterization of helicases from bacteriophage T5." Thesis, University of Sheffield, 2012. http://etheses.whiterose.ac.uk/2333/.

Full text
Abstract:
Bacteriophage T5 is a bacterial virus known to have a remarkably high replication rate. It is a double-stranded DNA virus and encodes many of the proteins needed for its own replication. During replication, the viral double-stranded genomic DNA has to be separated by enzymes called helicases, which are motor proteins that utilize chemical energy from ATP to move along and unwind nucleic acid duplexes. Until now, no helicase has been characterized in bacteriophage T5. A bioinformatic analysis on the T5 replication gene cluster showed that several early gene products (D2, D6 and D10), which possess key helicase signature sequences (motifs), may be T5 helicases. This is the first report to investigate helicases of bacteriophage T5 and the study focused on bioinformatic and biochemical characterization of these three potential helicases. Here, D2 and D10 were identified to be two novel T5 helicases, showing helicase activity in vitro as well as having some unique properties previously uncharacterised in other helicases. However, D6 did not show ATPase activity under the condition employed and a further investigation on characteristics of D6 is required. Except for a Walker A motif, no other common conserved motifs related to helicase activity were identified in the D2 protein sequence. However, D2 was found to have a rare bipolar helicase activity giving it the ability to unwind partial duplex DNA with either a 5' or a 3' ssDNA tail (ss-dsDNA). This indicates D2 may possess some unconventional motifs relevant to its helicase activity. The extent of 5'→3' or 3'→5' unwinding activity of D2 was revealed to be dependent on 5' or 3' tail length. Interestingly, D2 displayed biased polarity preference with its 3'→5' unwinding activity being several fold greater than its 5'→3' unwinding activity when the substrates have identical tail length. Differential inhibition of the bipolar helicase activities by high NaCl concentration was also observed. The 5'→3' unwinding activity was more sensitive to inhibition by high NaCl concentration than the 3'→5' unwinding activity. The D10 protein can unwind branched DNA substrates, including forks, Y-junctions and Holliday junctions, which resemble DNA replication, recombination and repair intermediates. Furthermore, D10 was shown to catalyze branch migration of the Holliday junction substrate. Intriguingly, the ability of D10 to unwind the Y-junction substrate was found to be structure-dependent and sequence-dependent. Also, the unwinding activity can be affected by the strand discontinuity of the substrate. All the findings in this study contribute to a new insight into functional properties of helicases.
APA, Harvard, Vancouver, ISO, and other styles
44

Thorpe, Peter. "Bioinformatic and functional characterisation of Globodera pallida effector genes." Thesis, University of Leeds, 2012. http://etheses.whiterose.ac.uk/4568/.

Full text
Abstract:
Pathogens secrete molecules, termed effectors, to manipulate their host to the benefit of the pathogen. Effectors of plant parasitic nematodes are predicted to have a range of functions such as facilitating invasion, initiation and maintenance of the feeding site, and suppression of host defences. The genome sequence of the potato cyst nematode Globodera pallida was analysed to identify putative effectors. They include: 129 effectors similar to those previously identified from cyst nematodes, 53 cell wall modifying enzymes and 117 novel putative effectors. Only four effectors were common between G. pallida and the root-knot nematode Meloidogyne incognita. These could have a conserved role in plant parasitism. A large SPRY domain containing gene family was identified in G. pallida. It has 299 members, of which 30 are predicted to be secreted and therefore categorised as effectors. Phylogenetic analysis showed that the family is hugely expanded and specific to Globodera species. Fifty-four putative effectors were cloned from G. pallida cDNA. Transgenic lines of Arabidopsis thaliana and Solanum tuberosum L. ‘Désirée’ were produced, to express a range of these effectors and act as tools for functional characterisation. Potato lines that expressed selected effectors were subjected to phenotypic analysis and pathogen susceptibility assays. The largest range of aberrant phenotypes was observed for those plants expressing GpIA7 and GpIVG9. Potato lines expressing GpIA7 showed altered growth phenotypes and an increased susceptibility to Phytophthora infestans CS-12. GpIVG9-expressing potato lines showed accelerated growth, distorted leaves and increased susceptibility to nematode invasion. A more in-depth functional characterisation was conducted on a ubiquitin extension protein effector. The G. pallida ubiquitin extension protein suppressed PAMPtriggered immunity and the C-terminal extension was required for this activity. The outcomes from this work and the tools generated for future experimentation will contribute to elucidating the complex interactions between pathogens and their hosts.
APA, Harvard, Vancouver, ISO, and other styles
45

Milani, Adelaide. "Genomic and bioinformatic approach to avian influenza virus evolution." Doctoral thesis, Università degli studi di Padova, 2016. http://hdl.handle.net/11577/3424357.

Full text
Abstract:
Viral zoonotic agents have a significant impact both on human and veterinary public health. Ecosystems changes, increasing urbanization and easy connection have influenced the balance between pathogen and related host species. In recent years most threatening viruses, originated from animal hosts causing emerging diseases; most of them are RNA viruses that thanks to a large population sizes, high mutation rate and short generation time allow rapid evolution, genetic variability and the selection of new variants. A constant and adequate surveillance program and the sharing of different professional expertise are necessaries to follow viral evolution and to formulate efficient public health policy (Howard and Fletcher, 2012). Influenza A virus is considered one of the most challenging RNA viruses for its zoonotic potential role in the animal-human interface, for global health and economic impact; almost every year influenza epidemics cause morbidity and mortality in the human and is also associated with influenza virus pandemics. Both wild and domestic birds are considered the primary natural reservoir of influenza A virus and in particular wild birds are thought to be the source of influenza A viruses in all other animals (http://www.cdc.gov/flu/about/viruses/transmission.htm). Different techniques are available to genetically characterize and study viruses in order to understand their behavior, the evolutionary dynamics, the host-virus interactions and their origin; the aim is to develop a valid support with appropriate treatments during the phases of surveillance and diagnosis of possible epidemics. During my PhD it was used an integrated approach, both genomic and structural, to study the evolution of avian influenza A virus in particular focusing on the hemagglutinin, the major surface glycoprotein, belonging to the H5, H7 and H9 (the major "avian" subtypes responsible for human infection). Next-generation sequencing (NGS) was used to investigate and characterize the complexity of the viral population to detect low-frequency mutations and to follow the evolution of the genetically related variants present in a viral population. To compare and inspect genetic data, phylogenetic approach has shown to be a useful tools in the analysis of viral evolution. It has been used to explain the molecular epidemiology, transmission and viral evolution. In order to obtain a more complete view of the ‘functional evolution’, phylogenetic analyses based on sequence comparison and resulting in trees, was integrated taking into account information from structural comparison. Three-dimensional structural approach have shown to be a useful tool to display similarities and to inspect motifs that cannot be discovered analyzing primary sequences alone. Indeed, in the primary sequences the introduction of a mutation does not take into account the effect on the protein folding or on the surface properties, while in the three-dimensional structures, since each mutation is able to influence the structural characteristics and interactions, is directly detectable. This approach has also brought a further contribution to the phylogenetic analysis. In particular the study has focused on the evolutionary dynamics and the adaptive strategies of avian influenza H7N1 and H7N3 subtypes that circulated in Northern Italy for similar periods of time under similar epidemiological conditions. Within and between host population dynamics of Avian HPAI H7N7 viruses, that affected Italy during 2013, were investigated using next generation technology. NGS analysis was used to characterize viral population complexity into two groups of animals challenged with the same virus H5N1 HPAIvirus but vaccinated with vaccine conferring different protection levels. An extensive comparison of structural domains and sub-regions was performed on the hemagglutinin of different subtypes of influenza A virus, with particular interest to different clades of HPAI H5N1 circulating in Egypt (where bird flu is endemic in poultry ), to investigate any domain-specific changes. Influenza A viruses belonging to H9 subtype were inspected from a phylogenetic and a structural point of view to infer type-specific characteristic and confirm if surface properties could be associated to 'functional evolution' of viral surface determinants as seen in H5N1 subtype. This work suggests that integrating genomic, phylogenetic, and structural comparison can help in understanding the 'functional evolution' of avian influenza A virus.
I virus zoonotici, cioè in grado di infettare l’uomo e alcune specie animali, hanno un impatto significativo e costituiscono una costante, potenziale minaccia sia per la salute pubblica umana che per quella animale. Ecosistemi dagli equilibri modificati, una crescente urbanizzazione e connessioni facilitate hanno influenzato sempre piu' il rapporto tra patogeni e specie ospiti affini. Negli ultimi anni la fonte della maggior parte dei virus potenzialmente pericolosi e in grado di causare malattie emergenti sembra derivi da ospiti di origine animale; si tratta prevalentemente di virus a RNA che, grazie alla possibilità di moltiplicarsi in breve tempo all'interno di una popolazione ampia ed all'alto tasso di mutazione, permettono una rapida evoluzione, un'elevata variabilità genetica e la selezione di nuove varianti. Un adeguato e costante programma di sorveglianza, la condivisione di conoscenze e una collaborazione tra diverse competenze professionali sono fondamentali e necessarie per seguire l'evoluzione virale e per formulare politiche di sanità pubblica efficienti (Howard e Fletcher, 2012). L' Influenza virus di tipo A è considerato uno dei virus a RNA più importanti, tanto per il suo potenziale ruolo zoonotico nell'interfaccia animale-umano, quanto per la salute globale e l'impatto economico. Quasi ogni anno epidemie di influenza provocano morbilità e mortalità nell'uomo e talvolta gli stessi virus possono essere associati a pandemie. Il serbatoio naturale dei virus influenzali di tipo A è rappresentato dagli uccelli, sia selvatici che domestici (influenza aviaria) (http://www.cdc.gov/flu/about/viruses/transmission.htm); in particolare gli uccelli selvatici sembrano costituire la fonte dell'influenza A virus tutte le altre specie animali. Diverse tecniche sono disponibili per studiare i virus e caratterizzarli geneticamente al fine di capirne il loro comportamento, le dinamiche evolutive, il loro rapporto con l'ospite e la loro origine e per sviluppare profilassi e terapie adeguate creando un valido supporto durante la fasi di sorveglianza e diagnosi di un'eventuale epidemia . Nell'ambito del mio dottorato è stato utilizzato un approccio integrato, sia genomico che strutturale, per studiare l'evoluzione dell'influenza aviaria; particolare interesse è stato rivolto allo studio dell'emoagglutinina virale, la principale glicoproteina di superficie, appartenente ai sottotipi H5, H7 e H9 (i principali sottotipi “aviari” responsabili di infezione nell’uomo). Le analisi mediante Next Generation Sequencing (NGS) hanno favorito lo studio e la caratterizzazione della complessità nella popolazione virale, consentendo di monitorare finemente l'evoluzione delle varianti geneticamente correlate presenti all'interno della popolazione virale tramite l'identificazione delle mutazioni a bassa frequenza. Per confrontare ed analizzare i dati genetici, l'approccio filogenetico si è rivelato un utile strumento per l'analisi dell'evoluzione virale; è stato usato per spiegare l'epidemiologia molecolare, la trasmissione e l'evoluzione virale. Al fine di ottenere una visione più completa in termini di 'evoluzione funzionale', l'analisi filogenetica è stata integrata con le informazioni provenienti dal confronto strutturale. L'approccio strutturale, considerando lo spazio tridimensionale dell’emoagglutinina, ha dimostrato di poter essere uno strumento utile per evidenziare eventuali somiglianze e per ispezionare e valutare quei motivi il cui ruolo non può essere correttamente interpretato utilizzando le sole sequenze primarie. Infatti, nelle sequenze primarie il peso delle mutazioni non tiene conto dell'effetto sul fold o sulle proprietà di superficie, mentre nelle strutture tridimensionali, quanto ciascuna mutazione sia in grado di influenzare le caratteristiche strutturali e le interazioni, è direttamente rilevabile. Questo approccio ha inoltre portato un ulteriore contributo all'analisi filogenetica. In particolare lo studio si è concentrato sull'analisi delle dinamiche evolutive e delle strategie adattative dei sottotipi H7N1 ed H7N3 dell'influenza aviaria circolanti nel Nord Italia per periodi di tempo analoghi e in condizioni epidemiologiche simili. Inoltre è stato utilizzato il deep sequencing per studiare le dinamiche evolutive e di trasmissione intra- e inter-ospiti del virus aviario sottotipo H7N7 che colpì alcuni allevamenti italiani nel 2013. L'analisi NGS è stata utilizzata per caratterizzare la complessità della popolazione virale in due gruppi di animali sperimentalmente infetti con lo stesso virus ad alta patogenicità (HPAI) H5N1 ed immunizzati con distinti vaccini. E' stato inoltre eseguito un ampio confronto strutturale su domini e sub-regioni dell'emoagglutinina di diversi sottotipi del virus dell'influenza, con particolare interesse per i diversi clades di HPAI H5N1 circolanti in Egitto (ove l’influenza aviaria è endemica nei volatili), per indagare eventuali variazioni dominio-specifiche. I virus influenzali del sottotipo H9 sono stati analizzati da un punto di vista sia filogenetico che strutturale, per rilevare caratteristiche tipo specifiche e verificare se la variazione delle proprietà di superficie possa essere un marcatore di 'evoluzione funzionale' dei determinanti di superficie virali, come dimostrato nel sottotipo H5N1. Questo lavoro suggerisce che il confronto e l'integrazione tra analisi genomica, filogenetica e strutturale può aiutare a capire l' 'evoluzione funzionale' del virus dell'influenza aviaria di tipo A.
APA, Harvard, Vancouver, ISO, and other styles
46

Rossini, Roberto. "Development and validation of bioinformatic methods for GRC assembly and annotation." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-414739.

Full text
Abstract:
This thesis presents the work done during my master degree projects under the supervision of Alexander Suh and Francisco J. Ruiz-Ruano. My work focused on the development of in-silico methods to improve the assembly of the Germline Restricted Chromosome (GRC) of songbirds, more specifically that of zebra finch.GRCs are a good example of the popular saying "The exception that proves the rule". For a very long time, it was assumed that every cell in a healthy multicellular organism carries the same genetic information. Cytogenetic evidence dating back as far as early XX century suggests that this is not always the case, as it has been documented that certain organisms carry supernumerary B chromosomes, which are dispensable chromosomes that are not part of the normal karyotype of a species. GRCs are often regarded as a special case of B chromosomes, where every individual from a species carries an additional chromosome whose presence is restricted to germline cells only. GRCs presence has been documented in insects, hagfishes and songbirds. A peculiar case of GRCs is that of zebra finch, whose GRC has an estimated size of over 150 Mb, accounting for over 10% of zebra finch total genome size. Despite the first cytogenetic evidence of zebra finch GRC dating back to 1998, it was only last year that the first comprehensive genomic study about this relatively large chromosome was published. This study shed some light on the gene content of the GRC in zebra finch, revealing that the GRC of zebra finch mostly consists of paralogs of A chromosomal genes. The GRC assembly and annotation that were published as part of this study included 115 GRC-linked genes that were identified through germline/soma read mapping, as well as 36 manually curated scaffolds with a median length of 3.6 kb. Considering the conspicuous size of the GRC of zebra finch, it is clear that this is a very fragmented and likely incomplete GRC assembly. There are many factors that can have a negative impact on assembly completeness and contiguity. In the GRC case, these factors collectively affect coverage in ways that are not properly handled by available genome assemblers. In the course of my master degree project I developed kFish, a bioinformatic software to perform alignment-free enrichment of GRC-linked barcodes from a 10x Genomics linked-read DNA Chromium library. kFish uses an iterative approach where the k-mer content of a set of GRC-linked sequences is compared with that of reads corresponding to each individual 10x Genomics barcode. This comparison allows kFish to identify likely GRC-linked barcodes, and then only use reads corresponding to these barcodes when trying to assemble the GRC. First benchmarking results generated using five GRC-linked genes from zebra finch as reference sequences, show that kFish is not only capable of assembling already known GRC-linked sequences, but also new ones with high confidence. kFish can do all of this in a matter of hours, using only few gigabytes of system memory, while previous efforts took over two days to assemble zebra finch genome and identify GRC-linked scaffolds using an approach based on read mapping. High quality genome assemblies and annotations are the foundations of modern genomics research, the lack of which greatly limits the breadth of the questions that can be answered. There is still a lot that we do not understand about GRCs, and part of this is due to the lack of high quality GRC assemblies and annotations. Producing such an assembly will likely require an integrated approach, where multiple sequencing technologies as well as bleeding edge bioinformatic tools such as kFish, are combined together to produce an high quality assembly, which will be crucial to unravel the mystery of GRCs function and evolutionary history.
APA, Harvard, Vancouver, ISO, and other styles
47

Mayol, Escuer Eduardo. "Development of bioinformatic tools for the study of membrane proteins." Doctoral thesis, Universitat Autònoma de Barcelona, 2019. http://hdl.handle.net/10803/667335.

Full text
Abstract:
Las proteínas de membrana son elementos fundamentales de todas las células conocidas, que representan una cuarta parte de los genes del genoma humano, y desempeñan funciones esenciales en la biología celular. Alrededor del 50% de los medicamentos comercializados actualmente tienen una proteína de membrana como objetivo, y alrededor de un tercio de todos ellos se dirigen a los receptores acoplados a proteína G (GPCR). Las dificultades y limitaciones en el trabajo experimental necesario para los estudios microscópicos de la membrana, así como las proteínas de membrana, impulsaron el uso de métodos computacionales. El alcance de esta tesis es desarrollar nuevas herramientas bioinformáticas para el estudio de las proteínas de membrana y en particular para GPCRs que ayudan a caracterizar sus rasgos estructurales y ayudar a la comprensión de su función. Con respecto a las proteínas de membrana, una piedra angular de esta tesis ha sido la creación de dos bases de datos para las principales clases de proteínas de membrana: una para helices-α (TMalphaDB) y otra para proteínas barriles-β (TMbetaDB). Estas bases de datos son empleadas por una herramienta recientemente desarrollada para encontrar distorsiones estructurales inducidas por motivos específicos de secuencias de aminoácidos (http://lmc.uab.cat/tmalphadb y http://lmc.uab.cat/tmbetadb). También se usaron en la caracterización de las interacciones entre residuos que se producen en la región transmembrana de estas proteínas con el objetivo de favorecer la comprensión de la complejidad y las características diferenciales de las proteínas de membrana. Se encontró que las interacciones que involucran los residuos de Phe y Leu son las principales responsables de la estabilización de la región transmembrana. Además, se analizó la contribución energética de las interacciones entre los aminoácidos que contienen azufre (Met y Cys) y los residuos alifáticos o aromáticos. Estas interacciones normalmente no se tienen en gran consideración a pesar de que pueden formar interacciones más fuertes que las interacciones aromático-aromático o aromático-alifático. Asimismo, la familia de GPCRs, la más importante de proteínas de membrana, ha sido el foco de dos aplicaciones web dedicadas al análisis de conservación de aminoácidos o motivos de secuencia y correlación de pares (GPCR-SAS, http://lmc.uab.cat/gpcrsas) y para incorporar moléculas de agua internas en estructuras de estos receptores (HomolWat, http://lmc.uab.cat/HW). Estas aplicaciones web son estudios piloto que pueden extenderse a otras familias de proteínas de membrana en proyectos futuros. Todas estas herramientas y análisis pueden ayudar en el desarrollo de mejores modelos estructurales y contribuir a la comprensión de las proteínas de membrana.
Membrane proteins are fundamental elements for every known cell, accounting for a quarter of genes in the Human genome, they play essential roles in cell biology. About 50% of currently marketed drugs have a membrane protein as target, and around a third of them target G-protein-coupled receptors (GPCRs). The current difficulties and limitations in the experimental work necessary for microscopic studies of the membrane as well as membrane proteins urged the use of computational methods. The scope of this thesis is to develop new bioinformatic tools for the study of membrane proteins and also for GPCRs in particular that help to characterize their structural features and understand their function. In regard to membrane proteins, a cornerstone of this thesis has been the creation of two databases for the main classes of membrane proteins: one for α-helical proteins (TMalphaDB) and another for β-barrel proteins (TMbetaDB). These databases are used by a newly developed tool to find structural distortions induced by specific amino acid sequence motifs (http://lmc.uab.cat/tmalphadb and http://lmc.uab.cat/tmbetadb) and in the characterization of inter-residue interactions that occur in the transmembrane region of membrane proteins aimed to understand the complexity and differential features of these proteins. Interactions involving Phe and Leu residues were found to be the main responsible for the stabilization of the transmembrane region. Moreover, the energetic contribution of interactions between sulfur-containing amino acids (Met and Cys) and aliphatic or aromatic residues were analyzed. These interactions are often not considered despite they can form stronger interactions than aromatic-aromatic or aromatic-aliphatic interactions. Additionally, G-protein coupled receptor family, the most important family of membrane proteins, have been the focus of two web applications tools dedicated to the analysis of conservation of amino acids or sequence motifs and pair correlation (GPCR-SAS, http://lmc.uab.cat/gpcrsas) and to allocate internal water molecules in receptor structures (HomolWat, http://lmc.uab.cat/HW). These web applications are pilot studies that can be extended to other membrane proteins families in future projects. All these tools and analysis may help in the development of better structural models and contribute to the understanding of membrane proteins.
APA, Harvard, Vancouver, ISO, and other styles
48

González, Ramírez Mar 1991. "Bioinformatic analysis of epigenetic regulatory mechanisms in development and disease." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/671370.

Full text
Abstract:
Appropriate regulation of gene expression is necessary for correct development and homeostasis of organisms. Epigenetic mechanisms represent an additional layer of information, besides the genetic sequence, crucial for the correct functioning of each cell. Histone modifications, which modulate and are associated to transcriptional activation or repression, are a major epigenetic feature. Thanks to predictive modelling, we have studied which histone modifications relate better to enhancer or promoter function in mouse embryonic stem cells, during differentiation and in animal development. We have found that different histone modifications relate better to enhancers or promoters, respectively. We have studied the role of poised enhancers during differentiation and development. We have seen that poised enhancer activation is not exclusive of the neural lineage, but a general mechanism implicated in differentiation of every cell type. We have characterized the epigenetic landscape of Cushing’s syndrome. We have found persistent epigenetic and transcriptional alterations after long-term remission of the disease, related to a deep alteration of the circadian rhythm. These findings promise to be relevant for future therapeutic advances.
Una regulació apropiada de l’expressió gènica és necessària per a un correcte desenvolupament i homeòstasi dels organismes. Els mecanismes epigenètics representen una informació addicional, a més de la seqüència genètica, crucial per al correcte funcionament de cada cèl·lula. Les modificacions d’histones, que modulen i s’associen a activació o repressió transcripcionals, són una característica epigenètica important. Gràcies al modelatge predictiu, hem estudiat quines modificacions d’histones es relacionen millor amb la funció dels enhancers o promotors en cèl·lules mare embrionàries de ratolí, durant la diferenciació i en el desenvolupament animal. Hem trobat que modificacions d’histones diferents es relacionen millor amb enhancers o promotors, respectivament. Hem estudiat el rol dels poised enhancers durant la diferenciació i el desenvolupament. Hem vist que l’activació dels poised enhancers no és exclusiva del llinatge neural, sinó un mecanisme implicat en la diferenciació de tot tipus cel·lular. Hem caracteritzat el paisatge epigenètic de la síndrome de Cushing. Hem trobat alteracions epigenètiques i transcripcionals després d’una remissió de la malaltia a llarg termini, relacionades amb una profunda alteració del ritme circadiari. Aquestes troballes prometen ser rellevants per a futurs avenços terapèutics.
APA, Harvard, Vancouver, ISO, and other styles
49

Thorburn, Henrik. "Applying Bioinformatic Techniques to Identify Cold-associated Genes in Oat." Thesis, University of Skövde, Department of Computer Science, 2002. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-728.

Full text
Abstract:

As the interest in biological sequence analysis increases, more efficient techniques to sequence, map and analyse genome data are needed. One frequently used technique is EST sequencing, which has proven to be a fast and cheap method to extract genome data. An EST sequencing generates large numbers of low-quality sequences which have to be managed and analysed further.

Performing complete searches and finding guaranteed results are very time consuming. This dissertation project presents a method that can be used to perform rapid gene prediction of function-specific genes in EST data, as well as the results and an estimation of the accuracy of the method.

This dissertation project applies various methods and techniques on actual data, attempting to identify genes involved in cold-associative processes in plants. The presented method consists of three steps. First, a database with genes known to have cold-associated properties is assembled. These genes are extracted from other, already sequenced and analysed organisms. Secondly, this database is used to identify homologues in an unanalysed EST dataset, generating a candidate-list of cold-associated genes. Last, each of the identified candidate cold-associative genes are verified, both to estimate the accuracy of the rapid gene prediction and also to support the removal of candidates which are not cold-associative.

The method was applied to a previously unanalysed Avena sativa EST dataset, and was able to identify 135 candidate genes from approximately 9500 EST's. Out of these, 103 were verified as cold-associated genes.

APA, Harvard, Vancouver, ISO, and other styles
50

Barrenäs, Fredrik. "Bioinformatic identification of disease associated pathways by network based analysis." Doctoral thesis, Linköpings universitet, Institutionen för klinisk och experimentell medicin, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-81898.

Full text
Abstract:
Many common diseases are complex, meaning that they are caused by many interacting genes. This makes them difficult to study; to determine disease mechanisms, disease-associated genes must be analyzed in combination. Disease-associated genes can be detected using high-throughput methods, such as mRNA expression microarrays, DNA methylation microarrays and genome-wide association studies (GWAS), but determining how they interact to cause disease is an intricate challenge. One approach is to organize disease-associated genes into networks using protein-protein interactions (PPIs) and dissect them to identify disease causing pathways. Studies of complex disease can also be greatly facilitated by using an appropriate model system. In this dissertation, seasonal allergic rhinitis (SAR) served as a model disease. SAR is a common disease that is relatively easy to study. Also, the key disease cell types, like the CD4+ T cell, are known and can be cultured and activated in vitro by the disease causing pollen. The aim of this dissertation was to determine network properties of disease-associated genes, and develop methods to identify and validate networks of disease-associated genes. First, we showed that disease-associated genes have distinguishing network properties, one being that they co-localize in the human PPI network. This supported the existence of disease modules within the PPI network. We then identified network modules of genes whose mRNA expression was perturbed in human disease, and showed that the most central genes in those network modules were enriched for disease-associated polymorphisms identified by GWAS. As a case study, we identified disease modules using mRNA expression data from allergen-challenged CD4+ cells from patients with SAR. The case study identified and validated a novel disease-associated gene, FGF2 using GWAS data and RNAi mediated knockdown. Lastly, we examined how DNA methylation caused disease-associated mRNA expression changes in SAR. DNA methylation, but not mRNA expression profiles, could accurately distinguish allergic patients from healthy controls. Also, we found that disease-associated mRNA expression changes were associated with a low DNA methylation content and absence of CpG islands. Specifically within this group, we found a correlation between disease-associated mRNA expression changes and DNA methylation changes. Using ChIP-chip analysis, we found that targets of a known disease relevant transcription factor, IRF4, were also enriched among non CpG island genes with low methylation levels. Taken together, in this dissertation the network properties of disease-associated genes were examined, and then used to validate disease networks defined by mRNA expression data. We then examined regulatory mechanisms underlying disease-associated mRNA expression changes in a model disease. These studies support network-based analyses as a method to understand disease mechanisms and identify important disease causing genes, such as treatment targets or markers for personalized medication.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography