Log in

Relevant bibliographies by topics / Bioinformatics tool / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Bioinformatics tool.

Dissertations / Theses on the topic 'Bioinformatics tool'

Author: Grafiati

Published: 4 June 2021

Last updated: 12 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Bioinformatics tool.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Dodda, Srinivasa Rao. "Improvements and extensions of a web-tool for finding candidate genes associated with rheumatoid arthritis." Thesis, University of Skövde, School of Humanities and Informatics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-26.

Full text

Abstract:

<p>QuantitativeTraitLocus (QTL) is a statistical method used to restrict genomic regions contributing to specific phenotypes. To further localize genes in such regions a web tool called “Candidate Gene Capture” (CGC) was developed by Andersson et al. (2005). The CGC tool was based on the textual description of genes defined in the human phenotype database OMIM. Even though the CGC tool works well, the tool was limited by a number of inconsistencies in the underlying database structure, static web pages and some gene descriptions without properly defined function in the OMIM database. Hence, in this work the CGC tool was improved by redesigning its database structure, adding dynamic web pages and improving the prediction of unknown gene function by using exon analysis. The changes in database structure diminished the number of tables considerably, eliminated redundancies and made data retrieval more efficient. A new method for prediction of gene function was proposed, based on the assumption that similarity between exon sequences is associated with biochemical function. Using Blast with 20380 exon protein sequences and a threshold E-value of 0.01, 639 exon groups were obtained with an average of 11 exons per group. When estimating the functional similarity, it was found that on the average 72% of the exons in a group had at least one Gene Ontology (GO) term in common.</p>

APA, Harvard, Vancouver, ISO, and other styles

2

Naswa, Sudhir. "Representation of Biochemical Pathway Models : Issues relating conversion of model representation from SBML to a commercial tool." Thesis, University of Skövde, School of Humanities and Informatics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-28.

Full text

Abstract:

<p>Background: Computational simulation of complex biological networks lies at the heart of systems biology since it can confirm the conclusions drawn by experimental studies of biological networks and guide researchers to produce fresh hypotheses for further experimental validation. Since this iterative process helps in development of more realistic system models a variety of computational tools have been developed. In the absence of a common format for representation of models these tools were developed in different formats. As a result these tools became unable to exchange models amongst them, leading to development of SBML, a standard exchange format for computational models of biochemical networks. Here the formats of SBML and one of the commercial tools of systems biology are being compared to study the issues which may arise during conversion between their respective formats. A tool StoP has been developed to convert the format of SBML to the format of the selected tool.</p><p>Results: The basic format of SBML representation which is in the form of listings of various elements of a biochemical reaction system differs from the representation of the selected tool which is location oriented. In spite of this difference the various components of biochemical pathways including multiple compartments, global parameters, reactants, products, modifiers, reactions, kinetic formulas and reaction parameters could be converted from the SBML representation to the representation of the selected tool. The MathML representation of the kinetic formula in an SBML model can be converted to the string format of the selected tool. Some features of the SBML are not present in the selected tool. Similarly, the ability of the selected tool to declare parameters for locations, which are global to those locations and their children, is not present in the SBML.</p><p>Conclusions: Differences in representations of pathway models may include differences in terminologies, basic architecture, differences in capabilities of software’s, and adoption of different standards for similar things. But the overall similarity of domain of pathway models enables us to interconvert these representations. The selected tool should develop support for unit definitions, events and rules. Development of facility for parameter declaration at compartment level by SBML and facility for function declaration by the selected tool is recommended.</p>

APA, Harvard, Vancouver, ISO, and other styles

3

Rönnbrant, Anders. "Implementing a visualization tool for myocardial strain tensors." Thesis, Linköping University, Department of Biomedical Engineering, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-5173.

Full text

Abstract:

<p>The heart is a complex three-dimensional structure with mechanical properties that are inhomogeneous, non-linear, time-variant and anisotropic. These properties affect major physiological factors within the heart, such as the pumping performance of the ventricles, the oxygen demand in the tissue and the distribution of coronary blood flow.</p><p>During the cardiac cycle the heart muscle tissue is deformed as a consequence of the active contraction of the muscle fibers and their relaxation respectively. A mapping of this deformation would give increased understanding of the mechanical properties of the heart. The deformation induces strain and stress in the tissue which are both mechanical properties and can be described with a mathematical tensor object.</p><p>The aim of this master's thesis is to develop a visualization tool for the strain tensor objects that can aid a user to see and/or understand various differences between different hearts and spatial and temporal differences within the same heart. Preferably should the tool be general enough for use with different types of data.</p>

APA, Harvard, Vancouver, ISO, and other styles

4

Persson, Emma. "Developing a web based tool for identification of disease modules." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-16479.

Full text

Abstract:

Complex diseases such as cancer or obesity are thought to be caused by abnormalities in multiple genes and cannot be derived to one specific location in the genome. It has been shown that identification of disease associated genes can be made through looking at interaction patterns in a protein‐protein interaction network, where the disease associated genes are represented in clusters, or disease modules. There are several algorithms developed to infer these disease modules, but studies have shown that the reliability of the results increase if multiple algorithms are used and a consensus module is derived from them. MODifieR is an R package developed to combine the results of multiple disease module inferring algorithms and has proven to provide a stable result. To increase usability of the R package and make it available not only for users with programmatic skills, MODifieR Web was developed as a web based tool with a graphical user interface. The tool was built using Angular and .NET core, invoking the MODifieR R package in the backend. The interface requires input in the form of an expression matrix and a probe map from the user, easily uploadable in a drag‐and‐drop interface. It gives the user the possibility to analyze data using seven different algorithms and provide results as gene lists and visualizes the consensus module in a network image. MODifieR Web is a first version of an application that is a novel contribution to the existing tools for identification of disease modules, although in need of further improvements to be able to serve a greater pool of users in a more effective way. The tool is available to try out at http://transbioinfo.liu.se/modifier#/home and the source code is released as an open‐source project in Github (https://github.com/emmape/MODifieRProject).

APA, Harvard, Vancouver, ISO, and other styles

5

Hatherley, Rowan. "Structural bioinformatics studies and tool development related to drug discovery." Thesis, Rhodes University, 2016. http://hdl.handle.net/10962/d1020021.

Full text

Abstract:

This thesis is divided into two distinct sections which can be combined under the broad umbrella of structural bioinformatics studies related to drug discovery. The first section involves the establishment of an online South African natural products database. Natural products (NPs) are chemical entities synthesised in nature and are unrivalled in their structural complexity, chemical diversity, and biological specificity, which has long made them crucial to the drug discovery process. South Africa is rich in both plant and marine biodiversity and a great deal of research has gone into isolating compounds from organisms found in this country. However, there is no official database containing this information, making it difficult to access for research purposes. This information was extracted manually from literature to create a database of South African natural products. In order to make the information accessible to the general research community, a website, named “SANCDB”, was built to enable compounds to be quickly and easily searched for and downloaded in a number of different chemical formats. The content of the database was assessed and compared to other established natural product databases. Currently, SANCDB is the only database of natural products in Africa with an online interface. The second section of the thesis was aimed at performing structural characterisation of proteins with the potential to be targeted for antimalarial drug therapy. This looked specifically at 1) The interactions between an exported heat shock protein (Hsp) from Plasmodium falciparum (P. falciparum), PfHsp70-x and various host and exported parasite J proteins, as well as 2) The interface between PfHsp90 and the heat shock organising protein (PfHop). The PfHsp70-x:J protein study provided additional insight into how these two proteins potentially interact. Analysis of the PfHsp90:PfHop also provided a structural insight into the interaction interface between these two proteins and identified residues that could be targeted due to their contribution to the stability of the Hsp90:Hop binding complex and differences between parasite and human proteins. These studies inspired the development of a homology modelling tool, which can be used to assist researchers with homology modelling, while providing them with step-by-step control over the entire process. This thesis presents the establishment of a South African NP database and the development of a homology modelling tool, inspired by protein structural studies. When combined, these two applications have the potential to contribute greatly towards in silico drug discovery research.

APA, Harvard, Vancouver, ISO, and other styles

6

Brown, David K. "Bioinformatics tool development with a focus on structural bioinformatics and the analysis of genetic variation in humans." Thesis, Rhodes University, 2018. http://hdl.handle.net/10962/60708.

Full text

Abstract:

This thesis is divided into three parts, united under the general theme of bioinformatics tool development and variation analysis. Part 1 describes the design and development of the Job Management System (JMS), a workflow management system for high performance computing (HPC). HPC has become an integral part of bioinformatics. Computational methods for molecular dynamics and next generation sequencing (NGS) analysis, which require complex calculations on large datasets, are not yet feasible on desktop computers. As such, powerful computer clusters have been employed to perform these calculations. However, making use of these HPC clusters requires familiarity with command line interfaces. This excludes a large number of researchers from taking advantage of these resources. JMS was developed as a tool to make it easier for researchers without a computer science background to make use of HPC. Additionally, JMS can be used to host computational tools and pipelines and generates both web-based interfaces and RESTful APIs for those tools. The web-based interfaces can be used to quickly and easily submit jobs to the underlying cluster. The RESTful web API, on the other hand, allows JMS to provided backend functionality for external tools and web servers that want to run jobs on the cluster. Numerous tools and workflows have already been added to JMS, several of which have been incorporated into external web servers. One such web server is the Human Mutation Analysis (HUMA) web server and database. HUMA, the topic of part 2 of this thesis, is a platform for the analysis of genetic variation in humans. HUMA aggregates data from various existing databases into a single, connected and related database. The advantages of this are realized in the powerful querying abilities that it provides. HUMA includes protein, gene, disease, and variation data and can be searched from the angle of any one of these categories. For example, searching for a protein will return the protein data (e.g. protein sequences, structures, domains and families, and other meta-data). However, the related nature of the database means that genes, diseases, variation, and literature related to the protein will also be returned, giving users a powerful and holistic view of all data associated with the protein. HUMA also provides links to the original sources of the data, allowing users to follow the links to find additional details. HUMA aims to be a platform for the analysis of genetic variation. As such, it also provides tools to visualize and analyse the data (several of which run on the underlying cluster, via JMS). These tools include alignment and 3D structure visualization, homology modeling, variant analysis, and the ability to upload custom variation datasets and map them to proteins, genes and diseases. HUMA also provides collaboration features, allowing users to share and discuss datasets and job results. Finally, part 3 of this thesis focused on the development of a suite of tools, MD-TASK, to analyse genetic variation at the protein structure level via network analysis of molecular dynamics simulations. The use of MD-TASK in combination with the tools developed in the previous parts of this thesis is showcased via the analysis of variation in the renin-angiotensinogen complex, a vital part of the renin-angiotensin system.

APA, Harvard, Vancouver, ISO, and other styles

7

Brockman, Michael James. "Eyetracking: A Novel Tool for Evaluating Learning." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1523987188501883.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Lember, Geivi. "Sepsis-associated Escherichia coli whole-genome sequencing analysis using in-house developed pipeline and 1928 diagnostics tool." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19841.

Full text

Abstract:

Sepsis is a life-threatening condition that is caused by a dysregulated host response to infection. Timely detection of sepsis and antibiotic treatment is important for the patient’s recovery from sepsis. Usually, when sepsis is detected, immediate antibiotic treatment is started with broad-spectrum antibiotics as it takes time to determine the correct antibiotic susceptibility. To overcome this problem, next-generation sequencing is seen as one possible development in clinical diagnostics in the future. Automated bioinformatics pipelines could be used initially for surveillance purposes but eventually for rapid clinical diagnosis. Therefore, the results of 1928 Diagnostics, an automated pipeline for whole-genome sequencing (WGS) data analysis, were compared with the results of an in-house developed pipeline for manual data processing by analyzing sepsis-associated Escherichia coli (SEPEC) WGS data. The pipelines were compared by assessing their predicted antimicrobial resistance (AMR) genes, virulence genes and epidemiological relatedness. In addition, the predicted resistance genes were compared to phenotypic antimicrobial susceptibility testing (AST) data from the clinical microbiology laboratory. All the results obtained from the 1928 Diagnostics and in-house pipeline were similar but differed in the number of virulence/predicted AMR genes, AMR gene variants, detection of species and epidemiologically related E. coli samples. Moreover, the predicted AMR genes from both pipelines did not show a good overall relation to the phenotypic AST result. More studies are needed to make predictions of genes from the WGS analysis more reliable so that WGS analysis can be used as a diagnostics tool in clinical laboratories in the future.

APA, Harvard, Vancouver, ISO, and other styles

9

Staton, Margaret E. "Bioinformatics tool development and sequence analysis of Rosaceae family expressed sequence tages." Connect to this title online, 2007. http://etd.lib.clemson.edu/documents/1193078921/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Kanchinadam, Krishna M. "DataMapX a tool for cross-mapping entities and attributes between bioinformatics databases /." Fairfax, VA : George Mason University, 2008. http://hdl.handle.net/1920/3135.

Full text

Abstract:

Thesis (M.S.)--George Mason University, 2008.<br>Vita: p. 29. Thesis director: Jennifer Weller. Submitted in partial fulfillment of the requirements for the degree of Master of Science in Bioinformatics. Title from PDF t.p. (viewed July 7, 2008). Includes bibliographical references (p. 28). Also issued in print.

APA, Harvard, Vancouver, ISO, and other styles

11

Wang, Chen. "Novel software tool for microsatellite instability classification and landscape of microsatellite instability in osteosarcoma." Miami University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=miami1554829925088174.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Testa, Oliver David. "The CC+ tool set : a web-based resource for studying coiled-coil bioinformatics." Thesis, University of Bristol, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.550297.

Full text

Abstract:

Coiled-coil motifs are elements of protein structure, which, whilst based on relatively simple sequences, form a variety of structures and perform various functions in biology. The subtleties in coiled-coil sequences that lead to this diversity are not yet comprehensively understood. This thesis aims to explore these relationships by putting in place of relational database of coiled-coil structures and sequences. To this end coiled-coil motifs were identified from the POB using the SOCKET algorithm. The identified assignments were collated and organized into the CC+ tool set, and this was used to study various contemporary issues in coiled-coil research. CC+ comprises an organized file system of data made during coiled-coil assignments, and a subset of these data was parsed into a relational database. The relational database facilitates searching coiled-coil data sets for motifs of specific topologies, configurations and composition. Using the tool set, compiling data sets of specific coiled-coil data is no longer a laborious manual undertaking, but a simple and speedy process conducted in real time. This process was further enhanced with the implementation of an on-line interface; the web site facilitates interrogation of the tool set, visualization of the coiled-coil assignments returned from the database, and exporting data for further study elsewhere. The tool set was used to study several aspects of coiled-coil sequence-to-structure relationships. Searches performed on amino-acid interactions occurring within the hydrophobic interface of antiparallel, 2-helix coiled coils yielded hypotheses consistent with the findings of an independent experimental system. Networks of polar interactions incorporating as- paragine and an ion ligand within the hydrophobia interface of 3-helix coiled coils have also been observed and studied. Throughout these studies, it was apparent that many coiled-coil assignments within the tool set are either exact duplicates, or redundant due to the highly repetitive nature of identified coiled-coil sequences. Therefore, algorithms were developed to determine the relative redundancies of different aspect of coiled-coil structure, and these are incorporated dynamically into web-based searching to improve the quality of the data sets returned. Numerous algorithms exist to predict the formation of coiled-coil structure using amino-acid sequences, and these are based on specific amino-acid compositions recorded from manually compiled data sets. It is hoped that the CC+ tool set will facilitate refinement of existing coiled-coil prediction algorithms by contributing updated data sets, and even contribute in the creation of more accurate prediction methods in future. Furthermore, specific amino- acid interactions within coiled-coil systems have been investigated, and conclusions made regarding their effect on coiled-coil stability, illustrating the tool set's potential in facilitating de novo design of novel coiled-coil interactions.

APA, Harvard, Vancouver, ISO, and other styles

13

Stamm, Karl D. "Gene set enrichment and projection| A computational tool for knowledge discovery in transcriptomes." Thesis, Marquette University, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10146411.

Full text

Abstract:

<p> Explaining the mechanism behind a genetic disease involves two phases, collecting and analyzing data associated to the disease, then interpreting those data in the context of biological systems. The objective of this dissertation was to develop a method of integrating complementary datasets surrounding any single biological process, with the goal of presenting the response to a signal in terms of a set of downstream biological effects. This dissertation specifically tests the hypothesis that computational projection methods overlaid with domain expertise can direct research towards relevant systems-level signals underlying complex genetic disease. To this end, I developed a software algorithm named Geneset Enrichment and Projection Displays (GSEPD) that can visualize multidimensional genetic expression to identify the biologically relevant gene sets that are altered in response to a biological process. </p><p> This dissertation highlights a problem of data interpretation facing the medical research community, and shows how computational sciences can help. By bringing annotation and expression datasets together, a new analytical and software method was produced that helps unravel complicated experimental and biological data. </p><p> The dissertation shows four coauthored studies where the experts in their field have desired to annotate functional significance to a gene-centric experiment. Using GSEPD to show inherently high dimensional data as a simple colored graph, a subspace vector projection directly calculated how each sample behaves like test conditions. The end-user medical researcher understands their data as a series of somewhat-independent subsystems, and GSEPD provides a dimensionality reduction for high throughput experiments of limited sample size. Gene Ontology analyses are accessible on a sample-to-sample level, and this work highlights not just the expected biological systems, but many annotated results available in vast online databases.</p>

APA, Harvard, Vancouver, ISO, and other styles

14

Sanchez, Rhea I. "Annotation consistency tool : the assessment of JCVI microbial genome annotations /." Online version of thesis, 2009. http://hdl.handle.net/1850/10653.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Selvaraja, Sudarshan. "Microarray Data Analysis Tool (MAT)." University of Akron / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=akron1227467806.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Lee, Tsung-Lu. "BAXQLB̲LAST an enhanced BLAST bioinformatics homology search tool with batch and structured query support /." [Gainesville, Fla.] : University of Florida, 2002. http://purl.fcla.edu/fcla/etd/UFE1001161.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Patsekin, Aleksandr. "Feature Learning as a Tool to Identify Existence of Multiple Biological Patterns." Thesis, Purdue University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10807747.

Full text

Abstract:

<p> This paper introduces a novel approach for assessing multiple patterns in biological imaging datasets. The developed tool should be able to provide most probable structure of a dataset of images that consists of biological patterns not encountered during the model training process. The tool includes two major parts: (1) feature learning and extraction pipeline and (2) subsequent clustering with estimation of number of classes. The feature-learning part includes two deep-learning techniques and a feature quantitation pipeline as a benchmark method. Clustering includes three non-parametric methods. K-means clustering is employed for validation and hypothesis testing by comparing results with provided ground truth. The most appropriate methods and hyper-parameters were suggested to achieve maximum clustering quality. A convolutional autoencoder demonstrated the most stable and robust results: entropy-based V-measure metric 0.9759 on a dataset of classes employed for training and 0.9553 on a dataset of completely novel classes.</p><p>

APA, Harvard, Vancouver, ISO, and other styles

18

Abdulahad, Bassam, and Georgios Lounis. "A user interface for the ontology merging tool SAMBO." Thesis, Linköping University, Department of Computer and Information Science, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2659.

Full text

Abstract:

<p>Ontologies have become an important tool for representing data in a structured manner. Merging ontologies allows for the creation of ontologies that later can be composed into larger ontologies as well as for recognizing patterns and similarities between ontologies. Ontologies are being used nowadays in many areas, including bioinformatics. In this thesis, we present a desktop version of SAMBO, a system for merging ontologies that are represented in the languages OWL and DAML+OIL. The system has been developed in the programming language JAVA with JDK (Java Development Kit) 1.4.2. The user can open a file locally or from the network and can merge ontologies using suggestions generated by the SAMBO algorithm. SAMBO provides a user-friendly graphical interface, which guides the user through the merging process.</p>

APA, Harvard, Vancouver, ISO, and other styles

19

Lampa, Samuel. "SWI-Prolog as a Semantic Web Tool for semantic querying in Bioclipse: Integration and performance benchmarking." Thesis, Uppsala universitet, Institutionen för farmaceutisk biovetenskap, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-146738.

Full text

Abstract:

The huge amounts of data produced in high-throughput techniques in the life sciences and the need for integration of heterogeneous data from disparate sources in new fields such as Systems Biology and translational drug development require better approaches to data integration. The semantic web is anticipated to provide solutions through new formats for knowledge representation and management. Software libraries for semantic web formats are becoming mature, but there exist multiple tools based on foundationally different technologies. SWI-Prolog, a tool with semantic web support, was integrated into the Bioclipse bio- and cheminformatics workbench software and evaluated in terms of performance against non Prolog-based semantic web tools in Bioclipse, Jena and Pellet, for querying a data set consisting of mostly numerical, NMR shift values, in the semantic web format RDF. The integration has given access to the convenience of the Prolog language for working with semantic data and defining data management workflows in Bioclipse. The performance comparison shows that SWI-Prolog is superior in terms of performance over Jena and Pellet for this specific dataset and suggests Prolog-based tools as interesting for further evaluations.

APA, Harvard, Vancouver, ISO, and other styles

20

Sharman, Joanna Louise. "Visualising Plasmodium falciparum functional genomic data in MaGnET : malaria genome exploration tool." Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/5936.

Full text

Abstract:

Malaria affects the lives of 500 million people around the world each year. The disease is caused by protozoan parasites of the genus Plasmodium, whose ability to evade the immune system and quickly evolve resistance to drugs poses a major challenge for disease control. The results of several Plasmodium genome sequencing projects have revealed how little is known about the function of their genes (over half of the approximately 5400 genes in Plasmodium falciparum, the most deadly human parasite, are annotated as hypothetical ). Recently, several large-scale studies have attempted to shed light on the processes in which genes are involved; for example, the use of DNA microarrays to profile the parasite s gene expression. With the emergence of varied types of functional genomic data comes a need for effective tools that allow biologists (and bioinformaticians) to explore these data. The goal of exploration/browsing-style analyses will typically be to derive clues towards the function of thus far uncharacterised gene products, and to formulate experimentally testable hypotheses. Graphic interfaces to individual data sets are obviously beneficial in this endeavour. However, effective visual data exploration requires also that interfaces to different functional genomic data are integrated and that the user can carry forward a selected group of genes (not merely one at a time) across a variety of data sets. Non-expert users especially benefit from workbenchlike tools offering access to the data in this way. Still, only very few of the contemporary publicly available software have implemented such functionality. This work introduces a novel software tool for the integrated visualisation of functional genomic data relating to P. falciparum: the Malaria Genome Exploration Tool (MaGnET). MaGnET consists of a light-weight Java program for effective visualisation linked to a MySQL database for data storage. In order to maximise accessibility, the program is publicly available over the World Wide Web (http://www.malariagenomeexplorer.org/). MaGnET incorporates a Genome Viewer for visualising the location of genomic features, a Protein-Protein Interaction Viewer for visualising networks of experimentally determined interactions and an Expression Data Viewer for displaying mRNA and protein expression data. Complex database queries can easily be constructed in the Data Analysis Viewer. An advantage over most other tools is that all sections are fully integrated, allowing users to carry selected groups of genes across different datasets. Furthermore, MaGnET provides useful advanced visualisation features, including mapping of expression data onto genomic location or protein-protein interaction network. The inclusion of available third-party Java software has expanded the visualisation capability of MaGnET; for example, the Jmol viewer has been incorporated for viewing 3-D protein structures. An effort has been made to only include data in MaGnET that is at least of reasonable quality. The MaGnET database collates experimental data from various public Plasmodium resources (e.g. PlasmoDB) and from published functional genomic studies, such as DNA microarrays. In addition, through careful filtering and labelling we have been able to include some predicted annotation that has not been experimentally confirmed, such as Gene Ontology and InterPro functional assignments and modelled protein structures. The application of MaGnET to malaria biology is demonstrated through a series of small studies. Initial examples show how MaGnET can be used to effectively demonstrate results from previously published analyses. This is followed up by using MaGnET to make a set of predictions about the possible functions of selected uncharacterised genes and suggesting follow-up experiments.

APA, Harvard, Vancouver, ISO, and other styles

21

Gutiérrez-Sacristán, Alba 1990. "A Bioinformatics approach to the study of comorbidity : Insight into mental disorders." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/664356.

Full text

Abstract:

Estudios clínicos y epidemiológicos muestran que la comorbilidad, la coexistencia de varias enfermedades en un mismo paciente, tiene un gran impacto en la evolución de su estado de salud. Por lo tanto, el análisis de comorbilidades es clave para identificar nuevas estrategias preventivas y terapéuticas, trabajando hacia una medicina más personalizada. Con el fin de aprovechar el potencial del creciente volumen de información de salud disponible en la época del “big data”, esta tesis presenta el desarrollo de nuevas herramientas y recursos para la identificación de patrones de comorbilidad, basados en la información clínica y molecular. Las herramientas comoRbidity y psygenet2r presentados en esta tesis permiten analizar las comorbilidades de forma amplia y completa, y en particular, ofrecen a los usuarios la posibilidad de diseñar su propio estudio de comorbilidad según sus necesidades y especificaciones. Por otra parte, debido al importante papel que juega la información molecular en la interpretación de la causa de comorbilidades y la falta de recursos para recopilar esta información en el área específica de los trastornos mentales, una nueva base de datos, PsyGeNET, se ha desarrollado centrada en las asociaciones gen-enfermedad. En resumen, todas las herramientas desarrolladas en esta tesis, disponibles en el dominio público y aplicadas ya en estudios del campo biomédico, son de gran valor práctico para el análisis de la comorbilidad y puede ayudar a transformar la información clínica en conocimiento que puede ser analizado, interpretado por los investigadores y aplicado para lograr una práctica de la medicina más personalizada.<br>Clinical and epidemiological studies show that comorbidity, the coexistence of disorders in a patient, has a great impact on the evolution of the health status of patients. Therefore, comorbidity analysis is key to identify new preventive and therapeutic strategies, walking through a more personalized medicine. In order to harness the power of the increasing volume of available health information in the era of big data, this thesis presents the development of new tools and resources for the identification of comorbidity patterns, based on the clinical and molecular information. The comoRbidity package and the psygenet2r one presented in this thesis provide an adequately complete and comprehensive analysis of comorbidities and in particular, offer the users the possibility to design their own comorbidity study according to their needs and specifications. Moreover, due to the significant role that plays the molecular information in interpreting the cause of disease comorbidities and the lack of resources to collect that information in the specific area of mental disorders, a new manual curated database, PsyGeNET, focus on gene-disease association has also been developed. In summary, all the tools developed in this thesis, available to the scientific community and already applied to several studies in the biomedical field, are of immense practical value for the comorbidity analysis and can aid to transform clinical information in a form of knowledge that can be analyzed, interpreted by researchers and applied leading overall, to more personalized medicine.

APA, Harvard, Vancouver, ISO, and other styles

22

Parmidge, Amelia J. "NEPIC, a Semi-Automated Tool with a Robust and Extensible Framework that Identifies and Tracks Fluorescent Image Features." Thesis, Mills College, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=1556025.

Full text

Abstract:

<p> As fluorescent imaging techniques for biological systems have advanced in recent years, scientists have used fluorescent imaging more and more to capture the state of biological systems at different moments in time. For many researchers, analysis of the fluorescent image data has become the limiting factor of this new technique. Although identification of fluorescing neurons in an image is (seemingly) easily done by the human visual system, manual delineation of the exact pixels comprising these fluorescing regions of interest (or fROIs) in digital images does not scale up well, being time-consuming, reiterative, and error-prone. This thesis introduces NEPIC, the Neuron-to- Environment Pixel Intensity Calculator, which seeks to help resolve this issue. NEPIC is a semi-automated tool for finding and tracking the cell body of a single neuron over an entire movie of grayscale calcium image data. NEPIC also provides a highly extensible, open source framework that could easily support finding and tracking other kinds of fROIs. When tested on calcium image movies of the AWC neuron in <i>C. elegans</i> under highly variant conditions, NEPIC correctly identified the neuronal cell body in 95.48% of the movie frames, and successfully tracked this cell body feature across 98.60% of the frame transitions in the movies. Although support for finding and tracking multiple fROIs has yet to be implemented, NEPIC displays promise as a tool for assisting researchers in the bulk analysis of fluorescent imaging data.</p>

APA, Harvard, Vancouver, ISO, and other styles

23

Kokkonen, Alexander. "Evaluation of next-generation sequencing as a tool for determining the presence of pathogens in clinical samples." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-17374.

Full text

Abstract:

Metagenomic sequencing is an increasingly popular way of determining microbial diversity from environmental and clinical samples. By specifically targeting the 16S rRNA gene found in all bacteria, classifications of pathogens can be determined based on the variable and conserved regions found in the gene. Metagenomic sequencing can therefore highlight the vast difference in microbiological diversity between culture-dependent and culture-independent methods. Today, this has expanded into various next-generation sequencing platforms which can provide massively parallel sequencing of the target fragment. One of these platforms is Ion-torrent, which can be utilized for targeting the 16S rRNA gene and with the help of bioinformatics pipelines be able to classify pathogens using the bacteria’s own variable and conserved regions. The overall aim of the present work is to evaluate the clinical use of Ion-torrent 16S ribosomal RNA sequencing for determining pathogenic species from clinical samples, but also to set up a pipeline for clinical practice. Optimal DNA-extraction and quantification methods were determined towards each evaluated sample-type and DNA-eluates were sent for 16S rRNA Sanger and Next-generation sequencing. The result indicated that the next-generation sequencing shows a concordance in results towards the culturing-based method, but also the importance of experimental design and effective quality trimming of the NGS data. The conclusion of the project is that the Ion-torrent pipeline provided by the Public Health Agency of Sweden shows great promise in determining pathogens from clinical samples. However, there is still a lot of validation and standardisations needed for the successful implementation into a clinical setting.

APA, Harvard, Vancouver, ISO, and other styles

24

Narayanan, Kanchana. "MAVEN: a tool for Visualization and Functional Analysis of Genome-Wide Association Studies." Cleveland, Ohio : Case Western Reserve University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=case1269455528.

Full text

Abstract:

Thesis (Master of Sciences)--Case Western Reserve University, 2010<br>Department of EECS - Computer and Information Sciences Title from PDF (viewed on 2010-05-25) Includes abstract Includes bibliographical references and appendices Available online via the OhioLINK ETD Center

APA, Harvard, Vancouver, ISO, and other styles

25

Grigsby, Claude Curtis. "A Comprehensive Tool and Analytical Pathway for Differential Molecular Profiling and Biomarker Discovery." Wright State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=wright1387540709.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Lamprecht, Anna-Lena, Tiziana Margaria, and Bernhard Steffen. "Bio-jETI : a framework for semantics-based service composition." Universität Potsdam, 2009. http://opus.kobv.de/ubp/volltexte/2010/4506/.

Full text

Abstract:

Background: The development of bioinformatics databases, algorithms, and tools throughout the last years has lead to a highly distributedworld of bioinformatics services. Without adequatemanagement and development support, in silico researchers are hardly able to exploit the potential of building complex, specialized analysis processes from these services. The Semantic Web aims at thoroughly equipping individual data and services with machine-processable meta-information, while workflow systems support the construction of service compositions. However, even in this combination, in silico researchers currently would have to deal manually with the service interfaces, the adequacy of the semantic annotations, type incompatibilities, and the consistency of service compositions. Results: In this paper, we demonstrate by means of two examples how Semantic Web technology together with an adequate domain modelling frees in silico researchers from dealing with interfaces, types, and inconsistencies. In Bio-jETI, bioinformatics services can be graphically combined to complex services without worrying about details of their interfaces or about type mismatches of the composition. These issues are taken care of at the semantic level by Bio-jETI’s model checking and synthesis features. Whenever possible, they automatically resolve type mismatches in the considered service setting. Otherwise, they graphically indicate impossible/incorrect service combinations. In the latter case, the workflow developermay either modify his service composition using semantically similar services, or ask for help in developing the missing mediator that correctly bridges the detected type gap. Newly developed mediators should then be adequately annotated semantically, and added to the service library for later reuse in similar situations. Conclusion: We show the power of semantic annotations in an adequately modelled and semantically enabled domain setting. Using model checking and synthesis methods, users may orchestrate complex processes from a wealth of heterogeneous services without worrying about interfaces and (type) consistency. The success of this method strongly depends on a careful semantic annotation of the provided services and on its consequent exploitation for analysis, validation, and synthesis. We are convinced that these annotations will become standard, as they will become preconditions for the success and widespread use of (preferred) services in the Semantic Web

APA, Harvard, Vancouver, ISO, and other styles

27

Sentausa, Erwin. "Time course simulation replicability of SBML-supporting biochemical network simulation tools." Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-33.

Full text

Abstract:

<p>Background: Modelling and simulation are important tools for understanding biological systems. Numerous modelling and simulation software tools have been developed for integrating knowledge regarding the behaviour of a dynamic biological system described in mathematical form. The Systems Biology Markup Language (SBML) was created as a standard format for exchanging biochemical network models among tools. However, it is not certain yet whether actual usage and exchange of SBML models among the tools of different purpose and interfaces is assessable. Particularly, it is not clear whether dynamic simulations of SBML models using different modelling and simulation packages are replicable.</p><p>Results: Time series simulations of published biological models in SBML format are performed using four modelling and simulation tools which support SBML to evaluate whether the tools correctly replicate the simulation results. Some of the tools do not successfully integrate some models. In the time series output of the successful</p><p>simulations, there are differences between the tools.</p><p>Conclusions: Although SBML is widely supported among biochemical modelling and simulation tools, not all simulators can replicate time-course simulations of SBML models exactly. This incapability of replicating simulation results may harm the peer-review process of biological modelling and simulation activities and should be addressed accordingly, for example by specifying in the SBML model the exact algorithm or simulator used for replicating the simulation result.</p>

APA, Harvard, Vancouver, ISO, and other styles

28

Sweeney, Deacon John. "A Computational Tool for Biomolecular Structure Analysis Based On Chemical and Enzymatic Modification of Native Proteins." Wright State University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=wright1316440232.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Mesa, Annia. "Auto-antigenic Properties of the Spliceosome as a Molecular Tool for Diagnosing Systemic Lupus Erythematosus and Mixed Connective Tissue Disease Patients." FIU Digital Commons, 2014. http://digitalcommons.fiu.edu/etd/1126.

Full text

Abstract:

Systemic Lupus Erythematosus (SLE) and Mixed Connective Tissue Disease (MCTD) are chronic, autoimmune disorders that target overlapping autoantigens and exhibit similar clinical manifestations. Despite 40 years of research, a reliable biomarker capable of diagnosing these syndromes has yet to be identified. Previous studies have confirmed that components of the U1 small nuclear ribonucleoprotein complex (U1 snRNP) such as U1A are 1000 fold more autoantigenic than any other nuclear component in SLE patients. Based on these findings, I hypothesize that models derived from the U1 snRNP autoantigenic properties could distinguish SLE from MCTD patients. To test this hypothesis, 30 peptides corresponding to protein regions of the U1 snRNP were tested in triplicates by indirect ELISA in sera from SLE or MCTD subjects. In addition laboratory tests and clinical manifestations data from these patients were included and analyzed in this investigation. Statistical classification methods as well as bioinformatics pattern recognition strategy were employed to determine which combination, if any, of all the variables included in this study provide the best segregation power for SLE and MCTD. The results confirmed that the IgM reactivity for U1 snRNP and U1A have the power to significantly distinguish SLE from MTCD patients as well as identify kidney and lung malfunctions for these subjects (p ≤ 0.05). Furthermore, the data analysis revealed eight novel classification rules for the segregation of SLE and MCTD which are a better classification tool than any of the currently available methods (p ≤ 0.05). Consequently, the results derived from this study support that SLE and MCTD are indeed separate disorders and pioneer the description of eight novel classification criteria capable of significantly discerning between SLE and MCTD patients (p ≤ 0.05).

APA, Harvard, Vancouver, ISO, and other styles

30

Desmet, François-Olivier. "Bioinformatique et épissage dans les pathologies humaines." Thesis, Montpellier 1, 2010. http://www.theses.fr/2010MON1T017.

Full text

Abstract:

Découvert en 1977, l'épissage est une étape de maturation post-transcriptionnelle consistant à rabouter les exons et éliminer les introns d'un ARN pré-messager. Pour que l'épissage soit correctement pris en charge par l'épisome et ses protéines auxiliaires, différents signaux sont présents le long de la séquence de l'ARN pré-messager. Il est maintenant reconnu que près de la moitié des mutations pathogènes chez l'homme impactent l'épissage, aboutissant à un dysfonctionnement du gène. Il est ainsi indispensable pour les biologistes d'être capables de détecter ces signaux sur une séquence génomique.Cette thèse a donc pour but de concevoir de nouveaux algorithmes permettant d'apporter la puissance de calcul des ordinateurs au service de la biologie de l'épissage. La solution proposée, Human Splicing Finder (HSF), est capable de prédire les trois types de signaux d'épissage à partir d'une séquence quelconque extraite du génome humain. Nous avons évalué l'efficacité de prédiction d'HSF dans l'ensemble des situations associées à des mutations pathogènes pour lesquelles il a été démontré expérimentalement leur impact sur l'épissage et par rapport aux autres algorithmes de prédiction. Parallèlement à ces apports directs tant pour la connaissance des processus biologiques de l'épissage que pour le diagnostic, les nouvelles approches thérapeutiques génotype-spécifiques peuvent également bénéficier de ces nouveaux algorithmes. Ainsi HSF permet de mieux cibler les oligonucléotides anti-sens utilisés pour induire le saut d'exon dans la myopathie de Duchenne et les dysferlinopathies.La reconnaissance récente de l'intérêt majeur de l'épissage dans des domaines aussi variés que la recherche fondamentale, la thérapeutique et le diagnostic nécessitaient un point central d'accès aux signaux d'épissage. HSF a pour objet de remplir ce rôle, en étant régulièrement mis à jour pour intégrer de nouvelles connaissances, et est d'ores et déjà reconnu comme un outil de référence<br>Discovered in 1977, splicing is a post-transcriptional maturation process that consists in link-ing exons together and removing introns from a pre-messanger RNA. For splicing to be cor-rectly undertaken by the spliceosome and its auxiliary proteins, several signals are located along the pre-messanger RNA sequence. Nearly half of pathogenous mutations in humans are now recognized to impact splicing and leading to a gene dysfunction. Therefore it is es-sential for biologists to detect those signals in any genomic sequence.Thus, the goals of this thesis were to conceive new algorithms: i) to identify splicing signals; ii) to predict the impact of mutations on these signals and iii) to give access to this information to researchers thanks to the power of bioinformatics. The proposed solution, Human Splicing Finder (HSF), is a web application able to predict all types of splicing signals hidden in any sequence extracted from the human genome. We demonstrated the prediction's efficiency of HSF for all situations associated with pathogenous mutations for which an impact on splicing has been experimentally demonstrated. Along with these direct benefits for the knowledge of biological processes for splicing and diagnosis, new genotype-specific therapeutic approaches can also benefit from these new algorithms. Thus, HSF allows to better target antisense olignucleotides used to induce exon skipping in Duchenne myopathy and dysferlinopathies.The recent recognition of the major interest of splicing in various domains such as fundamen-tal research, therapeutics and diagnosis needed a one stop shop for splicing signals. HSF has for object to fulfill this need, being regularly updated to integrate new knowledge and is already recognized as an international reference tool

APA, Harvard, Vancouver, ISO, and other styles

31

Sutharzan, Sreeskandarajan. "A GENOME-WIDE ANALYSIS OF PERFECT INVERTED REPEATS IN ARABIDOPSIS THALIANA." Miami University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=miami1386848607.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Foose, Daniel Patrick. "Vespucci: A free, cross-platform software tool for spectroscopic data analysis and imaging." Wright State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1472823712.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Cameron, Michael, and mcam@mc-mc net. "Efficient Homology Search for Genomic Sequence Databases." RMIT University. Computer Science and Information Technology, 2006. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20070509.162443.

Full text

Abstract:

Genomic search tools can provide valuable insights into the chemical structure, evolutionary origin and biochemical function of genetic material. A homology search algorithm compares a protein or nucleotide query sequence to each entry in a large sequence database and reports alignments with highly similar sequences. The exponential growth of public data banks such as GenBank has necessitated the development of fast, heuristic approaches to homology search. The versatile and popular blast algorithm, developed by researchers at the US National Center for Biotechnology Information (NCBI), uses a four-stage heuristic approach to efficiently search large collections for analogous sequences while retaining a high degree of accuracy. Despite an abundance of alternative approaches to homology search, blast remains the only method to offer fast, sensitive search of large genomic collections on modern desktop hardware. As a result, the tool has found widespread use with millions of queries posed each day. A significant investment of computing resources is required to process this large volume of genomic searches and a cluster of over 200 workstations is employed by the NCBI to handle queries posed through the organisation's website. As the growth of sequence databases continues to outpace improvements in modern hardware, blast searches are becoming slower each year and novel, faster methods for sequence comparison are required. In this thesis we propose new techniques for fast yet accurate homology search that result in significantly faster blast searches. First, we describe improvements to the final, gapped alignment stages where the query and sequences from the collection are aligned to provide a fine-grain measure of similarity. We describe three new methods for aligning sequences that roughly halve the time required to perform this computationally expensive stage. Next, we investigate improvements to the first stage of search, where short regions of similarity between a pair of sequences are identified. We propose a novel deterministic finite automaton data structure that is significantly smaller than the codeword lookup table employed by ncbi-blast, resulting in improved cache performance and faster search times. We also discuss fast methods for nucleotide sequence comparison. We describe novel approaches for processing sequences that are compressed using the byte packed format already utilised by blast, where four nucleotide bases from a strand of DNA are stored in a single byte. Rather than decompress sequences to perform pairwise comparisons, our innovations permit sequences to be processed in their compressed form, four bases at a time. Our techniques roughly halve average query evaluation times for nucleotide searches with no effect on the sensitivity of blast. Finally, we present a new scheme for managing the high degree of redundancy that is prevalent in genomic collections. Near-duplicate entries in sequence data banks are highly detrimental to retrieval performance, however existing methods for managing redundancy are both slow, requiring almost ten hours to process the GenBank database, and crude, because they simply purge highly-similar sequences to reduce the level of internal redundancy. We describe a new approach for identifying near-duplicate entries that is roughly six times faster than the most successful existing approaches, and a novel approach to managing redundancy that reduces collection size and search times but still provides accurate and comprehensive search results. Our improvements to blast have been integrated into our own version of the tool. We find that our innovations more than halve average search times for nucleotide and protein searches, and have no signifcant effect on search accuracy. Given the enormous popularity of blast, this represents a very significant advance in computational methods to aid life science research.

APA, Harvard, Vancouver, ISO, and other styles

34

Stenberg, Johan. "Software Tools for Design of Reagents for Multiplex Genetic Analyses." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-6832.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Meng, Da. "Bioinformatics tools for evaluating microbial relationships." Pullman, Wash. : Washington State University, 2009. http://www.dissertations.wsu.edu/Dissertations/Spring2009/d_meng_042209.pdf.

Full text

Abstract:

Thesis (Ph. D.)--Washington State University, May 2009.<br>Title from PDF title page (viewed on June 8, 2009). "School of Electrical Engineering and Computer Science." Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

36

Berry, Eric Zachary 1980. "Bioinformatics and database tools for glycans." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/27085.

Full text

Abstract:

Thesis (M. Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.<br>Includes bibliographical references (leaves 75-76).<br>Recent advances in biology have afforded scientists with the knowledge that polysaccharides play an active role in modulating cellular activities. Glycosaminoglycans (GAGs) are one such family of polysaccharides that play a very important role in regulating the functions of numerous important signaling molecules and enzymes in the cell. Developing bioinformatics tools has been integral to advancing genomics and proteomics. While these tools have been well-developed to store and process sequence and structure information for proteins and DNA, they are very poorly developed for polysaccharides. Glycan structures pose special problems because of their tremendous information density per fundamental unit, their often-branched structures, and the complicated nature of their building blocks. The GlycoBank, an online database of known GAG structures and functions, has been developed to overcome many of these difficulties by developing a common notation for researchers to describe GAG sequences, a common repository to view known structure-function relationships, and the complex tools and searches needed to facilitate their work. This thesis focuses on the development of GlycoBank. In addition, a large, NIGMS-funded consortium, the Consortium for Functional Glycomics, is a larger database that also aims to store polysaccharide structure-function information of a broader collection of polysaccharides. The ideas and concepts implemented in developing GlycoBank were instrumental in developing databases and bioinformatics tools for the Consortium for Functional Glycomics.<br>by Eric Zachary Berry.<br>M.Eng.and S.B.

APA, Harvard, Vancouver, ISO, and other styles

37

Lopes, Pinto Fernando. "Development of Molecular Biology and Bioinformatics Tools : From Hydrogen Evolution to Cell Division in Cyanobacteria." Doctoral thesis, Uppsala universitet, Institutionen för fotokemi och molekylärvetenskap, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-110842.

Full text

Abstract:

The use of fossil fuels presents a particularly interesting challenge - our society strongly depends on coal and oil, but we are aware that their use is damaging the environment. Currently, this awareness is gaining momentum, and pressure to evolve towards an energetically cleaner planet is very strong. Molecular hydrogen (H2) is an environmentally suitable energy carrier that could initially supplement or even substitute fossil fuels. Ideally, the primary energy source to produce hydrogen gas should be renewable, and the process of conversion back to energy without polluting emissions, making this cycle environmentally clean. Photoconversion of water to hydrogen can be achieved using the following strategies: 1) the use of photochemical fuel cells, 2) by applying photovoltaics, or 3) by promoting production of hydrogen by photosynthetic microorganisms, either phototrophic anoxygenic bacteria and cyanobacteria or eukaryotic green algae. For photobiological H2 production cyanobacteria are among the ideal candidates since they: a) are capable of H2 evolution, and b) have simple nutritional requirements - they can grow in air (N2 and CO2), water and mineral salts, with light as the only energy source. As this project started, a vision and a set of overall goals were established. These postulated that improved H2 production over a long period demanded: 1) selection of strains taking in consideration their specific hydrogen metabolism, 2) genetic modification in order to improve the H2 evolution, and 3) cultivation conditions in bioreactors should be exmined and improved. Within these goals, three main research objectives were set: 1) update and document the use of cyanobacteria for hydrogen production, 2) create tools to improve molecular biology work at the transcription analysis level, and 3) study cell division in cyanobacteria. This work resulted in: 1) the publication of a review on hydrogen evolution by cyanobacteria, 2) the development of tools to assist understanding of transcription, and 3) the start of a new fundamental research approach to ultimately improve the yield of H2 evolution by cyanobacteria.

APA, Harvard, Vancouver, ISO, and other styles

38

Strafford, J. "Docking and bioinformatics tools to guide enzyme engineering." Thesis, University College London (University of London), 2012. http://discovery.ucl.ac.uk/1339145/.

Full text

Abstract:

The carbon-carbon bond forming ability of transketolase (TK), along with its broad substrate specificity, makes it very attractive as a biocatalyst in industrial organic synthesis. Through the production of saturation mutagenesis libraries focused on individual active site residues, several variants of TK have been discovered with enhanced activities on non-natural substrates. We have used computational and bioinformatics tools to increase our understanding of TK and to guide engineering of the enzyme for further improvements in activity. Computational automated docking is a powerful technique with the potential to identify transient structures along an enzyme reaction pathway that are difficult to obtain by experimental structure determination. We have used the AutoDock algorithm to dock a series of known ketol donor and aldehyde acceptor substrates into the active site of E. coli TK, both in the presence and the absence of reactive intermediates. Comparison of docked conformations with available crystal structure complexes allows us to propose a more complete mechanism at a level of detail not currently possible by experimental structure determination alone. Statistical coupling analysis (SCA) utilises evolutionary sequence data present within multiple sequence alignments to identify energetically coupled networks of residues within protein structures. Using this technique we have identified several coupled networks within the TK enzyme which we have targeted for mutagenesis in multiple mutant variant libraries. Screening of these libraries for increased activity on the non-natural substrate propionaldehyde (PA) has identified combinations of mutations that act synergistically on enzyme activity. Notably, a double variant has been discovered with a 20-fold improvement in kcat relative to wild type on the PA reaction, this is higher than any other TK variant discovered to date.

APA, Harvard, Vancouver, ISO, and other styles

39

Mahram, Atabak. "FPGA acceleration of sequence analysis tools in bioinformatics." Thesis, Boston University, 2013. https://hdl.handle.net/2144/11126.

Full text

Abstract:

Thesis (Ph.D.)--Boston University<br>With advances in biotechnology and computing power, biological data are being produced at an exceptional rate. The purpose of this study is to analyze the application of FPGAs to accelerate high impact production biosequence analysis tools. Compared with other alternatives, FPGAs offer huge compute power, lower power consumption, and reasonable flexibility. BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. It is a complex highly-optimized system, consisting of tens of thousands of lines of code and a large number of heuristics. Our idea is to emulate the main phases of its algorithm on FPGA. Utilizing our FPGA engine, we quickly reduce the size of the database to a small fraction, and then use the original code to process the query. Using a standard FPGA-based system, we achieved 12x speedup over a highly optimized multithread reference code. Multiple Sequence Alignment (MSA)--the extension of pairwise Sequence Alignment to multiple Sequences--is critical to solve many biological problems. Previous attempts to accelerate Clustal-W, the most commonly used MSA code, have directly mapped a portion of the code to the FPGA. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 8Ox to 190x over the CPU code (8 cores). The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics. The challenge in FPGA-based acceleration is finding a suitable application mapping. Unfortunately many software heuristics do not fall into this category and so other methods must be applied. One is restructuring: an entirely new algorithm is applied. Another is to analyze application utilization and develop accuracy/performance tradeoffs. Using our prefiltering approach and novel FPGA programming models we have achieved significant speedup over reference programs. We have applied approximation, seeding, and filtering to this end. The bulk of this study is to introduce the pros and cons of these acceleration models for biosequence analysis tools.

APA, Harvard, Vancouver, ISO, and other styles

40

Petri, Eric D. C. "Bioinformatics Tools for Finding the Vocabularies of Genomes." Ohio University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1213730223.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Malatras, Apostolos. "Bioinformatics tools for the systems biology of dysferlin deficiency." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066627/document.

Full text

Abstract:

Le but de mon projet est de créer et d’appliquer des outils pour l’analyse de la biologie des systèmes musculaires en utilisant différentes données OMICS. Ce projet s’intéresse plus particulièrement à la dysferlinopathie due la déficience d’une protéine appelée dysferline qui est exprimée principalement dans les muscles squelettiques et cardiaque. La perte du dysferline due à la mutation (autosomique-récessive) du gène DYSF entraîne une dystrophie musculaire progressive (LGMD2B, MM, DMAT). Nous avons déjà développé des outils bio-informatiques qui peuvent être utilisés pour l’analyse fonctionnelle de données OMICS, relative à la dyspherlinopathie. Ces derniers incluent le test dit «gene set enrichment analysis», test comparant les profils OMICS d’intérêts aux données OMICS musculaires préalablement publiées ; et l’analyse des réseaux impliquant les diffèrent(e)s protéines et transcrits entre eux/elles. Ainsi, nous avons analysé des centaines de données omiques publiées provenant d’archives publiques. Les outils informatiques que nous avons développés sont CellWhere et MyoMiner. CellWhere est un outil facile à utiliser, permettant de visualiser sur un graphe interactif à la fois les interactions protéine-protéine et la localisation subcellulaire des protéines. Myominer est une base de données spécialisée dans le tissu et les cellules musculaires, et qui fournit une analyse de co-expression, aussi bien dans les tissus sains que pathologiques. Ces outils seront utilisés dans l'analyse et l'interprétation de données transcriptomiques pour les dyspherlinopathies mais également les autres pathologies neuromusculaires<br>The aim of this project was to build and apply tools for the analysis of muscle omics data, with a focus on Dysferlin deficiency. This protein is expressed mainly in skeletal and cardiac muscles, and its loss due to mutation (autosomal-recessive) of the DYSF gene, results in a progressive muscular dystrophy (Limb Girdle Muscular Dystrophy type 2B (LGMD2B), Miyoshi myopathy and distal myopathy with tibialis anterior onset (DMAT)). We have developed various tools and pipelines that can be applied towards a bioinformatics functional analysis of omics data in muscular dystrophies and neuromuscular disorders. These include: tests for enrichment of gene sets derived from previously published muscle microarray data and networking analysis of functional associations between altered transcripts/proteins. To accomplish this, we analyzed hundreds of published omics data from public repositories. The tools we developed are called CellWhere and MyoMiner. CellWhere is a user-friendly tool that combines protein-protein interactions and protein subcellular localizations on an interactive graphical display (https://cellwhere-myo.rhcloud.com). MyoMiner is a muscle cell- and tissue-specific database that provides co-expression analyses in both normal and pathological tissues. Many gene co-expression databases already exist and are used broadly by researchers, but MyoMiner is the first muscle-specific tool of its kind (https://myominer-myo.rhcloud.com). These tools will be used in the analysis and interpretation of transcriptomics data from dysferlinopathic muscle and other neuromuscular conditions and will be important to understand the molecular mechanisms underlying these pathologies

APA, Harvard, Vancouver, ISO, and other styles

42

Parida, Mrutyunjaya. "Exploring and analyzing omics using bioinformatics tools and techniques." Diss., University of Iowa, 2018. https://ir.uiowa.edu/etd/6244.

Full text

Abstract:

During the Human Genome Project the first hundred billion bases were sequenced in four years, however, the second hundred billion bases were sequenced in four months (NHGRI, 2013). As efforts were made to improve every aspect of sequencing in this project, cost became inversely proportional to the speed (NHGRI, 2013). Human Genome Project ended in April 2003 but research in faster and cheaper ways to sequence the DNA is active to date (NHGRI, 2013). On the one hand, these advancements have allowed the convenient and unbiased generation and interrogation of a variety of omics datasets; on the other hand, they have substantially contributed towards the ever-increasing size of biological data. Therefore, informatics techniques are indispensable tools in the field of biology and medicine due to their ability to efficiently store and probe large datasets. Bioinformatics is a specialized domain under informatics that focusses on biological data storage, organization and analysis (NHGRI, 2013). Here, I have applied informatics approaches such as database designing and web development in the context of biological datasets or bioinformatics, to create a novel web-based resource that allows users to explore the comprehensive transcriptome of common aquatic tunicate named Oikopleura dioica (O .dioica), and access their associated annotations across key developmental time points, conveniently. This unique resource will substantially contribute towards studies on development, evolution and genetics of chordates using O. dioica as a model. Mendelian or single-gene disorders such as cystic fibrosis, sickle-cell anemia, Huntington’s disease, and Rett’s syndrome run across generations in families (Chial, 2008). Allelic variations associated with Mendelian disorders primarily reside in the protein-coding regions of the genome, collectively called an exome (Stenson et al., 2009). Therefore, sequencing of exome rather than whole genome is an efficient and practical approach to discover etiologic variants in our genome (Bamshad et al., 2011). Renal agenesis (RA) is a severe form of congenital anomalies of the kidney and urinary tract (CAKUT) where children are born with one (unilateral renal agenesis) or no kidneys (bilateral renal agenesis) (Brophy et al., 2017; Yalavarthy & Parikh, 2003). In this study, we have applied exome-sequencing technique to selective human patients in a renal agenesis (RA) pedigree that followed a Mendelian mode of disease transmission. Exome sequencing and molecular techniques combined with my bioinformatics analysis has led to the discovery of a novel RA gene called GREB1L (Brophy et al., 2017). In this study, we have successfully demonstrated the validation of exome sequencing and bioinformatics techniques to narrow down disease-associated mutations in human genome. Additionally, the results from this study has substantially contributed towards understanding the molecular basis of CAKUT. Discovery of novel etiologic variants will enhance our understanding of human diseases and development. High-throughput sequencing technique called RNA-Seq has revolutionized the field of transcriptome analysis (Z. Wang, Gerstein, & Snyder, 2009). Concisely, a library of cDNA is prepared from a RNA sample using an enzyme called reverse transcriptase (Nottingham et al., 2016). Next, the cDNA is fragmented, sequenced using a sequencing platform of choice and mapped to a reference genome, assembled transcriptome, or assembled de novo to generate a transcriptome (Grabherr et al., 2011; Nottingham et al., 2016). Mapping allows detection of high-resolution transcript boundaries, quantification of transcript expression and identification of novel transcripts in the genome. We have applied RNA-Seq to analyze the gene expression patterns in water flea otherwise known as D. pulex to work out the genetic details underlying heavy metal induced stress (unpublished) and predator induced phenotypic plasticity (PIPP) (Rozenberg et al., 2015), independently. My bioinformatics analysis of the RNA-Seq data has facilitated the discovery of key biological processes participating in metal induced stress response and predator induced defense mechanisms in D. pulex. These studies are great additions to the field of ecotoxicogenomics, phenotypic plasticity and have aided us in gaining mechanistic insight into the impact of toxicant and predator exposure on D. pulex at a bimolecular level.

APA, Harvard, Vancouver, ISO, and other styles

43

Furió, Tarí Pedro. "Development of bioinformatic tools for massive sequencing analysis." Doctoral thesis, Universitat Politècnica de València, 2020. http://hdl.handle.net/10251/152485.

Full text

Abstract:

[EN] Transcriptomics is one of the most important and relevant areas of bioinformatics. It allows detecting the genes that are expressed at a particular moment in time to explore the relation between genotype and phenotype. Transcriptomic analysis has been historically performed using microarrays until 2008 when high-throughput RNA sequencing (RNA-Seq) was launched on the market, replacing the old technique. However, despite the clear advantages over microarrays, it was necessary to understand factors such as the quality of the data, reproducibility and replicability of the analyses and potential biases. The first section of the thesis covers these studies. First, an R package called NOISeq was developed and published in the public repository "Bioconductor", which includes a set of tools to better understand the quality of RNA-Seq data, minimise the impact of noise in any posterior analyses and implements two new methodologies (NOISeq and NOISeqBio) to overcome the difficulties of comparing two different groups of samples (differential expression). Second, I show our contribution to the Sequencing Quality Control (SEQC) project, a continuation of the Microarray Quality Control (MAQC) project led by the US Food and Drug Administration (FDA, United States) that aims to assess the reproducibility and replicability of any RNA-Seq analysis. One of the most effective approaches to understand the different factors that influence the regulation of gene expression, such as the synergic effect of transcription factors, methylation events and chromatin accessibility, is the integration of transcriptomic with other omics data. To this aim, a file that contains the chromosomal position where the events take place is required. For this reason, in the second chapter, we present a new and easy to customise tool (RGmatch) to associate chromosomal positions to the exons, transcripts or genes that could regulate the events. Another aspect of great interest is the study of non-coding genes, especially long non-coding RNAs (lncRNAs). Not long ago, these regions were thought not to play a relevant role and were only considered as transcriptional noise. However, they represent a high percentage of the human genes and it was recently shown that they actually play an important role in gene regulation. Due to these motivations, in the last chapter we focus, first, in trying to find a methodology to find out the generic functions of every lncRNA using publicly available data and, second, we develop a new tool (spongeScan) to predict the lncRNAs that could be involved in the sequestration of micro-RNAs (miRNAs) and therefore altering their regulation task.<br>[ES] La transcriptómica es una de las áreas más importantes y destacadas en bioinformática, ya que permite ver qué genes están expresados en un momento dado para poder explorar la relación existente entre genotipo y fenotipo. El análisis transcriptómico se ha realizado históricamente mediante el uso de microarrays hasta que, en el año 2008, la secuenciación masiva de ARN (RNA-Seq) fue lanzada al mercado y comenzó a desplazar poco a poco su uso. Sin embargo, a pesar de las ventajas evidentes frente a los microarrays, resultaba necesario entender factores como la calidad de los datos, reproducibilidad y replicabilidad de los análisis así como los potenciales sesgos. La primera parte de la tesis aborda precisamente estos estudios. En primer lugar, se desarrolla un paquete de R llamado NOISeq, publicado en el repositorio público "Bioconductor", el cual incluye un conjunto de herramientas para entender la calidad de datos de RNA-Seq, herramientas de procesado para minimizar el impacto del ruido en posteriores análisis y dos nuevas metodologías (NOISeq y NOISeqBio) para abordar la problemática de la comparación entre dos grupos (expresión diferencial). Por otro lado, presento nuestra contribución al proyecto Sequencing Quality Control (SEQC), una continuación del proyecto Microarray Quality Control (MAQC) liderado por la US Food and Drug Administration (FDA) que pretende evaluar precisamente la reproducibilidad y replicabilidad de los análisis realizados sobre datos de RNA-Seq. Una de las estrategias más efectivas para entender los diferentes factores que influyen en la regulación de la expresión génica, como puede ser el efecto sinérgico de los factores de transcripción, eventos de metilación y accesibilidad de la cromatina, es la integración de la transcriptómica con otros datos ómicos. Para ello se necesita generar un fichero que indique las posiciones cromosómicas donde se producen estos eventos. Por este motivo, en el segundo capítulo de la tesis presentamos una nueva herramienta (RGmatch) altamente customizable que permite asociar estas posiciones cromosómicas a los posibles genes, transcritos o exones a los que podría estar regulando cada uno de estos eventos. Otro de los aspectos de gran interés en este campo es el estudio de los genes no codificantes, especialmente los ARN largos no codificantes (lncRNAs). Hasta no hace mucho, se pensaba que estos genes no jugaban ningún papel fundamental y se consideraban como simple ruido transcripcional. Sin embargo, suponen un alto porcentaje de los genes del ser humano y se ha demostrado que juegan un papel crucial en la regulación de otros genes. Por este motivo, en el último capítulo nos centramos, en un primer lugar, en intentar obtener una metodología que permita averiguar las funciones generales de cada lncRNA haciendo uso de datos ya publicados y, en segundo lugar, generamos una nueva herramienta (spongeScan) que permite predecir qué lncRNAs podrían estar secuestrando determinados micro-RNAs (miRNAs), alterando así la regulación llevada a cabo por estos últimos.<br>[CA] La transcriptòmica és una de les àrees més importants i destacades en bioinformàtica, ja que permet veure quins gens s'expressen en un moment donat per a poder explorar la relació existent entre genotip i fenotip. L'anàlisi transcriptòmic s'ha fet històricament per mitjà de l'ús de microarrays fins l'any 2008 quan la tècnica de seqüenciació massiva d'ARN (RNA-Seq) es va fer pública i va començar a desplaçar a poc a poc el seu ús. No obstant això, a pesar dels avantatges evidents enfront dels microarrays, resultava necessari entendre factors com la qualitat de les dades, reproducibilitat i replicabilitat dels anàlisis, així com els possibles caires introduïts. La primera part de la tesi aborda precisament estos estudis. En primer lloc, es va programar un paquet de R anomenat NOISeq publicat al repositori públic "Bioconductor", el qual inclou un conjunt d'eines per a entendre la qualitat de les dades de RNA-Seq, eines de processat per a minimitzar l'impact del soroll en anàlisis posteriors i dos noves metodologies (NOISeq i NOISeqBio) per a abordar la problemàtica de la comparació entre dos grups (expressió diferencial). D'altra banda, presente la nostra contribució al projecte Sequencing Quality Control (SEQC), una continuació del projecte Microarray Quality Control (MAQC) liderat per la US Food and Drug Administration (FDA) que pretén avaluar precisament la reproducibilitat i replicabilitat dels anàlisis realitzats sobre dades de RNA-Seq. Una de les estratègies més efectives per a entendre els diferents factors que influïxen a la regulació de l'expressió gènica, com pot ser l'efecte sinèrgic dels factors de transcripció, esdeveniments de metilació i accessibilitat de la cromatina, és la integració de la transcriptómica amb altres dades ómiques. Per això es necessita generar un fitxer que indique les posicions cromosòmiques on es produïxen aquests esdeveniments. Per aquest motiu, en el segon capítol de la tesi presentem una nova eina (RGmatch) altament customizable que permet associar aquestes posicions cromosòmiques als possibles gens, transcrits o exons als que podria estar regulant cada un d'aquests esdeveniments regulatoris. Altre dels aspectes de gran interés en aquest camp és l'estudi dels genes no codificants, especialment dels ARN llargs no codificants (lncRNAs). Fins no fa molt, encara es pensava que aquests gens no jugaven cap paper fonamental i es consideraven com a simple soroll transcripcional. No obstant això, suposen un alt percentatge dels gens de l'ésser humà i s'ha demostrat que juguen un paper crucial en la regulació d'altres gens. Per aquest motiu, en l'últim capítol ens centrem, en un primer lloc, en intentar obtenir una metodologia que permeta esbrinar les funcions generals de cada lncRNA fent ús de dades ja publicades i, en segon lloc, presentem una nova eina (spongeScan) que permet predeir quins lncRNAs podríen estar segrestant determinats micro-RNAs (miRNAs), alterant així la regulació duta a terme per aquests últims.<br>Furió Tarí, P. (2020). Development of bioinformatic tools for massive sequencing analysis [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/152485<br>TESIS

APA, Harvard, Vancouver, ISO, and other styles

44

Murat, Katarzyna. "Bioinformatics analysis of epigenetic variants associated with melanoma." Thesis, University of Bradford, 2018. http://hdl.handle.net/10454/17220.

Full text

Abstract:

The field of cancer genomics is currently being enhanced by the power of Epigenome-wide association studies (EWAS). Over the last couple of years comprehensive sequence data sets have been generated, allowing analysis of genome-wide activity in cohorts of different individuals to be increasingly available. Finding associations between epigenetic variation and phenotype is one of the biggest challenges in biomedical research. Laboratories lacking dedicated resources and programming experience require bioinformatics expertise which can be prohibitively costly and time-consuming. To address this, we have developed a collection of freely available Galaxy tools (Poterlowicz, 2018a), combining analytical methods into a range of convenient analysis pipelines with graphical user-friendly interface.The tool suite includes methods for data preprocessing, quality assessment and differentially methylated region and position discovery. The aim of this project was to make EWAS analysis flexible and accessible to everyone and compatible with routine clinical and biological use. This is exemplified by my work undertaken by integrating DNA methylation profiles of melanoma patients (at baseline and mitogen-activated protein kinase inhibitor MAPKi treatment) to identify novel epigenetic switches responsible for tumour resistance to therapy (Hugo et al., 2015). Configuration files are publicly published on our GitHub repository (Poterlowicz, 2018b) with scripts and dependency settings also available to download and install via Galaxy test toolshed (Poterlowicz, 2018a). Results and experiences using this framework demonstrate the potential for Galaxy to be a bioinformatics solution for multi-omics cancer biomarker discovery tool.

APA, Harvard, Vancouver, ISO, and other styles

45

Malatras, Apostolos [Verfasser]. "Bioinformatics tools for the systems biology of dysferlin deficiency / Apostolos Malatras." Berlin : Freie Universität Berlin, 2018. http://d-nb.info/1171431333/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Pierleoni, Andrea <1979&gt. "Design and implementation of bioinformatics tools for large scale genome annotation." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2008. http://amsdottorato.unibo.it/695/.

Full text

Abstract:

The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe

APA, Harvard, Vancouver, ISO, and other styles

47

Cabrera, Cárdenas Claudia Paola. "Bioinformatics tools for the genetic dissection of complex traits in chickens." Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/3864.

Full text

Abstract:

This thesis explores the genetic characterization of the mechanisms underlying complex traits in chicken through the use and development of bioinformatics tools. The characterization of quantitative trait loci controlling complex traits has proven to be very challenging. This thesis comprises the study of experimental designs, annotation procedures and functional analyses. These represent some of the main ‘bottlenecks’ involved in the integration of QTLs with the biological interpretation of high-throughput technologies. The thesis begins with an investigation of the bioinformatics tools and procedures available for genome research, briefly reviewing microarray technology and commonly applied experimental designs. A targeted experimental design based on the concept of genetical genomics is then presented and applied in order to study a known functional QTL responsible for chicken body weight. This approach contrasts the gene expression levels of two alternative QTL genotypes, hence narrowing the QTL-phenotype gap, and, giving a direct quantification of the link between the genotypes and the genetic responses. Potential candidate genes responsible for the chicken body weight QTL are identified by using the location of the genes, their expression and biological significance. In order to deal with the multiple sources of information and exploit the data effectively, a systematic approach and a relational database were developed to improve the annotation of the probes of the ARK-Genomics G. gallus 13K v4.0 cDNA array utilized on the experiment. To follow up the investigation of the targeted genetical genomics study, a detailed functional analysis is performed on the dataset. The aim is to identify the downstream effects through the identification of functional variation found in pathways, and secondly to achieve a further characterization of potential candidate genes by using comparative genomics and sequence analyses. Finally the investigation of the body weight QTL syntenic regions and their reported QTLs are presented.

APA, Harvard, Vancouver, ISO, and other styles

48

Garma, L. D. (Leonardo D. ). "Structural bioinformatics tools for the comparison and classification of protein interactions." Doctoral thesis, Oulun yliopisto, 2017. http://urn.fi/urn:isbn:9789526216065.

Full text

Abstract:

Abstract Most proteins carry out their functions through interactions with other molecules. Thus, proteins taking part in similar interactions are likely to carry out related functions. One way to determine whether two proteins do take part in similar interactions is by quantifying the likeness of their structures. This work focuses on the development of methods for the comparison of protein-protein and protein-ligand interactions, as well as their application to structure-based classification schemes. A method based on the MultiMer-align (or MM-align) program was developed and used to compare all known dimeric protein complexes. The results of the comparison demonstrates that the method improves over MM-align in a significant number of cases. The data was employed to classify the complexes, resulting in 1,761 different protein-protein interaction types. Through a statistical model, the number of existing protein-protein interaction types in nature was estimated at around 4,000. The model allowed the establishment of a relationship between the number of quaternary families (sequence-based groups of protein-protein complexes) and quaternary folds (structure-based groups). The interactions between proteins and small organic ligands were studied using sequence-independent methodologies. A new method was introduced to test three similarity metrics. The best of these metrics was subsequently employed, together with five other existing methodologies, to conduct an all-to-all comparison of all the known protein-FAD (Flavin-Adenine Dinucleotide) complexes. The results demonstrates that the new methodology captures the best the similarities between complexes in terms of protein-ligand contacts. Based on the all-to-all comparison, the protein-FAD complexes were subsequently separated into 237 groups. In the majority of cases, the classification divided the complexes according to their annotated function. Using a graph-based description of the FAD-binding sites, each group could be further characterized and uniquely described. The study demonstrates that the newly developed methods are superior to the existing ones. The results indicate that both the known protein-protein and the protein-FAD interactions can be classified into a reduced number of types and that in general terms these classifications are consistent with the proteins' functions<br>Tiivistelmä Suurin osa proteiinien toiminnasta tapahtuu vuorovaikutuksessa muiden molekyylien kanssa. Proteiinit, jotka osallistuvat samanlaisiin vuorovaikutuksiin todennäköisesti toimivat samalla tavalla. Kahden proteiinin todennäköisyys esiintyä samanlaisissa vuorovaikutustilanteissa voidaan määrittää tutkimalla niiden rakenteellista samankaltaisuutta. Tämä väitöskirjatyö käsittelee proteiini-proteiini- ja proteiini-ligandi -vuorovaikutusten vertailuun käytettyjen menetelmien kehitystä, ja niiden soveltamista rakenteeseen perustuvissa luokittelujärjestelmissä. Tunnettuja dimeerisiä proteiinikomplekseja tutkittiin uudella MultiMer-align-ohjelmaan (MM-align) perustuvalla menetelmällä. Vertailun tulokset osoittavat, että uusi menetelmä suoriutui MM-alignia paremmin merkittävässä osassa tapauksista. Tuloksia käytettiin myös kompleksien luokitteluun, jonka tuloksena oli 1761 erilaista proteiinien välistä vuorovaikutustyyppiä. Luonnossa esiintyvien proteiinien välisten vuorovaikutusten määrän arvioitiin tilastollisen mallin avulla olevan noin 4000. Tilastollisen mallin avulla saatiin vertailtua sekä sekvenssin (”quaternary families”) sekä rakenteen (”quaternary folds”) mukaan ryhmiteltyjen proteiinikompleksien määriä. Proteiinien ja pienien orgaanisten ligandien välisiä vuorovaikutuksia tutkittiin sekvenssistä riippumattomilla menetelmillä. Uudella menetelmällä testattiin kolmea eri samankaltaisuutta mittaavaa metriikkaa. Näistä parasta käytettiin viiden muun tunnetun menetelmän kanssa vertailemaan kaikkia tunnettuja proteiini-FAD (Flavin-Adenine-Dinucleotide, flaviiniadeniinidinukleotidi) -komplekseja. Proteiini-ligandikontaktien osalta uusi menetelmä kuvasi kompleksien samankaltaisuutta muita menetelmiä paremmin. Vertailun tuloksia hyödyntäen proteiini-FAD-kompleksit luokiteltiin edelleen 237 ryhmään. Suurimmassa osassa tapauksista luokittelujärjestelmä oli onnistunut jakamaan kompleksit ryhmiin niiden toiminnallisuuden mukaisesti. Ryhmät voitiin määritellä yksikäsitteisesti kuvaamalla FAD:n sitoutumispaikka graafisesti. Väitöskirjatyö osoittaa, että siinä kehitetyt menetelmät ovat parempia kuin aikaisemmin käytetyt menetelmät. Tulokset osoittavat, että sekä proteiinien väliset että proteiini-FAD -vuorovaikutukset voidaan luokitella rajattuun määrään vuorovaikutustyyppejä ja yleisesti luokittelu on yhtenevä proteiinien toiminnan suhteen

APA, Harvard, Vancouver, ISO, and other styles

49

Mayol, Escuer Eduardo. "Development of bioinformatic tools for the study of membrane proteins." Doctoral thesis, Universitat Autònoma de Barcelona, 2019. http://hdl.handle.net/10803/667335.

Full text

Abstract:

Las proteínas de membrana son elementos fundamentales de todas las células conocidas, que representan una cuarta parte de los genes del genoma humano, y desempeñan funciones esenciales en la biología celular. Alrededor del 50% de los medicamentos comercializados actualmente tienen una proteína de membrana como objetivo, y alrededor de un tercio de todos ellos se dirigen a los receptores acoplados a proteína G (GPCR). Las dificultades y limitaciones en el trabajo experimental necesario para los estudios microscópicos de la membrana, así como las proteínas de membrana, impulsaron el uso de métodos computacionales. El alcance de esta tesis es desarrollar nuevas herramientas bioinformáticas para el estudio de las proteínas de membrana y en particular para GPCRs que ayudan a caracterizar sus rasgos estructurales y ayudar a la comprensión de su función. Con respecto a las proteínas de membrana, una piedra angular de esta tesis ha sido la creación de dos bases de datos para las principales clases de proteínas de membrana: una para helices-α (TMalphaDB) y otra para proteínas barriles-β (TMbetaDB). Estas bases de datos son empleadas por una herramienta recientemente desarrollada para encontrar distorsiones estructurales inducidas por motivos específicos de secuencias de aminoácidos (http://lmc.uab.cat/tmalphadb y http://lmc.uab.cat/tmbetadb). También se usaron en la caracterización de las interacciones entre residuos que se producen en la región transmembrana de estas proteínas con el objetivo de favorecer la comprensión de la complejidad y las características diferenciales de las proteínas de membrana. Se encontró que las interacciones que involucran los residuos de Phe y Leu son las principales responsables de la estabilización de la región transmembrana. Además, se analizó la contribución energética de las interacciones entre los aminoácidos que contienen azufre (Met y Cys) y los residuos alifáticos o aromáticos. Estas interacciones normalmente no se tienen en gran consideración a pesar de que pueden formar interacciones más fuertes que las interacciones aromático-aromático o aromático-alifático. Asimismo, la familia de GPCRs, la más importante de proteínas de membrana, ha sido el foco de dos aplicaciones web dedicadas al análisis de conservación de aminoácidos o motivos de secuencia y correlación de pares (GPCR-SAS, http://lmc.uab.cat/gpcrsas) y para incorporar moléculas de agua internas en estructuras de estos receptores (HomolWat, http://lmc.uab.cat/HW). Estas aplicaciones web son estudios piloto que pueden extenderse a otras familias de proteínas de membrana en proyectos futuros. Todas estas herramientas y análisis pueden ayudar en el desarrollo de mejores modelos estructurales y contribuir a la comprensión de las proteínas de membrana.<br>Membrane proteins are fundamental elements for every known cell, accounting for a quarter of genes in the Human genome, they play essential roles in cell biology. About 50% of currently marketed drugs have a membrane protein as target, and around a third of them target G-protein-coupled receptors (GPCRs). The current difficulties and limitations in the experimental work necessary for microscopic studies of the membrane as well as membrane proteins urged the use of computational methods. The scope of this thesis is to develop new bioinformatic tools for the study of membrane proteins and also for GPCRs in particular that help to characterize their structural features and understand their function. In regard to membrane proteins, a cornerstone of this thesis has been the creation of two databases for the main classes of membrane proteins: one for α-helical proteins (TMalphaDB) and another for β-barrel proteins (TMbetaDB). These databases are used by a newly developed tool to find structural distortions induced by specific amino acid sequence motifs (http://lmc.uab.cat/tmalphadb and http://lmc.uab.cat/tmbetadb) and in the characterization of inter-residue interactions that occur in the transmembrane region of membrane proteins aimed to understand the complexity and differential features of these proteins. Interactions involving Phe and Leu residues were found to be the main responsible for the stabilization of the transmembrane region. Moreover, the energetic contribution of interactions between sulfur-containing amino acids (Met and Cys) and aliphatic or aromatic residues were analyzed. These interactions are often not considered despite they can form stronger interactions than aromatic-aromatic or aromatic-aliphatic interactions. Additionally, G-protein coupled receptor family, the most important family of membrane proteins, have been the focus of two web applications tools dedicated to the analysis of conservation of amino acids or sequence motifs and pair correlation (GPCR-SAS, http://lmc.uab.cat/gpcrsas) and to allocate internal water molecules in receptor structures (HomolWat, http://lmc.uab.cat/HW). These web applications are pilot studies that can be extended to other membrane proteins families in future projects. All these tools and analysis may help in the development of better structural models and contribute to the understanding of membrane proteins.

APA, Harvard, Vancouver, ISO, and other styles

50

Torabi, Moghadam Behrooz. "Computational discovery of DNA methylation patterns as biomarkers of ageing, cancer, and mental disorders : Algorithms and Tools." Doctoral thesis, Uppsala universitet, Institutionen för cell- och molekylärbiologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-320720.

Full text

Abstract:

Epigenetics refers to the mitotically heritable modifications in gene expression without a change in the genetic code. A combination of molecular, chemical and environmental factors constituting the epigenome is involved, together with the genome, in setting up the unique functionality of each cell type. DNA methylation is the most studied epigenetic mark in mammals, where a methyl group is added to the cytosine in a cytosine-phosphate-guanine dinucleotides or a CpG site. It has been shown to have a major role in various biological phenomena such as chromosome X inactivation, regulation of gene expression, cell differentiation, genomic imprinting. Furthermore, aberrant patterns of DNA methylation have been observed in various diseases including cancer. In this thesis, we have utilized machine learning methods and developed new methods and tools to analyze DNA methylation patterns as a biomarker of ageing, cancer subtyping and mental disorders. In Paper I, we introduced a pipeline of Monte Carlo Feature Selection and rule-base modeling using ROSETTA in order to identify combinations of CpG sites that classify samples in different age intervals based on the DNA methylation levels. The combination of genes that showed up to be acting together, motivated us to develop an interactive pathway browser, named PiiL, to check the methylation status of multiple genes in a pathway. The tool enhances detecting differential patterns of DNA methylation and/or gene expression by quickly assessing large data sets. In Paper III, we developed a novel unsupervised clustering method, methylSaguaro, for analyzing various types of cancers, to detect cancer subtypes based on their DNA methylation patterns. Using this method we confirmed the previously reported findings that challenge the histological grouping of the patients, and proposed new subtypes based on DNA methylation patterns. In Paper IV, we investigated the DNA methylation patterns in a cohort of schizophrenic and healthy samples, using all the methods that were introduced and developed in the first three papers.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!