Dissertations / Theses: 'Bioinformatics applications'

1

Peeters, Justine Kate. "Microarray bioinformatics and applications in oncology." [S.l.] : Rotterdam : [The Author] ; Erasmus University [Host], 2008. http://hdl.handle.net/1765/12618.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Alvarez, Vega Marco. "Graph Kernels and Applications in Bioinformatics." DigitalCommons@USU, 2011. https://digitalcommons.usu.edu/etd/1185.

Full text

Abstract:

In recent years, machine learning has emerged as an important discipline. However, despite the popularity of machine learning techniques, data in the form of discrete structures are not fully exploited. For example, when data appear as graphs, the common choice is the transformation of such structures into feature vectors. This procedure, though convenient, does not always effectively capture topological relationships inherent to the data; therefore, the power of the learning process may be insufficient. In this context, the use of kernel functions for graphs arises as an attractive way to deal with such structured objects. On the other hand, several entities in computational biology applications, such as gene products or proteins, may be naturally represented by graphs. Hence, the demanding need for algorithms that can deal with structured data poses the question of whether the use of kernels for graphs can outperform existing methods to solve specific computational biology problems. In this dissertation, we address the challenges involved in solving two specific problems in computational biology, in which the data are represented by graphs. First, we propose a novel approach for protein function prediction by modeling proteins as graphs. For each of the vertices in a protein graph, we propose the calculation of evolutionary profiles, which are derived from multiple sequence alignments from the amino acid residues within each vertex. We then use a shortest path graph kernel in conjunction with a support vector machine to predict protein function. We evaluate our approach under two instances of protein function prediction, namely, the discrimination of proteins as enzymes, and the recognition of DNA binding proteins. In both cases, our proposed approach achieves better prediction performance than existing methods. Second, we propose two novel semantic similarity measures for proteins based on the gene ontology. The first measure directly works on the gene ontology by combining the pairwise semantic similarity scores between sets of annotating terms for a pair of input proteins. The second measure estimates protein semantic similarity using a shortest path graph kernel to take advantage of the rich semantic knowledge contained within ontologies. Our comparison with other methods shows that our proposed semantic similarity measures are highly competitive and the latter one outperforms state-of-the-art methods. Furthermore, our two methods are intrinsic to the gene ontology, in the sense that they do not rely on external sources to calculate similarities.

APA, Harvard, Vancouver, ISO, and other styles

3

Andersson, Claes. "Fusing Domain Knowledge with Data : Applications in Bioinformatics." Doctoral thesis, Uppsala universitet, Centrum för bioinformatik, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-8477.

Full text

Abstract:

Massively parallel measurement techniques can be used for generating hypotheses about the molecular underpinnings of a biological systems. This thesis investigates how domain knowledge can be fused to data from different sources in order to generate more sophisticated hypotheses and improved analyses. We find our applications in the related fields of cell cycle regulation and cancer chemotherapy. In our cell cycle studies we design a detector of periodic expression and use it to generate hypotheses about transcriptional regulation during the course of the cell cycle in synchronized yeast cultures as well as investigate if domain knowledge about gene function can explain whether a gene is periodically expressed or not. We then generate hypotheses that suggest how periodic expression that depends on how the cells were perturbed into synchrony are regulated. The hypotheses suggest where and which transcription factors bind upstreams of genes that are regulated by the cell cycle. In our cancer chemotherapy investigations we first study how a method for identifiyng co-regulated genes associated with chemoresponse to drugs in cell lines is affected by domain knowledge about the genetic relationships between the cell lines. We then turn our attention to problems that arise in microarray based predictive medicine, were there typically are few samples available for learning the predictor and study two different means of alleviating the inherent trade-off betweeen allocation of design and test samples. First we investigate whether independent tests on the design data can be used for improving estimates of a predictors performance without inflicting a bias in the estimate. Then, motivated by recent developments in microarray based predictive medicine, we propose an algorithm that can use unlabeled data for selecting features and consequently improve predictor performance without wasting valuable labeled data.

APA, Harvard, Vancouver, ISO, and other styles

4

Eklund, Martin. "eScience Approaches to Model Selection and Assessment : Applications in Bioinformatics." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-109437.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Lui, Thomas Wing Hong. "Integrated database mining with applications to bioinformatics." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ58354.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Wu, Man-kit Edward, and 胡文傑. "Improved indexes for next generation bioinformatics applications." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B43224222.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Liu, Chi-man, and 廖志敏. "Efficient solutions for bioinformatics applications using GPUs." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2015. http://hdl.handle.net/10722/211138.

Full text

Abstract:

Over the past few years, DNA sequencing technology has been advancing at such a fast pace that computer hardware and software can hardly meet the ever-increasing demand for sequence analysis. A natural approach to boost analysis efficiency is parallelization, which divides the problem into smaller ones that are to be solved simultaneously on multiple execution units. Common architectures such as multi-core CPUs and clusters can increase the throughput to some extent, but the hardware setup and maintenance costs are prohibitive. Fortunately, the newly emerged general-purpose GPU programming paradigm gives us a low-cost alternative for parallelization. This thesis presents GPU-accelerated algorithms for several problems in bioinformatics, along with implementations to demonstrate their power in handling enormous totally different limitations and optimization techniques than the CPU. The first tool presented is SOAP3-dp, which is a DNA short-read aligner highly optimized for speed. Prior to SOAP3-DP, the fastest short-read aligner was its predecessor SOAP2, which was capable of aligning 1 million 100-bp reads in 5 minutes. SOAP3-dp beats this record by aligning the same volume in only 10 seconds. The key to unlocking this unprecedented speed is the revamped BWT engine underlying SOAP3-dp. All data structures and associated operations have been tailor made for the GPU to achieve optimized performance. Experiments show that SOAP3-dp not only excels in speed, but also outperforms other aligners in both alignment sensitivity and accuracy. The next tools are for constructing data structures, namely Burrows-Wheeler transform (BWT) and de Bruijn graphs (DBGs), to facilitate genome assembly of short reads, especially large metagenomics data. The BWT index for a set of short reads has recently found its use in string-graph assemblers [44], as it provides a succinct way of representing huge string graphs which would otherwise exceed the main memory limit. Constructing the BWT index for a million reads is by itself not an easy task, let alone optimize for the GPU. Another class of assemblers, the DBG-based assemblers, also faces the same problem. This thesis presents construction algorithms for both the BWT and DBGs in a succinct form. In our experiments, we constructed the succinct DBG for a metagenomics data set with over 200 gigabases in 3 hours, and the resulting DBG only consumed 31.2 GB of memory. We also constructed the BWT index for 10 million 100-bp reads in 40 minutes using 4 quad-core machines. Lastly, we introduce a SNP detection tool, iSNPcall, which detects SNPs from a set of reads. Given a set of user-supplied annotated SNPs, iSNPcall focuses only on alignments covering these SNPs, which greatly accelerates the detection of SNPs at the prescribed loci. The annotated SNPs also helps us distinguish sequencing errors from authentic SNPs alleles easily. This is in contrast to the traditional de novo method which aligns reads onto the reference genome and then filters inauthentic mismatches according to some probabilities. After comparing on several applications, iSNPcall was found to give a higher accuracy than the de novo method, especially for samples with low coverage. published_or_final_version Computer Science Doctoral Doctor of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

8

Wu, Man-kit Edward. "Improved indexes for next generation bioinformatics applications." Click to view the E-thesis via HKUTO, 2009. http://sunzi.lib.hku.hk/hkuto/record/B43224222.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Zhang, Yi. "NOVEL APPLICATIONS OF MACHINE LEARNING IN BIOINFORMATICS." UKnowledge, 2019. https://uknowledge.uky.edu/cs_etds/83.

Full text

Abstract:

Technological advances in next-generation sequencing and biomedical imaging have led to a rapid increase in biomedical data dimension and acquisition rate, which is challenging the conventional data analysis strategies. Modern machine learning techniques promise to leverage large data sets for finding hidden patterns within them, and for making accurate predictions. This dissertation aims to design novel machine learning-based models to transform biomedical big data into valuable biological insights. The research presented in this dissertation focuses on three bioinformatics domains: splice junction classification, gene regulatory network reconstruction, and lesion detection in mammograms. A critical step in defining gene structures and mRNA transcript variants is to accurately identify splice junctions. In the first work, we built the first deep learning-based splice junction classifier, DeepSplice. It outperforms the state-of-the-art classification tools in terms of both classification accuracy and computational efficiency. To uncover transcription factors governing metabolic reprogramming in non-small-cell lung cancer patients, we developed TFmeta, a machine learning approach to reconstruct relationships between transcription factors and their target genes in the second work. Our approach achieves the best performance on benchmark data sets. In the third work, we designed deep learning-based architectures to perform lesion detection in both 2D and 3D whole mammogram images.

APA, Harvard, Vancouver, ISO, and other styles

10

Kumar, Deept. "Redescription Mining: Algorithms and Applications in Bioinformatics." Diss., Virginia Tech, 2007. http://hdl.handle.net/10919/27518.

Full text

Abstract:

Scientific data mining purports to extract useful knowledge from massive datasets curated through computational science efforts, e.g., in bioinformatics, cosmology, geographic sciences, and computational chemistry. In the recent past, we have witnessed major transformations of these applied sciences into data-driven endeavors. In particular, scientists are now faced with an overload of vocabularies for describing domain entities. All of these vocabularies offer alternative and mostly complementary (sometimes, even contradictory) ways to organize information and each vocabulary provides a different perspective into the problem being studied. To further knowledge discovery, computational scientists need tools to help uniformly reason across vocabularies, integrate multiple forms of characterizing datasets, and situate knowledge gained from one study in terms of others. This dissertation defines a new pattern class called redescriptions that provides high level capabilities for reasoning across domain vocabularies. A redescription is a shift of vocabulary, or a different way of communicating the same information; redescription mining finds concerted sets of objects that can be defined in (at least) two ways using given descriptors. We present the CARTwheels algorithm for mining redescriptions by exploiting equivalences of partitions induced by distinct descriptor classes as well as applications of CARTwheels to several bioinformatics datasets. We then outline how we can build more complex data mining operations by cascading redescriptions to realize a story, leading to a new data mining capability called storytelling. Besides applications to characterizing gene sets, we showcase its uses in other datasets as well. Finally, we extend the core CARTwheels algorithm by introducing a theoretical framework, based on partitions, to systematically explore redescription space; generalizing from mining redescriptions (and stories) within a single domain to relating descriptors across different domains, to support complex relational data mining scenarios; and exploiting structure of the underlying descriptor space to yield more effective algorithms for specific classes of datasets. Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

11

Lindskog, Mats. "Computational analyses of biological sequences -applications to antibody-based proteomics and gene family characterization." Doctoral thesis, KTH, School of Biotechnology (BIO), 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-527.

Full text

Abstract:

Following the completion of the human genome sequence, post-genomic efforts have shifted the focus towards the analysis of the encoded proteome. Several different systematic proteomics approaches have emerged, for instance, antibody-based proteomics initiatives, where antibodies are used to functionally explore the human proteome. One such effort is HPR (the Swedish Human Proteome Resource), where affinity-purified polyclonal antibodies are generated and subsequently used for protein expression and localization studies in normal and diseased tissues. The antibodies are directed towards protein fragments, PrESTs (Protein Epitope Signature Tags), which are selected based on criteria favourable in subsequent laboratory procedures.This thesis describes the development of novel software (Bishop) to facilitate the selection of proper protein fragments, as well as ensuring a high-throughput processing of selected target proteins. The majority of proteins were successfully processed by this approach, however, the design strategy resulted in a number ofnfall-outs. These proteins comprised alternative splice variants, as well as proteins exhibiting high sequence similarities to other human proteins. Alternative strategies were developed for processing of these proteins. The strategy for handling of alternative splice variants included the development of additional software and was validated by comparing the immunohistochemical staining patterns obtained with antibodies generated towards the same target protein. Processing of high sequence similarity proteins was enabled by assembling human proteins into clusters according to their pairwise sequence identities. Each cluster was represented by a single PrEST located in the region of the highest sequence similarity among all cluster members, thereby representing the entire cluster. This strategy was validated by identification of all proteins within a cluster using antibodies directed to such cluster specific PrESTs using Western blot analysis. In addition, the PrEST design success rates for more than 4,000 genes were evaluated.Several genomes other than human have been finished, currently more than 300 genomes are fully sequenced. Following the release of the tree model organism black cottonwood (Populus trichocarpa), a bioinformatic analysis identified unknown cellulose synthases (CesAs), and revealed a total of 18 CesA family members. These genes are thought to have arisen from several rounds of genome duplication. This number is significantly higher than previous studies performed in other plant genomes, which comprise only ten CesA family members in those genomes. Moreover, identification of corresponding orthologous ESTs belonging to the closely related hybrid aspen (P. tremula x tremuloides) for two pairs of CesAs suggest that they are actively transcribed. This indicates that a number of paralogs have preserved their functionalities following extensive genome duplication events in the tree’s evolutionary history.

APA, Harvard, Vancouver, ISO, and other styles

12

Ramraj, Varun. "Exploiting whole-PDB analysis in novel bioinformatics applications." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:6c59c813-2a4c-440c-940b-d334c02dd075.

Full text

Abstract:

The Protein Data Bank (PDB) is the definitive electronic repository for experimentally-derived protein structures, composed mainly of those determined by X-ray crystallography. Approximately 200 new structures are added weekly to the PDB, and at the time of writing, it contains approximately 97,000 structures. This represents an expanding wealth of high-quality information but there seem to be few bioinformatics tools that consider and analyse these data as an ensemble. This thesis explores the development of three efficient, fast algorithms and software implementations to study protein structure using the entire PDB. The first project is a crystal-form matching tool that takes a unit cell and quickly (< 1 second) retrieves the most related matches from the PDB. The unit cell matches are combined with sequence alignments using a novel Family Clustering Algorithm to display the results in a user-friendly way. The software tool, Nearest-cell, has been incorporated into the X-ray data collection pipeline at the Diamond Light Source, and is also available as a public web service. The bulk of the thesis is devoted to the study and prediction of protein disorder. Initially, trying to update and extend an existing predictor, RONN, the limitations of the method were exposed and a novel predictor (called MoreRONN) was developed that incorporates a novel sequence-based clustering approach to disorder data inferred from the PDB and DisProt. MoreRONN is now clearly the best-in-class disorder predictor and will soon be offered as a public web service. The third project explores the development of a clustering algorithm for protein structural fragments that can work on the scale of the whole PDB. While protein structures have long been clustered into loose families, there has to date been no comprehensive analytical clustering of short (~6 residue) fragments. A novel fragment clustering tool was built that is now leading to a public database of fragment families and representative structural fragments that should prove extremely helpful for both basic understanding and experimentation. Together, these three projects exemplify how cutting-edge computational approaches applied to extensive protein structure libraries can provide user-friendly tools that address critical everyday issues for structural biologists.

APA, Harvard, Vancouver, ISO, and other styles

13

Novotny, Marian. "Applications of Structural Bioinformatics for the Structural Genomics Era." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis Acta Universitatis Upsaliensis, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-7593.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Kwon, Deukwoo. "Wavelet methods and statistical applications: network security and bioinformatics." Texas A&M University, 2005. http://hdl.handle.net/1969.1/2654.

Full text

Abstract:

Wavelet methods possess versatile properties for statistical applications. We would like to explore the advantages of using wavelets in the analyses in two different research areas. First of all, we develop an integrated tool for online detection of network anomalies. We consider statistical change point detection algorithms, for both local changes in the variance and for jumps detection, and propose modified versions of these algorithms based on moving window techniques. We investigate performances on simulated data and on network traffic data with several superimposed attacks. All detection methods are based on wavelet packets transformations. We also propose a Bayesian model for the analysis of high-throughput data where the outcome of interest has a natural ordering. The method provides a unified approach for identifying relevant markers and predicting class memberships. This is accomplished by building a stochastic search variable selection method into an ordinal model. We apply the methodology to the analysis of proteomic studies in prostate cancer. We explore wavelet-based techniques to remove noise from the protein mass spectra. The goal is to identify protein markers associated with prostate-specific antigen (PSA) level, an ordinal diagnostic measure currently used to stratify patients into different risk groups.

APA, Harvard, Vancouver, ISO, and other styles

15

Hocking, Toby Dylan. "Learning algorithms and statistical software, with applications to bioinformatics." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2012. http://tel.archives-ouvertes.fr/tel-00906029.

Full text

Abstract:

Statistical machine learning is a branch of mathematics concerned with developing algorithms for data analysis. This thesis presents new mathematical models and statistical software, and is organized into two parts. In the first part, I present several new algorithms for clustering and segmentation. Clustering and segmentation are a class of techniques that attempt to find structures in data. I discuss the following contributions, with a focus on applications to cancer data from bioinformatics. In the second part, I focus on statistical software contributions which are practical for use in everyday data analysis.

APA, Harvard, Vancouver, ISO, and other styles

16

Trombetti, Gabriele Antonio <1977&gt. "Enabling computationally intensive bioinformatics applications on the Grid platform." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2008. http://amsdottorato.unibo.it/922/.

Full text

Abstract:

Bioinformatics is a recent and emerging discipline which aims at studying biological problems through computational approaches. Most branches of bioinformatics such as Genomics, Proteomics and Molecular Dynamics are particularly computationally intensive, requiring huge amount of computational resources for running algorithms of everincreasing complexity over data of everincreasing size. In the search for computational power, the EGEE Grid platform, world's largest community of interconnected clusters load balanced as a whole, seems particularly promising and is considered the new hope for satisfying the everincreasing computational requirements of bioinformatics, as well as physics and other computational sciences. The EGEE platform, however, is rather new and not yet free of problems. In addition, specific requirements of bioinformatics need to be addressed in order to use this new platform effectively for bioinformatics tasks. In my three years' Ph.D. work I addressed numerous aspects of this Grid platform, with particular attention to those needed by the bioinformatics domain. I hence created three major frameworks, Vnas, GridDBManager and SETest, plus an additional smaller standalone solution, to enhance the support for bioinformatics applications in the Grid environment and to reduce the effort needed to create new applications, additionally addressing numerous existing Grid issues and performing a series of optimizations. The Vnas framework is an advanced system for the submission and monitoring of Grid jobs that provides an abstraction with reliability over the Grid platform. In addition, Vnas greatly simplifies the development of new Grid applications by providing a callback system to simplify the creation of arbitrarily complex multistage computational pipelines and provides an abstracted virtual sandbox which bypasses Grid limitations. Vnas also reduces the usage of Grid bandwidth and storage resources by transparently detecting equality of virtual sandbox files based on content, across different submissions, even when performed by different users. BGBlast, evolution of the earlier project GridBlast, now provides a Grid Database Manager (GridDBManager) component for managing and automatically updating biological flatfile databases in the Grid environment. GridDBManager sports very novel features such as an adaptive replication algorithm that constantly optimizes the number of replicas of the managed databases in the Grid environment, balancing between response times (performances) and storage costs according to a programmed cost formula. GridDBManager also provides a very optimized automated management for older versions of the databases based on reverse delta files, which reduces the storage costs required to keep such older versions available in the Grid environment by two orders of magnitude. The SETest framework provides a way to the user to test and regressiontest Python applications completely scattered with side effects (this is a common case with Grid computational pipelines), which could not easily be tested using the more standard methods of unit testing or test cases. The technique is based on a new concept of datasets containing invocations and results of filtered calls. The framework hence significantly accelerates the development of new applications and computational pipelines for the Grid environment, and the efforts required for maintenance. An analysis of the impact of these solutions will be provided in this thesis. This Ph.D. work originated various publications in journals and conference proceedings as reported in the Appendix. Also, I orally presented my work at numerous international conferences related to Grid and bioinformatics.

APA, Harvard, Vancouver, ISO, and other styles

17

Veanes, Margus. "Identification of novel loss of heterozygosity collateral lethality genes for potential applications in cancer." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-433768.

Full text

Abstract:

Over the course of this project, I demonstrate the utility of a 4-phase analysis pipeline in the context of cancer therapy and the associated search for antineoplastic drug candidates. I showcase a repeatable means for generating lists of potential targets which may be used in conjunction with methods like small molecule screening as part of a search for broadly effective antineoplastic agents. By using publicly available variant call format (VCF) data sourced from the 1000 genomes project, global human population-wide data for non-sex chromosomes was filtered and transformed in a 4-phase process to obtain high population frequency, heterozygotic, nonsynonymous single nucleotide variants (nsSNVs) residing in functional domains of proteins. Through manual filtration combined with software-assisted annotation, I obtained a ranked list of 50 top scoring annotated variants across the human autosome, all residing in known protein domains. Additionally, a single top variant was selected for proof-of-concept structure prediction and visualization. When the methodology outlined herein is coupled to additional loss-of-heterozygosity (LOH) prevalence data across cancer genomes, it may be used to identify candidate variants which collectively represent potential loss-of-heterozygosity based collateral lethalities (CL) in the underlying cancer. Furthermore, under the assumption that subsequent methods like small molecule screening succeed in finding molecule(s) targeting a structural aspect of one of these variants, any subsequently developed therapeutic approaches may possess broader therapeutic utility dependent upon the strictness of the initial heterozygotic filtering threshold applied at the onset of the project pipeline. When combined with additional cancer data, the recreation of such gene lists at other degrees of heterozygotic thresholding can allow for the creation of lists of autosomal loss-of-heterozygosity gene candidates, representing potential collateral lethality targets with varied degrees of utility dependent upon the strictness of the initial filtration threshold.

APA, Harvard, Vancouver, ISO, and other styles

18

Kasap, Server. "High performance reconfigurable architectures for bioinformatics and computational biology applications." Thesis, University of Edinburgh, 2010. http://hdl.handle.net/1842/24757.

Full text

Abstract:

The field of Bioinformatics and Computational Biology (BCB), a relatively new discipline which spans the boundaries of Biology, Computer Science and Engineering, aims to develop systems that help organise, store, retrieve and analyse genomic and other biological information in a convenient and speedy way. This new discipline emerged mainly as a result of the Human Genome project which succeeded in transcribing the complete DNA sequence of the human genome, hence making it possible to address many problems which were impossible to even contemplate before, with a plethora of applications including disease diagnosis, drug engineering, bio-material engineering and genetic engineering of plants and animals; all with a real impact on the quality of the life of ordinary individuals. Due to the sheer immensity of the data sets involved in BCB algorithms (often measured in tens/hundreds of Gigabytes) as well as their computation demands (often measured in Tera-Ops), high performance supercomputers and computer clusters have been used as implementation platforms for high performance BCB computing. However, the high cost as well as the lack of suitable programming interfaces for these platforms still impedes a wider undertaking of this technology in the BCB community. Moreover, with increased heat dissipation, supercomputers are now often augmented with special-purpose hardware (or ASICs) in order to speed up their operations while reducing their power dissipation. However, since ASICs are fully customised to implement particular tasks/algorithms, they suffer from increased development times, higher Non-Recurring-Engineering (NRE) costs, and inflexibility as they cannot be reused to implement tasks/algorithms other than those they have been designed to perform. On the other hand, Field Programmable Gate Arrays (FPGAs) have recently been proposed as a viable alternative implementation platform for BCB applications due to their flexible computing and memory architecture which gives them ASIC-like performance with the added programmability feature. In order to counter the aforementioned limitations of both supercomputers and ASICs, this research proposes the use of state-of-the-art reprogrammable system-on-chip technology, in the form of platform FPGAs, as a relatively low cost, high performance and reprogrammable implementation platform for BCB applications. This research project aims to develop a sophisticated library of FPGA architectures for bio-sequence analysis, phylogenetic analysis, and molecular dynamics simulation.

APA, Harvard, Vancouver, ISO, and other styles

19

Lee, Semin. "Molecular characterization of protein-nucleic acid interfaces : applications in bioinformatics." Thesis, University of Cambridge, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609284.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Sigurgeirsson, Benjamín. "Analysis of RNA and DNA sequencing data : Improved bioinformatics applications." Doctoral thesis, KTH, Genteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-184158.

Full text

Abstract:

Massively parallel sequencing has rapidly revolutionized DNA and RNA research. Sample preparations are steadfastly advancing, sequencing costs have plummeted and throughput is ever growing. This progress has resulted in exponential growth in data generation with a corresponding demand for bioinformatic solutions. This thesis addresses methodological aspects of this sequencing revolution and applies it to selected biological topics. Papers I and II are technical in nature and concern sample preparation and data anal- ysis of RNA sequencing data. Paper I is focused on RNA degradation and paper II on generating strand specific RNA-seq libraries. Paper III and IV deal with current biological issues. In paper III, whole exomes of cancer patients undergoing chemotherapy are sequenced and their genetic variants associ- ated to their toxicity induced adverse drug reactions. In paper IV a comprehensive view of the gene expression of the endometrium is assessed from two time points of the menstrual cycle. Together these papers show relevant aspects of contemporary sequencing technologies and how it can be applied to diverse biological topics. QC 20160329

APA, Harvard, Vancouver, ISO, and other styles

21

Cleary, Alan Michael. "Computational Pan-Genomics| Algorithms and Applications." Thesis, Montana State University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10792396.

Full text

Abstract:

As the cost of sequencing DNA continues to drop, the number of sequenced genomes rapidly grows. In the recent past, the cost dropped so low that it is no longer prohibitively expensive to sequence multiple genomes for the same species. This has led to a shift from the single reference genome per species paradigm to the more comprehensive pan-genomics approach, where populations of genomes from one or more species are analyzed together. The total genomic content of a population is vast, requiring algorithms for analysis that are more sophisticated and scalable than existing methods. In this dissertation, we explore new algorithms and their applications to pan-genome analysis, both at the nucleotide and genic resolutions. Specifically, we present the Approximate Frequent Subpaths and Frequented Regions problems as a means of mining syntenic blocks from pan-genomic de Bruijn graphs and provide efficient algorithms for mining these structures. We then explore a variety of analyses that mining synteny blocks from pan-genomic data enables, including meaningful visualization, genome classification, and multidimensional-scaling. We also present a novel interactive data mining tool for pan-genome analysis—the Genome Context Viewer—which allows users to explore pan-genomic data distributed across a heterogeneous set of data providers by using gene family annotations as a unit of search and comparison. Using this approach, the tool is able to perform traditionally cumbersome analyses on-demand in a federated manner.

APA, Harvard, Vancouver, ISO, and other styles

22

Radwan, Ahmed M. "Information Integration in a Grid Environment Applications in the Bioinformatics Domain." Scholarly Repository, 2010. http://scholarlyrepository.miami.edu/oa_dissertations/509.

Full text

Abstract:

Grid computing emerged as a framework for supporting complex operations over large datasets; it enables the harnessing of large numbers of processors working in parallel to solve computing problems that typically spread across various domains. We focus on the problems of data management in a grid/cloud environment. The broader context of designing a services oriented architecture (SOA) for information integration is studied, identifying the main components for realizing this architecture. The BioFederator is a web services-based data federation architecture for bioinformatics applications. Based on collaborations with bioinformatics researchers, several domain-specific data federation challenges and needs are identified. The BioFederator addresses such challenges and provides an architecture that incorporates a series of utility services; these address issues like automatic workflow composition, domain semantics, and the distributed nature of the data. The design also incorporates a series of data-oriented services that facilitate the actual integration of data. Schema integration is a core problem in the BioFederator context. Previous methods for schema integration rely on the exploration, implicit or explicit, of the multiple design choices that are possible for the integrated schema. Such exploration relies heavily on user interaction; thus, it is time consuming and labor intensive. Furthermore, previous methods have ignored the additional information that typically results from the schema matching process, that is, the weights and in some cases the directions that are associated with the correspondences. We propose a more automatic approach to schema integration that is based on the use of directed and weighted correspondences between the concepts that appear in the source schemas. A key component of our approach is a ranking mechanism for the automatic generation of the best candidate schemas. The algorithm gives more weight to schemas that combine the concepts with higher similarity or coverage. Thus, the algorithm makes certain decisions that otherwise would likely be taken by a human expert. We show that the algorithm runs in polynomial time and moreover has good performance in practice. The proposed methods and algorithms are compared to the state of the art approaches. The BioFederator design, services, and usage scenarios are discussed. We demonstrate how our architecture can be leveraged on real world bioinformatics applications. We preformed a whole human genome annotation for nucleosome exclusion regions. The resulting annotations were studied and correlated with tissue specificity, gene density and other important gene regulation features. We also study data processing models on grid environments. MapReduce is one popular parallel programming model that is proven to scale. However, using the low-level MapReduce for general data processing tasks poses the problem of developing, maintaining and reusing custom low-level user code. Several frameworks have emerged to address this problem; these frameworks share a top-down approach, where a high-level language is used to describe the problem semantics, and the framework takes care of translating this problem description into the MapReduce constructs. We highlight several issues in the existing approaches and alternatively propose a novel refined MapReduce model that addresses the maintainability and reusability issues, without sacrificing the low-level controllability offered by directly writing MapReduce code. We present MapReduce-LEGOS (MR-LEGOS), an explicit model for composing MapReduce constructs from simpler components, namely, "Maplets", "Reducelets" and optionally "Combinelets". Maplets and Reducelets are standard MapReduce constructs that can be composed to define aggregated constructs describing the problem semantics. This composition can be viewed as defining a micro-workflow inside the MapReduce job. Using the proposed model, complex problem semantics can be defined in the encompassing micro-workflow provided by MR-LEGOS while keeping the building blocks simple. We discuss the design details, its main features and usage scenarios. Through experimental evaluation, we show that the proposed design is highly scalable and has good performance in practice.

APA, Harvard, Vancouver, ISO, and other styles

23

Ma, Chun-Wai. "Aboav-Weaire law in complex network and its applications in bioinformatics /." View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?PHYS%202005%20MA.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Kässens, Jan Christian [Verfasser]. "A Hybrid-parallel Architecture for Applications in Bioinformatics / Jan Christian Kässens." Kiel : Universitätsbibliothek Kiel, 2017. http://d-nb.info/1152321552/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Lardenois, Aurélie Poch Olivier. "Development and applications of an integrated bioinformatics approach for promoter analysis." Strasbourg : Université Louis Pasteur, 2007. http://eprints-scd-ulp.u-strasbg.fr:8080/649/01/Lardenois2006.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Lardenois, Aurélie. "Development and applications of an integrated bioinformatics approach for promoter analysis." Université Louis Pasteur (Strasbourg) (1971-2008), 2006. https://publication-theses.unistra.fr/public/theses_doctorat/2006/LARDENOIS_Aurelie_2006.pdf.

Full text

Abstract:

L’accumulation exponentielle de données expérimentales générées par les technologies à haut débit a considérablement favorisé l’analyse bioinformatique des séquences promotrices. A ce jour, des approches bioinformatiques ont été utilisées par les biologistes afin de faciliter l’identification des motifs de régulation dans les régions promotrices avant d’entreprendre des caractérisations biochimiques onéreuses en temps. Cependant, l’émergence d’une quantité colossale de données expérimentales, de programmes de prédiction et de méthodes complémentaires souligne l’absolue nécessité de développer des approches intégratives afin d’améliorer l’analyse des promoteurs assistée par ordinateur. Dans ce contexte, nous avons développé PromAn, un outil polyvalent et intégratif qui offre un panel de modules couvrant une grande partie des approches utilisées dans le domaine de l’analyse des promoteurs. Le programme ne requiert aucune connaissance préalable sur la séquence génomique à étudier et inclut une évaluation de la conservation au cours de l’évolution des régions promotrices, une validation des sites d’initiation de la transcription ainsi qu’une prédiction des sites de fixation de facteurs de transcription potentiellement actifs. PromAn implémente deux versions semi-automatiques (en local et sur un serveur web) ainsi qu’une version automatisée dédiée aux analyses à haut débit et utilisée en étroite conjonction avec des groupes de gènes potentiellement co-régulés. Dans le cadre de nombreuses collaborations avec divers groupes de recherche, l’efficacité de PromAn a pu être démontrée en étroite synergie avec des validations expérimentales afin de localiser et d’identifier les sites de fixation de facteurs de transcription biologiquement actifs. La version automatisée de PromAn dédiée à l’analyse à haut débit facilitera la compréhension de réseaux de régulations complexes et surtout leurs impacts sur la santé et les maladies humaines The exponential accumulation of high-throughput experimental data and complete genome sequences has greatly encouraged promoter sequence analysis through bioinformatics. To date, bioinformatics approaches have been used by biologists to facilitate the identification of regulatory motifs in promoter regions before engaging in time-consuming biochemical characterizations. However, the emergence of a huge amount of experimental data, prediction programs and complementary methods means that integrative approaches have become essential to improve in silico promoter analysis. In this context, we have developed PromAn, a versatile and integrative tool which provides a wide range of state-of-the-art promoter analyses. The program requires minimal prior knowledge of the input genomic sequence and includes an evaluation of the evolutionary conservation of promoter regions, a validation of the transcriptional start sites as well as a prediction of potentially active transcription factor binding sites. PromAn has been implemented in two expert-guided versions (local and web server) as well as a high-throughput automatic version that is used in combination with gene groups assumed to be co-regulated. In the context of a number of collaborations with different research groups, the efficiency of PromAn has been demonstrated in strong synergy with experimental validations through the localization and identification of bona-fide transcription factor binding sites. Hopefully, the PromAn high-throughput version will facilitate the understanding of complete regulatory networks and their impact in human health and diseases

APA, Harvard, Vancouver, ISO, and other styles

27

Liu, Pengyu. "Extracting Rules from Trained Machine Learning Models with Applications in Bioinformatics." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/264678.

Full text

Abstract:

京都大学 新制・課程博士 博士(情報学) 甲第23397号 情博第766号 新制||情||131(附属図書館) 京都大学大学院情報学研究科知能情報学専攻 (主査)教授阿久津達也, 教授山本章博, 教授鹿島久嗣 学位規則第4条第1項該当 Doctor of Informatics Kyoto University DFAM

APA, Harvard, Vancouver, ISO, and other styles

28

Jung, Min Kyung. "Statistical methods for biological applications." [Bloomington, Ind.] : Indiana University, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3278454.

Full text

Abstract:

Thesis (Ph.D.)--Indiana University, Dept. of Mathematics, 2007. Source: Dissertation Abstracts International, Volume: 68-10, Section: B, page: 6740. Adviser: Elizabeth A. Housworth. Title from dissertation home page (viewed May 20, 2008).

APA, Harvard, Vancouver, ISO, and other styles

29

Martínez, Barrio Álvaro. "Novel Bioinformatics Applications for Protein Allergology, Genome-Wide Association and Retrovirology Studies." Doctoral thesis, Uppsala universitet, Centrum för bioinformatik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-111932.

Full text

Abstract:

Recently, the pace of growth in the amount of data sources within Life Sciences has increased exponentially until pose a difficult problem to efficiently manage their integration. The data avalanche we are experiencing may be significant for a turning point in science, with a change of orientation from proprietary to publicly available data and a concomitant acceptance of studies based on the latter. To investigate these issues, a Network of Excellence (EMBRACE) was launched with the aim to integrate the major databases and the most popular bioinformatics software tools. The focus of this thesis is therefore to approach the problem of seamlessly integrating varied data sources and/or distributed research tools. In paper I, we have developed a web service to facilitate allergenicity risk assessment, based on allergen descriptors, in order to characterize proteins with the potential for sensitization and cross-reactivity. In paper II, a web service was developed which uses a lightweight protocol to integrate human endogenous retrovirus (ERV) data within a public genome browser. This new data catalogue and many other publicly available sources were integrated and tested in a bioinformatics-rich client application. In paper III, GeneFinder, a distributed tool for genome-wide association studies, was developed and tested. Useful information based on a particular genomic region can be easily retrieved and assessed. Finally, in paper IV, we developed a prototype pipeline to mine the dog genome for endogenous retroviruses and displaying the transcriptional landscape of these retroviral integrations. Moreover, we further characterized a group that until this point was believed to be primate-specific. Our results also revealed that the dog has been very effective in protecting itself from such integrations. This work integrates different applications in the fields of protein allergology, biotechnology, genome association studies and endogenous retroviruses. EMBRACE NoE EU FP6

APA, Harvard, Vancouver, ISO, and other styles

30

Liu, Feng. "Platform Independent Real-Time X3D Shaders and their Applications in Bioinformatics Visualization." Digital Archive @ GSU, 2007. http://digitalarchive.gsu.edu/cs_diss/24.

Full text

Abstract:

Since the introduction of programmable Graphics Processing Units (GPUs) and procedural shaders, hardware vendors have each developed their own individual real-time shading language standard. None of these shading languages is fully platform independent. Although this real-time programmable shader technology could be developed into 3D application on a single system, this platform dependent limitation keeps the shader technology away from 3D Internet applications. The primary purpose of this dissertation is to design a framework for translating different shader formats to platform independent shaders and embed them into the eXtensible 3D (X3D) scene for 3D web applications. This framework includes a back-end core shader converter, which translates shaders among different shading languages with a middle XML layer. Also included is a shader library containing a basic set of shaders that developers can load and add shaders to. This framework will then be applied to some applications in Biomolecular Visualization.

APA, Harvard, Vancouver, ISO, and other styles

31

Martínez, Barrio Álvaro. "Novel Bioinformatics Applications for Protein Allergology, Genome-Wide Association and Retrovirology Studies." Uppsala : Acta Universitatis Upsaliensis, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-111932.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Wu, Ho-chun, and 胡皓竣. "New algorithms in factor analysis : applications, model selection and findings in bioinformatics." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/205839.

Full text

Abstract:

Advancements in microelectronic devices and computational and storage technologies enable the collection of high volume, high speed and high dimension data in many applications. Due to the high dimensionality of these measurements, exact dependence of the observations on the various parameters or variables may not be exactly known. Factor analysis (FA) is a useful multivariate technique to exploit the redundancies among observations and reveal their dependence to some latent variables called factors. Some major issues of the conventional FA are high arithmetic complexity for real-time online implementation, assumption of static system parameters, the demand of interval forecasting, robustness against outlying observations and model selection in problems with high dimension but low number of samples (HDLS). This thesis addresses these issues and proposes new extensions to the existing FA algorithms. First, in order to reduce the arithmetic complexity, we propose new recursive FA algorithms (RFA) that recursively compute only the dominant Principal Components (PCs) and eigenvalues in the major subspace tracked by efficient subspace tracking algorithms. Specifically, two new approaches are proposed for updating the PCs and eigenvalues in the classical fault detection problem with different tradeoff between accuracy and arithmetic complexity, namely rank-1 modification and deflation. They significantly reduce the online arithmetic complexity and allow the adaption to time-varying system parameters. Second, we extend the RFA algorithm to forecasting of time series and propose a new recursive dynamic factor analysis (RDFA) algorithm for electricity price forecasting. While the PCs are recursively tracked by the subspace algorithm, a random walk or a state dynamical model can be incorporated to describe the latest state of the time-varying auto-regressive (AR) model built from the factors. This formulation can be solved by the celebrated Kalman filter (KF), which in turn allows future values to be forecasted with estimated confidence intervals. Third, we propose new robust covariance and outlier detection criteria to improve the robustness of the proposed RFA and RDFA algorithms against outlying observations based on the concept of robust M-estimation. Experimental results show that the proposed methods can effectively suppress the adverse contributions of the outliers on the factors and PCs. Finally, in order to improve the consistency of model selection and facilitate the estimation of p-values in HDLS problems, we propose a new automatic model selection method based on ridge partial least squares and recursive feature elimination. Furthermore, a novel performance criterion is proposed for ranking variables according to their consistency of being chosen in different perturbation of the samples. Using this criterion, the associated p-values can be estimated under the HDLS setting. Experimental results using real gene cancer microarray datasets show that improved prognosis can be obtained by the proposed approach as compared with conventional techniques. Furthermore, to quantify their statistical significance, the p-value of the identified genes are estimated and functional analysis of the significant genes found in the diffused large B-cell lymphoma (DLBCL) gene microarray data is performed to validate the findings. While we focus in a few engineering problems, these algorithms are also applicable to other related applications. published_or_final_version Electrical and Electronic Engineering Doctoral Doctor of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

33

Jiang, Tianwei. "Sequence alignment : algorithm development and applications /." View abstract or full-text, 2009. http://library.ust.hk/cgi/db/thesis.pl?ECED%202009%20JIANG.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Leung, Ho-yin. "Stochastic models for optimal control problems with applications." Click to view the E-thesis via HKUTO, 2009. http://sunzi.lib.hku.hk/hkuto/record/B42841781.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

SINHA, AMIT U. "Discovery and Analysis of Genomic Patterns: Applications to Transcription Factor Binding and Genome Rearrangement." University of Cincinnati / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1204227723.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Kumar, Chanchal. "Bioinformatics methods and applications for functional analysis of mass spectrometry based proteomics data." Diss., lmu, 2008. http://nbn-resolving.de/urn:nbn:de:bvb:19-124512.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Minas, Christopher. "Distance-based methods for detecting associations in structured data with applications in bioinformatics." Thesis, Imperial College London, 2012. http://hdl.handle.net/10044/1/25358.

Full text

Abstract:

In bioinformatics applications samples of biological variables of interest can take a variety of structures. For instance, in this thesis we consider vector-valued observations of multiple gene expression and genetic markers, curve-valued gene expression time courses, and graph-valued functional connectivity networks within the brain. This thesis considers three problems routinely encountered when dealing with such variables: detecting differences between populations, detecting predictive relationships between variables, and detecting association between variables. Distance-based approaches to these problems are considered, offering great flexibility over alternative approaches, such as traditional multivariate approaches which may be inappropriate. The notion of distance has been widely adopted in recent years to quantify the dissimilarity between samples, and suitable distance measures can be applied depending on the nature of the data and on the specific objectives of the study. For instance, for gene expression time courses modeled as time-dependent curves, distance measures can be specified to capture biologically meaningful aspects of these curves which may differ. On obtaining a distance matrix containing all pairwise distances between the samples of a given variable, many distance-based testing procedures can then be applied. The main inhibitor of their effective use in bioinformatics is that p-values are typically estimated by using Monte Carlo permutations. Thousands or even millions of tests need to be performed simultaneously, and time/computational constraints lead to a low number of permutations being enumerated for each test. The contributions of this thesis include the proposal of two new distance-based statistics, the DBF statistic for the problem of detecting differences between populations, and the GRV coefficient for the problem of detecting association between variables. In each case approximate null distributions are derived, allowing estimation of p-values with reduced computational cost, and through simulation these are shown to work well for a range of distances and data types. The tests are also demonstrated to be competitive with existing approaches. For the problem of detecting predictive relationships between variables, the approximate null distribution is derived for the routinely used distance-based pseudo F test, and through simulation this is shown to work well for a range of distances and data types. All tests are applied to real datasets, including a longitudinal human immune cell M. tuberculosis dataset, an Alzheimer's disease dataset, and an ovarian cancer dataset.

APA, Harvard, Vancouver, ISO, and other styles

38

Pridgeon, Carey. "Diverse applications of evolutionary computation in bioinformatics : hypermotifs and gene regulatory network inference." Thesis, University of Exeter, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.479210.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Indukuri, Kiran Kumar. "Fusion: a Visualization Framework for Interactive Ilp Rule Mining With Applications to Bioinformatics." Thesis, Virginia Tech, 2004. http://hdl.handle.net/10919/36326.

Full text

Abstract:

Microarrays provide biologists an opportunity to find the expression profiles of thousands of genes simultaneously. Biologists try to understand the mechanisms underlying the life processes by finding out relationships between gene-expression and their functional categories. Fusion is a software system that aids the biologists in performing microarray data analysis by providing them with both visual data exploration and data mining capabilities. Its multiple view visual framework allows the user to choose different views for different types of data. Fusion uses Proteus, an Inductive Logic Programming (ILP) rule finding algorithm to mine relationships in the microarray data. Fusion allows the user to explore the data interactively, choose biases, run the data mining algorithms and visualize the discovered rules. Fusion has the capability to smoothly switch across interactive data exploration and batch data mining modes. This optimizes the knowledge discovery process by facilitating a synergy between the interactivity and usability of visualization process with the pattern-finding abilities of ILP rule mining algorithms. Fusion was successful in helping biologists better understand the mechanisms underlying the acclimatization of certain varieties of Arabidopsis to ozone exposure. Master of Science

APA, Harvard, Vancouver, ISO, and other styles

40

Nilsson, Roland. "Statistical Feature Selection : With Applications in Life Science." Doctoral thesis, Linköping : Department of Physcis, Chemistry and Biology, Linköping University, 2007. http://www.bibl.liu.se/liupubl/disp/disp2007/tek1090s.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Costello, James Christopher. "Data integration and applications of functional gene networks in Drosophila melanogaster." [Bloomington, Ind.] : Indiana University, 2009. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3380070.

Full text

Abstract:

Thesis (Ph.D.)--Indiana University, Dept. of Informatics, 2009. Title from PDF t.p. (viewed on Jul 19, 2010). Source: Dissertation Abstracts International, Volume: 70-12, Section: B, page: 7296. Advisers: Mehmet M. Dalkilic; Justen R. Andrews.

APA, Harvard, Vancouver, ISO, and other styles

42

Leung, Ho-yin, and 梁浩賢. "Stochastic models for optimal control problems with applications." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B42841781.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Coutu-Nadeau, Charles. "Evaluating the usability of diabetes management iPad applications." Thesis, Weill Medical College of Cornell University, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=1526000.

Full text

Abstract:

Background Diabetes is a major cause of morbidity and mortality in the United States. In 2012, 29.1 million people were estimated to have the condition, with type 2 diabetes accounting for 95% of all cases [1]. It is currently one of the most costly conditions in the country [2] and forecasts as a heavier burden for the U.S. with the prevalence expected to significantly increase [3]. For those who live with the disease, it is possible to manage diabetes in order to prevent or delay the onset of complications [4]. However the self-management regimen is complex and impacts nearly every important aspect of one's life [5]. The ubiquitous nature of mobile technologies and powerful capabilities of smartphones and tablets has led to a significant increased interest in the development and use of mobile health. Diabetes management is an application area where mobile devices could enhance the quality of life for people living with chronic illnesses [6]–[8], and usability is key to the adoption of such technologies [9], [10]. Past work has evaluated the usability of diabetes management apps for Android, iOS and Blackberry smartphones [11]-[14] despite the fact that no established method to evaluate the usability of mobile apps has emerged [15]. To our knowledge, this study is the first to evaluate the usability of diabetes management apps on iPad. Methods This study introduces a novel usability survey that is designed for mHealth and specific to the iOS operating system. The survey is built on previous usability findings [11]–[14], Nielsen heuristics [16] and the Apple iOS Human Interface Guidelines [17]. The new instrument was evaluated with three evaluators assessing ten iPad apps, selected because they were the most popular diabetes management apps on the Apple AppStore. A focus group was subsequently held to gather more insight on the usability of the apps and the survey itself. Statistical analysis using R and grounded theory were used to analyze the quantitative and qualitative results, respectively. Results The survey identified OneTouch Reveal by LifeScan Inc. and TactioHealth by Tactic, Health Group as the most usable apps. GlucoMo by Artificial Life, Inc. and Diabetes in Check by Everyday Health, Inc. rated as the least usable apps. Setting up medication and editing blood glucose were the most problematic tasks. Some apps did not support all functions that were under review. Six main themes emerged from the focus group: the presentation of health information, aesthetic and minimalist design, flexibility and efficiency of data input, task feedback, intuitive design and app stability. These themes suggest important constructs of usability for mHealth apps. Discussion and Conclusion Mobile health developers and researchers should focus on the tasks, heuristics and underlying issues that were identified as most problematic throughout the study. Additionally, research should further inquire on the potentially critical relation between the information available on app markets and the usability of apps. Several signs point to the potential of the usability survey that was developed but further adjustments and additional test iterations are warranted to validate its use as a reliable usability evaluation method.

APA, Harvard, Vancouver, ISO, and other styles

44

Niu, Yanwei. "Parallelization and performance optimization of bioinformatics and biomedical applications targeted to advanced computer architectures." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file 1.05 Mb., 143 p, 2005. http://wwwlib.umi.com/dissertations/fullcit/3181852.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Gonzalez, Galarza Faviel. "The development of a database and bioinformatics applications for the investigation of immune genes." Thesis, University of Liverpool, 2011. http://livrepository.liverpool.ac.uk/4973/.

Full text

Abstract:

The extensive allelic variability observed in several genes related to the immune response and its significance in transplantation, disease association studies and diversity in human populations has led the scientific community to analyse these variants among individuals. This thesis is focussed on the development of a database and software applications for the investigation of several immune genes and the frequencies of their corresponding alleles in worldwide human populations. The approach presented in this thesis includes the design of a relational database, a web interface, the design of models for data exchange and the development of online searching mechanisms for the analysis of allele, haplotype and genotype frequencies. At present, the database contains data from more than 1000 populations covering more than four million unrelated individuals. The repertory of datasets available in the database encompasses different polymorphic regions such as Human Leukocyte Antigens (HLA), Killer-cell Immunoglobulin-like Receptors (KIR), Major histocompatibility complex Class I chain-related (MIC) genes and a number of cytokine gene polymorphisms. The work presented in this document has been shown to be a valuable resource for the medical and scientific societies. Acting as a primary source for the consultation of immune gene frequencies in worldwide populations, the database has been widely used in a variety of contexts by scientists, including histocompatibility, immunology, epidemiology, pharmacogenetics and population genetics among many others. In the last year (August 2010 to August 2011), the website was accessed by 15,784 distinct users from 2,758 cities in 136 countries and has been cited in 168 peer-reviewed publications demonstrating its wide international use.

APA, Harvard, Vancouver, ISO, and other styles

46

Morrison, Kevin S. "Topological Data Analysis and Applications to Influenza." Miami University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=miami1595864809447239.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Sköld, Karl. "Neuropeptidomics – Methods and Applications." Doctoral thesis, Uppsala University, Department of Pharmaceutical Biosciences, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-7276.

Full text

Abstract:

The sequencing of genomes has caused a growing demand for functional analysis of gene products. This research field named proteomics is derived from the term proteome, which by analogy to genome is defined as all proteins expressed by a cell or a tissue. Proteomics is however methodologically restricted to the analysis of proteins with higher molecular weights. The development of a technology which includes peptides with low molecular weight and small proteins is needed, since peptides play a central role in many biological processes. To study endogenous peptides and hormones, the peptidome, an improved method comprising rapid deactivation in combination with nano-flow liquid chromatography (LC) and mass spectrometry (MS) was developed. The method has been used to investigate endogenous peptides in brains of mouse and rat. Several novel peptides have been discovered together with known neuropeptides. To elucidate the post mortem time influence on peptides and proteins, a time course study was performed using peptidomics and proteomics technologies. Already after three minutes a substantial amount of protein fragments emerged in the peptidomics study and some endogenous peptides were drastically reduced with increasing post mortem time. Of about 1500 proteins investigated, 53 were found to be significantly changed at 10 minutes post mortem as compared to control. Moreover, using western blot the level of MAPK phosphorylation was shown to decrease by 95% in the 10 minutes post mortem sample. A database, SwePep (a repository of endogenous peptides, hormones and small proteins), was constructed to facilitate identification using MS. The database also contains additional information concerning the peptides such as physical properties. A method for analysis of LC-MS data, including scanning for, and further profiling of, biologically significant peptides was developed. We show that peptides present in different amounts in groups of samples can be automatically detected.The peptidome approach was used to investigate levels of peptides in two animal models of Parkinson’s disease. PEP-19, was found to be significantly decreased in the striatum of MPTP lesioned parkinsonian mice. The localization and expression was further investigated by imaging MALDI MS and by in situ hybridization. The brain peptidome of reserpine treated mice was investigated and displayed a number of significantly altered peptides. This thesis demonstrates that the peptidomics approach allows for the study of complex biochemical processes.

APA, Harvard, Vancouver, ISO, and other styles

48

Sun, Guoli. "Significant distinct branches of hierarchical trees| A framework for statistical analysis and applications to biological data." Thesis, State University of New York at Stony Brook, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3685086.

Full text

Abstract:

One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity. We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. With each of the five datasets, there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques. One dataset uses Cores Of Recurrent Events (CORE) to select features. CORE was developed with my participation in the course of this work. An R language implementation of the method is available from the Comprehensive R Archive Network: cran.r-project.org/web/packages/CORE/index.html. Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: cran.r-project.org/web/packages/TBEST/index.html.

APA, Harvard, Vancouver, ISO, and other styles

49

Muggli, Martin D. "Enhancing Space and Time Efficiency of Genomics in Practice through Sophisticated Applications of the FM-Index." Thesis, Colorado State University, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10977737.

Full text

Abstract:

Genomic sequence data has become so easy to get that the computation to process it has become a bottleneck in the advancement of biological science. A data structure known as the FM-Index both compresses data and allows efficient querying, thus can be used to implement more efficient processing methods. In this work we apply advanced formulations of the FM-Index to existing problems and show our methods exceed the performance of competing tools.

APA, Harvard, Vancouver, ISO, and other styles

50

Choudhury, Salimur Rashid, and University of Lethbridge Faculty of Arts and Science. "Approximation algorithms for a graph-cut problem with applications to a clustering problem in bioinformatics." Thesis, Lethbridge, Alta. : University of Lethbridge, Deptartment of Mathematics and Computer Science, 2008, 2008. http://hdl.handle.net/10133/774.

Full text

Abstract:

Clusters in protein interaction networks can potentially help identify functional relationships among proteins. We study the clustering problem by modeling it as graph cut problems. Given an edge weighted graph, the goal is to partition the graph into a prescribed number of subsets obeying some capacity constraints, so as to maximize the total weight of the edges that are within a subset. Identification of a dense subset might shed some light on the biological function of all the proteins in the subset. We study integer programming formulations and exhibit large integrality gaps for various formulations. This is indicative of the difficulty in obtaining constant factor approximation algorithms using the primal-dual schema. We propose three approximation algorithms for the problem. We evaluate the algorithms on the database of interacting proteins and on randomly generated graphs. Our experiments show that the algorithms are fast and have good performance ratio in practice. xiii, 71 leaves : ill. ; 29 cm.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Bioinformatics applications'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles