To see the other types of publications on this topic, follow the link: Computational biology, bioinformatics.

Dissertations / Theses on the topic 'Computational biology, bioinformatics'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Computational biology, bioinformatics.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Rajarathinam, Kayathri. "Nutraceuticals based computational medicinal chemistry." Licentiate thesis, KTH, Teoretisk kemi och biologi, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-122681.

Full text
Abstract:
In recent years, the edible biomedicinal products called nutraceuticals have been becoming more popular among the pharmaceutical industries and the consumers. In the process of developing nutraceuticals, in silico approaches play an important role in structural elucidation, receptor-ligand interactions, drug designing etc., that critically help the laboratory experiments to avoid biological and financial risk. In this thesis, three nutraceuticals possessing antimicrobial and anticancer activities have been studied. Firstly, a tertiary structure was elucidated for a coagulant protein (MO2.1) of Moringa oleifera based on homology modeling and also studied its oligomerization that is believed to interfere with its medicinal properties. Secondly, the antimicrobial efficiency of a limonoid from neem tree called ‘azadirachtin’ was studied with a bacterial (Proteus mirabilis) detoxification agent, glutathione S-transferase, to propose it as a potent drug candidate for urinary tract infections. Thirdly, sequence specific binding activity was analyzed for a plant alkaloid called ‘palmatine’ for the purpose of developing intercalators in cancer therapy. Cumulatively, we have used in silico methods to propose the structure of an antimicrobial peptide and also to understand the interactions between protein and nucleic acids with these nutraceuticals.

QC 20130531

APA, Harvard, Vancouver, ISO, and other styles
2

Pettersson, Fredrik. "A multivariate approach to computational molecular biology." Doctoral thesis, Umeå : Univ, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-609.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Peng, Zeshan. "Structure comparison in bioinformatics." Click to view the E-thesis via HKUTO, 2006. http://sunzi.lib.hku.hk/hkuto/record/B36271299.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Peng, Zeshan, and 彭澤山. "Structure comparison in bioinformatics." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B36271299.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Björkholm, Patrik. "Method for recognizing local descriptors of protein structures using Hidden Markov Models." Thesis, Linköping University, The Department of Physics, Chemistry and Biology, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11408.

Full text
Abstract:

Being able to predict the sequence-structure relationship in proteins will extend the scope of many bioinformatics tools relying on structure information. Here we use Hidden Markov models (HMM) to recognize and pinpoint the location in target sequences of local structural motifs (local descriptors of protein structure, LDPS) These substructures are composed of three or more segments of amino acid backbone structures that are in proximity with each other in space but not necessarily along the amino acid sequence. We were able to align descriptors to their proper locations in 41.1% of the cases when using models solely built from amino acid information. Using models that also incorporated secondary structure information, we were able to assign 57.8% of the local descriptors to their proper location. Further enhancements in performance was yielded when threading a profile through the Hidden Markov models together with the secondary structure, with this material we were able assign 58,5% of the descriptors to their proper locations. Hidden Markov models were shown to be able to locate LDPS in target sequences, the performance accuracy increases when secondary structure and the profile for the target sequence were used in the models.

APA, Harvard, Vancouver, ISO, and other styles
6

Chawade, Aakash. "Inferring Gene Regulatory Networks in Cold-Acclimated Plants by Combinatorial Analysis of mRNA Expression Levels and Promoter Regions." Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-20.

Full text
Abstract:

Understanding the cold acclimation process in plants may help us develop genetically engineered plants that are resistant to cold. The key factor in understanding this process is to study the genes and thus the gene regulatory network that is involved in the cold acclimation process. Most of the existing approaches1-8 in deriving regulatory networks rely only on the gene expression data. Since the expression data is usually noisy and sparse the networks generated by these approaches are usually incoherent and incomplete. Hence a new approach is proposed here that analyzes the promoter regions along with the expression data in inferring the regulatory networks. In this approach genes are grouped into sets if they contain similar over-represented motifs or motif pairs in their promoter regions and if their expression pattern follows the expression pattern of the regulating gene. The network thus derived is evaluated using known literature evidence, functional annotations and from statistical tests.

APA, Harvard, Vancouver, ISO, and other styles
7

Muhammad, Ashfaq. "Design and Development of a Database for the Classification of Corynebacterium glutamicum Genes, Proteins, Mutants and Experimental Protocols." Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-23.

Full text
Abstract:

Coryneform bacteria are largely distributed in nature and are rod like, aerobic soil bacteria capable of growing on a variety of sugars and organic acids. Corynebacterium glutamicum is a nonpathogenic species of Coryneform bacteria used for industrial production of amino acids. There are three main publicly available genome annotations, Cg, Cgl and NCgl for C. glutamicum. All these three annotations have different numbers of protein coding genes and varying numbers of overlaps of similar genes. The original data is only available in text files. In this format of genome data, it was not easy to search and compare the data among different annotations and it was impossible to make an extensive multidimensional customized formal search against different protein parameters. Comparison of all genome annotations for construction deletion, over-expression mutants, graphical representation of genome information, such as gene locations, neighboring genes, orientation (direct or complementary strand), overlapping genes, gene lengths, graphical output for structure function relation by comparison of predicted trans-membrane domains (TMD) and functional protein domains protein motifs was not possible when data is inconsistent and redundant on various publicly available biological database servers. There was therefore a need for a system of managing the data for mutants and experimental setups. In spite of the fact that the genome sequence is known, until now no databank providing such a complete set of information has been available. We solved these problems by developing a standalone relational database software application covering data processing, protein-DNA sequence extraction and

management of lab data. The result of the study is an application named, CORYNEBASE, which is a software that meets our aims and objectives.

APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Lei. "Construction of Evolutionary Tree Models for Oncogenesis of Endometrial Adenocarcinoma." Thesis, University of Skövde, School of Humanities and Informatics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-25.

Full text
Abstract:

Endometrial adenocarcinoma (EAC) is the fourth leading cause of carcinoma in woman worldwide, but not much is known about genetic factors involved in this complex disease. During the EAC process, it is well known that losses and gains of chromosomal regions do not occur completely at random, but partly through some flow of causality. In this work, we used three different algorithms based on frequency of genomic alterations to construct 27 tree models of oncogenesis. So far, no study about applying pathway models to microsatellite marker data had been reported. Data from genome–wide scans with microsatellite markers were classified into 9 data sets, according to two biological approaches (solid tumor cell and corresponding tissue culture) and three different genetic backgrounds provided by intercrossing the susceptible rat BDII strain and two normal rat strains. Compared to previous study, similar conclusions were drawn from tree models that three main important regions (I, II and III) and two subordinate regions (IV and V) are likely to be involved in EAC development. Further information about these regions such as their likely order and relationships was produced by the tree models. A high consistency in tree models and the relationship among p19, Tp53 and Tp53 inducible

protein genes provided supportive evidence for the reliability of results.

APA, Harvard, Vancouver, ISO, and other styles
9

Dodda, Srinivasa Rao. "Improvements and extensions of a web-tool for finding candidate genes associated with rheumatoid arthritis." Thesis, University of Skövde, School of Humanities and Informatics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-26.

Full text
Abstract:

QuantitativeTraitLocus (QTL) is a statistical method used to restrict genomic regions contributing to specific phenotypes. To further localize genes in such regions a web tool called “Candidate Gene Capture” (CGC) was developed by Andersson et al. (2005). The CGC tool was based on the textual description of genes defined in the human phenotype database OMIM. Even though the CGC tool works well, the tool was limited by a number of inconsistencies in the underlying database structure, static web pages and some gene descriptions without properly defined function in the OMIM database. Hence, in this work the CGC tool was improved by redesigning its database structure, adding dynamic web pages and improving the prediction of unknown gene function by using exon analysis. The changes in database structure diminished the number of tables considerably, eliminated redundancies and made data retrieval more efficient. A new method for prediction of gene function was proposed, based on the assumption that similarity between exon sequences is associated with biochemical function. Using Blast with 20380 exon protein sequences and a threshold E-value of 0.01, 639 exon groups were obtained with an average of 11 exons per group. When estimating the functional similarity, it was found that on the average 72% of the exons in a group had at least one Gene Ontology (GO) term in common.

APA, Harvard, Vancouver, ISO, and other styles
10

Huque, Enamul. "Shape Analysis and Measurement for the HeLa cell classification of cultured cells in high throughput screening." Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-27.

Full text
Abstract:

Feature extraction by digital image analysis and cell classification is an important task for cell culture automation. In High Throughput Screening (HTS) where thousands of data points are generated and processed at once, features will be extracted and cells will be classified to make a decision whether the cell-culture is going on smoothly or not. The culture is restarted if a problem is detected. In this thesis project HeLa cells, which are human epithelial cancer cells, are selected for the experiment. The purpose is to classify two types of HeLa cells in culture: Cells in cleavage that are round floating cells (stressed or dead cells are also round and floating) and another is, normal growing cells that are attached to the substrate. As the number of cells in cleavage will always be smaller than the number of cells which are growing normally and attached to the substrate, the cell-count of attached cells should be higher than the round cells. There are five different HeLa cell images that are used. For each image, every single cell is obtained by image segmentation and isolation. Different mathematical features are found for each cell. The feature set for this experiment is chosen in such a way that features are robust, discriminative and have good generalisation quality for classification. Almost all the features presented in this thesis are rotation, translation and scale invariant so that they are expected to perform well in discriminating objects or cells by any classification algorithm. There are some new features added which are believed to improve the classification result. The feature set is considerably broad rather than in contrast with the restricted sets which have been used in previous work. These features are used based on a common interface so that the library can be extended and integrated into other applications. These features are fed into a machine learning algorithm called Linear Discriminant Analysis (LDA) for classification. Cells are then classified as ‘Cells attached to the substrate’ or Cell Class A and ‘Cells in cleavage’ or Cell Class B. LDA considers features by leaving and adding shape features for increased performance. On average there is higher than ninety five percent accuracy obtained in the classification result which is validated by visual classification.

APA, Harvard, Vancouver, ISO, and other styles
11

Naswa, Sudhir. "Representation of Biochemical Pathway Models : Issues relating conversion of model representation from SBML to a commercial tool." Thesis, University of Skövde, School of Humanities and Informatics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-28.

Full text
Abstract:

Background: Computational simulation of complex biological networks lies at the heart of systems biology since it can confirm the conclusions drawn by experimental studies of biological networks and guide researchers to produce fresh hypotheses for further experimental validation. Since this iterative process helps in development of more realistic system models a variety of computational tools have been developed. In the absence of a common format for representation of models these tools were developed in different formats. As a result these tools became unable to exchange models amongst them, leading to development of SBML, a standard exchange format for computational models of biochemical networks. Here the formats of SBML and one of the commercial tools of systems biology are being compared to study the issues which may arise during conversion between their respective formats. A tool StoP has been developed to convert the format of SBML to the format of the selected tool.

Results: The basic format of SBML representation which is in the form of listings of various elements of a biochemical reaction system differs from the representation of the selected tool which is location oriented. In spite of this difference the various components of biochemical pathways including multiple compartments, global parameters, reactants, products, modifiers, reactions, kinetic formulas and reaction parameters could be converted from the SBML representation to the representation of the selected tool. The MathML representation of the kinetic formula in an SBML model can be converted to the string format of the selected tool. Some features of the SBML are not present in the selected tool. Similarly, the ability of the selected tool to declare parameters for locations, which are global to those locations and their children, is not present in the SBML.

Conclusions: Differences in representations of pathway models may include differences in terminologies, basic architecture, differences in capabilities of software’s, and adoption of different standards for similar things. But the overall similarity of domain of pathway models enables us to interconvert these representations. The selected tool should develop support for unit definitions, events and rules. Development of facility for parameter declaration at compartment level by SBML and facility for function declaration by the selected tool is recommended.

APA, Harvard, Vancouver, ISO, and other styles
12

Poudel, Sagar. "GPCR-Directed Libraries for High Throughput Screening." Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-29.

Full text
Abstract:

Guanine nucleotide binding protein (G-protein) coupled receptors (GPCRs), the largest receptor family, is enormously important for the pharmaceutical industry as they are the target of 50-60% of all existing medicines. Discovery of many new GPCR receptors by the “human genome project”, open up new opportunities for developing novel therapeutics. High throughput screening (HTS) of chemical libraries is a well established method for finding new lead compounds in drug discovery. Despite some success this approach has suffered from the near absence of more focused and specific targeted libraries. To improve the hit rates and to maximally exploit the full potential of current corporate screening collections, in this thesis work, identification and analysis of the critical drug-binding positions within the GPCRs were done, based on their overall sequence, their transmembrane regions and their drug binding fingerprints. A proper classification based on drug binding fingerprints on the basis for a successful pharmacophore modelling and virtual screening were done, which facilities in the development of more specific and focused targeted libraries for HTS.

APA, Harvard, Vancouver, ISO, and other styles
13

Anders, Patrizia. "A bioinformaticians view on the evolution of smell perception." Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-30.

Full text
Abstract:

Background:

The origin of vertebrate sensory systems still contains many mysteries and thus challenges to bioinformatics. Especially the evolution of the sense of smell maintains important puzzles, namely the question whether or not the vomeronasal system is older than the main olfactory system. Here I compare receptor sequences of the two distinct systems in a phylogenetic study, to determine their relationships among several different species of the vertebrates.

Results:

Receptors of the two olfactory systems share little sequence similarity and prove to be a challenge in multiple sequence alignment. However, recent dramatical improvements in the area of alignment tools allow for better results and high confidence. Different strategies and tools were employed and compared to derive a

high quality alignment that holds information about the evolutionary relationships between the different receptor types. The resulting Maximum-Likelihood tree supports the theory that the vomeronasal system is rather an ancestor of the main olfactory system instead of being an evolutionary novelty of tetrapods.

Conclusions:

The connections between the two systems of smell perception might be much more fundamental than the common architecture of receptors. A better understanding of these parallels is desirable, not only with respect to our view on evolution, but also in the context of the further exploration of the functionality and complexity of odor perception. Along the way, this work offers a practical protocol through the jungle of programs concerned with sequence data and phylogenetic reconstruction.

APA, Harvard, Vancouver, ISO, and other styles
14

Pohl, Matin. "Using an ontology to enhance metabolic or signaling pathway comparisions by biological and chemical knowledge." Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-32.

Full text
Abstract:

Motivation:

As genome-scale efforts are ongoing to investigate metabolic networks of miscellaneous organisms the amount of pathway data is growing. Simultaneously an increasing amount of gene expression data from micro arrays becomes available for reverse engineering, delivering e.g. hypothetical regulatory pathway data. To avoid outgrowing of data and keep control of real new informations the need of analysis tools arises. One vital task is the comparison of pathways for detection of similar functionalities, overlaps, or in case of reverse engineering, detection of known data corroborating a hypothetical pathway. A comparison method using ontological knowledge about molecules and reactions will feature a more biological point of view which graph theoretical approaches missed so far. Such a comparison attempt based on an ontology is described in this report.

Results:

An algorithm is introduced that performs a comparison of pathways component by component. The method was performed on two selected databases and the results proved it to be not satisfying using it as stand-alone method. Further development possibilities are suggested and steps toward an integrated method using several approaches are recommended.

Availability:

The source code, used database snapshots and pictures can be requested from the author.

APA, Harvard, Vancouver, ISO, and other styles
15

Sentausa, Erwin. "Time course simulation replicability of SBML-supporting biochemical network simulation tools." Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-33.

Full text
Abstract:

Background: Modelling and simulation are important tools for understanding biological systems. Numerous modelling and simulation software tools have been developed for integrating knowledge regarding the behaviour of a dynamic biological system described in mathematical form. The Systems Biology Markup Language (SBML) was created as a standard format for exchanging biochemical network models among tools. However, it is not certain yet whether actual usage and exchange of SBML models among the tools of different purpose and interfaces is assessable. Particularly, it is not clear whether dynamic simulations of SBML models using different modelling and simulation packages are replicable.

Results: Time series simulations of published biological models in SBML format are performed using four modelling and simulation tools which support SBML to evaluate whether the tools correctly replicate the simulation results. Some of the tools do not successfully integrate some models. In the time series output of the successful

simulations, there are differences between the tools.

Conclusions: Although SBML is widely supported among biochemical modelling and simulation tools, not all simulators can replicate time-course simulations of SBML models exactly. This incapability of replicating simulation results may harm the peer-review process of biological modelling and simulation activities and should be addressed accordingly, for example by specifying in the SBML model the exact algorithm or simulator used for replicating the simulation result.

APA, Harvard, Vancouver, ISO, and other styles
16

Simu, Tiberiu. "A method for extracting pathways from Scansite-predicted protein-protein interactions." Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-34.

Full text
Abstract:

Protein interaction is an important mechanism for cellular functionality. Predicting protein interactions is available in many cases as computational methods in publicly available resources (for example Scansite). These predictions can be further combined with other information sources to generate hypothetical pathways. However, when using computational methods for building pathways, the process may become time consuming, as it requires multiple iterations and consolidating data from different sources. We have tested whether it is possible to generate graphs of protein-protein interaction by using only domain-motif interaction data and the degree to which it is possible to automate this process by developing a program that is able to aggregate, under user guidance, query results from different information sources. The data sources used are Scansite and SwissProt. Visualisation of the graphs is done with an external program freely available for academic purposes, Osprey. The graphs obtained by running the software show that although it is possible to combine publicly available data and theoretical protein-protein interaction predictions from Scansite, further efforts are needed to increase the biological plausibility of these collections of data. It is possible, however, to reduce the dimensionality of the obtained graphs by focusing the searches on a certain tissue of interest.

APA, Harvard, Vancouver, ISO, and other styles
17

Mathew, Sumi. "A method to identify the non-coding RNA gene for U1 RNA in species in which it has not yet been found." Thesis, University of Skövde, School of Humanities and Informatics, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-37.

Full text
Abstract:

Background

Non coding RNAs are the RNA molecules that do not code for proteins but play structural, catalytic or regulatory roles in the organisms in which they are found. These RNAs generally conserve their secondary structure more than their primary sequence. It is possible to look for protein coding genes using sequence signals like promoters, terminators, start and stop codons etc. However, this is not the case with non coding RNAs since these signals are weakly conserved in them. Hence the situation with non coding RNAs is more challenging. Therefore a protocol is devised to identify U1 RNA in species not previously known to have it.

Results

It is sufficient to use the covariance models to identify non coding RNAs but they are very slow and hence a filtering step is needed before using the covariance models to reduce the search space for identifying these genes. The protocol for identifying U1 RNA genes employs for the filtering a pattern matcher RNABOB that can conduct secondary structure pattern searches. The descriptor for RNABOB is made automatically such that it can also represent the bulges and interior loops in helices of RNA. The protocol is compared with the Rfam and Weinberg & Ruzzo approaches and has been able to identify new U1 RNA homologues in the Apicomplexan group where it has not previously been found.

Conclusions

The method has been used to identify the gene for U1 RNA in certain species in which it has not been detected previously. The identified genes may be further analyzed by wet laboratory techniques for the confirmation of their existence.

4

APA, Harvard, Vancouver, ISO, and other styles
18

Kasap, Server. "High performance reconfigurable architectures for bioinformatics and computational biology applications." Thesis, University of Edinburgh, 2010. http://hdl.handle.net/1842/24757.

Full text
Abstract:
The field of Bioinformatics and Computational Biology (BCB), a relatively new discipline which spans the boundaries of Biology, Computer Science and Engineering, aims to develop systems that help organise, store, retrieve and analyse genomic and other biological information in a convenient and speedy way. This new discipline emerged mainly as a result of the Human Genome project which succeeded in transcribing the complete DNA sequence of the human genome, hence making it possible to address many problems which were impossible to even contemplate before, with a plethora of applications including disease diagnosis, drug engineering, bio-material engineering and genetic engineering of plants and animals; all with a real impact on the quality of the life of ordinary individuals. Due to the sheer immensity of the data sets involved in BCB algorithms (often measured in tens/hundreds of Gigabytes) as well as their computation demands (often measured in Tera-Ops), high performance supercomputers and computer clusters have been used as implementation platforms for high performance BCB computing. However, the high cost as well as the lack of suitable programming interfaces for these platforms still impedes a wider undertaking of this technology in the BCB community. Moreover, with increased heat dissipation, supercomputers are now often augmented with special-purpose hardware (or ASICs) in order to speed up their operations while reducing their power dissipation. However, since ASICs are fully customised to implement particular tasks/algorithms, they suffer from increased development times, higher Non-Recurring-Engineering (NRE) costs, and inflexibility as they cannot be reused to implement tasks/algorithms other than those they have been designed to perform. On the other hand, Field Programmable Gate Arrays (FPGAs) have recently been proposed as a viable alternative implementation platform for BCB applications due to their flexible computing and memory architecture which gives them ASIC-like performance with the added programmability feature. In order to counter the aforementioned limitations of both supercomputers and ASICs, this research proposes the use of state-of-the-art reprogrammable system-on-chip technology, in the form of platform FPGAs, as a relatively low cost, high performance and reprogrammable implementation platform for BCB applications. This research project aims to develop a sophisticated library of FPGA architectures for bio-sequence analysis, phylogenetic analysis, and molecular dynamics simulation.
APA, Harvard, Vancouver, ISO, and other styles
19

Eklund, Martin. "eScience Approaches to Model Selection and Assessment : Applications in Bioinformatics." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-109437.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Fauteux, François. "Computational DNA motif discovery in plant promoters." Thesis, McGill University, 2010. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=86926.

Full text
Abstract:
The regulation of gene expression is driven primarily by transcription factors binding to short DNA sequences. Here three studies related to promoter cis-regulatory motif discovery in plant promoters are presented. In the first study, an exact discriminative seeding DNA motif discovery addressing key issues associated with popular DNA motif discovery algorithms is proposed. The Seeder algorithm outperforms popular motif discovery tools on biological benchmark data. In the second study, the algorithm is applied to the identification of cis-regulatory motifs in seed storage protein gene promoters. Known and new motifs are discovered. In the third study, groups of orthologous genes are identified among five dicotyledonous plant species, and DNA motif discovery is carried out in the proximal promoter sequence within each group. The presence of three large clusters of groups of orthologous promoters sharing similar motifs is revealed.
L'expression des gènes est régulée, en grande partie, par la liaison des facteurs de transcription à de courtes séquences d'ADN. Trois études sont présentées, portant sur l'identification in silico de motifs régulateurs dans les séquences promotrices de gènes végétaux. Dans la première étude, un algorithme d'initiation discriminative exacte est présenté. L'algorithme surpasse plusieurs algorithmes populaires lorsque appliqué à des données biologiques de référence. Dans la deuxième étude, l'algorithme est utilisé pour l'identification de motifs cis-régulateurs conservés dans les promoteurs de gènes de protéines de réserve des graines chez diverses espèces végétales. Des motifs connus ainsi que de nouveaux motifs sont identifiés. Dans la troisième étude, des groupes de gènes orthologues sont identifiés chez cinq espèces dicotylédones, et une recherche de motifs cis-régulateurs est réalisée dans les séquences promotrices proximales pour chaque groupe. La présence de trois larges grappes de groupes d'orthologues partageant des motifs similaires est mise en évidence.
APA, Harvard, Vancouver, ISO, and other styles
21

Che, Huiwen. "Evaluation of de novo assembly using PacBio long reads." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-302744.

Full text
Abstract:
New sequencing technologies show promise for the construction of complete and accurate genome sequences, by a process called de novo assembly that joins reads by overlap to longer contiguous sequences without the need for a reference genome. High-quality de novo assembly leads to better understanding in genetic variations. The purpose of this thesis is to evaluate human genome sequences obtained from the PacBio sequencing platform, which is a new technology suitable for de novo assembly of large genomes. The evaluation focuses on comparing sequence identity between our own de novo assemblies and the available human reference and through that, benchmark accuracy of our data. Sequences that are absent from the reference genome, are investigated for potential unannotated genes coordinately. We also assess the complex structural variation using different approaches. Our assemblies show high consensus with the human reference genome, with ⇠ 98.6% of the bases in the assemblies mapped to the human reference. We also detect more than ten thousand of structural variants, including some large rearrangements, with respect to the reference.
APA, Harvard, Vancouver, ISO, and other styles
22

Mattisson, Jonas, Sofia Gräsberg, Öhrling Sara Rydberg, Mohammed Al-Jaff, Iris Molin, and Eric Sandström. "Framtagning av unika gemensamma sekvenser hos koagulasnegativa stafylokocker." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-295974.

Full text
Abstract:
I följande rapport kommer vi ta upp hur vi löste problemet med att hitta gemensamma sekvenser hos en mängd koagulasnegativa stafylokocker (KNS) för att bl.a. kunna skilja dem ifrån dess släkting Staphylococcus aureus (S. aureus). Problemet har sin grund i att projektbeställaren, Q-linea, vill kunna identifiera infekterande bakterier i fall av blodsjukdomen sepsis. Vi kunde dessvärre inte hitta sekvenser som fungerade för alla våra utvalda stafylokocker. Däremot lyckades vi hitta flera sekvenser som parvis fungerade tillsammans för att urskilja stafylokockgruppen mot S. aureus. För att utföra alla jämförelser konstruerade och implementerade vi en bioinformatisk pipeline med en tredelad optimeringsmetod för att göra de tunga beräkningarna snabbare.
APA, Harvard, Vancouver, ISO, and other styles
23

Capuccini, Marco. "Structure-Based Virtual Screening in Spark." Thesis, Uppsala universitet, Institutionen för farmaceutisk biovetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-257028.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Pestana, Valeria. "Modeling drug response in cancer cell linesusing genotype and high-throughput“omics” data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-166744.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Dong, Siyuan. "A time dependent adaptive learning process for estimating drug exposure from register data - applied to insulin and its analogues." Thesis, KTH, Beräkningsbiologi, CB, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-128438.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Ljungberg, Kajsa. "Numerical methods for mapping of multiple QTL." Licentiate thesis, Uppsala universitet, Avdelningen för teknisk databehandling, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-86133.

Full text
Abstract:
This thesis concerns numerical methods for mapping of multiple quantitative trait loci, QTL. Interactions between multiple genetic loci influencing important traits, such as growth rate in farm animals and predisposition to cancer in humans, make it necessary to search for several QTL simultaneously. Simultaneous search for n QTL involves solving an n-dimensional global optimization problem, where each evaluation of the objective function consists of solving a generalized least squares problem. In Paper A we present efficient algorithms, mainly based on updated QR factorizations, for evaluating the objective functions of different parametric QTL mapping methods. One of these algorithms reduces the computational work required for an important function class by one order of magnitude compared with the best of the methods used by other authors. In Paper B previously utilized techniques for finding the global optimum of the objective function are compared with a new approach based on the DIRECT algorithm of Jones et al. The new method gives accurate results in one order of magnitude less time than the best of the formerly employed algorithms. Using the algorithms presented in Papers A and B, simultaneous search for at least three QTL, including computation of the relevant empirical significance thresholds, can be performed routinely.
APA, Harvard, Vancouver, ISO, and other styles
27

Mattisson, Jonas. "Identifying esophageal atresi associated variants from whole genome sequencing data." Thesis, Uppsala universitet, Institutionen för immunologi, genetik och patologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-345329.

Full text
Abstract:
Knowing the underlying cause of a genetic disorder could not only further our understanding of the disease itself, and the otherwise healthy mechanism that is disrupted. It could potentially improve people’s lives. Even if whole genome sequencing has drastically improved the potential of discovering the cause, a comparison of two non-related individual’s genome will find several million sequence variations. While most variants have no significant impact, it is enough for only one to functionally impact a gene, for it to cause a genetic disorder. This project therefore focused on the filtering of variants, from lists of several million possible causes, to the stage where they could feasible be manually analysed one by one. Single-nucleotide variants, indels and structural variants were filtered, based on a dataset where single-nucleotide variants and indels had already been called. The more difficult process of structural variants discovery was performed, but it required the application of four different tools to minimise the drawback of each separate discovery technique. The same three filtering approaches were applied to all variants; the intersecting of datasets that should contain the same variant, the removal of variants in common with the general population and the selection of variants impacting functionality. Each approach proved to be an efficient filtering step, with their combination reducing each list to only a couple of variants out of the original five million. Due to lower accuracy and sensitivity of the structural variant analysis, this data will likely require more extensive manual analysis.
APA, Harvard, Vancouver, ISO, and other styles
28

Sandström, Eric. "Implementation of an automatic quality control of derived data files for NONMEM." Thesis, Uppsala universitet, Institutionen för farmaceutisk biovetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-375892.

Full text
Abstract:
A pharmacometric analysis must be based on correct data to be valid. Source clinical data is rarely ready to be modelled as is, but rather needs to be reprogrammed to fit the format required by the pharmacometric modelling software. The reprogramming steps include selecting the subsets of data relevant for modelling, deriving new information from the source and adjusting units and encoding. Sometimes, the source data may also be flawed, containing vague definitions and missing or confusing values. In either setting, the source data needs to be reprogrammed to remedy this, followed by extensive quality control to capture any errors or inconsistencies produced along the way. The quality control is a lengthy task which is often performed manually, either by the scientists conducting the pharmacometric study or by independent reviewers. This project presents an automatic data quality control with the purpose of aiding the data curation process, as to minimize any potential errors that would otherwise have to be detected by the manual quality control. The automatic quality control is implemented as an R-package and is specifically tailored for the needs of Pharmetheus.
APA, Harvard, Vancouver, ISO, and other styles
29

Odelgard, Anna. "Coverage Analysis in Clinical Next-Generation Sequencing." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-379228.

Full text
Abstract:
With the new way of sequencing by NGS new tools had to be developed to be able to work with new data formats and to handle the larger data sizes compared to the previous techniques but also to check the accuracy of the data. Coverage analysis is one important quality control for NGS data, the coverage indicates how many times each base pair has been sequenced and thus how trustworthy each base call is. For clinical purposes every base of interest must be quality controlled as one wrong base call could affect the patient negatively. The softwares used for coverage analysis with enough accuracy and detail for clinical applications are sparse. Several softwares like Samtools, are able to calculate coverage values but does not further process this information in a useful way to produce a QC report of each base pair of interest. My master thesis has therefore been to create a new coverage analysis report tool, named CAR tool, that extract the coverage values from Samtools and further uses this data to produce a report consisting of tables, lists and figures. CAR tool is created to replace the currently used tool, ExCID, at the Clinical Genomics facility at SciLifeLab in Uppsala and was developed to meet the needs of the bioinformaticians and clinicians. CAR tool is written in python and launched from a terminal window. The main function of the tool is to display coverage breath values for each region of interest and to extract all sub regions below a chosen coverage depth threshold. The low coverage regions are then reported together with region name, start and stop positions, length and mean coverage value. To make the tool useful to as many as possible several settings are possible by entering different flags when calling the tool. Such settings can be to generate pie charts of each region’s coverage values, filtering of the read and bases by quality or write your own entry that will be used for the coverage calculation by Samtools. The tool has been proved to find these low coverage regions very well. Most low regions found are also found by ExCID, the currently used tool, some differences did however occur and every such region was verified by IGV. The coverage values shown in IGV coincided with those found by CAR tool. CAR tool is written to find all low coverage regions even if they are only one base pair long, while ExCID instead seem to generate larger low regions not taking very short low regions into account. To read more about the functions and how to use CAR tool I refer to User instructions in the appendix and on GitHub at the repository anod6351
APA, Harvard, Vancouver, ISO, and other styles
30

Hillerton, Thomas. "Predicting adverse drug reactions in cancer treatment using a neural network based approach." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15659.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Gevorgyan, Arusjak. "Development of a phylogenomic framework for the krill." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-355387.

Full text
Abstract:
Over the last few decades, many krill stocks have declined in size and number,likely as a consequence of global climate change (Siegel 2016). A major risk factoris the increased level of carbon dioxide (CO2) in the ocean. A collapse of the krillpopulation has the potential to cause disruption of the ocean ecosystem, as krill arethe main connection between primary producers such as phytoplankton and largeranimals (Murphy et al. 2012). The aim of this project is to produce the firstphylogenomic framework with help of powerful comparative bioinformatics andphylogenomic methods in order to find and analyse the genes that help krill adaptto its environment. Problem with these studies is that we still do not have access toa reference genome sequence of any krill species. To strengthen and increase trustin our studies two different pipelines were performed, each with different OrthologyAssessment Toolkits (OATs), Orthograph and UPhO, in order to establish orthologyrelationships between transcripts/genes. Since UPhO produces well-supportedtrees where the majority of the gene trees match the species tree, it isrecommended as the proper OATs for generating a robust molecular phylogeny ofkrill. The second aim with his project was to estimate the level of positive selectionin E. superba in order to lay a foundation about level of selection acting on proteincodingsequences in krill. As expected, the level of selection was quite high in E.superba, which indicates that krill are adapted to the changing environment bypositive selection rather than natural genetic drift.
APA, Harvard, Vancouver, ISO, and other styles
32

Säde, Viktor, Linn Beckman, Gustav Ahlström, Rebecka Berglin, Frida Forssell, Albin Lundin, and Ida Wettergren. "Visualiseringsverktyg för en proteindatabas." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-412096.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Stenerlöw, Oskar. "Artefact detection in microstructures using image analysis." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-417342.

Full text
Abstract:
Gyros Protein Technologies AB produce instruments designed to perform automated immunoassaying on plastic CDs with microstructures. While generally being a very robust process, the company had noticed that some runs on the instruments encountered problems. They hypothesised it had to do with the chamber on the CD in which the sample is added to. It was believed that the chamber was not being filled properly, leaving it completely empty or contained with a small amount of air, rather than liquid. This project aimed to investigate this hypothesis and to develop an image analysis solution that could reliably detect these occurrences. An image analysis script was developed which mainly utilised template matching and canny edge detection to assess the presence of air. The analysis had great success in detecting empty chambers and large bubbles of air, while it had some trouble with discerning small bubbles from dirt on top of the CD. Evaluating the analysis on a test set of 1305 images annotated by two people, the analysis managed to score an accuracy of 96.8 % and 99.5 % respectively.
APA, Harvard, Vancouver, ISO, and other styles
34

Liang, Jiarong. "Federated Learning for Bioimage Classification." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-420615.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Bernedal, Nordström Clara. "Metabolic Modelling of Differential Drug Response to Proteasome Inhibitors in Glioblastoma Multiforme." Thesis, Uppsala universitet, Institutionen för farmaceutisk biovetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-450089.

Full text
Abstract:
This project was built upon a previous study (Johansson et al.2020) that tested multiple drugs on glioblastoma cell lines and found a big division between the drug response for proteasome inhibitors. The aim of this project was to try to obtain a better insight into the differences between the two drug response subgroups’ processes by creating and comparing two genome scale metabolic models (GEMs) of the two subgroups. To do this, genomescale metabolic models were made for each cell line and later merged after its proteasome inhibitor response to obtain two general models. After having multiple models for each cell line and two general drug response models, comparisons could be made. Overall, the differences between cell lines were larger than the differences between drug responses, but some differences could still be seen. Some differences in the number of reactions in subsystems were found between the two general GEMs, where the Ureacycle subsystem showed the largest difference between the two models. Another difference was in the metabolic activity of the models, where the sensitive model passed ten tasks which the resistant model could not. The last and the most important comparison was essentiality analysis which gave a multitude of essential genes but only twelve genes that were unique to the twogeneral GEMs. Nine genes for the resistant model and three for the sensitive. Out of these genes CYP51A1 and FDFT1, for the resistant model, and genes RBP1 and CYP27A1, for the sensitive model, had already been in at least one study regarding Glioblastoma or Proteasome Inhibitors. Since some of the found genes already seem to have been found interesting for PIs or glioblastoma treatment the unique genes from the essentiality analysis could be interesting to look more into in the future.
APA, Harvard, Vancouver, ISO, and other styles
36

Annett, Alva. "Single Cell Methods and Cell Hashing forHigh Throughput Drug Screens." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-451848.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Mansouri, Ahmad. "Computational modeling of osteopontin peptide binding to hydroxyapatite." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=104546.

Full text
Abstract:
Osteopontin (OPN), a secreted, noncollagenous, acidic, and mineral-binding phosphoprotein, is composed of 314 amino acids (in humans), mostly composed of glutamate, aspartate and serine. It is prominently associated with biominerals and has a regulatory effect on hydroxyapatite (HAP) crystal growth, the mineral phase of bones and teeth. Recent studies have revealed that OPN contains an acidic, serine- and aspartate-rich motif (ASARM), which potently inhibits mineralization of osteoblast cultures in a phosphate-dependent manner. ASARM peptides accumulate in hypophosphatemia patients whose distinguishing clinical feature is soft bones (osteomalacia). To understand the mechanism of how OPN and the acidic and negatively charged peptides from OPN inhibit the mineralization process by adsorbing to HAP crystal surfaces, we modeled the binding by computational studies. Computational simulations allow for assessing the mechanism by which polyelectrolytes, such as OPN and its peptides, can inhibit mineralization. We used the RosettaSurface protocol to examine human OPN-ASARM peptide (DDSHQSDESHHSDESDEL) binding to flat surfaces of HAP mineral and determined binding affinities, specificities, and structure for ASARM-Sp0 (without phosphoserine) and two phosphorylated forms of ASARM (ASARM-Sp3 and ASARM-Sp5, with 3 and 5 phosphoserines, respectively). Our simulations show an increase in adsorption of ASARM to HAP when the peptide is phosphorylated. Moreover, ASARM and its phosphorylated counterparts show preferential adsorption to the (100) and (010) crystallographic orientations of HAP compared to the (001) orientation.Beside the "flat" surfaces of the HAP crystal, "active sites" such as steps, kinks, and vacancies play deterministic roles in adsorption of foreign molecules and ultimately affect the process of crystal growth. We examined phosphorylated ASARM (DDSpHQSDESHHSpDESpDEL / ASARM-Sp3) binding to HAP mineral with and without vacancies to determine the following: the changes in binding affinity attributable to the phosphate vacancies, the effect of vacancies' geometry in adsorption of the peptide, and the structural changes of ASARM-Sp3 upon adsorption to these surfaces. Our results suggest that the presence of phosphate vacancies on (100) surface increases the adsorption energies of ASARM-Sp3 more than two-fold, and the increase in adsorption energies is related to the number of vacancies available on the surface. The adsorption on the surfaces was mostly mediated through ASARM-Sp3 phosphate groups, which were oriented towards the phosphate vacancies of the crystal surface. In addition, different geometry of the phosphate vacancies was shown to have influence in changing the adsorption energies of ASARM-Sp3. These results indicate that "active sites" present on the surface of a growing crystal can influence the adsorption of biological molecules. More specifically, peptides such as ASARM-Sp3 have side chains (phosphate groups) that can fill the vacancies (phosphate vacancies), driving their adsorption.
L'ostéopontine (OPN), une phosphoprotéine acide secrétée non collagénique, est composée de 314 acides aminés (chez les humains). Elle est constituée principalement de glutamate, l'aspartate et de serine. L'ostéopontine est associée avec des biominéraux et a un effet régulateur sur la croissance de cristaux hydroxyapatite (HAP), la phase minérale des os et des dents. De récentes recherches ont révélé que l'OPN contient un motif acide, riche en sérine et en aspartate (ASARM), qui peut fortement inhiber la minéralisation des cultures d'ostéoblastes en dépendance de phosphates. Les peptides ASARM s'accumulent dans les patients souffrant d'hypophosphatémie, ayant comme symptôme des os souples (ostéomalacie). Afin de comprendre le mécanisme par lequel l'OPN et les peptides charges négativement de l'OPN inhibe le processus de minéralisation par l'adsorption aux surfaces cristallines HAP, nous avons modélisé les liaisons par une étude de simulations computationnelles. Ces simulations nous permettent de déterminer le mécanisme par lequel les poly électrolytes (OPN et ses peptides) inhibent le processus de minéralisation. Nous avons utilise le protocole RosettaSurface pour examiner la liaison du peptide OPN-ASARM (DDSHQSDESHHSDESDEL) aux surfaces planes d'un minéral HAP. Plus précisément, nous avons observe les affinités, les spécificités de liaison ainsi que la structure de ASARM-Sp0 (sans phosphosérine) et deux formes phosphorylées de ASARM (ASARM-SP3 et ASARM-SP5, possédant 3 et 5 phosphosérines respectivement). Nous simulations indiques une augmentation de l'adsorption d'ASARM pour le HAP lorsque le peptide est phosphorylé. De plus, ASARM et ses versionsivphosphorylées montres une adsorption préférentielle aux orientations cristallographiques de HAP (100) et (010) comparé à l'orientation (001). Mis à part la surface plane du cristal HAP, des « sites d'activité », tels que des paliers, des crevasses ainsi que des vides jouent un rôle critique dans l'adsorption de molécules étrangères, affectant le processus de croissance des cristaux. Nous avons examine la liaisons entre un ASARM phosphorylé (DDSpHQSDESHHSpDESpDEL / ASARM-Sp3) et un minéral HAP avec et sans vide. Nous en avons déterminé les changements dans l'affinité de liaison attribuables au manque de phosphate, les effets des vides dans la géométrie pour l'adsorption du peptide ainsi que les changements de structure de l'ASARM-Sp3 lors de l'adsorption à ces surfaces. Nos résultats suggèrent que la présence de vides sur la surface (100) augmente l'énergie d'adsorption d'ASARM-Sp3 par plus de deux fois, et l'augmentation de l'énergie d'adsorption est lie au nombre de vides disponibles sur la surface. L'adsorption sur ces surfaces est assurée a traves les groupes phosphate d'ASARM-Sp3, orientes vers les vides phosphates de la surface du cristal. De plus, différentes géométries des vides de phosphate semblent avoir une influence sur le changement de l'énergie d'adsorption de ASARM-Sp3. Ces résultats indiquent que les sites actifs présents sur la surface d'un cristal en croissance peut influencer l'adsorption de molécules biologiques. Plus précisément, des peptides tels que ASARM-Sp3 ont des chaines secondaires (groupes phosphates) qui peuvent combler les vides (vides phosphates), entrainant leur adsorption.
APA, Harvard, Vancouver, ISO, and other styles
38

Ling, Cheng. "High performance bioinformatics and computational biology on general-purpose graphics processing units." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/6260.

Full text
Abstract:
Bioinformatics and Computational Biology (BCB) is a relatively new multidisciplinary field which brings together many aspects of the fields of biology, computer science, statistics, and engineering. Bioinformatics extracts useful information from biological data and makes these more intuitive and understandable by applying principles of information sciences, while computational biology harnesses computational approaches and technologies to answer biological questions conveniently. Recent years have seen an explosion of the size of biological data at a rate which outpaces the rate of increases in the computational power of mainstream computer technologies, namely general purpose processors (GPPs). The aim of this thesis is to explore the use of off-the-shelf Graphics Processing Unit (GPU) technology in the high performance and efficient implementation of BCB applications in order to meet the demands of biological data increases at affordable cost. The thesis presents detailed design and implementations of GPU solutions for a number of BCB algorithms in two widely used BCB applications, namely biological sequence alignment and phylogenetic analysis. Biological sequence alignment can be used to determine the potential information about a newly discovered biological sequence from other well-known sequences through similarity comparison. On the other hand, phylogenetic analysis is concerned with the investigation of the evolution and relationships among organisms, and has many uses in the fields of system biology and comparative genomics. In molecular-based phylogenetic analysis, the relationship between species is estimated by inferring the common history of their genes and then phylogenetic trees are constructed to illustrate evolutionary relationships among genes and organisms. However, both biological sequence alignment and phylogenetic analysis are computationally expensive applications as their computing and memory requirements grow polynomially or even worse with the size of sequence databases. The thesis firstly presents a multi-threaded parallel design of the Smith- Waterman (SW) algorithm alongside an implementation on NVIDIA GPUs. A novel technique is put forward to solve the restriction on the length of the query sequence in previous GPU-based implementations of the SW algorithm. Based on this implementation, the difference between two main task parallelization approaches (Inter-task and Intra-task parallelization) is presented. The resulting GPU implementation matches the speed of existing GPU implementations while providing more flexibility, i.e. flexible length of sequences in real world applications. It also outperforms an equivalent GPPbased implementation by 15x-20x. After this, the thesis presents the first reported multi-threaded design and GPU implementation of the Gapped BLAST with Two-Hit method algorithm, which is widely used for aligning biological sequences heuristically. This achieved up to 3x speed-up improvements compared to the most optimised GPP implementations. The thesis then presents a multi-threaded design and GPU implementation of a Neighbor-Joining (NJ)-based method for phylogenetic tree construction and multiple sequence alignment (MSA). This achieves 8x-20x speed up compared to an equivalent GPP implementation based on the widely used ClustalW software. The NJ method however only gives one possible tree which strongly depends on the evolutionary model used. A more advanced method uses maximum likelihood (ML) for scoring phylogenies with Markov Chain Monte Carlo (MCMC)-based Bayesian inference. The latter was the subject of another multi-threaded design and GPU implementation presented in this thesis, which achieved 4x-8x speed up compared to an equivalent GPP implementation based on the widely used MrBayes software. Finally, the thesis presents a general evaluation of the designs and implementations achieved in this work as a step towards the evaluation of GPU technology in BCB computing, in the context of other computer technologies including GPPs and Field Programmable Gate Arrays (FPGA) technology.
APA, Harvard, Vancouver, ISO, and other styles
39

Keller, Jens. "Clustering biological data using a hybrid approach : Composition of clusterings from different features." Thesis, University of Skövde, School of Humanities and Informatics, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-1078.

Full text
Abstract:

Clustering of data is a well-researched topic in computer sciences. Many approaches have been designed for different tasks. In biology many of these approaches are hierarchical and the result is usually represented in dendrograms, e.g. phylogenetic trees. However, many non-hierarchical clustering algorithms are also well-established in biology. The approach in this thesis is based on such common algorithms. The algorithm which was implemented as part of this thesis uses a non-hierarchical graph clustering algorithm to compute a hierarchical clustering in a top-down fashion. It performs the graph clustering iteratively, with a previously computed cluster as input set. The innovation is that it focuses on another feature of the data in each step and clusters the data according to this feature. Common hierarchical approaches cluster e.g. in biology, a set of genes according to the similarity of their sequences. The clustering then reflects a partitioning of the genes according to their sequence similarity. The approach introduced in this thesis uses many features of the same objects. These features can be various, in biology for instance similarities of the sequences, of gene expression or of motif occurences in the promoter region. As part of this thesis not only the algorithm itself was implemented and evaluated, but a whole software also providing a graphical user interface. The software was implemented as a framework providing the basic functionality with the algorithm as a plug-in extending the framework. The software is meant to be extended in the future, integrating a set of algorithms and analysis tools related to the process of clustering and analysing data not necessarily related to biology.

The thesis deals with topics in biology, data mining and software engineering and is divided into six chapters. The first chapter gives an introduction to the task and the biological background. It gives an overview of common clustering approaches and explains the differences between them. Chapter two shows the idea behind the new clustering approach and points out differences and similarities between it and common clustering approaches. The third chapter discusses the aspects concerning the software, including the algorithm. It illustrates the architecture and analyses the clustering algorithm. After the implementation the software was evaluated, which is described in the fourth chapter, pointing out observations made due to the use of the new algorithm. Furthermore this chapter discusses differences and similarities to related clustering algorithms and software. The thesis ends with the last two chapters, namely conclusions and suggestions for future work. Readers who are interested in repeating the experiments which were made as part of this thesis can contact the author via e-mail, to get the relevant data for the evaluation, scripts or source code.

APA, Harvard, Vancouver, ISO, and other styles
40

Ranjard, Louis. "Computational biology of bird song evolution." e-Thesis University of Auckland, 2010. http://hdl.handle.net/2292/5719.

Full text
Abstract:
Individuals of a given population share more behavioural traits with each other than with members of other populations. For example, in humans, traditions are specific to regions or countries. These cultural relationships can tell us about the history of the populations, their origin and the amount of exchange between them. In birds, regional dialects have been described in many species. However, the mechanisms with which dialects form in populations is not fully understood because it is difficult to analyse experimentally. Translocated populations, with their known histories, offer an opportunity to study these mechanisms. From the study of bird vocalisations we can make inferences regarding population structure and relationships as well as their history, individual behavioural state, neuronal and physiological mechanisms or development of neuronal learning. Too achieve this, cross-disciplinary approaches are necessary, combining field work, bioacoustic methods, statistical tools such as machine learning, ecological knowledge and phylogenetic methods. Here, I will describe computational methods for the treatment and classification of bird vocalisations and will use them to depict the relationships between bird populations. First, I discretise the data in order to define the cultural traits. Then phylogenetic tree-building methods are used. Two approaches are possible, first to map these traits onto known phylogenies and, second, to directly build the phylogeny of these traits. I describe the application of these methods to test several hypothesis on bird songs evolution related to both their history and the mechanisms with which they evolve. Evidence for the presence of dialects in the Puget Sound white-crowned sparrow (Zonotrichia leucophrys pugetensis) is provided on the basis of the syllable content of the songs. The absence of vocal sexual dimorphism is reported in the Australasian gannet (or takapu, Morus serrator), a member of the Sulidae family for which extensive sexual dimorphism has been reported in other species. Subsequently, convergence between the begging calls of several cuckoo species and their respective hosts is suggested by various bioacoustic methods. In addition, the male calls of the hihi (or stitchbird, Notiomystis cincta) is analysed in an island population. The corresponding pattern of variation suggests a post-dispersal acquisition of calls via learning which is in agreement with the most related species in the revised phylogeny of the hihi. Finally, the mechanisms of song evolution are depicted in translocated populations of tieke (or saddleback, Philesturnus carunculatus rufusater), resulting in the development of island dialects.
APA, Harvard, Vancouver, ISO, and other styles
41

Podowski, Raf M. "Applied bioinformatics for gene characterization /." Stockholm, 2006. http://diss.kib.ki.se/2006/91-7140-818-5/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Weirather, Jason Lee. "Computational approaches to the study of human trypanosomatid infections." Thesis, The University of Iowa, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3609102.

Full text
Abstract:

Trypanosomatids cause human diseases such as leishmaniasis and African trypanosomiasis. Trypanosomatids are protists from the order Trypanosomatida and include species of the genera Trypanosoma and Leishmania, which occupy a similar ecological niche. Both have digenic life-stages, alternating between an insect vector and a range of mammalian hosts. However, the strategies used to subvert the host immune system differ greatly as do the clinical outcome of infections between species. The genomes of both the host and the parasite instruct us about strategies the pathogens use to subvert the human immune system, and adaptations by the human host allowing us to better survive infections. We have applied unsupervised learning algorithms to aid visualization of amino acid sequence similarity and the potential for recombination events within Trypanosoma brucei 's large repertoire of variant surface glycoproteins (VSGs). Methods developed here reveal five groups of VSGs within a single sequenced genome of T. brucei, indicating many likely recombination events occurring between VSGs of the same type, but not between those of different types. These tools and methods can be broadly applied to identify groups of non-coding regulatory sequences within other Trypanosomatid genomes. To aid in the detection, quantification, and species identification of leishmania DNA isolated from environmental or clinical specimens, we developed a set of quantitative-PCR primers and probes targeting a taxonomically and geographically broad spectrum of Leishmania species. This assay has been applied to DNA extracted from both human and canine hosts as well as the sand fly vector, demonstrating its flexibility and utility in a variety of research applications. Within the host genomes, fine mapping SNP analysis was performed to detect polymorphisms in a family study of subjects in a region of Northeast Brazil that is endemic for Leishmania infantum chagasi, the parasite causing visceral leishmaniasis. These studies identified associations between genetic loci and the development of visceral leishmaniasis, with a single polymorphism associated with an asymptomatic outcome after infection. The methods and results presented here have capitalized on the large amount of genomics data becoming available that will improve our understanding of both parasite and host genetics and their role in human disease.

APA, Harvard, Vancouver, ISO, and other styles
43

Nettelblad, Jessica. "Haploid Selection in Animals." Thesis, Uppsala universitet, Evolutionsbiologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-362821.

Full text
Abstract:
Haploid selection in animal sperm is a somewhat controversial topic, but recentevidence might shed experimental light on the matter. This thesis investigates thepossibility to detect any genetic selection in an artificial setting for zebrafish spermfrom a single individual. I analyse pooled data acquired from whole-genomesequencing for two distinct groups of short- and long-lived sperm, trying to identifyshifts in allele frequencies. I augment this by designing an accurate computersimulation of selection, that manipulates selection strength and takes biologicalaspects like linkage and sequence coverage into account. This allows large scaletesting and the generation of null distributions for any test metric. The mainconclusion is that selection has to be extremely strong to be detectable unless onewould explicitly account for genetic linkage, as opposed to the straightforwardper-marker approaches that formed the initial basis for our analyses.
APA, Harvard, Vancouver, ISO, and other styles
44

Chen, Xiaoyu 1974. "Computational detection of tissue-specific cis-regulatory modules." Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=97927.

Full text
Abstract:
A cis-regulatory module (CRM) is a DNA region of a few hundred base pairs that consists of clustering of several transcription factor binding sites and regulates the expression of a nearby gene. This thesis presents a new computational approach to CRM detection.
It is believed that tissue-specific CRMs tend to regulate nearby genes in a certain tissue and that they consist of binding sites for transcription factors (TFs) that are also expressed in that tissue. These facts allow us to make use of tissue-specific gene expression data to detect tissue-specific CRMs and improve the specificity of module prediction.
We build a Bayesian network to integrate the sequence information about TF binding sites and the expression information about TFs and regulated genes. The network is then used to infer whether a given genomic region indeed has regulatory activity in a given tissue. A novel EM algorithm incorporating probability tree learning is proposed to train the Bayesian network in an unsupervised way. A new probability tree learning algorithm is developed to learn the conditional probability distribution for a variable in the network that has a large number of hidden variables as its parents.
Our approach is evaluated using biological data, and the results show that it is able to correctly discriminate among human liver-specific modules, erythroid-specific modules, and negative-control regions, even though no prior knowledge about the TFs and the target genes is employed in our algorithm. In a genome-wide scale, our network is trained to identify tissue-specific CRMs in ten tissues. Some known tissue-specific modules are rediscovered, and a set of novel modules are predicted to be related with tissue-specific expression.
APA, Harvard, Vancouver, ISO, and other styles
45

Jauhiainen, Alexandra. "Evaluation and Development of Methods for Identification of Biochemical Networks." Thesis, Linköping University, The Department of Physics, Chemistry and Biology, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2811.

Full text
Abstract:

Systems biology is an area concerned with understanding biology on a systems level, where structure and dynamics of the system is in focus. Knowledge about structure and dynamics of biological systems is fundamental information about cells and interactions within cells and also play an increasingly important role in medical applications.

System identification deals with the problem of constructing a model of a system from data and an extensive theory of particularly identification of linear systems exists.

This is a master thesis in systems biology treating identification of biochemical systems. Methods based on both local parameter perturbation data and time series data have been tested and evaluated in silico.

The advantage of local parameter perturbation data methods proved to be that they demand less complex data, but the drawbacks are the reduced information content of this data and sensitivity to noise. Methods employing time series data are generally more robust to noise but the lack of available data limits the use of these methods.

The work has been conducted at the Fraunhofer-Chalmers Research Centre for Industrial Mathematics in Göteborg, and at the division of Computational Biology at the Department of Physics and Measurement Technology, Biology, and Chemistry at Linköping University during the autumn of 2004.

APA, Harvard, Vancouver, ISO, and other styles
46

Guturu, Harendra. "Deciphering human gene regulation using computational and statistical methods." Thesis, Stanford University, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3581147.

Full text
Abstract:

It is estimated that at least 10-20% of the mammalian genome is dedicated towards regulating the 1-2% of the genome that codes for proteins. This non-coding, regulatory layer is a necessity for the development of complex organisms, but is poorly understood compared to the genetic code used to translate coding DNA into proteins. In this dissertation, I will discuss methods developed to better understand the gene regulatory layer. I begin, in Chapter 1, with a broad overview of gene regulation, motivation for studying it, the state of the art with a historically context and where to look forward.

In Chapter 2, I discuss a computational method developed to detect transcription factor (TF) complexes. The method compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid transcription factor (TF) complexes. Structural data were integrated to explore overlapping motif arrangements while ensuring physical plausibility of the TF complex. Using this approach, I predicted 422 physically realistic TF complex motifs at 18% false discovery rate (FDR). I found that the set of complexes is enriched in known TF complexes. Additionally, novel complexes were supported by chromatin immunoprecipitation sequencing (ChIP-seq) datasets. Analysis of the structural modeling revealed three cooperativity mechanisms and a tendency of TF pairs to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. The TF complexes and associated binding site predictions are made available as a web resource at http://complex.stanford.edu.

Next, in Chapter 3, I discuss how gene enrichment analysis can be applied to genome-wide conserved binding sites to successfully infer regulatory functions for a given TF complex. A genomic screen predicted 732,568 combinatorial binding sites for 422 TF complex motifs. From these predictions, I inferred 2,440 functional roles, which are consistent with known functional roles of TF complexes. In these functional associations, I found interesting themes such as promiscuous partnering of TFs (such as ETS) in the same functional context (T cells). Additionally, functional enrichment identified two novel TF complex motifs associated with spinal cord patterning genes and mammary gland development genes, respectively. Based on these predictions, I discovered novel spinal cord patterning enhancers (5/9, 56% validation rate) and enhancers active in MCF7 cells (11/19, 53% validation rate). This set replete with thousands of additional predictions will serve as a powerful guide for future studies of regulatory patterns and their functional roles.

Then, in Chapter 4, I outline a method developed to predict disease susceptibility due to gene mis-regulation. The method interrogates ensembles of conserved binding sites of regulatory factors disrupted by an individual's variants and then looks for their most significant congregation next to a group of functionally related genes. Strikingly, when the method is applied to five different full human genomes, the top enriched function for each is reflective of their very different medical histories. These results suggest that erosion of gene regulation results in function specific mutation loads that manifest as disease predispositions in a familial lineage. Additionally, this aggregate analysis method addresses the problem that although many human diseases have a genetic component involving many loci, the majority of studies are statistically underpowered to isolate the many contributing loci.

Finally, I conclude in Chapter 5 with a summary of my findings throughout my research and future directions of research based on my findings.

APA, Harvard, Vancouver, ISO, and other styles
47

Ganesan, Abhishekapriya. "The role of RFX-target genes in neurodevelopmental and psychiatric disorders." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-445493.

Full text
Abstract:
Neurodevelopmental disorders such as autism spectrum disorder (ASD) and psychiatric disorders, for example, schizophrenia (SCZ) represent a large spectrum of disorders that manifest through cognitive and behavioural problems. ASD and SCZ are both highly heritable, and some phenotypic similarities between ASD and SCZ have sparked an interest in understanding their genetic commonalities. The genetics of both disorders exhibit significant heterogeneity. Developments in genomics and systems biology, continually increases people’s understanding of these disorders. Recently, pathogenic genetic variants in the regulatory factor X (RFX) family of transcription factors have been identified in a number of ASD cases. In this thesis, common genetic variants and expression patterns of genes identified to have a conserved promotor X-Box motif region, a binding site of RFX factors, are studied. Significant common variants identified through expression quantitative trait loci (eQTLs) and genome wide association studies (GWAS) are mapped to the regulatory regions of these genes and analysed for putative enrichment. In addition, single-cell RNA sequencing data is utilised to examine enrichment of cell types having high X-Box gene expression in the developing human cortex. Through the study, genes that have eQTLs or SNPs in the genomic regulatory regions of the X-Box genes have been identified. While there were no eQTLs or GWAS SNPs in the X-Box motifs, in the X-Box promoter regions some common variants were found. By hypergeometric distribution testing and the subsequent p-values obtained, all of these distributions are statistically under-enriched. Further, major cell types in the cortical region with increased expression of the X-Box genes and most expressed genes among these enriched cell types have been identified. Among the 11 cell types seven were found to be enriched for X-Box genes and many of the most expressed genes in these cell-types were similar. A further study into the cell types and genes identified, along with additional systems biological data analysis, could reveal a larger list of X-Box genes involved in ASD and SCZ and the specific roles of these genes.
APA, Harvard, Vancouver, ISO, and other styles
48

Antonsson, Elin, William Eulau, Louise Fitkin, Jennifer Johansson, Fredrik Levin, Sara Lundqvist, and Elin Palm. "Framtidens biomarkörer : En prioritering av proteinerna i det humana plasmaproteomet." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-384709.

Full text
Abstract:
In this report, we rank possible protein biomarkers based on different criteria for use in Olink Proteomics’ protein panels. We started off with a list compiled through the Human Plasma Proteome Project (HPPP) and have in different ways used this to obtain the final results. To complete this task we compared the list with Olink’s and its competitors’ protein catalogs, identified diseases beyond Olink’s coverage and the proteins linked with these. We also created a scoring system used to fa- cilitate detection of good biomarkers. From this, we have concluded that Olink should focus on proteins that the competitors have in their catalogs and proteins that can be found in many pathways and are linked with many diseases. From each of the methods used, we have been able to identify a number of proteins that we recommend Olink to investigate further.
APA, Harvard, Vancouver, ISO, and other styles
49

Zacharouli, Markella-Achilleia. "Characterization of immune infiltrate in early breast cancer based on a multiplex imaging method." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-417716.

Full text
Abstract:
Breast cancer is the most common type of cancer among women worldwide. Multiple studies have reported the role of tumor-immune interactions and mechanisms that the immune system uses to combat tumor cells. Therapies based on the immune response are evolving by time, but more research is required to understand and identify the patterns and relationships within the tumor microenvironment. This study aims to characterize immune cell expression patterns using a multiplex method and to investigate the way different subpopulations in breast cancer patients’ tissue samples are correlated with clinicopathological characteristics. The results of this study indicate that there must be an association within immune cell composition and clinicopathological characteristics (Estrogen Receptor Status (ER+/ER-), Progesterone Receptor (PR+/PR-), Grade (I,II,III), which is a way to characterize the cancer cells on how similar they look to normal ones, Menopause, Tumor size, Nodal status, HR status, HER2) but validation in larger patient population is required in order to evaluate the role of the immune infiltration as a predictive / prognostic biomarker in early breast cancer.
APA, Harvard, Vancouver, ISO, and other styles
50

Fernandez, Daniel. "Cell States and Cell Fate: Statistical and Computational Models in (Epi)Genomics." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:14226043.

Full text
Abstract:
This dissertation develops and applies several statistical and computational methods to the analysis of Next Generation Sequencing (NGS) data in order to gain a better understanding of our biology. In the rest of the chapter we introduce key concepts in molecular biology, and recent technological developments that help us better understand this complex science, which, in turn, provide the foundation and motivation for the subsequent chapters. In the second chapter we present the problem of estimating gene/isoform expression at the allelic level, and different models to solve this problem. First, we describe the observed data and the computational workflow to process the data. Next, we propose frequentist and bayesian models motivated by the central dogma of molecular biology and the data generating process (DGP) for RNA-Seq. We develop EM and Gibbs sampling approaches to estimate gene and transcript-specic expression from our proposed models. Finally, we present the performance of our models in simulations and we end with the analysis of experimental RNA-Seq data at the allelic level. In the third chapter we present our paired factorial experimental design to study parentally biased gene/isoform expression in the mouse cerebellum, and dynamic changes of this pattern between young and adult stages of cerebellar development. We present a bayesian variable selection model to estimate the difference in expression between the paternal and maternal genes, while incorporating relevant factors and its interactions into the model. Next, we apply our model to our experimental data, and further on we validate our predictions using pyrosequencing follow-up experiments. We subsequently applied our model to the pyrosequencing data across multiple brain regions. Our method, combined with the validation experiments, allowed us to find novel imprinted genes, and investigate, for the first time, imprinting dynamics across brain regions and across development. In the fourth chapter we move from the controlled-experiments in mouse isogenic lines to the highly variant world of human genetics in observational studies. In this chapter we introduce a Bayesian Regression Allelic Imbalance Model, BRAIM, that estimates the imbalance coming from two major sources: cis-regulation and imprinting. We model the cis-effect as an additive effect for the heterozygous group and we model the parent-of-origin detect with a latent variable that indicates to which parent a given allele belongs. Next, we show the performance of the model under simulation scenarios, and finally we apply the model to several experiments across multiple tissues and multiple individuals. In the fifth chapter we characterize the transcriptional regulation and gene expression of in-vitro Embryonic Stem Cells (ESCs), and two-related in-vivo cells; the Inner Cell Mass (ICM) tissue, and the embryonic tissue at day 6.5. Our objective is two fold. First we would like to understand the differences in gene expression between the ESCs and their in-vivo counterpart from where these cells were derived (ICM). Second, we want to characterize the active transcriptional regulatory regions using several histone modifications and to connect such regulatory activity with gene expression. In this chapter we used several statistical and computational methods to analyze and visualize the data, and it provides a good showcase of how combining several methods of analysis we can delve into interesting developmental biology.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography