Dissertations / Theses: 'Multiple alignment'

1

Starrett, Dean. "Optimal Alignment of Multiple Sequence Alignments." Diss., The University of Arizona, 2008. http://hdl.handle.net/10150/194840.

Full text

Abstract:

An essential tool in biology is the alignment of multiple sequences. Biologists use multiple sequence alignments for tasks such as predicting protein structure and function, reconstructing phylogenetic trees, and finding motifs. Constructing high-quality multiple alignments is computationally hard, both in theory and in practice, and is typically done using heuristic methods. The majority of state-of-the-art multiple alignment programs employ a form and polish strategy, where in the construction phase, an initial multiple alignment is formed by progressively merging smaller alignments, starting with single sequences. Then in a local-search phase, the resulting alignment is polished by repeatedly splitting it into smaller alignments and re-merging. This merging of alignments, the basic computational problem in the construction and local-search phases of the best multiple alignment heuristics, is called the Aligning Alignments Problem. Under the sum-of-pairs objective for scoring multiple alignments, this problem may seem to be a simple extension of two-sequence alignment. It is proven here, however, that with affine gap costs (which are recognized as necessary to get biologically-informative alignments) the problem is NP-complete when gaps are counted exactly. Interestingly, this form of multiple alignment is polynomial-time solvable when we relax the exact count, showing that exact gap counts themselves are inherently hard in multiple sequence alignment. Unlike general multiple alignment however, we show that Aligning Alignments with affine gap costs and exact counts is tractable in practice, by demonstrating an effective algorithm and a fast implementation. Our software AlignAlign is both time- and space-efficient on biological data. Computational experiments on biological data show instances derived from standard benchmark suites can be optimally aligned with surprising efficiency, and experiments on simulated data show the time and space both scale well.

APA, Harvard, Vancouver, ISO, and other styles

2

Sammeth, Michael. "Integrated multiple sequence alignment." [S.l.] : [s.n.], 2005. http://deposit.ddb.de/cgi-bin/dokserv?idn=98148767X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Auer, Jens. "Metaheuristic Multiple Sequence Alignment Optimisation." Thesis, University of Skövde, School of Humanities and Informatics, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-899.

Full text

Abstract:

The ability to tackle NP-hard problems has been greatly extended by the introduction of Metaheuristics (see Blum & Roli (2003)) for a summary of most Metaheuristics, general problem-independent optimisation algorithms extending the hill-climbing local search approach to escape local minima. One of these algorithms is Iterated Local Search (ILS) (Lourenco et al., 2002; Stützle, 1999a, p. 25ff), a recent easy to implement but powerful algorithm with results comparable or superior to other state-of-the-art methods for many combinatorial optimisation problems, among them the Traveling Salesman (TSP) and Quadratic Assignment Problem (QAP). ILS iteratively samples local minima by modifying the current local minimum and restarting

a local search porcedure on this modified solution. This thesis will show how ILS can be implemented for MSA. After that, ILS will be evaluated and compared to other MSA algorithms by BAliBASE (Thomson et al., 1999), a set of manually refined alignments used in most recent publications of algorithms and in at least two MSA algorithm surveys. The runtime-behaviour will be evaluated using runtime-distributions.

The quality of alignments produced by ILS is at least as good as the best algorithms available and significantly superiour to previously published Metaheuristics for MSA, Tabu Search and Genetic Algorithm (SAGA). On the average, ILS performed best in five out of eight test cases, second for one test set and third for the remaining two. A drawback of all iterative methods for MSA is the long runtime needed to produce good alignments. ILS needs considerably less runtime than Tabu Search and SAGA, but can not compete with progressive or consistency based methods, e. g. ClustalW or T-COFFEE.

APA, Harvard, Vancouver, ISO, and other styles

4

Siu, Wing-yan, and 蕭穎欣. "Multiple structural alignment for proteins." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2008. http://hub.hku.hk/bib/B4068748X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Siu, Wing-yan. "Multiple structural alignment for proteins." Click to view the E-thesis via HKUTO, 2008. http://sunzi.lib.hku.hk/hkuto/record/B4068748X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Carroll, Hyrum D. "Biologically Relevant Multiple Sequence Alignment." Diss., CLICK HERE for online access, 2008. http://contentdm.lib.byu.edu/ETD/image/etd2623.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Nguyen, Ken D. "Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations." Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/cs_diss/62.

Full text

Abstract:

Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences' structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm.

APA, Harvard, Vancouver, ISO, and other styles

8

Zola, Jaroslaw. "Parallel server for multiple sequence alignment." Grenoble INPG, 2005. http://www.theses.fr/2005INPG0187.

Full text

Abstract:

Dans ce travail réalisé au cours du Doctorat, nous nous proposons d'étudier des techniques du calcul parallèle (en particulier, les caches web) pour optimiser l'alignement multiple de séquences. Nous développons une méthode générique pour gérer un cache local ou distribué et nous présentons un système de cache décentralisé gardant en mémoire les résultats intermédiaires ainsi que les alignements de séquences. Enfin, nous construisons un serveur parallèle utilisant les techniques précédentes permettant d'aligner plus rapidement des ensembles de séquences de grandes tailles (des milliers de séquences composées elles-mêmes de milliers depaires de bases). Ce serveur est basé sur un algorithme PhylTree, développé au Laboratoire ID-IMAG, qui est un schéma générique qui permet de construire simultanément l'alignement et la phylogénie. Le système de cache a été implémenté, le logiciel est disponible et a été utilisé en dehors du laboratoire pour plusieurs autres applications. Finalement, nous avons proposé également quelques extensions à PhylTree, comme par exemple l'utilisation du recuit simulé pour améliorer l'efficacité de l'analyse phylogénétique
Ln this work we investigate application of parallel processing and web-caching as a method to improve the efficiency of multiple sequence alignment. We develop a generic framework for distributed and local cache implementation, and we design decentralised caching system storing intermediate results of sequence alignment. Finally, we create a parallel server for multiple sequence alignment which utilises above techniques to speedup processing of large sequence sets. The server is based on the PhylTree method which is a generic scheme for multiple sequence alignment with simultaneous phylogeny, developed in the Laboratory ID-IMAG. Ln our work we propose also sorne extensions of PhylTree, like for example the application of simulated annealing to improve the efficiency of phylogenetic analysis

APA, Harvard, Vancouver, ISO, and other styles

9

DeBlasio, Daniel Frank. "Parameter Advising for Multiple Sequence Alignment." Diss., The University of Arizona, 2016. http://hdl.handle.net/10150/612932.

Full text

Abstract:

The problem of aligning multiple protein sequences is essential to many biological analyses, but most standard formulations of the problem are NP-complete. Due to both the difficulty of the problem and its practical importance, there are many heuristic multiple sequence aligners that a researcher has at their disposal. A basic issue that frequently arises is that each of these alignment tools has a multitude of parameters that must be set, and which greatly affect the quality of the alignment produced. Most users rely on the default parameter setting that comes with the aligner, which is optimal on average, but can produce a low-quality alignment for the given inputs. This dissertation develops an approach called parameter advising to find a parameter setting that produces a high-quality alignment for each given input. A parameter advisor aligns the input sequences for each choice in a collection of parameter settings, and then selects the best alignment from the resulting alignments produced. A parameter advisor has two major components: (i) an advisor set of parameter choices that are given to the aligner, and (ii) an accuracy estimator that is used to rank alignments produced by the aligner. Alignment accuracy is measured with respect to a known reference alignment, in practice a reference alignment is not available, and we can only estimate accuracy. We develop a new accuracy estimator that we call called Facet (short for "feature-based accuracy estimator") that computes an accuracy estimate as a linear combination of efficiently-computable feature functions, whose coefficients are learned by solving a large scale linear programming problem. We also develop an efficient approximation algorithm for finding an advisor set of a given cardinality for a fixed estimator, whose cardinality should ideally small, as the aligner is invoked for each parameter choice in the set. Using Facet for parameter advising boosts advising accuracy by almost 20% beyond using a single default parameter choice for the hardest-to-align benchmarks. This dissertation further applies parameter advising in two ways: (i) to ensemble alignment, which uses the advising process on a collection of aligners to choose both the aligner and its parameter settings, and (ii) to adaptive local realignment, which can align different regions of the input sequences with distinct parameter choices to conform to mutation rates as they vary across the lengths of the sequences.

APA, Harvard, Vancouver, ISO, and other styles

10

Guasco, Luciano M. "Multiple sequence alignment correction using constraints." Master's thesis, Faculdade de Ciências e Tecnologia, 2010. http://hdl.handle.net/10362/5143.

Full text

Abstract:

Trabalho apresentado no âmbito do European Master in Computational Logics, como requisito parcial para obtenção do grau de Mestre em Computational Logics
One of the most important fields in bioinformatics has been the study of protein sequence alignments. The study of homologous proteins, related by evolution, shows the conservation of many amino acids because of their functional and structural importance. One particular relationship between the amino acid sites in the same sequence or between different sequences, is protein-coevolution, interest in which has increased as a consequence of mathematical and computational methods used to understand the spatial, functional and evolutionary dependencies between amino acid sites. The principle of coevolution means that some amino acids are related through evolution because mutations in one site can create evolutionary pressures to select compensatory mutations in other sites that are functionally or structurally related. With the actual methods to detect coevolution, specifically mutual information techniques from the information theory field, we show in this work that much of the information between coevolved sites is lost because of mistakes in the multiple sequence alignment of variable regions. Moreover, we show that using these statistical methods to detect coevolved sites in multiple sequence alignments results in a high rate of false positives. Due to the amount of errors in the detection of coevolved site from multiple sequence alignments, we propose in this work a method to improve the detection efficacy of coevolved sites and we implement an algorithm to fix such sites correcting the misalignment produced in those specific locations. The detection part of our work is based on the mutual information between sites that are guessed as having coevolved, due to their high statistical correlation score. With this information we search for possible misalignments on those regions due to the incorrect matching of amino acids during the alignment. The re-alignment part is based on constraint programming techniques, to avoid the combinatorial complexity when one amino acid can be aligned with many others and to avoid inconsistencies in the alignments. In this work, we present a framework to impose constraints over the sequences, and we show how it is possible to compute alignments based on different criteria just by setting constraint between the amino acids. This framework can be applied not only for improving the alignment and detection of coevolved regions, but also to any desired constraints that may be used to express functional or structural relations among the amino acids in multiple sequences. We show also that after we fix these misalignments, using constraints based techniques, the correlation between coevolved sites increases and, in general, the new alignment is closer to the correct alignment than the MSA alignment. Finally, we show possible future research lines with the objective of overcoming some drawbacks detected during this work.

APA, Harvard, Vancouver, ISO, and other styles

11

Ye, Yongtao, and 叶永滔. "Aligning multiple sequences adaptively." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2014. http://hdl.handle.net/10722/206465.

Full text

Abstract:

With the rapid development of genome sequencing, an ever-increasing number of molecular biology analyses rely on the construction of an accurate multiple sequence alignment (MSA), such as motifs detection, phylogeny inference and structure prediction. Although many methods have been developed during the last two decades, most of them may perform poorly on some types of inputs, in particular when families of sequences fall below thirty percent similarity. Therefore, this thesis introduced two different effective approaches to improve the overall quality of multiple sequence alignment. First, by considering the similarity of the input sequences, we proposed an adaptive approach to compute better substitution matrices for each pair of sequences, and then apply the progressive alignment method to align them. For example, for inputs with high similarity, we consider the whole sequences and align them with global pair-Hidden Markov model, while for those with moderate low similarity, we may ignore the ank regions and use some local pair-Hidden Markov models to align them. To test the effectiveness of this approach, we have implemented a multiple sequence alignment tool called GLProbs and compared its performance with one dozen leading tools on three benchmark alignment databases, and GLProbs' alignments have the best scores in almost all testings. We have also evaluated the practicability of the alignments of GLProbs by applying the tool to three biological applications, namely phylogenetic tree reconstruction, protein secondary structure prediction and the detection of high risk members for cervical cancer in the HPV-E6 family, and the results are very encouraging. Second, based on our previous study, we proposed another new tool PnpProbs, which constructs better multiple sequence alignments by better handling of guide trees. It classifies input sequences into two types: normally related sequences and distantly related sequences. For normally related sequences, it uses an adaptive approach to construct the guide tree, and based on this guide tree, aligns the sequences progressively. To be more precise, it first estimates the input's discrepancy by computing the standard deviation of their percent identities, and based on this estimate, it chooses the best method to construct the guide tree. For distantly related sequences, PnpProbs abandons the guide tree; instead it uses the non-progressive sequence annealing method to construct the multiple sequence alignment. By combining the strength of the progressive and non-progressive methods, and with a better way to construct the guide tree, PnpProbs improves the quality of multiple sequence alignments significantly for not only general input sequences, but also those very distantly related. With those encouraging empirical results, our developed software tools have been appreciated by the community gradually. For example, GLProbs has been invited and incorporated into the JAva Bioinformatics Analysis Web Services system (JABAWS).
published_or_final_version
Computer Science
Master
Master of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

12

Lu, Yue. "Improving the quality of multiple sequence alignment." [College Station, Tex. : Texas A&M University, 2008. http://hdl.handle.net/1969.1/ETD-TAMU-3111.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Rodrigues, Kaio Wagner Lima, and 92991221146. "Removing DUST using multiple alignment of sequences." Universidade Federal do Amazonas, 2016. https://tede.ufam.edu.br/handle/tede/6557.

Full text

Abstract:

Submitted by Kaio Wagner Lima Rodrigues (kaiowagner@gmail.com) on 2018-08-23T05:45:00Z No. of bitstreams: 3 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) kaio-tese.pdf: 3615178 bytes, checksum: dc547b203670c1159f46136e021a4825 (MD5) kaio-folha-de-aprovacao.jpg: 3343904 bytes, checksum: b00e5c4807f5a7e10eddc2eed2de5f12 (MD5)
Approved for entry into archive by Secretaria PPGI (secretariappgi@icomp.ufam.edu.br) on 2018-08-23T19:08:57Z (GMT) No. of bitstreams: 3 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) kaio-tese.pdf: 3615178 bytes, checksum: dc547b203670c1159f46136e021a4825 (MD5) kaio-folha-de-aprovacao.jpg: 3343904 bytes, checksum: b00e5c4807f5a7e10eddc2eed2de5f12 (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2018-08-24T13:43:58Z (GMT) No. of bitstreams: 3 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) kaio-tese.pdf: 3615178 bytes, checksum: dc547b203670c1159f46136e021a4825 (MD5) kaio-folha-de-aprovacao.jpg: 3343904 bytes, checksum: b00e5c4807f5a7e10eddc2eed2de5f12 (MD5)
Made available in DSpace on 2018-08-24T13:43:58Z (GMT). No. of bitstreams: 3 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) kaio-tese.pdf: 3615178 bytes, checksum: dc547b203670c1159f46136e021a4825 (MD5) kaio-folha-de-aprovacao.jpg: 3343904 bytes, checksum: b00e5c4807f5a7e10eddc2eed2de5f12 (MD5) Previous issue date: 2016-09-21
FAPEAM - Fundação de Amparo à Pesquisa do Estado do Amazonas
A large number of URLs collected by web crawlers correspond to pages with duplicate or near-duplicate contents. These duplicate URLs, generically known as DUST (Different URLs with Similar Text), adversely impact search engines since crawling, storing and using such data imply waste of resources, the building of low quality rankings and poor user experiences. To deal with this problem, several studies have been proposed to detect and remove duplicate documents without fetching their contents. To accomplish this, the proposed methods learn normalization rules to transform all duplicate URLs into the same canonical form. This information can be used by crawlers to avoid fetching DUST. A challenging aspect of this strategy is to efficiently derive the minimum set of rules that achieve larger reductions with the smallest false positive rate. As most methods are based on pairwise analysis, the quality of the rules is affected by the criterion used to select the examples and the availability of representative examples in the training sets. To avoid processing large numbers of URLs, they employ techniques such as random sampling or by looking for DUST only within sites, preventing the generation of rules involving multiple DNS names. As a consequence of these issues, current methods are very susceptible to noise and, in many cases, derive rules that are very specific. In this thesis, we present a new approach to derive quality rules that take advantage of a multi-sequence alignment strategy. We demonstrate that a full multi-sequence alignment of URLs with duplicated content, before the generation of the rules, can lead to the deployment of very effective rules. Experimental results demonstrate that our approach achieved larger reductions in the number of duplicate URLs than our best baseline in two different web collections, in spite of being much faster. We also present a distributed version of our method, using the MapReduce framework, and demonstrate its scalability by evaluating it using a set of 7.37 million URLs.
Um grande número de URLs obtidas por coletores corresponde a páginas com conteúdo duplicado ou quase duplicado, conhecidas em Inglês pelo acrônimo DUST, que pode ser traduzido como Diferentes URLs com Texto Similar. DUST são prejudiciais para sistemas de busca porque ao serem coletadas, armazenadas e utilizadas, contribuem para o desperdício de recursos, a criação de rankings de baixa qualidade e, consequentemente, uma experiência pior para o usuário. Para lidar com este problema, muita pesquisa tem sido realizada com intuito de detectar e remover DUST antes mesmo de coletar as URLs. Para isso, esses métodos se baseiam no aprendizado de regras de normalização que transformam todas as URLs com conteúdo duplicado para uma mesma forma canônica. Tais regras podem ser então usadas por coletores com o intuito de reconhecer e ignorar DUST. Para isto, é necessário derivar, de forma eficiente, um conjunto mínimo de regras que alcance uma grande taxa de redução com baixa incidência de falsos-positivos. Como a maioria dos métodos propostos na literatura é baseada na análise de pares, a qualidade das regras é afetada pelo critério usado para selecionar os exemplos de pares e a disponibilidade de exemplos representativos no treino. Para evitar processar um número muito alto de exemplos, em geral, são aplicadas técnicas de amostragem ou a busca por DUST é limitada apenas a sites, o que impede a geração de regras que envolvam diferentes nomes de DNS. Como consequência, métodos atuais são muito suscetíveis a ruído e, em muitos casos, derivam regras muito específicas. Nesta tese, é proposta uma nova técnica para derivar regras, baseada em uma estratégia de alinhamento múltiplo de sequências. Em particular, mostramos que um alinhamento prévio das URLs com conteúdo duplicado contribui para uma melhor generalização, o que resulta na geração de regras mais efetivas. Através de experimentos em duas diferentes coleções extraídas da Web, observa-se que a técnica proposta, além de ser mais rápida, filtra um número maior de URLs duplicadas. Uma versão distribuída do método, baseada na arquitetura MapReduce, proporciona a possibilidade de escalabilidade para coleções com dimensões compatíveis com a Web.

APA, Harvard, Vancouver, ISO, and other styles

14

Rausch, Tobias [Verfasser]. "Dissecting multiple sequence alignment methods : the analysis, design and development of generic multiple sequence alignment components in SeqAn / Tobias Rausch." Berlin : Freie Universität Berlin, 2010. http://d-nb.info/1024541460/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Lightner, Carin Ann. "A Tabu Search Approach to Multiple Sequence Alignment." NCSU, 2008. http://www.lib.ncsu.edu/theses/available/etd-05312008-191232/.

Full text

Abstract:

Sequence alignment methods are used to detect and quantify similarities between different DNA and protein sequences that may have evolved from a common ancestor. Effective sequence alignment methodologies also provide insight into the structure function of a sequence and are the first step in constructing evolutionary trees. In this dissertation, we use a tabu search approach to multiple sequence alignment. A tabu search is a heuristic approach that uses adaptive memory features to align multiple sequences. The adaptive memory feature, a tabu list, helps the search process avoid local optimal solutions and explores the solution space in an efficient manner. We develop two main tabu searches that progressively align sequences. A randomly generated bifurcating tree guides the alignment. The objective is to optimize the alignment score computed using either the sum of pairs or parsimony scoring function. The use of a parsimony scoring function provides insight into the homology between sequences in the alignment. We also explore iterative refinement techniques such as a hidden Markov model and an intensification heuristic to further improve the alignment. This approach to multiple sequence alignment provides improved alignments as compared to several other methods.

APA, Harvard, Vancouver, ISO, and other styles

16

Zhang, Xiaodong. "A Local Improvement Algorithm for Multiple Sequence Alignment." Ohio University / OhioLINK, 2003. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1049485762.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Sun, Hong. "DETECTING MULTIPLE PROTEIN FOLDING TRAJECTORIES AND STRUCTURAL ALIGNMENT." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1319744262.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Floden, Evan 1985. "Alignment uncertainty, regressive alignment and large scale deployment." Doctoral thesis, Universitat Pompeu Fabra, 2018. http://hdl.handle.net/10803/665300.

Full text

Abstract:

A multiple sequence alignment (MSA) provides a description of the relationship between biological sequences where columns represent a shared ancestry through an implied set of evolutionary events. The majority of research in the field has focused on improving the accuracy of alignments within the progressive alignment framework and has allowed for powerful inferences including phylogenetic reconstruction, homology modelling and disease prediction. Notwithstanding this, when applied to modern genomics datasets - often comprising tens of thousands of sequences - new challenges arise in the construction of accurate MSA. These issues can be generalised to form three basic problems. Foremost, as the number of sequences increases, progressive alignment methodologies exhibit a dramatic decrease in alignment accuracy. Additionally, for any given dataset many possible MSA solutions exist, a problem which is exacerbated with an increasing number of sequences due to alignment uncertainty. Finally, technical difficulties hamper the deployment of such genomic analysis workflows - especially in a reproducible manner - often presenting a high barrier for even skilled practitioners. This work aims to address this trifecta of problems through a web server for fast homology extension based MSA, two new methods for improved phylogenetic bootstrap supports incorporating alignment uncertainty, a novel alignment procedure that improves large scale alignments termed regressive MSA and finally a workflow framework that enables the deployment of large scale reproducible analyses across clusters and clouds titled Nextflow. Together, this work can be seen to provide both conceptual and technical advances which deliver substantial improvements to existing MSA methods and the resulting inferences.
Un alineament de seqüència múltiple (MSA) proporciona una descripció de la relació entre seqüències biològiques on les columnes representen una ascendència compartida a través d'un conjunt implicat d'esdeveniments evolutius. La majoria de la investigació en el camp s'ha centrat a millorar la precisió dels alineaments dins del marc d'alineació progressiva i ha permès inferències poderoses, incloent-hi la reconstrucció filogenètica, el modelatge d'homologia i la predicció de malalties. Malgrat això, quan s'aplica als conjunts de dades de genòmica moderns, que sovint comprenen desenes de milers de seqüències, sorgeixen nous reptes en la construcció d'un MSA precís. Aquests problemes es poden generalitzar per formar tres problemes bàsics. En primer lloc, a mesura que augmenta el nombre de seqüències, les metodologies d'alineació progressiva presenten una disminució espectacular de la precisió de l'alineació. A més, per a un conjunt de dades, existeixen molts MSA com a possibles solucions un problema que s'agreuja amb un nombre creixent de seqüències a causa de la incertesa d'alineació. Finalment, les dificultats tècniques obstaculitzen el desplegament d'aquests fluxos de treball d'anàlisi genòmica, especialment de manera reproduïble, sovint presenten una gran barrera per als professionals fins i tot qualificats. Aquest treball té com a objectiu abordar aquesta trifecta de problemes a través d'un servidor web per a l'extensió ràpida d'homologia basada en MSA, dos nous mètodes per a la millora de l'arrencada filogenètica permeten incorporar incertesa d'alineació, un nou procediment d'alineació que millora els alineaments a gran escala anomenat MSA regressivu i, finalment, un marc de flux de treball permet el desplegament d'anàlisis reproduïbles a gran escala a través de clústers i computació al núvol anomenat Nextflow. En conjunt, es pot veure que aquest treball proporciona tant avanços conceptuals com tècniques que proporcionen millores substancials als mètodes MSA existents i les conseqüències resultants.

APA, Harvard, Vancouver, ISO, and other styles

19

Zhang, Ching. "Genetic algorithm approaches for efficient multiple molecular sequence alignment." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape17/PQDD_0013/NQ30660.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Zhou, Rong. "Memory-efficient graph search applied to multiple sequence alignment." Diss., Mississippi State : Mississippi State University, 2005. http://library.msstate.edu/etd/show.asp?etd=etd-06282005-015428.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Lloyd, G. Scott. "Accelerated Large-Scale Multiple Sequence Alignment with Reconfigurable Computing." BYU ScholarsArchive, 2011. https://scholarsarchive.byu.edu/etd/2729.

Full text

Abstract:

Multiple Sequence Alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. The time to compute an optimal MSA grows exponentially with respect to the number of sequences. Consequently, producing timely results on large problems requires more efficient algorithms and the use of parallel computing resources. Reconfigurable computing hardware provides one approach to the acceleration of biological sequence alignment. Other acceleration methods typically encounter scaling problems that arise from the overhead of inter-process communication and from the lack of parallelism. Reconfigurable computing allows a greater scale of parallelism with many custom processing elements that have a low-overhead interconnect. The proposed parallel algorithms and architecture accelerate the most computationally demanding portions of MSA. An overall speedup of up to 150 has been demonstrated on a large data set when compared to a single processor. The reduced runtime for MSA allows researchers to solve the larger problems that confront biologists today.

APA, Harvard, Vancouver, ISO, and other styles

22

Herman, Joseph L. "Multiple sequence analysis in the presence of alignment uncertainty." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:88a56d9f-a96e-48e3-b8dc-a73f3efc8472.

Full text

Abstract:

Sequence alignment is one of the most intensely studied problems in bioinformatics, and is an important step in a wide range of analyses. An issue that has gained much attention in recent years is the fact that downstream analyses are often highly sensitive to the specific choice of alignment. One way to address this is to jointly sample alignments along with other parameters of interest. In order to extend the range of applicability of this approach, the first chapter of this thesis introduces a probabilistic evolutionary model for protein structures on a phylogenetic tree; since protein structures typically diverge much more slowly than sequences, this allows for more reliable detection of remote homologies, improving the accuracy of the resulting alignments and trees, and reducing sensitivity of the results to the choice of dataset. In order to carry out inference under such a model, a number of new Markov chain Monte Carlo approaches are developed, allowing for more efficient convergence and mixing on the high-dimensional parameter space. The second part of the thesis presents a directed acyclic graph (DAG)-based approach for representing a collection of sampled alignments. This DAG representation allows the initial collection of samples to be used to generate a larger set of alignments under the same approximate distribution, enabling posterior alignment probabilities to be estimated reliably from a reasonable number of samples. If desired, summary alignments can then be generated as maximum-weight paths through the DAG, under various types of loss or scoring functions. The acyclic nature of the graph also permits various other types of algorithms to be easily adapted to operate on the entire set of alignments in the DAG. In the final part of this work, methodology is introduced for alignment-DAG-based sequence annotation using hidden Markov models, and RNA secondary structure prediction using stochastic context-free grammars. Results on test datasets indicate that the additional information contained within the DAG allows for improved predictions, resulting in substantial gains over simply analysing a set of alignments one by one.

APA, Harvard, Vancouver, ISO, and other styles

23

Ahmed, Nova. "Parallel Algorithm for Memory Efficient Pairwise and Multiple Genome Alignment in Distributed Environment." Digital Archive @ GSU, 2004. http://digitalarchive.gsu.edu/cs_theses/2.

Full text

Abstract:

The genome sequence alignment problems are very important ones from the computational biology perspective. These problems deal with large amount of data which is memory intensive as well as computation intensive. In the literature, two separate algorithms have been studied and improved – one is a Pairwise sequence alignment algorithm which aligns pairs of genome sequences with memory reduction and parallelism for the computation and the other one is the multiple sequence alignment algorithm that aligns multiple genome sequences and this algorithm is also parallelized efficiently so that the workload of the alignment program is well distributed. The parallel applications can be launched on different environments where shared memory is very well suited for these kinds of applications. But shared memory environment has the limitation of memory usage as well as scalability also these machines are very costly. A better approach is to use the cluster of computers and the cluster environment can be further enhanced to a grid environment so that the scalability can be improved introducing multiple clusters. Here the grid environment is studied as well as the shared memory and cluster environment for the two applications. It can be stated that for carefully designed algorithms the grid environment is comparable for its performance to other distributed environments and it sometimes outperforms the others in terms of the limitations of resources the other distributed environments have.

APA, Harvard, Vancouver, ISO, and other styles

24

Höhl, Michael. "Is multiple sequence alignment required for accurate inference of phylogeny? /." [St. Lucia, Qld.], 2006. http://www.library.uq.edu.au/pdfserve.php?image=thesisabs/absthe19790.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

DeBlasio, Daniel. "NEW COMPUTATIONAL APPROACHES FOR MULTIPLE RNA ALIGNMENT AND RNA SEARCH." Master's thesis, University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4070.

Full text

Abstract:

In this thesis we explore the the theory and history behind RNA alignment. Normal sequence alignments as studied by computer scientists can be completed in $O(n^2)$ time in the naive case. The process involves taking two input sequences and finding the list of edits that can transform one sequence into the other. This process is applied to biology in many forms, such as the creation of multiple alignments and the search of genomic sequences. When you take into account the RNA sequence structure the problem becomes even harder. Multiple RNA structure alignment is particularly challenging because covarying mutations make sequence information alone insufficient. Existing tools for multiple RNA alignments first generate pair-wise RNA structure alignments and then build the multiple alignment using only the sequence information. Here we present PMFastR, an algorithm which iteratively uses a sequence-structure alignment procedure to build a multiple RNA structure alignment. PMFastR also has low memory consumption allowing for the alignment of large sequences such as 16S and 23S rRNA. Specifically, we reduce the memory consumption to $\sim O(band^2*m)$ where $band$ is the banding size. Other solutions are $\sim O(n^2*m)$ where $n$ and $m$ are the lengths of the target and query respectively. The algorithm also provides a method to utilize a multi-core environment. We present results on benchmark data sets from BRAliBase, which shows PMFastR outperforms other state-of-the-art programs. Furthermore, we regenerate 607 Rfam seed alignments and show that our automated process creates similar multiple alignments to the manually-curated Rfam seed alignments. While these methods can also be applied directly to genome sequence search, the abundance of new multiple species genome alignments presents a new area for exploration. Many multiple alignments of whole genomes are available and these alignments keep growing in size. These alignments can provide more information to the searcher than just a single sequence. Using the methodology from sequence-structure alignment we developed AlnAlign, which searches an entire genome alignment using RNA sequence structure. While programs have been readily available to align alignments, this is the first to our knowledge that is specifically designed for RNA sequences. This algorithm is presented only in theory and is yet to be tested.
M.S.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science MS

APA, Harvard, Vancouver, ISO, and other styles

26

Sarver, Michael. "STRUCTURE-BASED MULTIPLE RNA SEQUENCE ALIGNMENT AND FINDING RNA MOTIFS." Bowling Green State University / OhioLINK, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1151076710.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

DeBlasio, Dan, and John Kececioglu. "Core column prediction for protein multiple sequence alignments." BIOMED CENTRAL LTD, 2017. http://hdl.handle.net/10150/623957.

Full text

Abstract:

Background: In a computed protein multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference alignment are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the known three-dimensional structures of the proteins. Typically the accuracy of a protein multiple sequence alignment that has been computed for a benchmark is only measured with respect to the core columns of the reference alignment. When computing an alignment in practice, however, a reference alignment is not known, so the coreness of its columns can only be predicted. Results: We develop for the first time a predictor of column coreness for protein multiple sequence alignments. This allows us to predict which columns of a computed alignment are core, and hence better estimate the alignment's accuracy. Our approach to predicting coreness is similar to nearest-neighbor classification from machine learning, except we transform nearest-neighbor distances into a coreness prediction via a regression function, and we learn an appropriate distance function through a new optimization formulation that solves a large-scale linear programming problem. We apply our coreness predictor to parameter advising, the task of choosing parameter values for an aligner's scoring function to obtain a more accurate alignment of a specific set of sequences. We show that for this task, our predictor strongly outperforms other column-confidence estimators from the literature, and affords a substantial boost in alignment accuracy.

APA, Harvard, Vancouver, ISO, and other styles

28

He, Jintai. "MULTIPLE SEQUENCES ALIGNMENT FOR PHYLOGENETIC TREE CONSTRUCTION USING GRAPHICS PROCESSING UNITS." Available to subscribers only, 2008. http://proquest.umi.com/pqdweb?did=1674095441&sid=1&Fmt=2&clientId=1509&RQT=309&VName=PQD.

Full text

Abstract:

Thesis (M.S.)--Southern Illinois University Carbondale, 2008.
"Department of Computer Science." Keywords: GPU computing, Sequence alignment. Includes bibliographical references (p. 34). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

29

Kemena, Carsten 1983. "Improving the accuracy and the efficiency of multiple sequence alignment methods." Doctoral thesis, Universitat Pompeu Fabra, 2012. http://hdl.handle.net/10803/128678.

Full text

Abstract:

Sequence alignment is one of the basic methods to compare biological sequences and the cornerstone of a wide range of different analyses. Due to this privileged position at the beginning of many studies its accuracy is of great importance, in fact, each result based on an alignment is depending on the alignment quality. This has been confirmed in several recent papers investigating the effect of alignment methods on phylogenetic reconstruction and the estimation of positive selection. In this thesis, I present several projects dedicated to the problem of developing more accurate multiple sequence alignments and how to evaluate them. I addressed the problem of structural protein alignment evaluation, the accurate structural alignment of RNA sequences and the alignment of large sequence data sets.
El alineamiento es uno de los métodos básicos en la comparación de secuencias biológicas, y a menudo el primer pasó en análisis posteriores. Por su posición privilegiada al principio de muchos estudios, la calidad del alineamiento es de gran importancia, de hecho cada resultado basado en un alineamiento depende en gran medida de la calidad de ´este. Este hecho se ha confirmado en diversos artículos recientes, en los cuales se ha investigado los efectos de la elección del método de alineamiento en la reconstrucción filogenética y la estimación de la selección positiva. En esta tesis, presento varios proyectos enfocados en la implementación de mejoras tanto en los métodos de alineamiento múltiple de secuencias como en la evaluación de estos. Concretamente, he tratado problemas como la evaluación de alineamientos estructurales de proteínas, la construcción de alineamientos estructurales y precisos de ARN y también el alineamiento de grandes conjuntos de secuencias.

APA, Harvard, Vancouver, ISO, and other styles

30

Elias, Isaac. "Computational problems in evolution : Multiple alignment, genome rearrangements, and tree reconstruction." Doctoral thesis, KTH, Numerical Analysis and Computer Science, NADA, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4170.

Full text

Abstract:

Reconstructing the evolutionary history of a set of species is a fundamental problem in biology. This thesis concerns computational problems that arise in different settings and stages of phylogenetic tree reconstruction, but also in other contexts. The contributions include:

• A new distance-based tree reconstruction method with optimal reconstruction radius and optimal runtime complexity. Included in the result is a greatly simplified proof that the NJ algorithm also has optimal reconstruction radius. (co-author Jens Lagergren)

• NP-hardness results for the most common variations of Multiple Alignment. In particular, it is shown that SP-score, Star Alignment, and Tree Alignment, are NP hard for all metric symbol distances over all binary or larger alphabets.

• A 1.375-approximation algorithm for Sorting By Transpositions (SBT). SBT is the problem of sorting a permutation using as few block-transpositions as possible. The complexity of this problem is still open and it was a ten-year-old open problem to improve the best known 1.5-approximation ratio. The 1.375-approximation algorithm is based on a new upper bound on the diameter of 3-permutations. Moreover, a new lower bound on the transposition diameter of the symmetric group is presented and the exact transposition diameter of simple permutations is determined. (co-author Tzvika Hartman)

• Approximation, fixed-parameter tractable, and fast heuristic algorithms for two variants of the Ancestral Maximum Likelihood (AML) problem: when the phylogenetic tree is known and when it is unknown. AML is the problem of reconstructing the most likely genetic sequences of extinct ancestors along with the most likely mutation probabilities on the edges, given the phylogenetic tree and sequences at the leafs. (co-author Tamir Tuller)

• An algorithm for computing the number of mutational events between aligned DNA sequences which is several hundred times faster than the famous Phylip packages. Since pairwise distance estimation is a bottleneck in distance-based phylogeny reconstruction, the new algorithm improves the overall running time of many distancebased methods by a factor of several hundred. (co-author Jens Lagergren)

APA, Harvard, Vancouver, ISO, and other styles

31

Wang, Jingjing. "A PARALLEL APPROACH TO MULTIPLE SEQUENCES ALIGNMENT AND PHYLOGENETIC TREE LABELING." OpenSIUC, 2010. https://opensiuc.lib.siu.edu/theses/246.

Full text

Abstract:

An evolutionary tree represents the relationship among a group of species, DNA or protein sequences, and play fundamental roles in biological lineage research. A high quality tree construction relies heavily on optimal multiple sequence alignment (MSA), which aligns three or more sequence simultaneously to derive the similarity. On the other hand, a good tree can also be used to guide the MSA process. Due to the high computational cost to conduct both the MSA and tree construction, parallel approaches are exploited to utilize the enormous amount of computing power and memory housed in a supercomputer or Linux cluster. In this paper, first of all, a divide and conquer based parallel algorithm is designed and implemented to perform optimal three sequence alignment using reduced memory cost. Secondly, all internal nodes of a phylogenetic tree resulting from a parallel Maximum-likelihood inference software are labeled using the parallel MSA. Such tree node labeling process is carried out from top down and is also parallelized to fully utilize the numerous cores and nodes in a high performance computing facility.

APA, Harvard, Vancouver, ISO, and other styles

32

Choi, Kwangbom. "P-Coffee a new divide-and-conquer method for multiple sequence alignment /." NCSU, 2005. http://www.lib.ncsu.edu/theses/available/etd-01182005-060947/.

Full text

Abstract:

We describe a new divide-and-conquer method, P-Coffee, for alignment of multiple sequences. P-Coffee first identifies candidate alignment columns using a position-specific substitution matrix (the T-Coffee extended library), tests those columns, and accepts only qualified ones. Accepted columns do not only constitute a final alignment solution, but also divide a given sequence set into partitions. The same procedure is recursively applied to each partition until all the alignment columns are collected. In P-Coffee, we minimized the source of bias by aligning all the sequences simultaneously without requiring any heuristic function to optmize, phylogenetic tree, nor gap cost scheme. In this research, we show the performance of our approach by comparing our results with that of T-Coffee using the 144 test sets provided in BAliBASE v1.0. P-Coffee outperformed T-Coffee in accuracy especially for more complicated test sets.

APA, Harvard, Vancouver, ISO, and other styles

33

Paten, Benedict John. "Large-scale multiple alignment and transcriptionally-associated pattern discovery in vertebrate genomes." Thesis, University of Cambridge, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.612811.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Lassmann, Timo. "Algorithms for building and evaluating multiple sequence alignments /." Stockholm, 2006. http://diss.kib.ki.se/2006/91-7140-887-8/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Wheeler, Travis John. "EFFICIENT CONSTRUCTION OF ACCURATE MULTIPLE ALIGNMENTS AND LARGE-SCALE PHYLOGENIES." Diss., The University of Arizona, 2009. http://hdl.handle.net/10150/195143.

Full text

Abstract:

A central focus of computational biology is to organize and make use of vast stores of molecular sequence data. Two of the most studied and fundamental problems in the field are sequence alignment and phylogeny inference. The problem of multiple sequence alignment is to take a set of DNA, RNA, or protein sequences and identify related segments of these sequences. Perhaps the most common use of alignments of multiple sequences is as input for methods designed to infer a phylogeny, or tree describing the evolutionary history of the sequences. The two problems are circularly related: standard phylogeny inference methods take a multiple sequence alignment as input, while computation of a rudimentary phylogeny is a step in the standard multiple sequence alignment method.Efficient computation of high-quality alignments, and of high-quality phylogenies based on those alignments, are both open problems in the field of computational biology. The first part of the dissertation gives details of my efforts to identify a best-of-breed method for each stage of the standard form-and-polish heuristic for aligning multiple sequences; the result of these efforts is a tool, called Opal, that achieves state-of-the-art 84.7% accuracy on the BAliBASE alignment benchmark. The second part of the dissertation describes a new algorithm that dramatically increases the speed and scalability of a common method for phylogeny inference called neighbor-joining; this algorithm is implemented in a new tool, called NINJA, which is more than an order of magnitude faster than a very fast implementation of the canonical algorithm, for example building a tree on 218,000 sequences in under 6 days using a single processor computer.

APA, Harvard, Vancouver, ISO, and other styles

36

Chiri, Ali. "Business Alignment Strategies for Middle East Real Estate Construction Projects." ScholarWorks, 2017. https://scholarworks.waldenu.edu/dissertations/4609.

Full text

Abstract:

In the Middle East real estate industry, 46% of projects fail in terms of strategic dimensions. Based on the dynamic capabilities approach and contingency approach, the purpose of this exploratory multiple case study was to identify the successful strategies project leaders used to improve the alignment of projects with business strategy. Data were collected from 7 Skype semistructured interviews with real estate construction project leaders from 3 real estate organizations ranked among the top 10 in the Middle East. Public organizational documents were used for methodological triangulation. A thematic coding approach was adopted following a nonlinear sequential process that involved four stages: (a) reading and preparing the collected data, (b) coding, (c) abstracting the codes into conceptual categories, and (d) identifying the themes' relationships and patterns and creating a thematic map. The 4 themes identified were the (a) flow of strategy, (b) governance of projects during the development phase, (c) governance of projects during the delivery phase, and (d) measurement of project performance and strategic success. The results confirmed the idiosyncratic nature of the selected contexts and the need to increase some dynamic capabilities' dimensions. The contribution of this study to positive social change includes improved community lifestyle and environmental quality.

APA, Harvard, Vancouver, ISO, and other styles

37

Helal, Manal Computer Science &amp Engineering Faculty of Engineering UNSW. "Indexing and partitioning schemes for distributed tensor computing with application to multiple sequence alignment." Awarded by:University of New South Wales. Computer Science & Engineering, 2009. http://handle.unsw.edu.au/1959.4/44781.

Full text

Abstract:

This thesis investigates indexing and partitioning schemes for high dimensional scientific computational problems. Building on the foundation offered by Mathematics of Arrays (MoA) for tensor-based computation, the ultimate contribution of the thesis is a unified partitioning scheme that works invariant of the dataset dimension and shape. Consequently, portability is ensured between different high performance machines, cluster architectures, and potentially computational grids. The Multiple Sequence Alignment (MSA) problem in computational biology has an optimal dynamic programming based solution, but it becomes computationally infeasible as its dimensionality (the number of sequences) increases. Even sub-optimal approximations may be unmanageable for more than eight sequences. Furthermore, no existing MSA algorithms have been formulated in a manner invariant over the number of sequences. This thesis presents an optimal distributed MSA method based on MoA. The latter offers a set of constructs that help represent multidimensional arrays in memory in a linear, concise and efficient way. Using MoA allows the partitioning of the dynamic programming algorithm to be expressed independently of dimension. MSA is the highest dimensional scientific problem considered for MoA-based partitioning to date. Two partitioning schemes are presented: the first is a master/slave approach which is based on both master/slave scheduling and slave/slave coupling. The second approach is a peer-to-peer design, in which the scheduling and dependency communication are calculated independently by each process, with no need for a master scheduler. A search space reduction technique is introduced to cater for the exponential expansion as the problem dimensionality increases. This technique relies on defining a hyper-diagonal through the tensor space, and choosing a band of neighbouring partitions around the diagonal to score. In contrast, other sub-optimal methods in the literature only consider projections on the surface of the hyper-cube. The resulting massively parallel design produces a scalable solution that has been implemented on high performance machines and cluster architectures. Experimental results for these implementations are presented for both simulated and real datasets. Comparisons between the reduced search space technique of this thesis with other sub-optimal methods for the MSA problem are presented.

APA, Harvard, Vancouver, ISO, and other styles

38

Ying, Chung Li, and 鍾立穎. "Multiple Sequence Alignment using Pairwise Suboptimal Alignment." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/78872248303598965142.

Full text

Abstract:

碩士
國立臺灣科技大學
資訊工程系
94
Multiple sequence alignment is an important tool to analysis biological sequence from searching similar sequence in database to protein structure. The optimal solution of dynamic programming is not always real biological solution when the number of sequence is increasing. Another method is progressive algorithm, it combined most similar sequence and then added next similar sequence. But the order of combining sequence have different alignment. Due to the optimal alignment is not always the best alignment in biological alignment, combining the pairwise suboptimal alignment have the possibility to find a better solution. The method also can decrease the time complexity. On the other hand, there is a possibility to find better alignment when we take a few time to try all combination.

APA, Harvard, Vancouver, ISO, and other styles

39

Wang, Shu 1973. "On multiple sequence alignment." Thesis, 2007. http://hdl.handle.net/2152/3715.

Full text

Abstract:

The tremendous increase in biological sequence data presents us with an opportunity to understand the molecular and cellular basis for cellular life. Comparative studies of these sequences have the potential, when applied with sufficient rigor, to decipher the structure, function, and evolution of cellular components. The accuracy and detail of these studies are directly proportional to the quality of these sequences alignments. Given the large number of sequences per family of interest, and the increasing number of families to study, improving the speed, accuracy and scalability of MSA is becoming an increasingly important task. In the past, much of interest has been on Global MSA. In recent years, the focus for MSA has shifted from global MSA to local MSA. Local MSA is being needed to align variable sequences from different families/species. In this dissertation, we developed two new algorithms for fast and scalable local MSA, a three-way-consistency-based MSA and a biclustering -based MSA. The first MSA algorithm is a three-way-Consistency-Based MSA (CBMSA). CBMSA applies alignment consistency heuristics in the form of a new three-way alignment to MSA. While three-way consistency approach is able to maintain the same time complexity as the traditional pairwise consistency approach, it provides more reliable consistency information and better alignment quality. We quantify the benefit of using three-way consistency as compared to pairwise consistency. We have also compared CBMSA to a suite of leading MSA programs and CBMSA consistently performs favorably. We also developed another new MSA algorithm, a biclustering-based MSA. Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in MSA is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering algorithms are intended to address. We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was compared with a suite of leading MSA programs. With respect to quantitative measures of MSA, BlockMSA scores comparable to or better than the other leading MSA programs. With respect to biological validation of MSA, the other leading MSA programs lag BlockMSA in their ability to identify the most highly conserved regions.

APA, Harvard, Vancouver, ISO, and other styles

40

Han-Chiou, Yu, and 邱毓翰. "Constrained Multiple Sequences Alignment." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/18516751892104681181.

Full text

Abstract:

碩士
國立清華大學
資訊工程學系
90
We design a new algorithm of computing constrained multiple sequence alignment (CMSA) for guaranteeing that generated alignment satisfies the user- specified constraints that some particular residues should be aligned together. The first step of our strategy is design a constrained pairwise sequence alignment. Next, based on the concept of progressive alignment, we use the constrained pairwise sequence alignment to progressively merge the sequences. The time complexity of our CMSA algorithm for aligning K sequences is O(Kn4),where n is the maximum of lengths of sequences. We experimented our algorithm on RNases sequences with known structure and results of our experiment are all the important residues of active sites are well aligned together.

APA, Harvard, Vancouver, ISO, and other styles

41

Xiao, Bo Weng, and èåæ. "Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/19616460760964410139.

Full text

Abstract:

碩士
國立暨南國際大學
資訊工程學系
92
Multiple sequence alignment is a fundamental problem in computational molecular biology. It has been known as an NP-hard problem. To find its optimal solution will take a lot of time. For a reasonable wait and an acceptable solution, we have progressive methods. These methods perform pairwise alignment, and then combine them in to a multiple sequence alignment. In this thesis, we focus on multiple sequence alignment containing clusters. We try to take another view point to deal with sequence alignment. We use a matrix to present a sequence. Every sequence will be represented as a matrix. After two sequences (matrices) are aligned, the result of the alignment will again be represented by a matrix and then the original two sequences (matrices) will be discarded. That is, the result of aligning a set of sequences will always be considered as a block and represented by a matrix. This is thus different from the old ways in which only two sequences are aligned, not a group of aligned sequences and another group of aligned sequences. In this thesis, we will show some experimental results to test our proposed method. Block alignment outperforms those progressive methods for sequences containing clusters.

APA, Harvard, Vancouver, ISO, and other styles

42

Hsu, Ta-Chin, and 徐大欽. "Multiple Sequence Alignment via Phylogenetic Tree." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/59691564789727790301.

Full text

Abstract:

碩士
東海大學
數學系
94
Simultaneous alignment of numerous nucleotide or amino acid sequence is a necessary tool when analyzing the structure of the sequences in the field of molecular biology. Multiple sequence alignment can be used to search for and depict the conserved domain in the protein family, examine and prove if there is any homogeneity between the new sequence and the existed protein family, to help predict the secondary or tertiary structures of new sequence, and to serve as the first step in molecular evolutionary analysis. The problems of large amounts of calculated time and space needed in the multidimensional dynamic programming make the multiple sequence alignment difficult to perform. The practical methods presented so far are just heuristic, and they can’t be assured that the results have any biological meaning. However, evaluating the multiple sequence alignment always depends on scoring scheme and gap penalty which reflect the evolutionary relation. Therefore, the use of phylogenetic trees to guide multiple sequence alignment captures the evolutionary relation is an essential way. This thesis discusses the simultaneous construct of phylogenetic tree and multiple sequence alignment based on the principle of parsimony. This idea was first put forward by Sankoff in 1973. In 1989, Hein presented the feasible algorithm by further using sequence graph combined with dynamic programming. We modified the methods that Hein constructed his guide tree in the algorithm. According to the edit distances for every pair of sequences, we build the guide tree by neighboring joining method and apply it to guide the order of alignment. Also, we use three kinds of perturbation of tree topology to reconstruct the phylogenetic tree and multiple alignments according to the principle of parsimony.

APA, Harvard, Vancouver, ISO, and other styles

43

Wang, San-yuan, and 王三源. "Constrained Multiple RNA Secondary Structure Alignment." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/w9z3st.

Full text

Abstract:

碩士
靜宜大學
資訊管理學系研究所
94
Multiple RNA secondary structure alignment and multiple sequence alignment are two important problems in computational biology. In the present criteria of the two problems, they just focus on absolute optimal solution, such as longest arc-preserving common subsequence and tree edit distance. In the sequence alignment problem, to align a set of sequences such that specified structures must be aligned together. This is desirable since one has the knowledge about the structures that are necessary for some function to work. This is the gold of the sequence alignment with constraints problem. In this thesis, we propose three multiple RNA secondary structure alignment algorithms for different applications. 1. The algorithm which is directed against the constraints of structure can let the user-specified constraints be aligned together. It’s time complexity is O(rkn2+k2n) where r, k, n are the number of constraints, sequences and the length of the longest sequence. 2. In the algorithm which is directed against the constraints of sequence, we add some annotated characters to the RNA sequences according to the structure information to produce the annotated RNA sequence. Then we align the annotated RNA sequences with constraints. The user can get not only the similarity of RAN but also the relationship of the RNA family. It’s time complexity is O(rk2n2). 3. Besides, we also propose the stem-loop algorithm which is directed against the feature s of tRNA structure. It’s time complexity is O(k2q+kq2) where q is the number of stem-loop.

APA, Harvard, Vancouver, ISO, and other styles

44

Pátek, Zdeněk. "Multiple sequence alignment pomocí genetických algoritmů." Master's thesis, 2012. http://www.nusl.cz/ntk/nusl-304119.

Full text

Abstract:

Title: Multiple sequence alignment using genetic algorithms Author: Zdeněk Pátek Department: Department of Software and Computer Science Education Supervisor: RNDr. František Mráz, CSc. Abstract: The thesis adresses the problem of multiple sequence alignment (MSA). It contains the specication of the proposed method MSAMS that allows to find motifs in biological sequences, to split sequences to blocks using the motifs, to solve MSA on the blocks and nally to assemble the global alignment from the aligned blocks and motifs. Motif search and MSA are both solved using genetic algorithms. The thesis describes the implementation of the method, conguration of its settings, benchmarking on the BAliBASE database and comparison to the ClustalW program. Experimental results showed that MSAMS can discover better alignments than ClustalW. Keywords: multiple sequence alignment, motif nding, genetic algorithms, ClustalW

APA, Harvard, Vancouver, ISO, and other styles

45

林世杰. "Multiple resolution and scale sequence alignment." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/74036857840582056102.

Full text

Abstract:

碩士
國立清華大學
資訊工程學系
89
We design a new sequence alignment system that base on L. Delcher’s [1] idea to aligning very long sequences. Because the traditional sequence alignment methods [12] spend too much time and memory space when align very long sequences. We use an idea, longest common subsequence (LCS), instead of suffix tree [5], and then we can solve sequence alignment problems more flexible. But it spends more time than using suffix tree. We improve LCS by using hash table, and it is a kind of trade off method between time and space. We also find some properties of LCS. So we can combine two shorter LCS’s results to get a longer LCS’s approximate result. In our experiment, we show our method is an effective method in practice.

APA, Harvard, Vancouver, ISO, and other styles

46

"CSA-X: Modularized Constrained Multiple Sequence Alignment." Thesis, 2015. http://hdl.handle.net/10388/ETD-2015-10-2276.

Full text

Abstract:

Imposing additional constraints on multiple sequence alignment (MSA) algorithms can often produce more biologically meaningful alignments. Hence, various constrained multiple sequence alignment (CMSA) algorithms have been developed in the literature, where researchers used anchor points, regular expressions, or context-free-grammars to specify the constraints, wherein alignments produced are forced to align around segments that match the constraints. In this thesis, we propose CSA-X, a modularized program of constrained multiple sequence alignment that accepts constraints in the form of regular expressions. It uses an arbitrary underlying multiple sequence alignment program to generate alignments, and is therefore modular. The name CSA-X refers to our proposed program generally, where the letter X is substituted with the name of a (non-constrained) multiple sequence alignment algorithm which is used as underlying MSA engine in the proposed program. We compare the accuracy of our program with another constrained multiple sequence alignment program called RE-MuSiC that similarly uses regular expressions for constraints. In addition, comparisons are also made to the underlying MSA programs (without constraints). The BAliBASE 3.0 benchmark database is used to assess the performance of the proposed program CSA-X, other MSA programs, and CMSA programs considered in this study. Based on the results presented herein, CSA-X outperforms RE-MuSiC, and scores well against the underlying alignment programs. It also shows that the use of regular expression constraints, if chosen well, created from the least conserved region of the correct alignments, improves the alignment accuracy. In this study, ProbCons and T-Coffee are used as the underlying MSA programs in CSA-X, and the accuracy of the alignments are measured in terms of Q score and TC score. On average, CSA-X used with constraints identified from the least conserved regions of the correct alignments achieves results that are 17.65% more for Q score, and 23.7% more for TC score compared to RE-MuSiC. In fact, CSA-X with ProbCons (CSA-PC) achieves a higher score in over 97.9% of the cases for Q score, and over 96.4% of the cases for TC score. In addition, CSA-X with T-Coffee (CSA-TCOF) achieves a higher score in over 97.7% of the cases for Q score, and over 94.8% of the cases for TC score. Furthermore, CSA-X with regular expressions created from the least conserved regions of the correct alignments achieves higher accuracy scores compared to standalone ProbCons and T-Coffee. To measure the statistical significance of CSA-X results, the Wilcoxon rank-sum test and Wilcoxon signed-rank test are performed, and these tests show that CSA-X results for the least conserved regular expression constraint sets from the correct BAliBASE 3.0 alignments are significantly different than those from RE-MuSiC, ProbCons, and T-Coffee.

APA, Harvard, Vancouver, ISO, and other styles

47

Jiang, Yanan master of cellular and molecular biology. "Manual alignment of IVS sequences and its implication in multiple sequence alignment." Thesis, 2011. http://hdl.handle.net/2152/ETD-UT-2011-12-4706.

Full text

Abstract:

It is recognized that an iterative comparative analysis of large-scale homologous RNAs significantly promote the understanding of an RNA family. The Gutell lab is renowned for maintaining high quality RNA sequence alignments and accurately predicted RNA secondary structures using this approach. While the current available alignment and structure data are mainly obtained by trained domain experts with extensive manual effort, it is highly desired that this process is automated and replicable given the exponentially growing number of RNA sequence data and the amount of time required for expert training. In this thesis, we learn the processes involved in comparative analysis by manually aligning a non-coding RNA family, IVS sequences, with the supervision of Dr. Gutell. Each process is then simulated by mathematical objective functions and algorithms. We also evaluate the current available RNA analysis packages that aim each of the processes. Finally, a new RNA sequence alignment algorithm incorporating structure information that can be extended for different alignment tasks is proposed.
text

APA, Harvard, Vancouver, ISO, and other styles

48

Sammeth, Michael [Verfasser]. "Integrated multiple sequence alignment / by Michael Sammeth." 2005. http://d-nb.info/98148767X/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Chen, Ying-li, and 陳穎立. "Multiple Alignment Analysis: Using Recursive Dynamic Programming." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/88573776921218031318.

Full text

Abstract:

碩士
義守大學
資訊管理學系碩士班
93
Sequence alignment analysis is the most important work for gene and protein research. Dynamic programming is one of the most used algorithms to study pairwise sequence alignments. However, when the number of sequences (k) is greater than 2, two obstacles hinder the application of this method: (1)The computing time and storage space requirements increase proportionally to O(2k．nk) and O(nk), respectively, n being the length of the sequence. (2)The difficulty in the coding to deal with dynamic k. This study presents a modified algorithm, the recursive dynamic programming, to resolve the second obstacle.

APA, Harvard, Vancouver, ISO, and other styles

50

Cheng, Mei-Ling, and 程美齡. "Improving Multiple Alignment of RNA Tertiary Structures." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/52922523689895268330.

Full text

Abstract:

碩士
國立交通大學
生物資訊及系統生物研究所
100
Recently, there is a fast growing interest in noncoding RNAs (ncRNAs) because they play a lot of essential roles in many cellular processes, even though the transcripts of these ncRNAs are not translated into proteins. Actually, the function of most available ncRNAs is still unknown and needs to be determined. Since molecular structures are typically more evolutionarily conserved than sequences, detecting structural similarities among RNA three-dimensional (3D) structures can bring more significant insights into their functional and even evolutionary relationships that would not be detected by sequence information alone. Therefore, the purpose of this study is to design a software tool that can efficiently and accurately compute the structural similarity of multiple RNA 3D structures. Our method first uses a new structure alphabet to transform RNA 3D structures into 1D SA-encoded sequences and then uses a traditional multiple sequence alignment tool CLUSTAL W and a new BLOSUM-like scoring matrix to align the SA-encoded sequences of several RNAs for detecting the structural similarity of these RNAs. Next, we have implemented the above method into a software tool called iMARTS. Finally, we have tested iMARTS on some RNA 3D structures and compared its experimental results with those obtained by our previously developed tool MARTS. Consequently, the experimental results show that iMARTS indeed has a better performance when compared with MARTS. Therefore, we believe that iMARTS can serve as a useful tool in the study of structural biology.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Multiple alignment'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles