To see the other types of publications on this topic, follow the link: Sequence alignment Sequence analysis.

Journal articles on the topic 'Sequence alignment Sequence analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Sequence alignment Sequence analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Staritzbichler, René, Edoardo Sarti, Emily Yaklich, et al. "Refining pairwise sequence alignments of membrane proteins by the incorporation of anchors." PLOS ONE 16, no. 4 (2021): e0239881. http://dx.doi.org/10.1371/journal.pone.0239881.

Full text
Abstract:
The alignment of primary sequences is a fundamental step in the analysis of protein structure, function, and evolution, and in the generation of homology-based models. Integral membrane proteins pose a significant challenge for such sequence alignment approaches, because their evolutionary relationships can be very remote, and because a high content of hydrophobic amino acids reduces their complexity. Frequently, biochemical or biophysical data is available that informs the optimum alignment, for example, indicating specific positions that share common functional or structural roles. Currently, if those positions are not correctly matched by a standard pairwise sequence alignment procedure, the incorporation of such information into the alignment is typically addressed in an ad hoc manner, with manual adjustments. However, such modifications are problematic because they reduce the robustness and reproducibility of the aligned regions either side of the newly matched positions. Previous studies have introduced restraints as a means to impose the matching of positions during sequence alignments, originally in the context of genome assembly. Here we introduce position restraints, or “anchors” as a feature in our alignment tool AlignMe, providing an aid to pairwise global sequence alignment of alpha-helical membrane proteins. Applying this approach to realistic scenarios involving distantly-related and low complexity sequences, we illustrate how the addition of anchors can be used to modify alignments, while still maintaining the reproducibility and rigor of the rest of the alignment. Anchored alignments can be generated using the online version of AlignMe available at www.bioinfo.mpg.de/AlignMe/.
APA, Harvard, Vancouver, ISO, and other styles
2

Ji, Guo Li, Jing Ci Yao, Zi Jiang Yang, and Cong Ting Ye. "LemK_MSA: A Multiple Sequence Alignment Method with Sequence Vectorization Based on Lempel-Ziv." Applied Mechanics and Materials 284-287 (January 2013): 3203–7. http://dx.doi.org/10.4028/www.scientific.net/amm.284-287.3203.

Full text
Abstract:
In this paper, we propose a method for multiple sequence alignment, LemK_MSA, which integrates Lempel-Ziv based sequence vectorization and k-means clustering analysis. LemK_MSA converts multiple sequence alignment into corresponding 10-dimensional vector alignment by 10 types of copy modes. Then it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each part with the vectors of the sequences. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Thus, the time efficiency of processing multiple sequence alignment, especially for large-scale sequences, can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. LemK_MSA also provides an effective method to analyze the evolutionary relationship and structural features among high-throughput sequences.
APA, Harvard, Vancouver, ISO, and other styles
3

Barton, Geoffrey J. "Protein Sequence Alignment Techniques." Acta Crystallographica Section D Biological Crystallography 54, no. 6 (1998): 1139–46. http://dx.doi.org/10.1107/s0907444998008324.

Full text
Abstract:
The basic algorithms for alignment of two or more protein sequences are explained. Alternative methods for scoring substitutions and gaps (insertions and deletions) are described, as are global and local alignment methods. Multiple alignment techniques are explained, including methods for profile comparison. A summary is given of programs for the alignment and analysis of protein sequences, either from sequence alone, or from three-dimensional structure.
APA, Harvard, Vancouver, ISO, and other styles
4

Wilson, W. C. "Activity Pattern Analysis by Means of Sequence-Alignment Methods." Environment and Planning A: Economy and Space 30, no. 6 (1998): 1017–38. http://dx.doi.org/10.1068/a301017.

Full text
Abstract:
The author describes a method of comparing sequences of characters, called sequence alignment or string matching, and illustrates its use in the analysis of daily activity patterns derived from time-use diaries. It allows definition of measures of similarity or distance between complete sequences, called global alignment, or the evaluation of the best fit of short sequences within long sequences, called local alignment. Alignments may be done pairwise to develop similarity or distance matrices that describe the relatedness of individuals in the set of sequences being examined. Pairwise alignment methods may be extended to many individuals by using multiple alignment analysis. A number of elementary hand-worked examples are provided. The basic concepts are discussed in terms of the problems of time-use research and the method is illustrated by examining diary data from a survey conducted in Reading, England. The CLUSTAL software used for the alignments was written for molecular biological research. The method offers a powerful technique for analyzing the full richness of diary data without discarding the details of episode ordering, duration, or transition. It is also possible to extend the analysis to include the context of activities, such as the presence of other persons or the location, but such extensions would require software designed for social science rather than biochemical problems. The method also offers a challenge to researchers to begin to develop theories about the determinants of daily behavior as a whole, rather than about participation in single activities or about time-budget totals.
APA, Harvard, Vancouver, ISO, and other styles
5

Ren, Jie, Xin Bai, Yang Young Lu, et al. "Alignment-Free Sequence Analysis and Applications." Annual Review of Biomedical Data Science 1, no. 1 (2018): 93–114. http://dx.doi.org/10.1146/annurev-biodatasci-080917-013431.

Full text
Abstract:
Genome and metagenome comparisons based on large amounts of next-generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads. Alignment-free approaches based on the counts of word patterns in NGS data do not depend on the complete genome and are generally computationally efficient. Thus, they contribute significantly to genome and metagenome comparison. Recently, novel statistical approaches have been developed for the comparison of both long and shotgun sequences. These approaches have been applied to many problems, including the comparison of gene regulatory regions, genome sequences, metagenomes, binning contigs in metagenomic data, identification of virus–host interactions, and detection of horizontal gene transfers. We provide an updated review of these applications and other related developments of word count–based approaches for alignment-free sequence analysis.
APA, Harvard, Vancouver, ISO, and other styles
6

Cook, Jonathan P., and Malcolm A. McCrae. "Sequence analysis of the guanylyltransferase (VP3) of group A rotaviruses." Journal of General Virology 85, no. 4 (2004): 929–32. http://dx.doi.org/10.1099/vir.0.19629-0.

Full text
Abstract:
The RNA segment encoding the guanylyltransferase (VP3) from 12 group A rotavirus isolates has been sequenced following RT-PCR and molecular cloning of the full-length amplicons produced. Alignment of the derived amino acid sequences including those of the four VP3 sequences available from GenBank revealed two levels of sequence divergence. Virus isolates from humans showed greater than 94 % sequence identity, whereas those isolated from different mammalian species showed as low as 79 % sequence identity. The exceptions were avian virus isolates, which diverged ∼45 % from those of mammalian origin, and the human virus isolates DS1 and 69M, which showed much closer (over 90 %) identity to viruses of bovine origin, suggesting that these human isolates may have undergone recent reassortment events with a bovine virus. Analysis of the sequences for a putative enzymic active site has revealed that the KXTAMDXEXP and KXXGNNH motifs around amino acids 385 and 545, respectively, are conserved across both group A and C rotaviruses.
APA, Harvard, Vancouver, ISO, and other styles
7

Asare, James Owusu, Justice Kwame Appati, and Kwaku Darkwah. "Formulation and Analysis of Patterns in a Score Matrix for Global Sequence Alignment." International Journal of Mathematics and Mathematical Sciences 2020 (June 1, 2020): 1–9. http://dx.doi.org/10.1155/2020/3858057.

Full text
Abstract:
Global sequence alignment is one of the most basic pairwise sequence alignment procedures used in molecular biology to understand the similarity that arises among the structure, function, or evolutionary relationship between two nucleotide sequences. The general algorithm associated with global sequence alignment is the dynamic programming algorithm of Needleman and Wunsch. In this paper, patterns are exploited in the score matrix of the Needleman–Wunsch algorithm. With the help of some examples, the general patterns realized are formulated as new a priori propositions and corollaries that are established for both equal and unequal length comparisons of any two arbitrary sequences.
APA, Harvard, Vancouver, ISO, and other styles
8

ESKIN, ELEAZAR, and SAGI SNIR. "INCORPORATING HOMOLOGUES INTO SEQUENCE EMBEDDINGS FOR PROTEIN ANALYSIS." Journal of Bioinformatics and Computational Biology 05, no. 03 (2007): 717–38. http://dx.doi.org/10.1142/s0219720007002734.

Full text
Abstract:
Statistical and learning techniques are becoming increasingly popular for different tasks in bioinformatics. Many of the most powerful statistical and learning techniques are applicable to points in a Euclidean space but not directly applicable to discrete sequences such as protein sequences. One way to apply these techniques to protein sequences is to embed the sequences into a Euclidean space and then apply these techniques to the embedded points. In this work we introduce a biologically motivated sequence embedding, the homology kernel, which takes into account intuitions from local alignment, sequence homology, and predicted secondary structure. This embedding allows us to directly apply learning techniques to protein sequences. We apply the homology kernel in several ways. We demonstrate how the homology kernel can be used for protein family classification and outperforms state-of-the-art methods for remote homology detection. We show that the homology kernel can be used for secondary structure prediction and is competitive with popular secondary structure prediction methods. Finally, we show how the homology kernel can be used to incorporate information from homologous sequences in local sequence alignment.
APA, Harvard, Vancouver, ISO, and other styles
9

Aadland, Kelsey, and Bryan Kolaczkowski. "Alignment-Integrated Reconstruction of Ancestral Sequences Improves Accuracy." Genome Biology and Evolution 12, no. 9 (2020): 1549–65. http://dx.doi.org/10.1093/gbe/evaa164.

Full text
Abstract:
Abstract Ancestral sequence reconstruction (ASR) uses an alignment of extant protein sequences, a phylogeny describing the history of the protein family and a model of the molecular-evolutionary process to infer the sequences of ancient proteins, allowing researchers to directly investigate the impact of sequence evolution on protein structure and function. Like all statistical inferences, ASR can be sensitive to violations of its underlying assumptions. Previous studies have shown that, whereas phylogenetic uncertainty has only a very weak impact on ASR accuracy, uncertainty in the protein sequence alignment can more strongly affect inferred ancestral sequences. Here, we show that errors in sequence alignment can produce errors in ASR across a range of realistic and simplified evolutionary scenarios. Importantly, sequence reconstruction errors can lead to errors in estimates of structural and functional properties of ancestral proteins, potentially undermining the reliability of analyses relying on ASR. We introduce an alignment-integrated ASR approach that combines information from many different sequence alignments. We show that integrating alignment uncertainty improves ASR accuracy and the accuracy of downstream structural and functional inferences, often performing as well as highly accurate structure-guided alignment. Given the growing evidence that sequence alignment errors can impact the reliability of ASR studies, we recommend that future studies incorporate approaches to mitigate the impact of alignment uncertainty. Probabilistic modeling of insertion and deletion events has the potential to radically improve ASR accuracy when the model reflects the true underlying evolutionary history, but further studies are required to thoroughly evaluate the reliability of these approaches under realistic conditions.
APA, Harvard, Vancouver, ISO, and other styles
10

Lebsir, Rabah, Abdesslem Layeb, and Tahi Fariza. "A Greedy Clustering Algorithm for Multiple Sequence Alignment." International Journal of Cognitive Informatics and Natural Intelligence 15, no. 4 (2021): 1–17. http://dx.doi.org/10.4018/ijcini.20211001.oa41.

Full text
Abstract:
This paper presents a strategy to tackle the Multiple Sequence Alignment (MSA) problem, which is one of the most important tasks in the biological sequence analysis. Its role is to align the sequences in their entirety to derive relationships and common characteristics between a set of protein or nucleotide sequences. The MSA problem was proved to be an NP-Hard problem. The proposed strategy incorporates a new idea based on the well-known divide and conquer paradigm. This paper presents a novel method of clustering sequences as a preliminary step to improve the final alignment; this decomposition can be used as an optimization procedure with any MSA aligner to explore promising alignments of the search space. In their solution, authors proposed to align the clusters in a parallel and distributed way in order to benefit from parallel architectures. The strategy was tested using classical benchmarks like BAliBASE, Sabre, Prefab4 and Oxm, and the experimental results show that it gives good results by comparing to the other aligners.
APA, Harvard, Vancouver, ISO, and other styles
11

Kholiq, Hibban, Mamika Ujianita Romdhini, and Marliadi Susanto. "Algoritma Needleman-Wunsch dalam Menentukan Tingkat Kemiripan Urutan DNA Rusa Timor (Cervus timorensis) dan Rusa Merah (Cervus elaphus)." EIGEN MATHEMATICS JOURNAL 3, no. 2 (2020): 125. http://dx.doi.org/10.29303/emj.v3i2.65.

Full text
Abstract:
Sequence alignment is a basic method in sequence analysis. This method is used to determine the similaritiy level of DNA sequences. The Needleman-Wunsch algorithm is an algorithm that can be used to solve the problem of sequence alignment. This research shows that the relation T (i, j) used in the Needleman-Wunsch algorithm is a function where T: (ℕ0 ℕ0) → ℤ. The function T (i, j) is a recursive function. Moreover, DNA sequence data used are DNA sequences from the Timor Deer, which are the identities of the provinces of West Nusa Tenggara and Red Deer, which are typical deer from the European continent as a comparison. The DNA sequence data was obtained from BLAST (Basic Local Alignment Search Tool). Based on the alignment, the most optimal alignment is obtained by forming 666 base pairs sequences with 322 matches, 230 missmatches and 114 gaps, meaning that the two DNA sequences have a 48% similarity (322/666).
APA, Harvard, Vancouver, ISO, and other styles
12

Humphrey, Sam, Alastair Kerr, Magnus Rattray, Caroline Dive, and Crispin J. Miller. "A model of k-mer surprisal to quantify local sequence information content surrounding splice regions." PeerJ 8 (November 4, 2020): e10063. http://dx.doi.org/10.7717/peerj.10063.

Full text
Abstract:
Molecular sequences carry information. Analysis of sequence conservation between homologous loci is a proven approach with which to explore the information content of molecular sequences. This is often done using multiple sequence alignments to support comparisons between homologous loci. These methods therefore rely on sufficient underlying sequence similarity with which to construct a representative alignment. Here we describe a method using a formal metric of information, surprisal, to analyse biological sub-sequences without alignment constraints. We applied our model to the genomes of five different species to reveal similar patterns across a panel of eukaryotes. As the surprisal of a sub-sequence is inversely proportional to its occurrence within the genome, the optimal size of the sub-sequences was selected for each species under consideration. With the model optimized, we found a strong correlation between surprisal and CG dinucleotide usage. The utility of our model was tested by examining the sequences of genes known to undergo splicing. We demonstrate that our model can identify biological features of interest such as known donor and acceptor sites. Analysis across all annotated coding exon junctions in Homo sapiens reveals the information content of coding exons to be greater than the surrounding intron regions, a consequence of increased suppression of the CG dinucleotide in intronic space. Sequences within coding regions proximal to exon junctions exhibited novel patterns within DNA and coding mRNA that are not a function of the encoded amino acid sequence. Our findings are consistent with the presence of secondary information encoding features such as DNA and RNA binding sites, multiplexed through the coding sequence and independent of the information required to define the corresponding amino-acid sequence. We conclude that surprisal provides a complementary methodology with which to locate regions of interest in the genome, particularly in situations that lack an appropriate multiple sequence alignment.
APA, Harvard, Vancouver, ISO, and other styles
13

Tyson, Hugh. "Relationships between amino acid sequences determined through optimum alignments, clustering, and specific distance patterns: application to a group of scorpion toxins." Genome 35, no. 2 (1992): 360–71. http://dx.doi.org/10.1139/g92-055.

Full text
Abstract:
Optimum alignment in all pairwise combinations among a group of amino acid sequences generated a distance matrix. These distances were clustered to evaluate relationships among the sequences. The degree of relationship among sequences was also evaluated by calculating specific distances from the distance matrix and examining correlations between patterns of specific distances for pairs of sequences. The sequences examined were a group of 20 amino acid sequences of scorpion toxins originally published and analyzed by M.J. Dufton and H. Rochat in 1984. Alignment gap penalties were constant for all 190 pairwise sequence alignments and were chosen after assessing the impact of changing penalties on resultant distances. The total distances generated by the 190 pairwise sequence aligments were clustered using complete (farthest neighbour) linkage. The square, symmetrical input distance matrix is analogous to diallel cross data where reciprocal and parental values are absent. Diallel analysis methods provided analogues for the distance matrix to genetical specific combining abilities, namely specific distances between all sequence pairs that are independent of the average distances shown by individual sequences. Correlation of specific distance patterns, with transformation to modified z values and a stringent probability level, were used to delineate subgroups of related sequences. These were compared with complete linkage clustering results. Excellent agreement between the two approaches was found. Three originally outlying sequences were placed within the four new subgroups.Key words: sequence alignment, specific distances, sequence relationships.
APA, Harvard, Vancouver, ISO, and other styles
14

Wilson, Clarke. "Analysis of Travel Behavior Using Sequence Alignment Methods." Transportation Research Record: Journal of the Transportation Research Board 1645, no. 1 (1998): 52–59. http://dx.doi.org/10.3141/1645-07.

Full text
Abstract:
Sequence alignment methods are applied to daily activity data derived from the Statistics Canada 1992 General Social Survey on Time Use, with special emphasis on travel episodes and the activities that generate travel. Sequence alignment is a combinatorial procedure that gives a quantitative measure of the similarity of character sequences, which may be used to represent daily activity patterns. It accommodates all the details supplied from activity diaries including the ordering of activity episodes, their duration, and patterns of transitions from one activity to another. Analysis of daily activity patterns by using such methods offers a new way of improving understanding of travel behavior. Such an understanding is especially critical when public transport policy is being driven increasingly by budget constraints, and traffic management through congestion is considered an acceptable response to increasing travel demands. The method successfully identifies groupings of behavioral patterns, which then may be further described by using multivariate analysis of sociodemographic characteristics. A key issue in the application of the method is to determine the circumstances in which activity sequences should or should not reflect episode duration.
APA, Harvard, Vancouver, ISO, and other styles
15

Fakankun, Irene, Brian Fristensky, and David B. Levin. "Genome Sequence Analysis of the Oleaginous Yeast, Rhodotorula diobovata, and Comparison of the Carotenogenic and Oleaginous Pathway Genes and Gene Products with Other Oleaginous Yeasts." Journal of Fungi 7, no. 4 (2021): 320. http://dx.doi.org/10.3390/jof7040320.

Full text
Abstract:
Rhodotorula diobovata is an oleaginous and carotenogenic yeast, useful for diverse biotechnological applications. To understand the molecular basis of its potential applications, the genome was sequenced using the Illumina MiSeq and Ion Torrent platforms, assembled by AbySS, and annotated using the JGI annotation pipeline. The genome size, 21.1 MB, was similar to that of the biotechnological “workhorse”, R. toruloides. Comparative analyses of the R. diobovata genome sequence with those of other Rhodotorula species, Yarrowia lipolytica, Phaffia rhodozyma, Lipomyces starkeyi, and Sporidiobolus salmonicolor, were conducted, with emphasis on the carotenoid and neutral lipid biosynthesis pathways. Amino acid sequence alignments of key enzymes in the lipid biosynthesis pathway revealed why the activity of malic enzyme and ATP-citrate lyase may be ambiguous in Y. lipolytica and L. starkeyi. Phylogenetic analysis showed a close relationship between R. diobovata and R. graminis WP1. Dot-plot analysis of the coding sequences of the genes crtYB and ME1 corroborated sequence homologies between sequences from R. diobovata and R. graminis. There was, however, nonsequential alignment between crtYB CDS sequences from R. diobovata and those from X. dendrorhous. This research presents the first genome analysis of R. diobovata with a focus on its biotechnological potential as a lipid and carotenoid producer.
APA, Harvard, Vancouver, ISO, and other styles
16

Cavanaugh, David, and Krishnan Chittur. "A hydrophobic proclivity index for protein alignments." F1000Research 4 (October 21, 2015): 1097. http://dx.doi.org/10.12688/f1000research.6348.1.

Full text
Abstract:
Sequence alignment algorithms are fundamental to modern bioinformatics. Sequence alignments are widely used in diverse applications such as phylogenetic analysis, database searches for related sequences to aid identification of unknown protein domain structures and classification of proteins and protein domains. Additionally, alignment algorithms are integral to the location of related proteins to secure understanding of unknown protein functions, to suggest the folded structure of proteins of unknown structure from location of homologous proteins and/or by locating homologous domains of known 3D structure. For proteins, alignment algorithms depend on information about amino acid substitutions that allows for matching sequences that are similar, but not exact. When primary sequence percent identity falls below about 25%, algorithms often fail to identify proteins that may have similar 3D structure. We have created a hydrophobicity scale and a matching dynamic programming algorithm called TMATCH (unpublished report) that is able to match proteins with remote homologs with similar secondary/tertiary structure, even with very low primary sequence matches. In this paper, we describe how we arrived at the hydrophobic scale, how it provides much more information than percent identity matches and some of the implications for better alignments and understanding protein structure.
APA, Harvard, Vancouver, ISO, and other styles
17

Cavanaugh, David, and Krishnan Chittur. "A hydrophobic proclivity index for protein alignments." F1000Research 4 (October 15, 2020): 1097. http://dx.doi.org/10.12688/f1000research.6348.2.

Full text
Abstract:
Sequence alignment algorithms are fundamental to modern bioinformatics. Sequence alignments are widely used in diverse applications such as phylogenetic analysis, database searches for related sequences to aid identification of unknown protein domain structures and classification of proteins and protein domains. Additionally, alignment algorithms are integral to the location of related proteins to secure understanding of unknown protein functions, to suggest the folded structure of proteins of unknown structure from location of homologous proteins and/or by locating homologous domains of known 3D structure. For proteins, alignment algorithms depend on information about amino acid substitutions that allows for matching sequences that are similar, but not exact. When primary sequence percent identity falls below about 25%, algorithms often fail to identify proteins that may have similar 3D structure. We have created a hydrophobicity scale and a matching dynamic programming algorithm called TMATCH (preprint report) that is able to match proteins with remote homologs with similar secondary/tertiary structure, even with very low primary sequence matches. In this paper, we describe how we arrived at the hydrophobic scale, how it provides much more information than percent identity matches and some of the implications for better alignments and understanding protein structure.
APA, Harvard, Vancouver, ISO, and other styles
18

Phillips, Aloysius, Daniel Janies, and Ward Wheeler. "Multiple Sequence Alignment in Phylogenetic Analysis." Molecular Phylogenetics and Evolution 16, no. 3 (2000): 317–30. http://dx.doi.org/10.1006/mpev.2000.0785.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Lu, Yang Young, Kujin Tang, Jie Ren, Jed A. Fuhrman, Michael S. Waterman, and Fengzhu Sun. "CAFE: aCcelerated Alignment-FrEe sequence analysis." Nucleic Acids Research 45, W1 (2017): W554—W559. http://dx.doi.org/10.1093/nar/gkx351.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Muller, Jean, Yukako Oma, Laurent Vallar, Evelyne Friederich, Olivier Poch, and Barbara Winsor. "Sequence and Comparative Genomic Analysis of Actin-related Proteins." Molecular Biology of the Cell 16, no. 12 (2005): 5736–48. http://dx.doi.org/10.1091/mbc.e05-06-0508.

Full text
Abstract:
Actin-related proteins (ARPs) are key players in cytoskeleton activities and nuclear functions. Two complexes, ARP2/3 and ARP1/11, also known as dynactin, are implicated in actin dynamics and in microtubule-based trafficking, respectively. ARP4 to ARP9 are components of many chromatin-modulating complexes. Conventional actins and ARPs codefine a large family of homologous proteins, the actin superfamily, with a tertiary structure known as the actin fold. Because ARPs and actin share high sequence conservation, clear family definition requires distinct features to easily and systematically identify each subfamily. In this study we performed an in depth sequence and comparative genomic analysis of ARP subfamilies. A high-quality multiple alignment of ∼700 complete protein sequences homologous to actin, including 148 ARP sequences, allowed us to extend the ARP classification to new organisms. Sequence alignments revealed conserved residues, motifs, and inserted sequence signatures to define each ARP subfamily. These discriminative characteristics allowed us to develop ARPAnno ( http://bips.u-strasbg.fr/ARPAnno ), a new web server dedicated to the annotation of ARP sequences. Analyses of sequence conservation among actins and ARPs highlight part of the actin fold and suggest interactions between ARPs and actin-binding proteins. Finally, analysis of ARP distribution across eukaryotic phyla emphasizes the central importance of nuclear ARPs, particularly the multifunctional ARP4.
APA, Harvard, Vancouver, ISO, and other styles
21

Steinke, Dirk, Miguel Vences, Walter Salzburger, and Axel Meyer. "TaxI: a software tool for DNA barcoding using distance methods." Philosophical Transactions of the Royal Society B: Biological Sciences 360, no. 1462 (2005): 1975–80. http://dx.doi.org/10.1098/rstb.2005.1729.

Full text
Abstract:
DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding.
APA, Harvard, Vancouver, ISO, and other styles
22

Kaur, Navjot, Rajbir Singh Cheema, and Harmandeep Singh Harmandeep Singh. "Multiple Sequence Alignment and Profile Analysis of Protein Family Utsing Hidden Markov Model." International Journal of Scientific Research 2, no. 6 (2012): 208–11. http://dx.doi.org/10.15373/22778179/june2013/66.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Jeon, Yoon-Seong, Kihyun Lee, Sang-Cheol Park, et al. "EzEditor: a versatile sequence alignment editor for both rRNA- and protein-coding genes." International Journal of Systematic and Evolutionary Microbiology 64, Pt_2 (2014): 689–91. http://dx.doi.org/10.1099/ijs.0.059360-0.

Full text
Abstract:
EzEditor is a Java-based molecular sequence editor allowing manipulation of both DNA and protein sequence alignments for phylogenetic analysis. It has multiple features optimized to connect initial computer-generated multiple alignment and subsequent phylogenetic analysis by providing manual editing with reference to biological information specific to the genes under consideration. It provides various functionalities for editing rRNA alignments using secondary structure information. In addition, it supports simultaneous editing of both DNA sequences and their translated protein sequences for protein-coding genes. EzEditor is, to our knowledge, the first sequence editing software designed for both rRNA- and protein-coding genes with the visualization of biologically relevant information and should be useful in molecular phylogenetic studies. EzEditor is based on Java, can be run on all major computer operating systems and is freely available from http://sw.ezbiocloud.net/ezeditor/.
APA, Harvard, Vancouver, ISO, and other styles
24

Morrison, David A., Matthew J. Morgan, and Scot A. Kelchner. "Molecular homology and multiple-sequence alignment: an analysis of concepts and practice." Australian Systematic Botany 28, no. 1 (2015): 46. http://dx.doi.org/10.1071/sb15001.

Full text
Abstract:
Sequence alignment is just as much a part of phylogenetics as is tree building, although it is often viewed solely as a necessary tool to construct trees. However, alignment for the purpose of phylogenetic inference is primarily about homology, as it is the procedure that expresses homology relationships among the characters, rather than the historical relationships of the taxa. Molecular homology is rather vaguely defined and understood, despite its importance in the molecular age. Indeed, homology has rarely been evaluated with respect to nucleotide sequence alignments, in spite of the fact that nucleotides are the only data that directly represent genotype. All other molecular data represent phenotype, just as do morphology and anatomy. Thus, efforts to improve sequence alignment for phylogenetic purposes should involve a more refined use of the homology concept at a molecular level. To this end, we present examples of molecular-data levels at which homology might be considered, and arrange them in a hierarchy. The concept that we propose has many levels, which link directly to the developmental and morphological components of homology. Of note, there is no simple relationship between gene homology and nucleotide homology. We also propose terminology with which to better describe and discuss molecular homology at these levels. Our over-arching conceptual framework is then used to shed light on the multitude of automated procedures that have been created for multiple-sequence alignment. Sequence alignment needs to be based on aligning homologous nucleotides, without necessary reference to homology at any other level of the hierarchy. In particular, inference of nucleotide homology involves deriving a plausible scenario for molecular change among the set of sequences. Our clarifications should allow the development of a procedure that specifically addresses homology, which is required when performing alignment for phylogenetic purposes, but which does not yet exist.
APA, Harvard, Vancouver, ISO, and other styles
25

Kanagarajadurai, Karuppiah, Singaravelu Kalaimathy, Paramasivam Nagarajan, and Ramanathan Sowdhamini. "PASS2." International Journal of Knowledge Discovery in Bioinformatics 2, no. 4 (2011): 53–66. http://dx.doi.org/10.4018/jkdb.2011100104.

Full text
Abstract:
A detailed comparison of protein domains that belong to families and superfamilies shows that structure is better conserved than sequence during evolutionary divergence. Sequence alignments, guided by structural features, permit a better sampling of the protein sequence space and effective construction of libraries for fold recognition. Sequence alignments are useful evolutionary models in defining structure-function relationships for protein superfamilies. The PASS2 database, maintained by the authors, presents alignments of proteins related at the superfamily level and characterised by low sequence similarity. The number of new superfamilies increased to 47% compared with the previous PASS2 version, which shows the crucial importance of updating the PASS2 database. In the current release of the PASS2 database, they align protein superfamilies using a structural alignment protocol. The authors also introduce two alignment assessment methods that depend on the average structural deviations of domains and the extent of conserved secondary structures. They also integrate new and important structural and sequence features at the superfamily level into the database. These features are conserved-unconserved blocks in proteins, spatial distribution of sequences using principal component analysis and a statistical view for each superfamily. The authors suggest that highly structurally deviant superfamily members could be removed as outliers, so that such extreme distant relationships will not obscure the alignment. They report a nearly-automated, updated version of the superfamily alignment database, consisting of 1776 superfamilies and 9536 protein domains, that is in direct correspondence with the SCOP (1.73) database.
APA, Harvard, Vancouver, ISO, and other styles
26

Abbas, Ali Hadi, Haider Abas AL saegh, and Furkan Sabbar ALaraji. "Sequence diversity and evolution of infectious bursal disease virus in Iraq." F1000Research 10 (April 16, 2021): 293. http://dx.doi.org/10.12688/f1000research.28421.1.

Full text
Abstract:
Background: Infectious Bursal Disease (IBD) is a highly infectious disease which causes huge economic losses to the poultry industry due to the direct impact of the illness and indirect consequences such as decreasing the general immunity of the flock, leaving it naive to other diseases. In Iraq, IBD is highly prevalent despite vaccination programs, yet studies on sequence diversity of the causative virus are still rare. Methods: A sample from Bursa of Fabricius from an IBD outbreak in a flock in the city of Najaf in Iraq was smeared on an FTA card. Amplicons of targeted regions in VP1 and VP2 genes were generated and sequenced. Sequences were then compared with other local and global sequences downloaded from GenBank repositories. Sequence alignment and DNA sequence analyses were achieved using MUSCLE, UGENE and MEGAx software. The molecular clock and sequence evolutionary analyses were applied using MEGAx tools. Results: The strain sequenced in this study belongs to a very virulent Infectious Bursal Disease Virus (vvIBDV) as the DNA and phylogenetic analysis of VP1 and VP2 gene sequences showed a mutual clustering with similar sequences belonging to vvIBDV genogroup 3. Analyses of the hyper variable region of VP2 gene (hvVP2) of IBDV isolates from Iraq indicates a presence of sequence diversity. Interestingly, the two vaccine strains Ventri IBDV Plus and ABIC MB71 that showed the highest sequence similarity to the local isolates in the hvVP2 region are not used in vaccination routine against IBDV in Iraq. Conclusion: Sequences of vvIBDV in Iraq are diverse. Remarkably, some of the available vaccine strains show high sequence similarity with local strains in Iraq; however, they are not included in the routine vaccination programs. Analysis of more samples involving more geographical regions is needed to draw a detailed map of antigenic diversity of IBDV in Iraq.
APA, Harvard, Vancouver, ISO, and other styles
27

Abbas, Ali Hadi, Haider Abas AL saegh, and Furkan Sabbar ALaraji. "Sequence diversity and evolution of infectious bursal disease virus in Iraq." F1000Research 10 (September 2, 2021): 293. http://dx.doi.org/10.12688/f1000research.28421.2.

Full text
Abstract:
Background: Infectious Bursal Disease (IBD) is a highly infectious disease which causes huge economic losses to the poultry industry due to the direct impact of the illness and indirect consequences such as decreasing the general immunity of the flock, leaving it naive to other diseases. In Iraq, IBD is highly prevalent despite vaccination programs, yet studies on sequence diversity of the causative virus are still rare. Methods: A sample from Bursa of Fabricius from an IBD outbreak in a flock in the city of Najaf in Iraq was smeared on an FTA card. Amplicons of targeted regions in VP1 and VP2 genes were generated and sequenced. Sequences were then compared with other local and global sequences downloaded from GenBank repositories. Sequence alignment and DNA sequence analyses were achieved using MUSCLE, UGENE and MEGAx software. The molecular clock and sequence evolutionary analyses were applied using MEGAx tools. Results: The strain sequenced in this study belongs to a very virulent Infectious Bursal Disease Virus (vvIBDV) as the DNA and phylogenetic analysis of VP1 and VP2 gene sequences showed a mutual clustering with similar sequences belonging to vvIBDV genogroup 3. Analyses of the hyper variable region of VP2 gene (hvVP2) of IBDV isolates from Iraq indicates a presence of sequence diversity. Interestingly, the two vaccine strains Ventri IBDV Plus and ABIC MB71 that showed the highest sequence similarity to the local isolates in the hvVP2 region are not used in vaccination routine against IBDV in Iraq. Conclusion: Sequences of vvIBDV in Iraq are diverse. Remarkably, some of the available vaccine strains show high sequence similarity with local strains in Iraq; however, they are not included in the routine vaccination programs. Analysis of more samples involving more geographical regions is needed to draw a detailed map of antigenic diversity of IBDV in Iraq.
APA, Harvard, Vancouver, ISO, and other styles
28

Bhattacharyya, Debnath, Bijoy Kumar Mandal, and Tai-hoon Kim. "Designing a Bioengine for Detection and Analysis of Base String on an Affected Sequence in High-Concentration Regions." BioMed Research International 2013 (2013): 1–7. http://dx.doi.org/10.1155/2013/372646.

Full text
Abstract:
We design an Algorithm for bioengine. As a program are enable optimal alignments searching between two sequences, the host sequence (normal plant) as well as query sequence (virus). Searching for homologues has become a routine operation of biological sequences in 4 × 4 combination with different subsequence (word size). This program takes the advantage of the high degree of homology between such sequences to construct an alignment of the matching regions. There is a main aim which is to detect the overlapping reading frames. This program also enables to find out the highly infected colones selection highest matching region with minimum gap or mismatch zones and unique virus colones matches. This is a small, portable, interactive, front-end program intended to be used to find out the regions of matching between host sequence and query subsequences. All the operations are carried out in fraction of seconds, depending on the required task and on the sequence length.
APA, Harvard, Vancouver, ISO, and other styles
29

Ahola, Virpi, Tero Aittokallio, Esa Uusipaikka, and Mauno Vihinen. "Statistical Methods for Identifying Conserved Residues in Multiple Sequence Alignment." Statistical Applications in Genetics and Molecular Biology 3, no. 1 (2004): 1–28. http://dx.doi.org/10.2202/1544-6115.1074.

Full text
Abstract:
The assessment of residue conservation in a multiple sequence alignment is a central issue in bioinformatics. Conserved residues and regions are used to determine structural and functional motifs or evolutionary relationships between the sequences of a multiple sequence alignment. For this reason, residue conservation is a valuable measure for database and motif search or for estimating the quality of alignments. In this paper, we present statistical methods for identifying conserved residues in multiple sequence alignments. While most earlier studies examine the positional conservation of the alignment, we focus on the detection of individual conserved residues at a position. The major advantages of multiple comparison methods originate from their ability to select conserved residues simultaneously and to consider the variability of the residue estimates. Large-scale simulations were used for the comparative analysis of the methods. Practical performance was studied by comparing the structurally and functionally important residues of Src homology 2 (SH2) domains to the assignments of the conservation indices. The applicability of the indices was also compared in three additional protein families comprising different degrees of entropy and variability in alignment positions. The results indicate that statistical multiple comparison methods are sensitive and reliable in identifying conserved residues.
APA, Harvard, Vancouver, ISO, and other styles
30

Beccari, T., J. Hoade, A. Orlacchio та J. L. Stirling. "Cloning and sequence analysis of a cDNA encoding the α-subunit of mouse β-N-acetylhexosaminidase and comparison with the human enzyme". Biochemical Journal 285, № 2 (1992): 593–96. http://dx.doi.org/10.1042/bj2850593.

Full text
Abstract:
cDNAs encoding the mouse beta-N-acetylhexosaminidase alpha-subunit were isolated from a mouse testis library. The longest of these (1.7 kb) was sequenced and showed 83% similarity with the human alpha-subunit cDNA sequence. The 5′ end of the coding sequence was obtained from a genomic DNA clone. Alignment of the human and mouse sequences showed that all three putative N-glycosylation sites are conserved, but that the mouse alpha-subunit has an additional site towards the C-terminus. All eight cysteines in the human sequence are conserved in the mouse. There are an additional two cysteines in the mouse alpha-subunit signal peptide. All amino acids affected in Tay-Sachs-disease mutations are conserved in the mouse.
APA, Harvard, Vancouver, ISO, and other styles
31

Shabaan, Amr M., Magdy M. Mohamed, Mohga S. Abdallah, Hayat M. Ibrahim, and Amr M. Karim. "Analysis of Schistosoma mansoni genes using the expressed sequence tag approach." Acta Biochimica Polonica 50, no. 1 (2003): 259–68. http://dx.doi.org/10.18388/abp.2003_3735.

Full text
Abstract:
Expressed sequence tags (ESTs) are partial cDNA sequences read from both ends of random expressed gene fragments used for discovering new genes. DNA libraries from four different developmental stages of Schistosoma mansoni used in this study generated 141 ESTs representing about 2.5% of S. mansoni sequences in dbEST. Sequencing was done by the dideoxy chain termination method. The sequences were submitted to GenBank for homology searching in nonredundant databases using Basic Local Alignment Search Tool for DNA (BLASTN) alignment and for protein (BLASTX) alignment at the National Center for Biotechnology Information (NCBI). Among submitted ESTs, 29 were derived from lambdagt11 sporocyst library, 70 from lambdaZap adult worm library, 31 from lambdaZap cercarial library, and 11 from lambdaZap female B worm library. Homology search revealed that eight (5.6%) ESTs shared homology to previously identified S.mansoni genes in dbEST, 15 (10.6%) are homologous to known genes in other organisms, 116 (81.7%) showed no significant sequence homology in the databases, and the remaining sequences (2.1%) showed low homologies to rRNA or mitochondrial DNA sequences. Thus, among the 141 ESTs studied, 116 sequences are derived from noval, uncharactarized S. mansoni genes. Those 116 ESTs are important for identification of coding regions in the sequences, helping in mapping of schistosome genome, and identifying genes of immunological and pharmacological significance.
APA, Harvard, Vancouver, ISO, and other styles
32

Solano-Roman, A., C. Cruz-Castillo, D. Offenhuber, and A. Colubri. "NX4: a web-based visualization of large multiple sequence alignments." Bioinformatics 35, no. 22 (2019): 4800–4802. http://dx.doi.org/10.1093/bioinformatics/btz457.

Full text
Abstract:
Abstract Summary Multiple Sequence Alignments (MSAs) are a fundamental operation in genome analysis. However, MSA visualizations such as sequence logos and matrix representations have changed little since the nineties and are not well suited for displaying large-scale alignments. We propose a novel, web-based MSA visualization tool called NX4, which can handle genome alignments comprising thousands of sequences. NX4 calculates the frequency of each nucleotide along the alignment and visually summarizes the results using a color-blind friendly palette that helps identifying regions of high genetic diversity. NX4 also provides the user with additional assistance in finding these regions with a ‘focus + context’ mechanism that uses a line chart of the Shannon entropy across the alignment. The tool offers geneticists an easy-to-use and scalable analysis for large MSA studies. Availability and implementation NX4 is freely available at https://www.nx4.io, and its source code at https://github.com/NX4/nx4. Supplementary information Supplementary data are available at Bioinformatics online
APA, Harvard, Vancouver, ISO, and other styles
33

Grivet, L., J. C. Glaszmann, and P. Arruda. "Sequence polymorphism from EST data in sugarcane: a fine analysis of 6-phosphogluconate dehydrogenase genes." Genetics and Molecular Biology 24, no. 1-4 (2001): 161–67. http://dx.doi.org/10.1590/s1415-47572001000100022.

Full text
Abstract:
This paper presents preliminary results demonstrating the use of the sugarcane expressed sequence tag (EST) database (SUCEST) to detect single nucleotide polymorphisms (SNPs) inside 6-phosphogluconate dehydrogenase genes (Pgds). Sixty-four Pgd-related EST sequences were identified and partitioned into two clear-cut sets of 14 and 50 ESTs, probably corresponding to two genes, A and B, respectively. Alignment of A sequences allowed the detection of a single SNP while alignment of B sequences permitted the detection of 39 reliable SNPs, 27 of which in the coding sequence of the gene. Thirty-eight SNPs were binucleotidic and a single one was trinucleotidic. Nine insertions/deletions from one to 72 base pairs long were also detected in the noncoding 3’ and 5’ sequences. The soundness and the consequences of those preliminary observations on sequence polymorphism in sugarcane are discussed.
APA, Harvard, Vancouver, ISO, and other styles
34

Morrison, David A. "Multiple sequence alignment for phylogenetic purposes." Australian Systematic Botany 19, no. 6 (2006): 479. http://dx.doi.org/10.1071/sb06020.

Full text
Abstract:
I have addressed the biological rather than bioinformatics aspects of molecular sequence alignment by covering a series of topics that have been under-valued, particularly within the context of phylogenetic analysis. First, phylogenetic analysis is only one of the many objectives of sequence alignment, and the most appropriate multiple alignment may not be the same for all of these purposes. Phylogenetic alignment thus occupies a specific place within a broader context. Second, homology assessment plays an intricate role in phylogenetic analysis, with sequence alignment consisting of primary homology assessment and tree building being secondary homology assessment. The objective of phylogenetic alignment thus distinguishes it from other sorts of alignment. Third, I summarise what is known about the serious limitations of using phenetic similarity as a criterion for automated multiple alignment, and provide an overview of what is currently being done to improve these computerised procedures. This synthesises information that is apparently not widely known among phylogeneticists. Fourth, I then consider the recent development of automated procedures for combining alignment and tree building, thus integrating primary and secondary homology assessment. Finally, I outline various strategies for increasing the biological content of sequence alignment procedures, which consists of taking into account known evolutionary processes when making alignment decisions. These procedures can be objective and repeatable, and can involve computerised algorithms to automate much of the work. Perhaps the most important suggestion is that alignment should be seen as a process where new sequences are added to a pre-existing alignment that has been manually curated by the biologist.
APA, Harvard, Vancouver, ISO, and other styles
35

Begum, RA, MT Alam, H. Jahan, and MS Alam. "Partial sequence analysis of mitochondrial cytochrome B gene of Labeo calbasu of Bangladesh." Journal of Biodiversity Conservation and Bioresource Management 5, no. 1 (2019): 25–30. http://dx.doi.org/10.3329/jbcbm.v5i1.42182.

Full text
Abstract:
Labeo calbasu (Family Cyprinidae) was studied at DNA level to know genetic diversity within and between species. The mitochondrial cytochrome b (cyt-b) gene of L. calbasu was sequenced and compared to the corresponding sequences of other Labeo species. DNA was isolated from the tissue sample of L. calbasu using phenol: chloroform extraction method. Forward and reverse primers were designed to amplify the target region of cytochrome b gene. A standard PCR protocol was used for the amplification of the desired region. Then, the forward and reverse sequences obtained were aligned and edited to finalize a length of 510 nucleotides which was submitted to NCBI genbank database. Nucleotide BLAST of this sequence at NCBI resulted 100% sequence similarity with L. calbasu sequence of the same region of cyt-b gene. Multiple sequence alignment of the sequence with seven more Labeo species sequences revealed 120 polymorphic sites, which have been mark of diversity among the species and might be used in molecular identification of the Labeo species. A constructed phylogenetic tree has shown relationship among the Labeo species. This research demonstrated the usefulness of mitochondrial DNA-based approach in species identification. Further, the data will provide appropriate background for studying genetic diversity within-species of the Labeo species in general and of L. calbasu in particular.
 J. Biodivers. Conserv. Bioresour. Manag. 2019, 5(1): 25-30
APA, Harvard, Vancouver, ISO, and other styles
36

Macas, Jiří, Pavel Neumann, Petr Novák, and Jiming Jiang. "Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data." Bioinformatics 26, no. 17 (2010): 2101–8. http://dx.doi.org/10.1093/bioinformatics/btq343.

Full text
Abstract:
Abstract Motivation: Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies. Results: k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats. Contact: macas@umbr.cas.cz Supplementary information: Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
37

Manoharan, Malini, Sayyed Auwn Muhammad, and Ramanathan Sowdhamini. "Sequence Analysis and Evolutionary Studies of Reelin Proteins." Bioinformatics and Biology Insights 9 (January 2015): BBI.S26530. http://dx.doi.org/10.4137/bbi.s26530.

Full text
Abstract:
The reelin gene is conserved across many vertebrate species, including humans. The protein product of this gene plays several important roles in early brain development and regulation of neural network plasticity of a matured brain structure. With an extended structure of 3461 amino acid sequences, consisting of eight reelin repeats, the human reelin sequence stands out as an exceptional model for evolutionary studies. In this study, sequence analysis of the human reelin and its homologues and reelin sequences from 104 other species is described in detail. Interesting sequence conservation patterns of individual repeats have been highlighted. Sequence phylogeny of the reelin sequences indicates a pattern similar to the evolution of the species, thereby serving as a highly conserved family for evolutionary purposes. Multiple sequence alignment of different reelin domain repeats, derived from homologues, suggests specific functions for individual repeats and high sequence conservation across reelin repeats from different organisms, albeit with few unusual domain architectures. A three-dimensional structural model of the full-length human reelin is now available that provides clues on residues at the dimer interface.
APA, Harvard, Vancouver, ISO, and other styles
38

Aurahs, Ralf, Markus GÖker, Guido W. Grimm, et al. "Using the Multiple Analysis Approach to Reconstruct Phylogenetic Relationships among Planktonic Foraminifera from Highly Divergent and Length-polymorphic SSU rDNA Sequences." Bioinformatics and Biology Insights 3 (January 2009): BBI.S3334. http://dx.doi.org/10.4137/bbi.s3334.

Full text
Abstract:
The high sequence divergence within the small subunit ribosomal RNA gene (SSU rDNA) of foraminifera makes it difficult to establish the homology of individual nucleotides across taxa. Alignment-based approaches so far relied on time-consuming manual alignments and discarded up to 50% of the sequenced nucleotides prior to phylogenetic inference. Here, we investigate the potential of the multiple analysis approach to infer a molecular phylogeny of all modern planktonic foraminiferal taxa by using a matrix of 146 new and 153 previously published SSU rDNA sequences. Our multiple analysis approach is based on eleven different automated alignments, analysed separately under the maximum likelihood criterion. The high degree of congruence between the phylogenies derived from our novel approach, traditional manually homologized culled alignments and the fossil record indicates that poorly resolved nucleotide homology does not represent the most significant obstacle when exploring the phylogenetic structure of the SSU rDNA in planktonic foraminifera. We show that approaches designed to extract phylogenetically valuable signals from complete sequences show more promise to resolve the backbone of the planktonic foraminifer tree than attempts to establish strictly homologous base calls in a manual alignment.
APA, Harvard, Vancouver, ISO, and other styles
39

SHU, JIAN-JUN, and YAJING LI. "HYPERCOMPLEX CROSS-CORRELATION OF DNA SEQUENCES." Journal of Biological Systems 18, no. 04 (2010): 711–25. http://dx.doi.org/10.1142/s0218339010003470.

Full text
Abstract:
A hypercomplex representation of DNA is proposed to facilitate comparing DNA sequences with fuzzy composition. With the hypercomplex number representation, the conventional sequence analysis method, such as, dot matrix analysis, dynamic programming, and cross-correlation method have been extended and improved to align DNA sequences with fuzzy composition. The hypercomplex dot matrix analysis can provide more control over the degree of alignment desired. A new scoring system has been proposed to accommodate the hypercomplex number representation of DNA and integrated with dynamic programming alignment method. By using hypercomplex cross-correlation, the match and mismatch alignment information between two aligned DNA sequences are separately stored in the resultant real part and imaginary parts respectively. The mismatch alignment information is very useful to refine consensus sequence based motif scanning.
APA, Harvard, Vancouver, ISO, and other styles
40

Mohamed, Eman M., Hamdy M. Mousa, and Arabi E. keshk. "Comparative Analysis of Multiple Sequence Alignment Tools." International Journal of Information Technology and Computer Science 10, no. 8 (2018): 24–30. http://dx.doi.org/10.5815/ijitcs.2018.08.04.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Rani, Sita, and Simarjeet Kaur. "Cluster Analysis Method for Multiple Sequence Alignment." International Journal of Computer Applications 43, no. 14 (2012): 19–25. http://dx.doi.org/10.5120/6171-8595.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Fan-Yun, Lin, Hu Yin-Gang, Song Guo-Qi, Zhang Hong, Liu Tian-Ming, and He Bei-Ru. "Isolation and analysis of genes induced by rehydration after serious drought in broomcorn millet (Panicum miliaceum L.) by SSH." Chinese Journal of Agricultural Biotechnology 3, no. 3 (2006): 237–42. http://dx.doi.org/10.1079/cjb2006119.

Full text
Abstract:
AbstractIn order to investigate the molecular mechanism of rehydration after serious drought in broomcorn millet (Panicum miliaceum L.), a forward subtracted cDNA library was constructed between normal watered leaves and rehydrated leaves after serious drought conditions, using the suppressive subtraction hybridization (SSH) technique. A total of 60 positive clones were picked out at random from the subtracted library and sequenced, and redundancy sequences were removed after sequence alignment. Based on the results of sequence homologous comparison and function querying, 32 expressed sequence tags (EST) were highly homologous with known ESTs. Most of those sequences were related to either abiotic or biotic stress in plants. Of those sequences, 11 ESTs were homologous with ESTs in rat (Rattus norvegicus) liver after partial hepatectomy. The Blast result of proteins revealed that 28 ESTs were similar to known proteins. The functions of these proteins mainly involve signal transduction, transcription and protein processing. This experiment demonstrated that a range of specific genes was induced and expressed in broomcorn millet during the rehydration stage after serious drought.
APA, Harvard, Vancouver, ISO, and other styles
43

GEREMIA, Roberto A., E. Alejandro PETRONI, Luis IELPI та Bernard HENRISSAT. "Towards a classification of glycosyltransferases based on amino acid sequence similarities: prokaryotic α-mannosyltransferases". Biochemical Journal 318, № 1 (1996): 133–38. http://dx.doi.org/10.1042/bj3180133.

Full text
Abstract:
A number of genes encoding bacterial glycosyltransferases have been sequenced during the last few years, but their low sequence similarity has prevented a straightforward grouping of these enzymes into families. The sequences of several bacterial α-mannosyltransferases have been compared using current alignment algorithms as well as hydrophobic cluster analysis (HCA). These sequences show a similarity which is significant but too low to be reliably aligned using automatic alignment methods. However, a region spanning approx. 270 residues in these proteins could be aligned by HCA, and several invariant amino acid residues were identified. These features were also found in several other glycosyltransferases, as well as in proteins of unknown function present in sequence databases. This similarity most probably reflects the existence of a family of proteins with conserved structural and mechanistic features. It is argued that the present IUBMB classification of glycosyltransferases could be complemented by a classification of these enzymes based on sequence similarities analogous to that which we proposed for glycosyl hydrolases [Henrissat, B. (1991) Biochem. J. 280, 309–316].
APA, Harvard, Vancouver, ISO, and other styles
44

Rautiainen, Mikko, Veli Mäkinen, and Tobias Marschall. "Bit-parallel sequence-to-graph alignment." Bioinformatics 35, no. 19 (2019): 3599–607. http://dx.doi.org/10.1093/bioinformatics/btz162.

Full text
Abstract:
Abstract Motivation Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include genome assembly, read error correction and variant calling with respect to a variation graph. Results We generalize two linear sequence-to-sequence algorithms to graphs: the Shift-And algorithm for exact matching and Myers’ bitvector algorithm for semi-global alignment. These linear algorithms are both based on processing w sequence characters with a constant number of operations, where w is the word size of the machine (commonly 64), and achieve a speedup of up to w over naive algorithms. For a graph with |V| nodes and |E| edges and a sequence of length m, our bitvector-based graph alignment algorithm reaches a worst case runtime of O(|V|+⌈mw⌉|E| log w) for acyclic graphs and O(|V|+m|E| log w) for arbitrary cyclic graphs. We apply it to five different types of graphs and observe a speedup between 3-fold and 20-fold compared with a previous (asymptotically optimal) alignment algorithm. Availability and implementation https://github.com/maickrau/GraphAligner Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
45

Kouser and Lalitha Rangarajan. "Promoter Sequence Analysis through No Gap Multiple Sequence Alignment of Motif Pairs." Procedia Computer Science 58 (2015): 356–62. http://dx.doi.org/10.1016/j.procs.2015.08.031.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Zhan, Qing, Yilei Fu, Qinghua Jiang, Bo Liu, Jiajie Peng, and Yadong Wang. "SpliVert: A Protein Multiple Sequence Alignment Refinement Method Based on Splitting-Splicing Vertically." Protein & Peptide Letters 27, no. 4 (2020): 295–302. http://dx.doi.org/10.2174/0929866526666190806143959.

Full text
Abstract:
Background: Multiple Sequence Alignment (MSA) is a fundamental task in bioinformatics and is required for many biological analysis tasks. The more accurate the alignments are, the more credible the downstream analyses. Most protein MSA algorithms realign an alignment to refine it by dividing it into two groups horizontally and then realign the two groups. However, this strategy does not consider that different regions of the sequences have different conservation; this property may lead to incorrect residue-residue or residue-gap pairs, which cannot be corrected by this strategy. Objective: In this article, our motivation is to develop a novel refinement method based on splitting- splicing vertically. Method: Here, we present a novel refinement method based on splitting-splicing vertically, called SpliVert. For an alignment, we split it vertically into 3 parts, remove the gap characters in the middle, realign the middle part alone, and splice the realigned middle parts with the other two initial pieces to obtain a refined alignment. In the realign procedure of our method, the aligner will only focus on a certain part, ignoring the disturbance of the other parts, which could help fix the incorrect pairs. Results: We tested our refinement strategy for 2 leading MSA tools on 3 standard benchmarks, according to the commonly used average SP (and TC) score. The results show that given appropriate proportions to split the initial alignment, the average scores are increased comparably or slightly after using our method. We also compared the alignments refined by our method with alignments directly refined by the original alignment tools. The results suggest that using our SpliVert method to refine alignments can also outperform direct use of the original alignment tools. Conclusion: The results reveal that splitting vertically and realigning part of the alignment is a good strategy for the refinement of protein multiple sequence alignments.
APA, Harvard, Vancouver, ISO, and other styles
47

PENG, YUNG-HSING, CHANG-BIAU YANG, KUO-TSUNG TSENG, and KUO-SI HUANG. "AN ALGORITHM AND APPLICATIONS TO SEQUENCE ALIGNMENT WITH WEIGHTED CONSTRAINTS." International Journal of Foundations of Computer Science 21, no. 01 (2010): 51–59. http://dx.doi.org/10.1142/s012905411000712x.

Full text
Abstract:
Given two sequences S1, S2, and a constrained sequence C, a longest common subsequence of S1, S2 with restriction to C is called a constrained longest common subsequence of S1 and S2 with C. At the same time, an optimal alignment of S1, S2 with restriction to C is called a constrained pairwise sequence alignment of S1 and S2 with C. Previous algorithms have shown that the constrained longest common subsequence problem is a special case of the constrained pairwise sequence alignment problem, and that both of them can be solved in O(rnm) time, where r, n, and m represent the lengths of C, S1, and S2, respectively. In this paper, we extend the definition of constrained pairwise sequence alignment to a more flexible version, called weighted constrained pairwise sequence alignment, in which some constraints might be ignored. We first give an O(rnm)-time algorithm for solving the weighted constrained pairwise sequence alignment problem, then show that our extension can be adopted to solve some constraint-related problems that cannot be solved by previous algorithms for the constrained longest common subsequence problem or the constrained pairwise sequence alignment problem. Therefore, in contrast to previous results, our extension is a new and suitable model for sequence analysis.
APA, Harvard, Vancouver, ISO, and other styles
48

Vulandari, Retno Tri, Sri Siswanti, Andriani Kusumaningrum Kusumawijaya, and Kumaratih Sandradewi. "Implementation of Basic Local Alignment Search for Detection H1N1 Sequence Alignment." International Journal of Trends in Mathematics Education Research 2, no. 1 (2019): 9. http://dx.doi.org/10.33122/ijtmer.v2i1.20.

Full text
Abstract:
Bioinformatics is a science that studies the management and analysis of biological information. Bioinformatics includes application of mathematics, statistics, and informatics to biological problems to solve. Bioinformatics can store data generated by the genome project with regular and high degree of accuracy. Basic local alignment search is one of the methods used to process penyejajaran molecular data sequences. In 2009, there is a virus that attacks the respiratory tract that is the swine flu. The virus is spread around the world, so that retrieved the journal research on diverse virus DNA sequences in different endemic countries. Therefore, in this study will be explained about the process sequence alignment of the H1N1 swine flu virus. H1N1 Weiss AF 250365.2 and H1N1 Swine AF250364.2 have 90% similarity level.
APA, Harvard, Vancouver, ISO, and other styles
49

Urgese, Gianvito, Emanuele Parisi, Orazio Scicolone, Santa Di Cataldo, and Elisa Ficarra. "BioSeqZip: a collapser of NGS redundant reads for the optimization of sequence analysis." Bioinformatics 36, no. 9 (2020): 2705–11. http://dx.doi.org/10.1093/bioinformatics/btaa051.

Full text
Abstract:
Abstract Motivation High-throughput next-generation sequencing can generate huge sequence files, whose analysis requires alignment algorithms that are typically very demanding in terms of memory and computational resources. This is a significant issue, especially for machines with limited hardware capabilities. As the redundancy of the sequences typically increases with coverage, collapsing such files into compact sets of non-redundant reads has the 2-fold advantage of reducing file size and speeding-up the alignment, avoiding to map the same sequence multiple times. Method BioSeqZip generates compact and sorted lists of alignment-ready non-redundant sequences, keeping track of their occurrences in the raw files as well as of their quality score information. By exploiting a memory-constrained external sorting algorithm, it can be executed on either single- or multi-sample datasets even on computers with medium computational capabilities. On request, it can even re-expand the compacted files to their original state. Results Our extensive experiments on RNA-Seq data show that BioSeqZip considerably brings down the computational costs of a standard sequence analysis pipeline, with particular benefits for the alignment procedures that typically have the highest requirements in terms of memory and execution time. In our tests, BioSeqZip was able to compact 2.7 billion of reads into 963 million of unique tags reducing the size of sequence files up to 70% and speeding-up the alignment by 50% at least. Availability and implementation BioSeqZip is available at https://github.com/bioinformatics-polito/BioSeqZip. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
50

Coutinho, Luiz L., Lakshmi K. Matukumalli, Tad S. Sonstegard, et al. "Discovery and profiling of bovine microRNAs from immune-related and embryonic tissues." Physiological Genomics 29, no. 1 (2007): 35–43. http://dx.doi.org/10.1152/physiolgenomics.00081.2006.

Full text
Abstract:
MicroRNAs are small ∼22 nucleotide-long noncoding RNAs capable of controlling gene expression by inhibiting translation. Alignment of human microRNA stem-loop sequences (mir) against a recent draft sequence assembly of the bovine genome resulted in identification of 334 predicted bovine mir. We sequenced five tissue-specific cDNA libraries derived from the small RNA fractions of bovine embryo, thymus, small intestine, and lymph node to validate these predictions and identify new mir. This strategy combined with comparative sequence analysis identified 129 sequences that corresponded to mature microRNAs (miR). A total of 107 sequences aligned to known human mir, and 100 of these matched expressed miR. The other seven sequences represented novel miR expressed from the complementary strand of previously characterized human mir. The 22 sequences without matches displayed characteristic mir secondary structures when folded in silico, and 10 of these retained sequence conservation with other vertebrate species. Expression analysis based on sequence identity counts revealed that some miR were preferentially expressed in certain tissues, while bta-miR-26a and bta-miR-103 were prevalent in all tissues examined. These results support the premise that species differences in regulation of gene expression by miR occur primarily at the level of expression and processing.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!