To see the other types of publications on this topic, follow the link: Bioinformatics. Indexing.

Journal articles on the topic 'Bioinformatics. Indexing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Bioinformatics. Indexing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Morgulis, Aleksandr, George Coulouris, Yan Raytselis, Thomas L. Madden, Richa Agarwala, and Alejandro A. Schäffer. "Database indexing for production MegaBLAST searches." Bioinformatics 24, no. 16 (2008): 1757–64. http://dx.doi.org/10.1093/bioinformatics/btn322.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Morgulis, A., G. Coulouris, Y. Raytselis, T. L. Madden, R. Agarwala, and A. A. Schaffer. "Database indexing for production MegaBLAST searches." Bioinformatics 24, no. 24 (2008): 2942. http://dx.doi.org/10.1093/bioinformatics/btn554.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Dai, Suyang, Ronghui You, Zhiyong Lu, Xiaodi Huang, Hiroshi Mamitsuka, and Shanfeng Zhu. "FullMeSH: improving large-scale MeSH indexing with full text." Bioinformatics 36, no. 5 (2019): 1533–41. http://dx.doi.org/10.1093/bioinformatics/btz756.

Full text
Abstract:
Abstract Motivation With the rapidly growing biomedical literature, automatically indexing biomedical articles by Medical Subject Heading (MeSH), namely MeSH indexing, has become increasingly important for facilitating hypothesis generation and knowledge discovery. Over the past years, many large-scale MeSH indexing approaches have been proposed, such as Medical Text Indexer, MeSHLabeler, DeepMeSH and MeSHProbeNet. However, the performance of these methods is hampered by using limited information, i.e. only the title and abstract of biomedical articles. Results We propose FullMeSH, a large-scale MeSH indexing method taking advantage of the recent increase in the availability of full text articles. Compared to DeepMeSH and other state-of-the-art methods, FullMeSH has three novelties: (i) Instead of using a full text as a whole, FullMeSH segments it into several sections with their normalized titles in order to distinguish their contributions to the overall performance. (ii) FullMeSH integrates the evidence from different sections in a ‘learning to rank’ framework by combining the sparse and deep semantic representations. (iii) FullMeSH trains an Attention-based Convolutional Neural Network for each section, which achieves better performance on infrequent MeSH headings. FullMeSH has been developed and empirically trained on the entire set of 1.4 million full-text articles in the PubMed Central Open Access subset. It achieved a Micro F-measure of 66.76% on a test set of 10 000 articles, which was 3.3% and 6.4% higher than DeepMeSH and MeSHLabeler, respectively. Furthermore, FullMeSH demonstrated an average improvement of 4.7% over DeepMeSH for indexing Check Tags, a set of most frequently indexed MeSH headings. Availability and implementation The software is available upon request. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
4

Lam, T. W., W. K. Sung, S. L. Tam, C. K. Wong, and S. M. Yiu. "Compressed indexing and local alignment of DNA." Bioinformatics 24, no. 6 (2008): 791–97. http://dx.doi.org/10.1093/bioinformatics/btn032.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Gülsoy, Günhan, and Tamer Kahveci. "RINQ: Reference-based Indexing for Network Queries." Bioinformatics 27, no. 13 (2011): i149—i158. http://dx.doi.org/10.1093/bioinformatics/btr203.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Klötzl, Fabian, and Bernhard Haubold. "Phylonium: fast estimation of evolutionary distances from large samples of similar genomes." Bioinformatics 36, no. 7 (2019): 2040–46. http://dx.doi.org/10.1093/bioinformatics/btz903.

Full text
Abstract:
Abstract Motivation Tracking disease outbreaks by whole-genome sequencing leads to the collection of large samples of closely related sequences. Five years ago, we published a method to accurately compute all pairwise distances for such samples by indexing each sequence. Since indexing is slow, we now ask whether it is possible to achieve similar accuracy when indexing only a single sequence. Results We have implemented this idea in the program phylonium and show that it is as accurate as its predecessor and roughly 100 times faster when applied to all 2678 Escherichia coli genomes contained in ENSEMBL. One of the best published programs for rapidly computing pairwise distances, mash, analyzes the same dataset four times faster but, with default settings, it is less accurate than phylonium. Availability and implementation Phylonium runs under the UNIX command line; its C++ sources and documentation are available from github.com/evolbioinf/phylonium. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
7

Garzon, Max H., Kiran C. Bobba, Andrew Neel, and Vinhthuy Phan. "DNA-Based Indexing." International Journal of Nanotechnology and Molecular Computation 2, no. 3 (2010): 25–45. http://dx.doi.org/10.4018/jnmc.2010070102.

Full text
Abstract:
DNA has been acknowledged as a suitable medium for massively parallel computing and as a “smart” glue for self-assembly. In this paper, a third capability of DNA is described in detail as memory capable of encoding and processing large amounts of data so that information can be retrieved associatively based on content. The technique is based on a novel representation of data on DNA that can shed information on the way DNA-, RNA- and other biomolecules encode information, which may be potentially important in applications to fields like bioinformatics and genetics, and natural language processing. Analyses are also provided of the sensitivity, robustness, and bounds on the theoretical capacity of the memories. Finally, the potential use of the memories are illustrated with two applications, one in genomic analysis for identification and classification, another in information retrieval from text data in abiotic form.
APA, Harvard, Vancouver, ISO, and other styles
8

Chang, Xian, Jordan Eizenga, Adam M. Novak, Jouni Sirén, and Benedict Paten. "Distance indexing and seed clustering in sequence graphs." Bioinformatics 36, Supplement_1 (2020): i146—i153. http://dx.doi.org/10.1093/bioinformatics/btaa446.

Full text
Abstract:
Abstract Motivation Graph representations of genomes are capable of expressing more genetic variation and can therefore better represent a population than standard linear genomes. However, due to the greater complexity of genome graphs relative to linear genomes, some functions that are trivial on linear genomes become much more difficult in genome graphs. Calculating distance is one such function that is simple in a linear genome but complicated in a graph context. In read mapping algorithms such distance calculations are fundamental to determining if seed alignments could belong to the same mapping. Results We have developed an algorithm for quickly calculating the minimum distance between positions on a sequence graph using a minimum distance index. We have also developed an algorithm that uses the distance index to cluster seeds on a graph. We demonstrate that our implementations of these algorithms are efficient and practical to use for a new generation of mapping algorithms based upon genome graphs. Availability and implementation Our algorithms have been implemented as part of the vg toolkit and are available at https://github.com/vgteam/vg.
APA, Harvard, Vancouver, ISO, and other styles
9

Camoglu, O., T. Kahveci, and A. K. Singh. "PSI: indexing protein structures for fast similarity search." Bioinformatics 19, Suppl 1 (2003): i81—i83. http://dx.doi.org/10.1093/bioinformatics/btg1009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Liu, F., T. K. Jenssen, V. Nygaard, J. Sack, and E. Hovig. "FigSearch: a figure legend indexing and classification system." Bioinformatics 20, no. 16 (2004): 2880–82. http://dx.doi.org/10.1093/bioinformatics/bth316.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Xun, Guangxu, Kishlay Jha, Ye Yuan, Yaqing Wang, and Aidong Zhang. "MeSHProbeNet: a self-attentive probe net for MeSH indexing." Bioinformatics 35, no. 19 (2019): 3794–802. http://dx.doi.org/10.1093/bioinformatics/btz142.

Full text
Abstract:
Abstract Motivation MEDLINE is the primary bibliographic database maintained by National Library of Medicine (NLM). MEDLINE citations are indexed with Medical Subject Headings (MeSH), which is a controlled vocabulary curated by the NLM experts. This greatly facilitates the applications of biomedical research and knowledge discovery. Currently, MeSH indexing is manually performed by human experts. To reduce the time and monetary cost associated with manual annotation, many automatic MeSH indexing systems have been proposed to assist manual annotation, including DeepMeSH and NLM’s official model Medical Text Indexer (MTI). However, the existing models usually rely on the intermediate results of other models and suffer from efficiency issues. We propose an end-to-end framework, MeSHProbeNet (formerly named as xgx), which utilizes deep learning and self-attentive MeSH probes to index MeSH terms. Each MeSH probe enables the model to extract one specific aspect of biomedical knowledge from an input article, thus comprehensive biomedical information can be extracted with different MeSH probes and interpretability can be achieved at word level. MeSH terms are finally recommended with a unified classifier, making MeSHProbeNet both time efficient and space efficient. Results MeSHProbeNet won the first place in the latest batch of Task A in the 2018 BioASQ challenge. The result on the last test set of the challenge is reported in this paper. Compared with other state-of-the-art models, such as MTI and DeepMeSH, MeSHProbeNet achieves the highest scores in all the F-measures, including Example Based F-Measure, Macro F-Measure, Micro F-Measure, Hierarchical F-Measure and Lowest Common Ancestor F-measure. We also intuitively show how MeSHProbeNet is able to extract comprehensive biomedical knowledge from an input article.
APA, Harvard, Vancouver, ISO, and other styles
12

Kahveci, T., V. Ljosa, and A. K. Singh. "Speeding up whole-genome alignment by indexing frequency vectors." Bioinformatics 20, no. 13 (2004): 2122–34. http://dx.doi.org/10.1093/bioinformatics/bth212.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Homayouni, R., K. Heinrich, L. Wei, and M. W. Berry. "Gene clustering by Latent Semantic Indexing of MEDLINE abstracts." Bioinformatics 21, no. 1 (2004): 104–15. http://dx.doi.org/10.1093/bioinformatics/bth464.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Peng, Shengwen, Ronghui You, Hongning Wang, Chengxiang Zhai, Hiroshi Mamitsuka, and Shanfeng Zhu. "DeepMeSH: deep semantic representation for improving large-scale MeSH indexing." Bioinformatics 32, no. 12 (2016): i70—i79. http://dx.doi.org/10.1093/bioinformatics/btw294.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Marchet, Camille, Zamin Iqbal, Daniel Gautheret, Mikaël Salson, and Rayan Chikhi. "REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets." Bioinformatics 36, Supplement_1 (2020): i177—i185. http://dx.doi.org/10.1093/bioinformatics/btaa487.

Full text
Abstract:
Abstract Motivation In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets. Results We used REINDEER to index the abundances of sequences within 2585 human RNA-seq experiments in 45 h using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of ∼4 billion distinct k-mers across 2585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph of each dataset, then conceptually merges those de Bruijn graphs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups of k-mers of similar abundances. Availability and implementation https://github.com/kamimrcht/REINDEER. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
16

Etzold, Thure, and Patrick Argos. "SRS—an indexing and retrieval tool for flat file data libraries." Bioinformatics 9, no. 1 (1993): 49–57. http://dx.doi.org/10.1093/bioinformatics/9.1.49.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Tchechmedjiev, Andon, Amine Abdaoui, Vincent Emonet, Soumia Melzi, Jitendra Jonnagaddala, and Clement Jonquet. "Enhanced functionalities for annotating and indexing clinical text with the NCBO Annotator+." Bioinformatics 34, no. 11 (2018): 1962–65. http://dx.doi.org/10.1093/bioinformatics/bty009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Walenz, Brian, and Liliana Florea. "Sim4db and Leaff: utilities for fast batch spliced alignment and sequence indexing." Bioinformatics 27, no. 13 (2011): 1869–70. http://dx.doi.org/10.1093/bioinformatics/btr285.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Ramu, C. "SIR: a simple indexing and retrieval system for biological flat file databases." Bioinformatics 17, no. 8 (2001): 756–58. http://dx.doi.org/10.1093/bioinformatics/17.8.756.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Liu, Ke, Shengwen Peng, Junqiu Wu, Chengxiang Zhai, Hiroshi Mamitsuka, and Shanfeng Zhu. "MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence." Bioinformatics 31, no. 12 (2015): i339—i347. http://dx.doi.org/10.1093/bioinformatics/btv237.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

ZHANG, MENG, LIANG HU, and YI ZHANG. "WEIGHTED AUTOMATA FOR FULL-TEXT INDEXING." International Journal of Foundations of Computer Science 22, no. 04 (2011): 921–43. http://dx.doi.org/10.1142/s0129054111008490.

Full text
Abstract:
Full-text index structures are widely used in string matching and bioinformatics. These structures such as DAWGs and suffix trees allow fast searches on texts. In this paper, we present a new partition of the factors of a word, called a consistent minimal linear partition. Based on this partition, we introduce the weighted directed word graph (WDWG), a space-economical full-text index. WDWGs are basically cyclic, which means that they may accept infinite strings. But by assigning weights to edges, the acceptable strings are limited only to the factors of the input string. For a given word w, any factor of w can be indexed by a state of the WDWG and its length. A WDWG of w has at most |w| states and 2|w| - 1 transition edges. We present an on-line algorithm to construct a WDWG for a given word in time linear in the length of the word. Our experiment shows the size of WDWGs is smaller than that of DAWGs for many data sets including DNA sequences, Chinese texts and English texts.
APA, Harvard, Vancouver, ISO, and other styles
22

Piro, Vitor C., Temesgen H. Dadi, Enrico Seiler, Knut Reinert, and Bernhard Y. Renard. "ganon: precise metagenomics classification against large and up-to-date sets of reference sequences." Bioinformatics 36, Supplement_1 (2020): i12—i20. http://dx.doi.org/10.1093/bioinformatics/btaa458.

Full text
Abstract:
Abstract Motivation The exponential growth of assembled genome sequences greatly benefits metagenomics studies. However, currently available methods struggle to manage the increasing amount of sequences and their frequent updates. Indexing the current RefSeq can take days and hundreds of GB of memory on large servers. Few methods address these issues thus far, and even though many can theoretically handle large amounts of references, time/memory requirements are prohibitive in practice. As a result, many studies that require sequence classification use often outdated and almost never truly up-to-date indices. Results Motivated by those limitations, we created ganon, a k-mer-based read classification tool that uses Interleaved Bloom Filters in conjunction with a taxonomic clustering and a k-mer counting/filtering scheme. Ganon provides an efficient method for indexing references, keeping them updated. It requires <55 min to index the complete RefSeq of bacteria, archaea, fungi and viruses. The tool can further keep these indices up-to-date in a fraction of the time necessary to create them. Ganon makes it possible to query against very large reference sets and therefore it classifies significantly more reads and identifies more species than similar methods. When classifying a high-complexity CAMI challenge dataset against complete genomes from RefSeq, ganon shows strongly increased precision with equal or better sensitivity compared with state-of-the-art tools. With the same dataset against the complete RefSeq, ganon improved the F1-score by 65% at the genus level. It supports taxonomy- and assembly-level classification, multiple indices and hierarchical classification. Availability and implementation The software is open-source and available at: https://gitlab.com/rki_bioinformatics/ganon. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
23

Kern, Fabian, Tobias Fehlmann, and Andreas Keller. "On the lifetime of bioinformatics web services." Nucleic Acids Research 48, no. 22 (2020): 12523–33. http://dx.doi.org/10.1093/nar/gkaa1125.

Full text
Abstract:
Abstract Web services are used through all disciplines in life sciences and the online landscape is growing by hundreds of novel servers annually. However, availability varies, and maintenance practices are largely inconsistent. We screened the availability of 2396 web tools published during the past 10 years. All servers were accessed over 133 days and 318 668 index files were stored in a local database. The number of accessible tools almost linearly increases in time with highest availability for 2019 and 2020 (∼90%) and lowest for tools published in 2010 (∼50%). In a 133-day test frame, 31% of tools were always working, 48.4% occasionally and 20.6% never. Consecutive downtimes were typically below 5 days with a median of 1 day, and unevenly distributed over the weekdays. A rescue experiment on 47 tools that were published from 2019 onwards but never accessible showed that 51.1% of the tools could be restored in due time. We found a positive association between the number of citations and the probability of a web server being reachable. We then determined common challenges and formulated categorical recommendations for researchers planning to develop web-based resources. As implication of our study, we propose to develop a repository for automatic API testing and sustainability indexing.
APA, Harvard, Vancouver, ISO, and other styles
24

Navarro, Gonzalo. "Indexing Highly Repetitive String Collections, Part II." ACM Computing Surveys 54, no. 2 (2021): 1–32. http://dx.doi.org/10.1145/3432999.

Full text
Abstract:
Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through applications like bioinformatics, the string collections experienced a growth that outperforms Moore’s Law and challenges our ability of handling them even in compressed form. It turns out, fortunately, that many of these rapidly growing string collections are highly repetitive, so that their information content is orders of magnitude lower than their plain size. The statistical compression methods used for classical collections, however, are blind to this repetitiveness, and therefore a new set of techniques has been developed to properly exploit it. The resulting indexes form a new generation of data structures able to handle the huge repetitive string collections that we are facing. In this survey, formed by two parts, we cover the algorithmic developments that have led to these data structures. In this second part, we describe the fundamental algorithmic ideas and data structures that form the base of all the existing indexes, and the various concrete structures that have been proposed, comparing them both in theoretical and practical aspects, and uncovering some new combinations. We conclude with the current challenges in this fascinating field.
APA, Harvard, Vancouver, ISO, and other styles
25

Wehbe, Firas H., Steven H. Brown, Pierre P. Massion, Cynthia S. Gadd, Daniel R. Masys, and Constantin F. Aliferis. "A Novel Information Retrieval Model for High-Throughput Molecular Medicine Modalities." Cancer Informatics 8 (January 2009): CIN.S964. http://dx.doi.org/10.4137/cin.s964.

Full text
Abstract:
Significant research has been devoted to predicting diagnosis, prognosis, and response to treatment using high-throughput assays. Rapid translation into clinical results hinges upon efficient access to up-to-date and high-quality molecular medicine modalities. We first explain why this goal is inadequately supported by existing databases and portals and then introduce a novel semantic indexing and information retrieval model for clinical bioinformatics. The formalism provides the means for indexing a variety of relevant objects (e.g. papers, algorithms, signatures, datasets) and includes a model of the research processes that creates and validates these objects in order to support their systematic presentation once retrieved. We test the applicability of the model by constructing proof-of-concept encodings and visual presentations of evidence and modalities in molecular profiling and prognosis of: (a) diffuse large B-cell lymphoma (DLBCL) and (b) breast cancer.
APA, Harvard, Vancouver, ISO, and other styles
26

Navarro, Gonzalo. "Indexing Highly Repetitive String Collections, Part I." ACM Computing Surveys 54, no. 2 (2021): 1–31. http://dx.doi.org/10.1145/3434399.

Full text
Abstract:
Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through applications like bioinformatics, the string collections experienced a growth that outperforms Moore’s Law and challenges our ability to handle them even in compressed form. It turns out, fortunately, that many of these rapidly growing string collections are highly repetitive, so that their information content is orders of magnitude lower than their plain size. The statistical compression methods used for classical collections, however, are blind to this repetitiveness, and therefore a new set of techniques has been developed to properly exploit it. The resulting indexes form a new generation of data structures able to handle the huge repetitive string collections that we are facing. In this survey, formed by two parts, we cover the algorithmic developments that have led to these data structures. In this first part, we describe the distinct compression paradigms that have been used to exploit repetitiveness, and the algorithmic techniques that provide direct access to the compressed strings. In the quest for an ideal measure of repetitiveness, we uncover a fascinating web of relations between those measures, as well as the limits up to which the data can be recovered, and up to which direct access to the compressed data can be provided. This is the basic aspect of indexability, which is covered in the second part of this survey.
APA, Harvard, Vancouver, ISO, and other styles
27

Crochemore, Maxime, Alessio Langiu, and M. Sohel Rahman. "Indexing a sequence for mapping reads with a single mismatch." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 372, no. 2016 (2014): 20130167. http://dx.doi.org/10.1098/rsta.2013.0167.

Full text
Abstract:
Mapping reads against a genome sequence is an interesting and useful problem in computational molecular biology and bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the next generation sequencing. In the sequel, we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in time and space and can answer subsequent queries in time. Here, n is the length of the sequence, m is the length of the read, 0< ε <1 and is the optimal output size.
APA, Harvard, Vancouver, ISO, and other styles
28

Su, X., J. Xu, and K. Ning. "Meta-Storms: efficient search for similar microbial communities based on a novel indexing scheme and similarity score for metagenomic data." Bioinformatics 28, no. 19 (2012): 2493–501. http://dx.doi.org/10.1093/bioinformatics/bts470.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Blumenthal, David B., Nicolas Boria, Sébastien Bougleux, Luc Brun, Johann Gamper, and Benoit Gaüzère. "Scalable generalized median graph estimation and its manifold use in bioinformatics, clustering, classification, and indexing." Information Systems 100 (September 2021): 101766. http://dx.doi.org/10.1016/j.is.2021.101766.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Kuksa, Pavel P., Chien-Yueh Lee, Alexandre Amlie-Wolf, et al. "SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants." Bioinformatics 36, no. 12 (2020): 3879–81. http://dx.doi.org/10.1093/bioinformatics/btaa246.

Full text
Abstract:
Abstract Summary We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. Availability and implementation SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. Contact lswang@pennmedicine.upenn.edu Supplementary information Supplementary data are available at Bioinformatics online
APA, Harvard, Vancouver, ISO, and other styles
31

Shibuya, Yoshihiro, and Matteo Comin. "Indexing k-mers in linear space for quality value compression." Journal of Bioinformatics and Computational Biology 17, no. 05 (2019): 1940011. http://dx.doi.org/10.1142/s0219720019400110.

Full text
Abstract:
Many bioinformatics tools heavily rely on [Formula: see text]-mer dictionaries to describe the composition of sequences and allow for faster reference-free algorithms or look-ups. Unfortunately, naive [Formula: see text]-mer dictionaries are very memory-inefficient, requiring very large amount of storage space to save each [Formula: see text]-mer. This problem is generally worsened by the necessity of an index for fast queries. In this work, we discuss how to build an indexed linear reference containing a set of input [Formula: see text]-mers and its application to the compression of quality scores in FASTQ files. Most of the entropies of sequencing data lie in the quality scores, and thus they are difficult to compress. Here, we present an application to improve the compressibility of quality values while preserving the information for SNP calling. We show how a dictionary of significant [Formula: see text]-mers, obtained from SNP databases and multiple genomes, can be indexed in linear space and used to improve the compression of quality value. Availability: The software is freely available at https://github.com/yhhshb/yalff .
APA, Harvard, Vancouver, ISO, and other styles
32

Lian, Qiuyu, Hongyi Xin, Jianzhu Ma, et al. "Artificial-cell-type aware cell-type classification in CITE-seq." Bioinformatics 36, Supplement_1 (2020): i542—i550. http://dx.doi.org/10.1093/bioinformatics/btaa467.

Full text
Abstract:
Abstract Motivation Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq), couples the measurement of surface marker proteins with simultaneous sequencing of mRNA at single cell level, which brings accurate cell surface phenotyping to single-cell transcriptomics. Unfortunately, multiplets in CITE-seq datasets create artificial cell types (ACT) and complicate the automation of cell surface phenotyping. Results We propose CITE-sort, an artificial-cell-type aware surface marker clustering method for CITE-seq. CITE-sort is aware of and is robust to multiplet-induced ACT. We benchmarked CITE-sort with real and simulated CITE-seq datasets and compared CITE-sort against canonical clustering methods. We show that CITE-sort produces the best clustering performance across the board. CITE-sort not only accurately identifies real biological cell types (BCT) but also consistently and reliably separates multiplet-induced artificial-cell-type droplet clusters from real BCT droplet clusters. In addition, CITE-sort organizes its clustering process with a binary tree, which facilitates easy interpretation and verification of its clustering result and simplifies cell-type annotation with domain knowledge in CITE-seq. Availability and implementation http://github.com/QiuyuLian/CITE-sort. Supplementary information Supplementary data is available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
33

Jung, H. Y., and H. G. Cho. "An automatic block and spot indexing with k-nearest neighbors graph for microarray image analysis." Bioinformatics 18, Suppl 2 (2002): S141—S151. http://dx.doi.org/10.1093/bioinformatics/18.suppl_2.s141.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Maharaj, Sridevi, Brennan Tracy, and Wayne B. Hayes. "BLANT—fast graphlet sampling tool." Bioinformatics 35, no. 24 (2019): 5363–64. http://dx.doi.org/10.1093/bioinformatics/btz603.

Full text
Abstract:
Abstract Summary BLAST creates local sequence alignments by first building a database of small k-letter sub-sequences called k-mers. Identical k-mers from different regions provide ‘seeds’ for longer local alignments. This seed-and-extend heuristic makes BLAST extremely fast and has led to its almost exclusive use despite the existence of more accurate, but slower, algorithms. In this paper, we introduce the Basic Local Alignment for Networks Tool (BLANT). BLANT is the analog of BLAST, but for networks: given an input graph, it samples small, induced, k-node sub-graphs called k-graphlets. Graphlets have been used to classify networks, quantify structure, align networks both locally and globally, identify topology-function relationships and build taxonomic trees without the use of sequences. Given an input network, BLANT produces millions of graphlet samples in seconds—orders of magnitude faster than existing methods. BLANT offers sampled graphlets in various forms: distributions of graphlets or their orbits; graphlet degree or graphlet orbit degree vectors, the latter being compatible with ORCA; or an index to be used as the basis for seed-and-extend local alignments. We demonstrate BLANT’s usefelness by using its indexing mode to find functional similarity between yeast and human PPI networks. Availability and implementation BLANT is written in C and is available at https://github.com/waynebhayes/BLANT/releases. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
35

Kim, Hani Jieun, Yingxin Lin, Thomas A. Geddes, Jean Yee Hwa Yang, and Pengyi Yang. "CiteFuse enables multi-modal analysis of CITE-seq data." Bioinformatics 36, no. 14 (2020): 4137–43. http://dx.doi.org/10.1093/bioinformatics/btaa282.

Full text
Abstract:
Abstract Motivation Multi-modal profiling of single cells represents one of the latest technological advancements in molecular biology. Among various single-cell multi-modal strategies, cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) allows simultaneous quantification of two distinct species: RNA and cell-surface proteins. Here, we introduce CiteFuse, a streamlined package consisting of a suite of tools for doublet detection, modality integration, clustering, differential RNA and protein expression analysis, antibody-derived tag evaluation, ligand–receptor interaction analysis and interactive web-based visualization of CITE-seq data. Results We demonstrate the capacity of CiteFuse to integrate the two data modalities and its relative advantage against data generated from single-modality profiling using both simulations and real-world CITE-seq data. Furthermore, we illustrate a novel doublet detection method based on a combined index of cell hashing and transcriptome data. Finally, we demonstrate CiteFuse for predicting ligand–receptor interactions by using multi-modal CITE-seq data. Collectively, we demonstrate the utility and effectiveness of CiteFuse for the integrative analysis of transcriptome and epitope profiles from CITE-seq data. Availability and implementation CiteFuse is freely available at http://shiny.maths.usyd.edu.au/CiteFuse/ as an online web service and at https://github.com/SydneyBioX/CiteFuse/ as an R package. Contact pengyi.yang@sydney.edu.au Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
36

Mwambe, Othmar Othmar, Phan Xuan Tan, and Eiji Kamioka. "Bioinformatics-Based Adaptive System towards Real-Time Dynamic E-learning Content Personalization." Education Sciences 10, no. 2 (2020): 42. http://dx.doi.org/10.3390/educsci10020042.

Full text
Abstract:
Adaptive Educational Hypermedia Systems (AEHS) play a crucial role in supporting adaptive learning and immensely outperform learner-control based systems. AEHS’ page indexing and hyperspace rely mostly on navigation supports which provide the learners with a user-friendly interactive learning environment. Such AEHS features provide the systems with a unique ability to adapt learners’ preferences. However, obtaining timely and accurate information for their adaptive decision-making process is still a challenge due to the dynamic understanding of individual learner. This causes a spontaneous changing of learners’ learning styles that makes hard for system developers to integrate learning objects with learning styles on real-time basis. Thus, in previous research studies, multiple levels navigation supports have been applied to solve this problem. However, this approach destroys their learning motivation because of imposing time and work overload on learners. To address such a challenge, this study proposes a bioinformatics-based adaptive navigation support that was initiated by the alternation of learners’ motivation states on a real-time basis. EyeTracking sensor and adaptive time-locked Learning Objects (LOs) were used. Hence, learners’ pupil size dilation and reading and reaction time were used for the adaption process and evaluation. The results show that the proposed approach improved the AEHS adaptive process and increased learners’ performance up to 78%.
APA, Harvard, Vancouver, ISO, and other styles
37

COMIN, MATTEO, CARLO FERRARI, and CONCETTINA GUERRA. "GRID DEPLOYMENT OF BIOINFORMATICS APPLICATIONS: A CASE STUDY IN PROTEIN SIMILARITY DETERMINATION." Parallel Processing Letters 14, no. 02 (2004): 163–76. http://dx.doi.org/10.1142/s0129626404001817.

Full text
Abstract:
In this paper we present a scenario for the grid immersion of the procedures that solve the protein structural similarity determination problem. The emphasis is on the way various computational components and data resources are tied together into a workflow to be executed on a grid. The grid deployment has been organized according to the bag-of-service model: a set of different modules (with their data set) is made available to the application designers. Each module deals with a specific subproblem using a proper protein data representation. At the design level, the process of task selection produces a first general workflow that establishes which subproblems need to be solved and their temporal relations. A further refinement requires to select a procedure for each previously identified task that solves it: the choice is made among different available methods and representations. The final outcome is an instance of the workflow ready for execution on a grid. Our approach to protein structure comparison is based on a combination of indexing and dynamic programming techniques to achieve fast and reliable matching. All the components have been implemented on a grid infrastructure using Globus, and the overall tool has been tested by choosing proteins from different fold classes. The obtained results are compared against SCOP, a standard tool for the classification of known proteins.
APA, Harvard, Vancouver, ISO, and other styles
38

Liu, Yuansheng, Leo Yu Zhang, and Jinyan Li. "Fast detection of maximal exact matches via fixed sampling of query K-mers and Bloom filtering of index K-mers." Bioinformatics 35, no. 22 (2019): 4560–67. http://dx.doi.org/10.1093/bioinformatics/btz273.

Full text
Abstract:
Abstract Motivation Detection of maximal exact matches (MEMs) between two long sequences is a fundamental problem in pairwise reference-query genome comparisons. To efficiently compare larger and larger genomes, reducing the number of indexed k-mers as well as the number of query k-mers has been adopted as a mainstream approach which saves the computational resources by avoiding a significant number of unnecessary matches. Results Under this framework, we proposed a new method to detect all MEMs from a pair of genomes. The method first performs a fixed sampling of k-mers on the query sequence, and adds these selected k-mers to a Bloom filter. Then all the k-mers of the reference sequence are tested by the Bloom filter. If a k-mer passes the test, it is inserted into a hash table for indexing. Compared with the existing methods, much less number of query k-mers are generated and much less k-mers are inserted into the index to avoid unnecessary matches, leading to an efficient matching process and memory usage savings. Experiments on large genomes demonstrate that our method is at least 1.8 times faster than the best of the existing algorithms. This performance is mainly attributed to the key novelty of our method that the fixed k-mer sampling must be conducted on the query sequence and the index k-mers are filtered from the reference sequence via a Bloom filter. Availability and implementation https://github.com/yuansliu/bfMEM Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
39

Wang, Tianduanyi, Sandor Szedmak, Haishan Wang, et al. "Modeling drug combination effects via latent tensor reconstruction." Bioinformatics 37, Supplement_1 (2021): i93—i101. http://dx.doi.org/10.1093/bioinformatics/btab308.

Full text
Abstract:
Abstract Motivation Combination therapies have emerged as a powerful treatment modality to overcome drug resistance and improve treatment efficacy. However, the number of possible drug combinations increases very rapidly with the number of individual drugs in consideration, which makes the comprehensive experimental screening infeasible in practice. Machine-learning models offer time- and cost-efficient means to aid this process by prioritizing the most effective drug combinations for further pre-clinical and clinical validation. However, the complexity of the underlying interaction patterns across multiple drug doses and in different cellular contexts poses challenges to the predictive modeling of drug combination effects. Results We introduce comboLTR, highly time-efficient method for learning complex, non-linear target functions for describing the responses of therapeutic agent combinations in various doses and cancer cell-contexts. The method is based on a polynomial regression via powerful latent tensor reconstruction. It uses a combination of recommender system-style features indexing the data tensor of response values in different contexts, and chemical and multi-omics features as inputs. We demonstrate that comboLTR outperforms state-of-the-art methods in terms of predictive performance and running time, and produces highly accurate results even in the challenging and practical inference scenario where full dose–response matrices are predicted for completely new drug combinations with no available combination and monotherapy response measurements in any training cell line. Availability and implementation comboLTR code is available at https://github.com/aalto-ics-kepaco/ComboLTR. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
40

Danciu, Daniel, Mikhail Karasikov, Harun Mustafa, André Kahles, and Gunnar Rätsch. "Topology-based sparsification of graph annotations." Bioinformatics 37, Supplement_1 (2021): i169—i176. http://dx.doi.org/10.1093/bioinformatics/btab330.

Full text
Abstract:
Abstract Motivation Since the amount of published biological sequencing data is growing exponentially, efficient methods for storing and indexing this data are more needed than ever to truly benefit from this invaluable resource for biomedical research. Labeled de Bruijn graphs are a frequently-used approach for representing large sets of sequencing data. While significant progress has been made to succinctly represent the graph itself, efficient methods for storing labels on such graphs are still rapidly evolving. Results In this article, we present RowDiff, a new technique for compacting graph labels by leveraging expected similarities in annotations of vertices adjacent in the graph. RowDiff can be constructed in linear time relative to the number of vertices and labels in the graph, and in space proportional to the graph size. In addition, construction can be efficiently parallelized and distributed, making the technique applicable to graphs with trillions of nodes. RowDiff can be viewed as an intermediary sparsification step of the original annotation matrix and can thus naturally be combined with existing generic schemes for compressed binary matrices. Experiments on 10 000 RNA-seq datasets show that RowDiff combined with multi-BRWT results in a 30% reduction in annotation footprint over Mantis-MST, the previously known most compact annotation representation. Experiments on the sparser Fungi subset of the RefSeq collection show that applying RowDiff sparsification reduces the size of individual annotation columns stored as compressed bit vectors by an average factor of 42. When combining RowDiff with a multi-BRWT representation, the resulting annotation is 26 times smaller than Mantis-MST. Availability and implementation RowDiff is implemented in C++ within the MetaGraph framework. The source code and the data used in the experiments are publicly available at https://github.com/ratschlab/row_diff.
APA, Harvard, Vancouver, ISO, and other styles
41

Prezza, Nicola. "Subpath Queries on Compressed Graphs: A Survey." Algorithms 14, no. 1 (2021): 14. http://dx.doi.org/10.3390/a14010014.

Full text
Abstract:
Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text T, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in T in time proportional to the query’s length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text’s entropy. These contributions had an enormous impact in bioinformatics: today, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to labeled graphs: after all, texts can be viewed as labeled directed paths. In turn, since finite state automata can be considered as a particular case of labeled graphs, these findings created a bridge between the fields of compressed indexing and regular language theory, ultimately allowing to index regular languages and promising to shed new light on problems, such as regular expression matching. This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today’s compressed indexes for labeled graphs and regular languages.
APA, Harvard, Vancouver, ISO, and other styles
42

., Tin Thein Thwel, and G. R. Sinha . "Efficient Data Deduplication Mechanism for Genomic Data." CSVTU International Journal of Biotechnology Bioinformatics and Biomedical 4, no. 2 (2019): 52–58. http://dx.doi.org/10.30732/ijbbb.20190402004.

Full text
Abstract:
During the data science age, many people tend to access health concerned information and diagnosis using information technology, including telemedicine. Therefore, many researchers attempting to work with medical experts as well as bioinformatics area. In the bioinformatics field, handling the genomic data of human beings becomes essential such as collecting, storing and processing. Genomic data refers to the genome and DNA data of an organism. Unavoidably, genomic data require huge amount of storage for the customized software to analyze. Recently, genome researchers are rising the alarms over big data.This research papers attempts in significant amount of reduction of data storage by applying data deduplication process in genomic data set. Data deduplication, ‘dedupe’ in short can reduce the amount of storage because of its single instance storage nature.Therefore, data deduplication becomes one of the solutions for optimizing the huge amount of storage spaces for genome storage.We have implemented data deduplication method and applied it to genomic data and the deduplication performed successfully by using secure hash algorithm, B++ tree and sub-file level chunking algorithm. The methods were implemented in integrated approach. The files are separated into different chunks with the help of Two Threshold Two Divisors algorithm and hash function is used to get chunk identifiers. Indexing keys are constructed using the identifiersin B+ tree like index structure.Thissystem can reduce the storage space significantly when there exist duplicated data. The preliminary testing is made using NCBI datasets
APA, Harvard, Vancouver, ISO, and other styles
43

Medina-Franco, José L. "Expanding the Chemical Information Science gateway." F1000Research 10 (April 16, 2021): 294. http://dx.doi.org/10.12688/f1000research.52192.1.

Full text
Abstract:
As chemical information evolves, impacting many chemistry areas, effective ways to disseminate results by the scientific community are also changing. Thus, publication schemes adapt to meet the needs of researchers across disciplines to share high-quality data, information, and knowledge. Since 2015, the F1000Research Chemical Information Science (CIS) gateway has offered an open and unique model to disseminate science at the interface of chemoinformatics, bioinformatics, and several other informatic-related disciplines. In response to the evolution of chemical information science, the F1000Research CIS gateway has incorporated new members to the advisory board. It is also reinforcing and expanding the gateway areas with a particular focus on machine learning and metabolomics. The range of available article types, availability of data, exposure within complementary multidisciplinary F1000Research gateways, and indexing in major bibliographic databases increases the visibility of all contributions. As part of progressing open science in this field, we look forward to your high-quality contributions to the CIS gateway.
APA, Harvard, Vancouver, ISO, and other styles
44

Brown, S. H., G. Wright, and P. L. Elkin. "Biomedical Informatics: We Are What We Publish." Methods of Information in Medicine 52, no. 06 (2013): 538–46. http://dx.doi.org/10.3414/me13-01-0041.

Full text
Abstract:
SummaryIntroduction: This article is part of a For-Discussion-Section of Methods of Information in Medicine on “Biomedical Informatics: We are what we publish“. It is introduced by an editorial and followed by a commentary paper with invited comments. In subsequent issues the discussion may continue through letters to the editor.Objective: Informatics experts have attempted to define the field via consensus projects which has led to consensus statements by both AMIA. and by IMIA. We add to the output of this process the results of a study of the Pubmed publications with abstracts from the field of Biomedical Informatics.Methods: We took the terms from the AMIA consensus document and the terms from the IMIA definitions of the field of Biomedical Informatics and combined them through human review to create the Health Infor -matics Ontology. We built a terminology server using the Intelligent Natural Language Processor (iNLP). Then we downloaded the entire set of articles in Medline identified by searching the literature by “Medical Informatics” OR “Bioinformatics”. The articles were parsed by the joint AMIA / IMIA terminology and then again using SNOMED CT and for the Bioinformatics they were also parsed using HGNC Ontology.Results: We identified 153,580 articles using “Medical Informatics” and 20,573 articles using “Bioinformatics”. This resulted in 168,298 unique articles and an overlap of 5,855 articles. Of these 62,244 articles (37%) had titles and abstracts that contained at least one concept from the Health Infor -matics Ontology. SNOMED CT indexing showed that the field interacts with most all clinical fields of medicine.Conclusions: Further defining the field by what we publish can add value to the consensus driven processes that have been the mainstay of the efforts to date. Next steps should be to extract terms from the literature that are uncovered and create class hierarchies and relationships for this content. We should also examine the high occurring of MeSH terms as markers to define Biomedical Informatics. Greater understanding of the Biomedical Informatics Literature has the potential to lead to improved self-awareness for our field.
APA, Harvard, Vancouver, ISO, and other styles
45

Hosseininia, Nayereh, Soudabeh Boroumand, and Majid Haghparast. "Novel Nanometric Reversible Low Power Bidirectional Universal Logarithmic Barrel Shifter with Overflow and Zero Flags." Journal of Circuits, Systems and Computers 24, no. 04 (2015): 1550049. http://dx.doi.org/10.1142/s0218126615500498.

Full text
Abstract:
One of the most important issues in designing VLSI circuits is power consumption. Reversible logic which is widely utilized in quantum computing, low power CMOS design, optical information processing, bioinformatics and nanotechnology-based systems decreases power loss. A reversible circuit has zero internal power dissipation because it does not lose information. Reversible barrel shifters are required to construct reversible embedded digital signal and general-purpose processors. Data shifting is often used in high-speed/low-power error-control applications, floating point normalization, address decoding and bit indexing. This paper proposes a novel reversible bidirectional universal barrel shifter which is applied in high speed and high performance applications. The proposed barrel shifter is designed in a single circuit with overflow and zero flags. It performs three operations consisting of rotating, logical and arithmetic shifting that transfers and shifts data in both directions. The design is evaluated and formulated in terms of number of garbage outputs, number of constant inputs, quantum cost, number of reversible gates and hardware complexity. All the scales are in nanometric area.
APA, Harvard, Vancouver, ISO, and other styles
46

Cretoiu, Dragos, Simona Roatesi, Ion Bica, et al. "Simulation and Modeling of Telocytes Behavior in Signaling and Intercellular Communication Processes." International Journal of Molecular Sciences 21, no. 7 (2020): 2615. http://dx.doi.org/10.3390/ijms21072615.

Full text
Abstract:
Background: Telocytes (TCs) are unique interstitial or stromal cells of mesodermal origin, defined by long cellular extensions called telopodes (Tps) which form a network, connecting them to surrounding cells. TCs were previously found around stem and progenitor cells, and were thought to be most likely involved in local tissue metabolic equilibrium and regeneration. The roles of telocytes are still under scientific scrutiny, with existing studies suggesting they possess various functions depending on their location. Methods: Human myometrium biopsies were collected from pregnant and non-pregnant women, telocytes were then investigated in myometrial interstitial cell cultures based on morphological criteria and later prepared for time-lapse microscopy. Semi-analytical and numerical solutions were developed to highlight the geometric characteristics and the behavior of telocytes. Results: Results were gathered in a database which would further allow efficient telocyte tracking and indexing in a content-based image retrieval (CBIR) of digital medical images. Mathematical analysis revealed pivotal information regarding the homogeneity, hardness and resistance of telocytes’ structure. Cellular activity models were monitored in vitro, therefore supporting the creation of databases of telocyte images. Conclusions: The obtained images were analyzed, using segmentation techniques and mathematical models in conjunction with computer simulation, in order to depict TCs behavior in relation to surrounding cells. This paper brings an important contribution to the development of bioinformatics systems by creating software-based telocyte models that could be used both for diagnostic and educational purposes.
APA, Harvard, Vancouver, ISO, and other styles
47

Correia, Damien, Olivia Doppelt-Azeroual, Jean-Baptiste Denis, Mathias Vandenbogaert, and Valérie Caro. "MetaGenSense : A web application for analysis and visualization of high throughput sequencing metagenomic data." F1000Research 4 (April 2, 2015): 86. http://dx.doi.org/10.12688/f1000research.6139.1.

Full text
Abstract:
The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.
APA, Harvard, Vancouver, ISO, and other styles
48

Correia, Damien, Olivia Doppelt-Azeroual, Jean-Baptiste Denis, Mathias Vandenbogaert, and Valérie Caro. "MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data." F1000Research 4 (August 22, 2016): 86. http://dx.doi.org/10.12688/f1000research.6139.2.

Full text
Abstract:
The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.
APA, Harvard, Vancouver, ISO, and other styles
49

Correia, Damien, Olivia Doppelt-Azeroual, Jean-Baptiste Denis, Mathias Vandenbogaert, and Valérie Caro. "MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data." F1000Research 4 (December 1, 2016): 86. http://dx.doi.org/10.12688/f1000research.6139.3.

Full text
Abstract:
The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.
APA, Harvard, Vancouver, ISO, and other styles
50

Kaushik, Manish, Ramandeep Sharma, Sindhu Veetil, Sandeep Srivastava, and Suneel Kateriya. "Modular Diversity of the BLUF Proteins and Their Potential for the Development of Diverse Optogenetic Tools." Applied Sciences 9, no. 18 (2019): 3924. http://dx.doi.org/10.3390/app9183924.

Full text
Abstract:
Organisms can respond to varying light conditions using a wide range of sensory photoreceptors. These photoreceptors can be standalone proteins or represent a module in multidomain proteins, where one or more modules sense light as an input signal which is converted into an output response via structural rearrangements in these receptors. The output signals are utilized downstream by effector proteins or multiprotein clusters to modulate their activity, which could further affect specific interactions, gene regulation or enzymatic catalysis. The blue-light using flavin (BLUF) photosensory module is an autonomous unit that is naturally distributed among functionally distinct proteins. In this study, we identified 34 BLUF photoreceptors of prokaryotic and eukaryotic origin from available bioinformatics sequence databases. Interestingly, our analysis shows diverse BLUF-effector arrangements with a functional association that was previously unknown or thought to be rare among the BLUF class of sensory proteins, such as endonucleases, tet repressor family (tetR), regulators of G-protein signaling, GAL4 transcription family and several other previously unidentified effectors, such as RhoGEF, Phosphatidyl-Ethanolamine Binding protein (PBP), ankyrin and leucine-rich repeats. Interaction studies and the indexing of BLUF domains further show the diversity of BLUF-effector combinations. These diverse modular architectures highlight how the organism’s behaviour, cellular processes, and distinct cellular outputs are regulated by integrating BLUF sensing modules in combination with a plethora of diverse signatures. Our analysis highlights the modular diversity of BLUF containing proteins and opens the possibility of creating a rational design of novel functional chimeras using a BLUF architecture with relevant cellular effectors. Thus, the BLUF domain could be a potential candidate for the development of powerful novel optogenetic tools for its application in modulating diverse cell signaling.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography