Log in

Relevant bibliographies by topics / Bioinformatics pipeline / Journal articles

To see the other types of publications on this topic, follow the link: Bioinformatics pipeline.

Journal articles on the topic 'Bioinformatics pipeline'

Author: Grafiati

Published: 4 June 2021

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Bioinformatics pipeline.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Ewels, Philip, Felix Krueger, Max Käller, and Simon Andrews. "Cluster Flow: A user-friendly bioinformatics workflow tool." F1000Research 5 (December 6, 2016): 2824. http://dx.doi.org/10.12688/f1000research.10335.1.

Full text

Abstract:

Pipeline tools are becoming increasingly important within the field of bioinformatics. Using a pipeline manager to manage and run workflows comprised of multiple tools reduces workload and makes analysis results more reproducible. Existing tools require significant work to install and get running, typically needing pipeline scripts to be written from scratch before running any analysis. We present Cluster Flow, a simple and flexible bioinformatics pipeline tool designed to be quick and easy to install. Cluster Flow comes with 40 modules for common NGS processing steps, ready to work out of the box. Pipelines are assembled using these modules with a simple syntax that can be easily modified as required. Core helper functions automate many common NGS procedures, making running pipelines simple. Cluster Flow is available with an GNU GPLv3 license on GitHub. Documentation, examples and an online demo are available at http://clusterflow.io.

APA, Harvard, Vancouver, ISO, and other styles

2

Ewels, Philip, Felix Krueger, Max Käller, and Simon Andrews. "Cluster Flow: A user-friendly bioinformatics workflow tool." F1000Research 5 (May 2, 2017): 2824. http://dx.doi.org/10.12688/f1000research.10335.2.

Full text

Abstract:

Pipeline tools are becoming increasingly important within the field of bioinformatics. Using a pipeline manager to manage and run workflows comprised of multiple tools reduces workload and makes analysis results more reproducible. Existing tools require significant work to install and get running, typically needing pipeline scripts to be written from scratch before running any analysis. We present Cluster Flow, a simple and flexible bioinformatics pipeline tool designed to be quick and easy to install. Cluster Flow comes with 40 modules for common NGS processing steps, ready to work out of the box. Pipelines are assembled using these modules with a simple syntax that can be easily modified as required. Core helper functions automate many common NGS procedures, making running pipelines simple. Cluster Flow is available with an GNU GPLv3 license on GitHub. Documentation, examples and an online demo are available at http://clusterflow.io.

APA, Harvard, Vancouver, ISO, and other styles

3

SoRelle, Jeffrey A., Megan Wachsmann, and Brandi L. Cantarel. "Assembling and Validating Bioinformatic Pipelines for Next-Generation Sequencing Clinical Assays." Archives of Pathology & Laboratory Medicine 144, no. 9 (2020): 1118–30. http://dx.doi.org/10.5858/arpa.2019-0476-ra.

Full text

Abstract:

Context.— Clinical next-generation sequencing (NGS) is being rapidly adopted, but analysis and interpretation of large data sets prompt new challenges for a clinical laboratory setting. Clinical NGS results rely heavily on the bioinformatics pipeline for identifying genetic variation in complex samples. The choice of bioinformatics algorithms, genome assembly, and genetic annotation databases are important for determining genetic alterations associated with disease. The analysis methods are often tuned to the assay to maximize accuracy. Once a pipeline has been developed, it must be validated to determine accuracy and reproducibility for samples similar to real-world cases. In silico proficiency testing or institutional data exchange will ensure consistency among clinical laboratories. Objective.— To provide molecular pathologists a step-by-step guide to bioinformatics analysis and validation design in order to navigate the regulatory and validation standards of implementing a bioinformatic pipeline as a part of a new clinical NGS assay. Data Sources.— This guide uses published studies on genomic analysis, bioinformatics methods, and methods comparison studies to inform the reader on what resources, including open source software tools and databases, are available for genetic variant detection and interpretation. Conclusions.— This review covers 4 key concepts: (1) bioinformatic analysis design for detecting genetic variation, (2) the resources for assessing genetic effects, (3) analysis validation assessment experiments and data sets, including a diverse set of samples to mimic real-world challenges that assess accuracy and reproducibility, and (4) if concordance between clinical laboratories will be improved by proficiency testing designed to test bioinformatic pipelines.

APA, Harvard, Vancouver, ISO, and other styles

4

Pal, Soumitra, and Teresa M. Przytycka. "Bioinformatics pipeline using JUDI: Just Do It!" Bioinformatics 36, no. 8 (2019): 2572–74. http://dx.doi.org/10.1093/bioinformatics/btz956.

Full text

Abstract:

Abstract Summary Large-scale data analysis in bioinformatics requires pipelined execution of multiple software. Generally each stage in a pipeline takes considerable computing resources and several workflow management systems (WMS), e.g. Snakemake, Nextflow, Common Workflow Language, Galaxy, etc. have been developed to ensure optimum execution of the stages across two invocations of the pipeline. However, when the pipeline needs to be executed with different settings of parameters, e.g. thresholds, underlying algorithms, etc. these WMS require significant scripting to ensure an optimal execution. We developed JUDI on top of DoIt, a Python based WMS, to systematically handle parameter settings based on the principles of database management systems. Using a novel modular approach that encapsulates a parameter database in each task and file associated with a pipeline stage, JUDI simplifies plug-and-play of the pipeline stages. For a typical pipeline with n parameters, JUDI reduces the number of lines of scripting required by a factor of O(n). With properly designed parameter databases, JUDI not only enables reproducing research under published values of parameters but also facilitates exploring newer results under novel parameter settings. Availability and implementation https://github.com/ncbi/JUDI Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

5

Mshvidobadze, Tinatin. "Bioinformatics as Emerging Tool and Pipeline Frameworks." Science Progress and Research 1, no. 4 (2021): 411–15. http://dx.doi.org/10.52152/spr/2021.162.

Full text

Abstract:

In this article, we will discuss the areas of origin of bioinformatics in the human health care system. Due to the growing network of biological information databases such as human genomes, transcriptomics and proteomics, bioinformatics has become the approach of choosing forensic sciences. High-throughput bioinformatic analyses increasingly rely on pipeline frameworks to process sequence and metadata. Here we survey and compare the design philosophies of several current pipeline frameworks.

APA, Harvard, Vancouver, ISO, and other styles

6

Afiahayati, Stefanus Bernard, Gunadi, et al. "A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains." Genes 13, no. 8 (2022): 1330. http://dx.doi.org/10.3390/genes13081330.

Full text

Abstract:

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a newly emerging virus well known as the major cause of the worldwide pandemic due to Coronavirus Disease 2019 (COVID-19). Major breakthroughs in the Next Generation Sequencing (NGS) field were elucidated following the first release of a full-length SARS-CoV-2 genome on the 10 January 2020, with the hope of turning the table against the worsening pandemic situation. Previous studies in respiratory virus characterization require mapping of raw sequences to the human genome in the downstream bioinformatics pipeline as part of metagenomic principles. Illumina, as the major player in the NGS arena, took action by releasing guidelines for improved enrichment kits called the Respiratory Virus Oligo Panel (RVOP) based on a hybridization capture method capable of capturing targeted respiratory viruses, including SARS-CoV-2; therefore, allowing a direct map of raw sequences data to SARS-CoV-2 genome in downstream bioinformatics pipeline. Consequently, two bioinformatics pipelines emerged with no previous studies benchmarking the pipelines. This study focuses on gaining insight and understanding of target enrichment workflow by Illumina through the utilization of different bioinformatics pipelines named as ‘Fast Pipeline’ and ‘Normal Pipeline’ to SARS-CoV-2 strains isolated from Yogyakarta and Central Java, Indonesia. Overall, both pipelines work well in the characterization of SARS-CoV-2 samples, including in the identification of major studied nucleotide substitutions and amino acid mutations. A higher number of reads mapped to the SARS-CoV-2 genome in Fast Pipeline and merely were discovered as a contributing factor in a higher number of coverage depth and identified variations (SNPs, insertion, and deletion). Fast Pipeline ultimately works well in a situation where time is a critical factor. On the other hand, Normal Pipeline would require a longer time as it mapped reads to the human genome. Certain limitations were identified in terms of pipeline algorithm, whereas it is highly recommended in future studies to design a pipeline in an integrated framework, for instance, by using NextFlow, a workflow framework to combine all scripts into one fully integrated pipeline.

APA, Harvard, Vancouver, ISO, and other styles

7

Cervera, Alejandra, Ville Rantanen, Kristian Ovaska, et al. "Anduril 2: upgraded large-scale data integration framework." Bioinformatics 35, no. 19 (2019): 3815–17. http://dx.doi.org/10.1093/bioinformatics/btz133.

Full text

Abstract:

Abstract Summary Anduril is an analysis and integration framework that facilitates the design, use, parallelization and reproducibility of bioinformatics workflows. Anduril has been upgraded to use Scala for pipeline construction, which simplifies software maintenance, and facilitates design of complex pipelines. Additionally, Anduril’s bioinformatics repository has been expanded with multiple components, and tutorial pipelines, for next-generation sequencing data analysis. Availabilityand implementation Freely available at http://anduril.org. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

8

Allain, Fabrice, Julien Roméjon, Philippe La Rosa, Frédéric Jarlier, Nicolas Servant, and Philippe Hupé. "Geniac: Automatic Configuration GENerator and Installer for nextflow pipelines." Open Research Europe 1 (July 2, 2021): 76. http://dx.doi.org/10.12688/openreseurope.13861.1.

Full text

Abstract:

With the advent of high-throughput biotechnological platforms and their ever-growing capacity, life science has turned into a digitized, computational and data-intensive discipline. As a consequence, standard analysis with a bioinformatics pipeline in the context of routine production has become a challenge such that the data can be processed in real-time and delivered to the end-users as fast as possible. The usage of workflow management systems along with packaging systems and containerization technologies offer an opportunity to tackle this challenge. While very powerful, they can be used and combined in multiple ways thus increasing their usage complexity. Therefore, guidelines and protocols are required in order to detail how the source code of the bioinformatics pipeline should be written and organized to ensure its usability, maintainability, interoperability, sustainability, portability, reproducibility, scalability and efficiency. Capitalizing on Nextflow, Conda, Docker, Singularity and the nf-core initiative, we propose a set of best practices along the development life cycle of the bioinformatics pipeline and deployment for production operations which address different expert communities including i) the bioinformaticians and statisticians ii) the software engineers and iii) the data managers and core facility engineers. We implemented Geniac (Automatic Configuration GENerator and Installer for nextflow pipelines) which consists of a toolbox with three components: i) a technical documentation available at https://geniac.readthedocs.io to detail coding guidelines for the bioinformatics pipeline with Nextflow, ii) a linter to check that the code respects the guidelines, and iii) an add-on to generate configuration files, build the containers and deploy the pipeline. The Geniac toolbox aims at the harmonization of development practices across developers and automation of the generation of configuration files and containers by parsing the source code of the Nextflow pipeline. The Geniac toolbox and two demo pipelines are available on GitHub. This article presents the main functionalities of Geniac.

APA, Harvard, Vancouver, ISO, and other styles

9

Parmen, Adibah, MOHD NOOR MAT ISA, FARAH FADWA BENBELGACEM, Hamzah Mohd Salleh, and Ibrahim Ali Noorbatcha. "COMPARATIVE METAGENOMICS ANALYSIS OF PALM OIL MILL EFFLUENT (POME) USING THREE DIFFERENT BIOINFORMATICS PIPELINES." IIUM Engineering Journal 20, no. 1 (2019): 1–11. http://dx.doi.org/10.31436/iiumej.v20i1.909.

Full text

Abstract:

ABSTRACT: The substantial cost reduction and massive production of next-generation sequencing (NGS) data have contributed to the progress in the rapid growth of metagenomics. However, production of the massive amount of data by NGS has revealed the challenges in handling the existing bioinformatics tools related to metagenomics. Therefore, in this research we have investigated an equal set of DNA metagenomics data from palm oil mill effluent (POME) sample using three different freeware bioinformatics pipelines’ websites of metagenomics RAST server (MG-RAST), Integrated Microbial Genomes with Microbiome Samples (IMG/M) and European Bioinformatics Institute (EBI) Metagenomics, in term of the taxonomic assignment and functional analysis. We found that MG-RAST is the quickest among these three pipelines. However, in term of analysis of results, IMG/M provides more variety of phylum with wider percent identities for taxonomical assignment and IMG/M provides the highest carbohydrates, amino acids, lipids, and coenzymes transport and metabolism functional annotation beside the highest in total number of glycoside hydrolase enzymes. Next, in identifying the conserved domain and family involved, EBI Metagenomics would be much more appropriate. All the three bioinformatics pipelines have their own specialties and can be used alternately or at the same time based on the user’s functional preference. ABSTRAK: Pengurangan kos dalam skala besar dan pengeluaran data ‘next-generation sequencing’ (NGS) secara besar-besaran telah menyumbang kepada pertumbuhan pesat metagenomik. Walau bagaimanapun, pengeluaran data dalam skala yang besar oleh NGS telah menimbulkan cabaran dalam mengendalikan alat-alat bioinformatika yang sedia ada berkaitan dengan metagenomik. Justeru itu, dalam kajian ini, kami telah menyiasat satu set data metagenomik DNA yang sama dari sampel effluen kilang minyak sawit dengan menggunakan tiga laman web bioinformatik percuma iaitu dari laman web ‘metagenomics RAST server’ (MG-RAST), ‘Integrated Microbial Genomes with Microbiome Samples’ (IMG/M) dan ‘European Bioinformatics Institute’ (EBI) Metagenomics dari segi taksonomi dan analisis fungsi. Kami mendapati bahawa MG-RAST ialah yang paling cepat di antara ketiga-tiga ‘pipeline’, tetapi mengikut keputusan analisa, IMG/M mengeluarkan maklumat philum yang lebih pelbagai bersama peratus identiti yang lebih luas berbanding yang lain untuk pembahagian taksonomi dan IMG/M juga mempunyai bacaan tertinggi dalam hampir semua anotasi fungsional karbohidrat, amino asid, lipid, dan koenzima pengangkutan dan metabolisma malah juga paling tinggi dalam jumlah enzim hidrolase glikosida. Kemudian, untuk mengenal pasti ‘domain’ terpelihara dan keluarga yang terlibat, EBI metagenomics lebih bersesuaian. Ketiga-tiga saluran ‘bioinformatics pipeline’ mempunyai keistimewaan mereka yang tersendiri dan boleh digunakan bersilih ganti dalam masa yang sama berdasarkan pilihan fungsi penggun.

APA, Harvard, Vancouver, ISO, and other styles

10

Van Neste, Leander, James G. Herman, Kornel E. Schuebel, et al. "A Bioinformatics Pipeline for Cancer Epigenetics." Current Bioinformatics 5, no. 3 (2010): 153–63. http://dx.doi.org/10.2174/157489310792006710.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Lacroix, Zoé, Christophe Legendre, Louiqa Raschid, and Ben Snyder. "BIPASS: BioInformatics Pipeline Alternative Splicing Services." Nucleic Acids Research 35, suppl_2 (2007): W292—W296. http://dx.doi.org/10.1093/nar/gkm344.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Kwon, ChangHyuk, Jason Kim, and Jaegyoon Ahn. "DockerBIO: web application for efficient use of bioinformatics Docker images." PeerJ 6 (November 27, 2018): e5954. http://dx.doi.org/10.7717/peerj.5954.

Full text

Abstract:

Background and Objective Docker is a light containerization program that shows almost the same performance as a local environment. Recently, many bioinformatics tools have been distributed as Docker images that include complex settings such as libraries, configurations, and data if needed, as well as the actual tools. Users can simply download and run them without making the effort to compile and configure them, and can obtain reproducible results. In spite of these advantages, several problems remain. First, there is a lack of clear standards for distribution of Docker images, and the Docker Hub often provides multiple images with the same objective but different uses. For these reasons, it can be difficult for users to learn how to select and use them. Second, Docker images are often not suitable as a component of a pipeline, because many of them include big data. Moreover, a group of users can have difficulties when sharing a pipeline composed of Docker images. Users of a group may modify scripts or use different versions of the data, which causes inconsistent results. Methods and Results To handle the problems described above, we developed a Java web application, DockerBIO, which provides reliable, verified, light-weight Docker images for various bioinformatics tools and for various kinds of reference data. With DockerBIO, users can easily build a pipeline with tools and data registered at DockerBIO, and if necessary, users can easily register new tools or data. Built pipelines are registered in DockerBIO, which provides an efficient running environment for the pipelines registered at DockerBIO. This enables user groups to run their pipelines without expending much effort to copy and modify them.

APA, Harvard, Vancouver, ISO, and other styles

13

Allain, Fabrice, Julien Roméjon, Philippe La Rosa, Frédéric Jarlier, Nicolas Servant, and Philippe Hupé. "Geniac: Automatic Configuration GENerator and Installer for nextflow pipelines." Open Research Europe 1 (February 21, 2022): 76. http://dx.doi.org/10.12688/openreseurope.13861.2.

Full text

Abstract:

With the advent of high-throughput biotechnological platforms and their ever-growing capacity, life science has turned into a digitized, computational and data-intensive discipline. As a consequence, standard analysis with a bioinformatics pipeline in the context of routine production has become a challenge such that the data can be processed in real-time and delivered to the end-users as fast as possible. The usage of workflow management systems along with packaging systems and containerization technologies offer an opportunity to tackle this challenge. While very powerful, they can be used and combined in many multiple ways which may differ from one developer to another. Therefore, promoting the homogeneity of the workflow implementation requires guidelines and protocols which detail how the source code of the bioinformatics pipeline should be written and organized to ensure its usability, maintainability, interoperability, sustainability, portability, reproducibility, scalability and efficiency. Capitalizing on Nextflow, Conda, Docker, Singularity and the nf-core initiative, we propose a set of best practices along the development life cycle of the bioinformatics pipeline and deployment for production operations which target different expert communities including i) the bioinformaticians and statisticians ii) the software engineers and iii) the data managers and core facility engineers. We implemented Geniac (Automatic Configuration GENerator and Installer for nextflow pipelines) which consists of a toolbox with three components: i) a technical documentation available at https://geniac.readthedocs.io to detail coding guidelines for the bioinformatics pipeline with Nextflow, ii) a command line interface with a linter to check that the code respects the guidelines, and iii) an add-on to generate configuration files, build the containers and deploy the pipeline. The Geniac toolbox aims at the harmonization of development practices across developers and automation of the generation of configuration files and containers by parsing the source code of the Nextflow pipeline.

APA, Harvard, Vancouver, ISO, and other styles

14

Navarro, José Fernández, Joel Sjöstrand, Fredrik Salmén, Joakim Lundeberg, and Patrik L. Ståhl. "ST Pipeline: an automated pipeline for spatial mapping of unique transcripts." Bioinformatics 33, no. 16 (2017): 2591–93. http://dx.doi.org/10.1093/bioinformatics/btx211.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Rieder, Dietmar, Georgios Fotakis, Markus Ausserhofer, et al. "nextNEOpi: a comprehensive pipeline for computational neoantigen prediction." Bioinformatics 38, no. 4 (2021): 1131–32. http://dx.doi.org/10.1093/bioinformatics/btab759.

Full text

Abstract:

Abstract Summary Somatic mutations and gene fusions can produce immunogenic neoantigens mediating anticancer immune responses. However, their computational prediction from sequencing data requires complex computational workflows to identify tumor-specific aberrations, derive the resulting peptides, infer patients’ Human Leukocyte Antigen types and predict neoepitopes binding to them, together with a set of features underlying their immunogenicity. Here, we present nextNEOpi (nextflow NEOantigen prediction pipeline) a comprehensive and fully automated bioinformatic pipeline to predict tumor neoantigens from raw DNA and RNA sequencing data. In addition, nextNEOpi quantifies neoepitope- and patient-specific features associated with tumor immunogenicity and response to immunotherapy. Availability and implementation nextNEOpi source code and documentation are available at https://github.com/icbi-lab/nextNEOpi Contact dietmar.rieder@i-med.ac.at or francesca.finotello@uibk.ac.at Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

16

Yiyuan, Li, C. Molik David, and E. Pfrender Michael. "EPPS, a metabarcoding bioinformatics pipeline using Nextflow." Biodiversity Science 27, no. 5 (2019): 567–75. http://dx.doi.org/10.17520/biods.2018211.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Hatzis, C. "Bioinformatics analysis pipeline for exome sequencing data." AACR Education book 2014, no. 1 (2014): 131–34. http://dx.doi.org/10.1158/aacr.edb-14-6406.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Mshvidobadze, Tinatin. "Bioinformatics as emerging tool and pipeline framework." Science Progress and Research 2, no. 1 (2022): 491–95. http://dx.doi.org/10.52152/spr/2022.156.

Full text

Abstract:

Biological data are being produced at phenomenal rate and storing, analyzing and interpreting such data in a meaningful way is assuming greater significance. In this article, we will discuss the areas of origin of bioinformatics in the human health care system. Due to the growing network of biological information databases such as human genomes, transcriptomics and proteomics, bioinformatics has become the approach of choosing forensic sciences. High-throughput bioinfor-matic analyses increasingly rely on pipeline frameworks to process sequence and metadata. Here we survey and compare the design philosophies of several current pipeline frameworks.The contributions from the field of biological and medical sciences have facilitated a tremendous increase in the data on various aspects as highlighted above in the text. The results of genomic research will bring a revolution into the field of medicine. The links between various databases of biological and medical significance are important and bioinformatics plays a very vital role in this direction.

APA, Harvard, Vancouver, ISO, and other styles

19

Tahir Ul Qamar, Muhammad, Xitong Zhu, Feng Xing, and Ling-Ling Chen. "ppsPCP: a plant presence/absence variants scanner and pan-genome construction pipeline." Bioinformatics 35, no. 20 (2019): 4156–58. http://dx.doi.org/10.1093/bioinformatics/btz168.

Full text

Abstract:

Abstract Summary Since the idea of pan-genomics emerged several tools and pipelines have been introduced for prokaryotic pan-genomics. However, not a single comprehensive pipeline has been reported which could overcome multiple challenges associated with eukaryotic pan-genomics. To aid the eukaryotic pan-genomic studies, here we present ppsPCP pipeline which is designed for eukaryotes especially for plants. It is capable of scanning presence/absence variants (PAVs) and constructing a fully annotated pan-genome. We believe with these unique features of PAV scanning and building a pan-genome together with its annotation, ppsPCP will be useful for plant pan-genomic studies and aid researchers to study genetic/phenotypic variations and genomic diversity. Availability and implementation The ppsPCP is freely available at github DOI: https://doi.org/10.5281/zenodo.2567390 and webpage http://cbi.hzau.edu.cn/ppsPCP/. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

20

Levy, Joshua J., Alexander J. Titus, Lucas A. Salas, and Brock C. Christensen. "PyMethylProcess—convenient high-throughput preprocessing workflow for DNA methylation data." Bioinformatics 35, no. 24 (2019): 5379–81. http://dx.doi.org/10.1093/bioinformatics/btz594.

Full text

Abstract:

Abstract Summary Performing highly parallelized preprocessing of methylation array data using Python can accelerate data preparation for downstream methylation analyses, including large scale production-ready machine learning pipelines. We present a highly reproducible, scalable pipeline (PyMethylProcess) that can be quickly set-up and deployed through Docker and PIP. Availability and implementation Project Home Page: https://github.com/Christensen-Lab-Dartmouth/PyMethylProcess. Available on PyPI (pymethylprocess), Docker (joshualevy44/pymethylprocess). Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

21

Kohlbacher, O., K. Reinert, C. Gropl, et al. "TOPP--the OpenMS proteomics pipeline." Bioinformatics 23, no. 2 (2007): e191-e197. http://dx.doi.org/10.1093/bioinformatics/btl299.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Zhao, Yongbing, Jiayan Wu, Junhui Yang, Shixiang Sun, Jingfa Xiao, and Jun Yu. "PGAP: pan-genomes analysis pipeline." Bioinformatics 28, no. 3 (2011): 416–18. http://dx.doi.org/10.1093/bioinformatics/btr655.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Bartschat, S., S. Kehr, H. Tafer, P. F. Stadler, and J. Hertel. "snoStrip: a snoRNA annotation pipeline." Bioinformatics 30, no. 1 (2013): 115–16. http://dx.doi.org/10.1093/bioinformatics/btt604.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Blumenthal, David B., Lorenzo Viola, Markus List, Jan Baumbach, Paolo Tieri, and Tim Kacprowski. "EpiGEN: an epistasis simulation pipeline." Bioinformatics 36, no. 19 (2020): 4957–59. http://dx.doi.org/10.1093/bioinformatics/btaa245.

Full text

Abstract:

Abstract Summary Simulated data are crucial for evaluating epistasis detection tools in genome-wide association studies. Existing simulators are limited, as they do not account for linkage disequilibrium (LD), support limited interaction models of single nucleotide polymorphisms (SNPs) and only dichotomous phenotypes or depend on proprietary software. In contrast, EpiGEN supports SNP interactions of arbitrary order, produces realistic LD patterns and generates both categorical and quantitative phenotypes. Availability and implementation EpiGEN is implemented in Python 3 and is freely available at https://github.com/baumbachlab/epigen. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

25

Hudek, A. K., J. Cheung, A. P. Boright, and S. W. Scherer. "Genescript: DNA sequence annotation pipeline." Bioinformatics 19, no. 9 (2003): 1177–78. http://dx.doi.org/10.1093/bioinformatics/btg134.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Rey, Carine, Philippe Veber, Bastien Boussau, and Marie Sémon. "CAARS: comparative assembly and annotation of RNA-Seq data." Bioinformatics 35, no. 13 (2018): 2199–207. http://dx.doi.org/10.1093/bioinformatics/bty903.

Full text

Abstract:

Abstract Motivation RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction. Results We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses. Availability and implementation CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

27

Christensen, Paul A., Sishir Subedi, Kristi Pepper, et al. "Development and validation of Houston Methodist Variant Viewer version 3: updates to our application for interpretation of next-generation sequencing data." JAMIA Open 3, no. 2 (2020): 299–305. http://dx.doi.org/10.1093/jamiaopen/ooaa004.

Full text

Abstract:

Abstract Objectives Informatics tools that support next-generation sequencing workflows are essential to deliver timely interpretation of somatic variants in cancer. Here, we describe significant updates to our laboratory developed bioinformatics pipelines and data management application termed Houston Methodist Variant Viewer (HMVV). Materials and Methods We collected feature requests and workflow improvement suggestions from the end-users of HMVV version 1. Over 1.5 years, we iteratively implemented these features in five sequential updates to HMVV version 3. Results We improved the performance and data throughput of the application while reducing the opportunity for manual data entry errors. We enabled end-user workflows for pipeline monitoring, variant interpretation and annotation, and integration with our laboratory information system. System maintenance was improved through enhanced defect reporting, heightened data security, and improved modularity in the code and system environments. Discussion and Conclusion Validation of each HMVV update was performed according to expert guidelines. We enabled an 8× reduction in the bioinformatics pipeline computation time for our longest running assay. Our molecular pathologists can interpret the assay results at least 2 days sooner than was previously possible. The application and pipeline code are publicly available at https://github.com/hmvv.

APA, Harvard, Vancouver, ISO, and other styles

28

Fuchs, Maximilian, Fabian Philipp Kreutzer, Lorenz A. Kapsner, et al. "Integrative Bioinformatic Analyses of Global Transcriptome Data Decipher Novel Molecular Insights into Cardiac Anti-Fibrotic Therapies." International Journal of Molecular Sciences 21, no. 13 (2020): 4727. http://dx.doi.org/10.3390/ijms21134727.

Full text

Abstract:

Integrative bioinformatics is an emerging field in the big data era, offering a steadily increasing number of algorithms and analysis tools. However, for researchers in experimental life sciences it is often difficult to follow and properly apply the bioinformatical methods in order to unravel the complexity and systemic effects of omics data. Here, we present an integrative bioinformatics pipeline to decipher crucial biological insights from global transcriptome profiling data to validate innovative therapeutics. It is available as a web application for an interactive and simplified analysis without the need for programming skills or deep bioinformatics background. The approach was applied to an ex vivo cardiac model treated with natural anti-fibrotic compounds and we obtained new mechanistic insights into their anti-fibrotic action and molecular interplay with miRNAs in cardiac fibrosis. Several gene pathways associated with proliferation, extracellular matrix processes and wound healing were altered, and we could identify micro (mi) RNA-21-5p and miRNA-223-3p as key molecular components related to the anti-fibrotic treatment. Importantly, our pipeline is not restricted to a specific cell type or disease and can be broadly applied to better understand the unprecedented level of complexity in big data research.

APA, Harvard, Vancouver, ISO, and other styles

29

Fiers, Mark WEJ, Ate van der Burgt, Erwin Datema, Joost CW de Groot, and Roeland CHJ van Ham. "High-throughput bioinformatics with the Cyrille2 pipeline system." BMC Bioinformatics 9, no. 1 (2008): 96. http://dx.doi.org/10.1186/1471-2105-9-96.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Hughes, Graham M., and Emma C. Teeling. "AGILE: an assembled genome mining pipeline." Bioinformatics 35, no. 7 (2018): 1252–54. http://dx.doi.org/10.1093/bioinformatics/bty781.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Zhang, Z., N. Carriero, D. Zheng, J. Karro, P. M. Harrison, and M. Gerstein. "PseudoPipe: an automated pseudogene identification pipeline." Bioinformatics 22, no. 12 (2006): 1437–39. http://dx.doi.org/10.1093/bioinformatics/btl116.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Trudgian, D. C., B. Thomas, S. J. McGowan, B. M. Kessler, M. Salek, and O. Acuto. "CPFP: a central proteomics facilities pipeline." Bioinformatics 26, no. 8 (2010): 1131–32. http://dx.doi.org/10.1093/bioinformatics/btq081.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

DeMaere, M. Z., F. M. Lauro, T. Thomas, S. Yau, and R. Cavicchioli. "Simple high-throughput annotation pipeline (SHAP)." Bioinformatics 27, no. 17 (2011): 2431–32. http://dx.doi.org/10.1093/bioinformatics/btr411.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Knowles, D. G., M. Roder, A. Merkel, and R. Guigo. "Grape RNA-Seq analysis pipeline environment." Bioinformatics 29, no. 5 (2013): 614–21. http://dx.doi.org/10.1093/bioinformatics/btt016.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Weisman, David, Michie Yasuda, and Jennifer L. Bowen. "FunFrame: functional gene ecological analysis pipeline." Bioinformatics 29, no. 9 (2013): 1212–14. http://dx.doi.org/10.1093/bioinformatics/btt123.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Morris, Tiffany J., Lee M. Butcher, Andrew Feber, et al. "ChAMP: 450k Chip Analysis Methylation Pipeline." Bioinformatics 30, no. 3 (2013): 428–30. http://dx.doi.org/10.1093/bioinformatics/btt684.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Lee, Soohyun, Jeremy Johnson, Carl Vitzthum, Koray Kırlı, Burak H. Alver, and Peter J. Park. "Tibanna: software for scalable execution of portable pipelines on the cloud." Bioinformatics 35, no. 21 (2019): 4424–26. http://dx.doi.org/10.1093/bioinformatics/btz379.

Full text

Abstract:

Abstract Summary We introduce Tibanna, an open-source software tool for automated execution of bioinformatics pipelines on Amazon Web Services (AWS). Tibanna accepts reproducible and portable pipeline standards including Common Workflow Language (CWL), Workflow Description Language (WDL) and Docker. It adopts a strategy of isolation and optimization of individual executions, combined with a serverless scheduling approach. Pipelines are executed and monitored using local commands or the Python Application Programming Interface (API) and cloud configuration is automatically handled. Tibanna is well suited for projects with a range of computational requirements, including those with large and widely fluctuating loads. Notably, it has been used to process terabytes of data for the 4D Nucleome (4DN) Network. Availability and implementation Source code is available on GitHub at https://github.com/4dn-dcic/tibanna. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

38

Ecale Zhou, Carol L., Stephanie Malfatti, Jeffrey Kimbrel, et al. "multiPhATE: bioinformatics pipeline for functional annotation of phage isolates." Bioinformatics 35, no. 21 (2019): 4402–4. http://dx.doi.org/10.1093/bioinformatics/btz258.

Full text

Abstract:

Abstract Summary To address the need for improved phage annotation tools that scale, we created an automated throughput annotation pipeline: multiple-genome Phage Annotation Toolkit and Evaluator (multiPhATE). multiPhATE is a throughput pipeline driver that invokes an annotation pipeline (PhATE) across a user-specified set of phage genomes. This tool incorporates a de novo phage gene calling algorithm and assigns putative functions to gene calls using protein-, virus- and phage-centric databases. multiPhATE’s modular construction allows the user to implement all or any portion of the analyses by acquiring local instances of the desired databases and specifying the desired analyses in a configuration file. We demonstrate multiPhATE by annotating two newly sequenced Yersinia pestis phage genomes. Within multiPhATE, the PhATE processing pipeline can be readily implemented across multiple processors, making it adaptable for throughput sequencing projects. Software documentation assists the user in configuring the system. Availability and implementation multiPhATE was implemented in Python 3.7, and runs as a command-line code under Linux or Unix. multiPhATE is freely available under an open-source BSD3 license from https://github.com/carolzhou/multiPhATE. Instructions for acquiring the databases and third-party codes used by multiPhATE are included in the distribution README file. Users may report bugs by submitting to the github issues page associated with the multiPhATE distribution. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

39

Christensen, Paul, Sishir Subedi, Heather Hendrickson, et al. "Updates to a Lab-Developed Bioinformatics Pipeline and Application for Interpretation of Clinical Next-Generation Sequencing Panels." American Journal of Clinical Pathology 152, Supplement_1 (2019): S9—S10. http://dx.doi.org/10.1093/ajcp/aqz112.019.

Full text

Abstract:

Abstract Objectives Our goal was to enhance our next-generation sequencing (NGS) molecular oncology workflow from sequencing to analysis through improvements to our custom-built and previously described NGS application. Methods Over 1 year, we collected feedback regarding workflow pain-points and feature requests from all end users of our NGS application. The application consists of a series of scripted pipelines, a MySQL database, and a Java Graphic User Interface (GUI); the end users include molecular pathologists (MPs), medical technologist/medical laboratory technologists (MTs/MLTs), and the molecular laboratory manager. These feedback data were used to engineer significant changes to the pipelines and software architecture. These architecture changes provided the backbone to a suite of feature enhancements aimed to improve turnaround time, decrease manual processes, and increase efficiency for the molecular laboratory staff and directors. Summary The key software architecture changes include implementing support for multiple environments, refactoring common code in the different pipelines, migrating from a per-run pipeline model to a per-sample pipeline model, and key updates to the MySQL database. These changes enabled development of many technical and user experience improvements. We eliminated the need for the pipelines to be launched manually from the Linux command line. Multiple pipelines can be executed concurrently. We created a per-sample pipeline status monitor. Sample entry is integrated with our Laboratory Information System (LIS) barcodes, thus reducing the possibility of transcription errors. We developed quality assurance reports. Socket-based integration with Integrated Genomics Viewer (IGV) was enhanced. We enabled rapid loading of key alignment data into IGV over a wireless network. Features to support resident and fellow driven variant and gene annotation reporting were developed. Support for additional clinical databases was implemented. Conclusions The designed feature enhancements to our previously reported NGS application have added significant sophistication and safety to our clinical NGS workflow. For example, our NGS consensus conference can be held in a conference room over a wireless network, and a trainee can prepare and present each case without ever leaving the application. To date, we have analyzed 2,540 samples using three different assays (TruSight Myeloid Sequencing Panel, AmpliSeq Cancer Hotspot Panel, GlioSeq) and four sequencing instruments (NextSeq, MiSeq, Proton, PGM) in this application. The code is freely available on GitHub.

APA, Harvard, Vancouver, ISO, and other styles

40

Chen, Junfang, Dietmar Lippold, Josef Frank, William Rayner, Andreas Meyer-Lindenberg, and Emanuel Schwarz. "Gimpute: an efficient genetic data imputation pipeline." Bioinformatics 35, no. 8 (2018): 1433–35. http://dx.doi.org/10.1093/bioinformatics/bty814.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Du, P., W. A. Kibbe, and S. M. Lin. "lumi: a pipeline for processing Illumina microarray." Bioinformatics 24, no. 13 (2008): 1547–48. http://dx.doi.org/10.1093/bioinformatics/btn224.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Wegrzyn, J. L., J. M. Lee, J. Liechty, and D. B. Neale. "PineSAP--sequence alignment and SNP identification pipeline." Bioinformatics 25, no. 19 (2009): 2609–10. http://dx.doi.org/10.1093/bioinformatics/btp477.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Torres-García, Wandaliz, Siyuan Zheng, Andrey Sivachenko, et al. "PRADA: pipeline for RNA sequencing data analysis." Bioinformatics 30, no. 15 (2014): 2224–26. http://dx.doi.org/10.1093/bioinformatics/btu169.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Pedersen, B., T. F. Hsieh, C. Ibarra, and R. L. Fischer. "MethylCoder: software pipeline for bisulfite-treated sequences." Bioinformatics 27, no. 17 (2011): 2435–36. http://dx.doi.org/10.1093/bioinformatics/btr394.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Peat, Gareth, William Jones, Michael Nuhn, et al. "The open targets post-GWAS analysis pipeline." Bioinformatics 36, no. 9 (2020): 2936–37. http://dx.doi.org/10.1093/bioinformatics/btaa020.

Full text

Abstract:

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.

APA, Harvard, Vancouver, ISO, and other styles

46

Chetnik, Kelsey, Elisa Benedetti, Daniel P. Gomari, et al. "maplet: an extensible R toolbox for modular and reproducible metabolomics pipelines." Bioinformatics 38, no. 4 (2021): 1168–70. http://dx.doi.org/10.1093/bioinformatics/btab741.

Full text

Abstract:

Abstract This article presents maplet, an open-source R package for the creation of highly customizable, fully reproducible statistical pipelines for metabolomics data analysis. It builds on the SummarizedExperiment data structure to create a centralized pipeline framework for storing data, analysis steps, results and visualizations. maplet’s key design feature is its modularity, which offers several advantages, such as ensuring code quality through the maintenance of individual functions and promoting collaborative development by removing technical barriers to code contribution. With over 90 functions, the package includes a wide range of functionalities, covering many widely used statistical approaches and data visualization techniques. Availability and implementation The maplet package is implemented in R and freely available at https://github.com/krumsieklab/maplet

APA, Harvard, Vancouver, ISO, and other styles

47

Graña, Osvaldo, Hugo López-Fernández, Florentino Fdez-Riverola, David González Pisano, and Daniel Glez-Peña. "Bicycle: a bioinformatics pipeline to analyze bisulfite sequencing data." Bioinformatics 34, no. 8 (2017): 1414–15. http://dx.doi.org/10.1093/bioinformatics/btx778.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Yao, Li, Heming Wang, Yuanyuan Song, and Guangchao Sui. "BioQueue: a novel pipeline framework to accelerate bioinformatics analysis." Bioinformatics 33, no. 20 (2017): 3286–88. http://dx.doi.org/10.1093/bioinformatics/btx403.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Tung, Nguyen Van, Nguyen Thi Kim Lien, and Nguyen Huy Hoang. "A comparison of three variant calling pipelines using simulated data." Academia Journal of Biology 43, no. 2 (2021): 47–53. http://dx.doi.org/10.15625/2615-9023/16006.

Full text

Abstract:

Advances in next generation sequencing allow us to do DNA sequencing rapidly at a relatively low cost. Multiple bioinformatics methods have been developed to identify genomic variants from whole genome or whole exome sequencing data. The development of better variant calling methodologies is limited by the difficulty of assessing the accuracy and completeness of a new method. Normally, computational methods can be benchmarked using simulated data which allows us to generate as much data as desired and under controlled scenarios. In this study, we compared three variant calling pipelines: Samtools/VarScan, Samtools/Bcftools, and Picard/GATK using two simulated datasets. The result showed a significant difference between the three pipelines in two cases. In Chromosome 6 dataset, GATK and Bcftools pipelines detected more than 90% of variants. Meanwhile, only 82.19% of mutations were detected by VarScan. In NA12878 datasets, the result showed GATK pipeline was more sensitive than Bcftools and Varscan pipeline. All pipelines showed a high Positive Predictive Value. Moreover, by a measure of run time, VarScan was the highest pipeline but GATK has an option for multithreading which is a way to make a program run faster. Therefore, GATK is more effective than Bcftools and Varscan to variant calling with a lower coverage dataset.

APA, Harvard, Vancouver, ISO, and other styles

50

Becker, Daniela, Denny Popp, Hauke Harms, and Florian Centler. "A Modular Metagenomics Pipeline Allowing for the Inclusion of Prior Knowledge Using the Example of Anaerobic Digestion." Microorganisms 8, no. 5 (2020): 669. http://dx.doi.org/10.3390/microorganisms8050669.

Full text

Abstract:

Metagenomics analysis revealing the composition and functional repertoire of complex microbial communities typically relies on large amounts of sequence data. Numerous analysis strategies and computational tools are available for their analysis. Fully integrated automated analysis pipelines such as MG-RAST or MEGAN6 are user-friendly but not designed for integrating specific knowledge on the biological system under study. In order to facilitate the consideration of such knowledge, we introduce a modular, adaptable analysis pipeline combining existing tools. We applied the novel pipeline to simulated mock data sets focusing on anaerobic digestion microbiomes and compare results to those obtained with established automated analysis pipelines. We find that the analysis strategy and choice of tools and parameters have a strong effect on the inferred taxonomic community composition, but not on the inferred functional profile. By including prior knowledge, computational costs can be decreased while improving result accuracy. While automated off-the-shelf analysis pipelines are easy to apply and require no knowledge on the microbial system under study, custom-made pipelines require more preparation time and bioinformatics expertise. This extra effort is minimized by our modular, flexible, custom-made pipeline, which can be adapted to different scenarios and can take available knowledge on the microbial system under study into account.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!