To see the other types of publications on this topic, follow the link: Reference protein-coding alignments.

Journal articles on the topic 'Reference protein-coding alignments'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 30 journal articles for your research on the topic 'Reference protein-coding alignments.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Jeon, Yoon-Seong, Kihyun Lee, Sang-Cheol Park, Bong-Soo Kim, Yong-Joon Cho, Sung-Min Ha, and Jongsik Chun. "EzEditor: a versatile sequence alignment editor for both rRNA- and protein-coding genes." International Journal of Systematic and Evolutionary Microbiology 64, Pt_2 (February 1, 2014): 689–91. http://dx.doi.org/10.1099/ijs.0.059360-0.

Full text
Abstract:
EzEditor is a Java-based molecular sequence editor allowing manipulation of both DNA and protein sequence alignments for phylogenetic analysis. It has multiple features optimized to connect initial computer-generated multiple alignment and subsequent phylogenetic analysis by providing manual editing with reference to biological information specific to the genes under consideration. It provides various functionalities for editing rRNA alignments using secondary structure information. In addition, it supports simultaneous editing of both DNA sequences and their translated protein sequences for protein-coding genes. EzEditor is, to our knowledge, the first sequence editing software designed for both rRNA- and protein-coding genes with the visualization of biologically relevant information and should be useful in molecular phylogenetic studies. EzEditor is based on Java, can be run on all major computer operating systems and is freely available from http://sw.ezbiocloud.net/ezeditor/.
APA, Harvard, Vancouver, ISO, and other styles
2

Staats, Martijn, and Jan A. L. van Kan. "Genome Update of Botrytis cinerea Strains B05.10 and T4." Eukaryotic Cell 11, no. 11 (October 26, 2012): 1413–14. http://dx.doi.org/10.1128/ec.00164-12.

Full text
Abstract:
ABSTRACT We report here an update of the Botrytis cinerea strains B05.10 and T4 genomes, as well as an automated preliminary gene structure annotation. High-coverage de novo assemblies and reference-based alignments led to a correction of wrong base calls, elimination of sequence gaps, and improved joining of contigs. The new assemblies have substantially lower numbers of scaffolds and a concomitant increase in the N 50 .The list of protein-coding genes was generated using the evidence-driven gene predictor Augustus, with expressed sequence tag evidence and RNA-Seq data as input.
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Xiaoyu, Irene M. Kaplow, Morgan Wirthlin, Tae Yoon Park, and Andreas R. Pfenning. "HALPER facilitates the identification of regulatory element orthologs across species." Bioinformatics 36, no. 15 (May 14, 2020): 4339–40. http://dx.doi.org/10.1093/bioinformatics/btaa493.

Full text
Abstract:
Abstract Summary Diverse traits have evolved through cis-regulatory changes in genome sequence that influence the magnitude, timing and cell type-specificity of gene expression. Advances in high-throughput sequencing and regulatory genomics have led to the identification of regulatory elements in individual species, but these genomic regions remain difficult to align across taxonomic orders due to their lack of sequence conservation relative to protein coding genes. The groundwork for tracing the evolution of regulatory elements is provided by the recent assembly of hundreds of genomes, the generation of reference-free Cactus multiple sequence alignments of these genomes, and the development of the halLiftover tool for mapping regions across these alignments. We present halLiftover Post-processing for the Evolution of Regulatory Elements (HALPER), a tool for constructing contiguous regulatory element orthologs from the outputs of halLiftover. We anticipate that this tool will enable users to efficiently identify orthologs of regulatory elements across hundreds of species, providing novel insights into the evolution of traits that have evolved through gene expression. Availability and implementation HALPER is implemented in python and available on github: https://github.com/pfenninglab/halLiftover-postprocessing. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
4

Zheng, Hewei, Xueying Zhao, Hong Wang, Yu Ding, Xiaoyan Lu, Guosi Zhang, Jiaxin Yang, et al. "Location deviations of DNA functional elements affected SNP mapping in the published databases and references." Briefings in Bioinformatics 21, no. 4 (August 2, 2019): 1293–301. http://dx.doi.org/10.1093/bib/bbz073.

Full text
Abstract:
Abstract The recent extensive application of next-generation sequencing has led to the rapid accumulation of multiple types of data for functional DNA elements. With the advent of precision medicine, the fine-mapping of risk loci based on these elements has become of paramount importance. In this study, we obtained the human reference genome (GRCh38) and the main DNA sequence elements, including protein-coding genes, miRNAs, lncRNAs and single nucleotide polymorphism flanking sequences, from different repositories. We then realigned these elements to identify their exact locations on the genome. Overall, 5%–20% of all sequence element locations deviated among databases, on the scale of kilobase-pair to megabase-pair. These deviations even affected the selection of genome-wide association study risk-associated genes. Our results implied that the location information for functional DNA elements may deviate among public databases. Researchers should take care when using cross-database sources and should perform pilot sequence alignments before element location-based studies.
APA, Harvard, Vancouver, ISO, and other styles
5

Ortiz-Romero, Pablo L., Gonzalo Gomez-Lopez, Sagrario Gómez de Benito, Veronica Monsalvez, Jose P. Vaque, Nerea Martinez, Ignacio Varela, et al. "Mutations in PLCG1 Is a Frequent Event in Cutaneous T-Cell Lymphomas." Blood 120, no. 21 (November 16, 2012): 300. http://dx.doi.org/10.1182/blood.v120.21.300.300.

Full text
Abstract:
Abstract Abstract 300 Background: Cutaneous T-cell lymphoma (CTCL) is a heterogeneous group of diseases characterized by clonal expansion of malignant T-cells in the skin. The two predominant clinical forms of CTCL are mycosis fungoides (MF) and Sezary syndrome (SS). Tumor-stage MF has an unfavorable prognosis with a 10-year survival of approximately 40%. The molecular pathogenesis of CTCL is still basically unknown, although some data suggest that signalling from T-cell receptor (TCR) is a driving force. However, the molecular mechanisms responsible for this activation have not been fully clarified. Methods: Based on the hypothesis that TCR activation may depend, at least in part, on somatic mutations, we have investigated this in a selection of genes belonging to TCR, or related pathways, such as NFkB, JAK/STAT, by means of deep sequencing. A Target Enrichment method using SureSelect system (Agilent) has been used to enrich in exons and regulatory regions of 524 genes belonging to these pathways. DNA from 2 tumoral-MF, 5 erythrodermic-MF and 4 SS patients, both normal and tumoral, were processed and sequenced with Genome Analyzer GA2 (Illumina) (PE-42bp). Sequencing data were first checked by FastQC and aligned to the human reference genome (GRCh37) using BWA and BFAST alignments. Somatic variants were identified using GATK. Thus, SNPs available at dbSNP 135 (hg19) and 1000 Genomes Project were filtered out from VCF output files. The GATK-QUAL field was employed for ranking selected somatic variants. Biological impact predictions for detected variants were obtained from Ensembl Variant Effect Predictor. Putative variants were manually reviewed and validated by capillary sequencing. Immunohistochemical analysis for NFAT, p50, p52 and STAT·p was also performed. qPCR-genotyping for specific variants was performed in a new cohort of 60 CTCL patients including SS and tumoral MFs. Results: Several mutations were found in essential genes belonging to pathways implicated in the Treg and Th17 regulatory pathways, NFkB and JAK/STAT, among others. PLCG1 was found mutated in three samples, two of them sharing the same mutation affecting one of the PLCG1 protein catalytic domains. This mutation was further analyzed by qPCR-genotyping in the new series of patients, being detected in 20% of samples. PLCG mutated cases showed a strong paraffin immunostaining for nuclear NFAT, p50 and p52. Additionally, immunological studies performed by flow cytometry in CTCL cell lines show aberrant coexpression of TH17 and Treg phenotypes. Conclusions: Activation of the TCR in CTCL might be partially dependent on the acquisition of somatic mutations in the coding region of genes known to play an essential role in T-cell differentiation processes and acquisition of TH17 and Treg phenotypes. Especially relevant is the finding that the catalytic domain of PLCG1 is frequently mutated in tumoral MF samples. Disclosures: No relevant conflicts of interest to declare.
APA, Harvard, Vancouver, ISO, and other styles
6

Halo, Julia V., Amanda L. Pendleton, Feichen Shen, Aurélien J. Doucet, Thomas Derrien, Christophe Hitte, Laura E. Kirby, et al. "Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes." Proceedings of the National Academy of Sciences 118, no. 11 (March 8, 2021): e2016274118. http://dx.doi.org/10.1073/pnas.2016274118.

Full text
Abstract:
Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3′ end of LINE-1_Cfs (i.e., LINE-1_Cf 3′-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation.
APA, Harvard, Vancouver, ISO, and other styles
7

Marques, João P., Fernando A. Seixas, Liliana Farelo, Colin M. Callahan, Jeffrey M. Good, W. Ian Montgomery, Neil Reid, Paulo C. Alves, Pierre Boursot, and José Melo-Ferreira. "An Annotated Draft Genome of the Mountain Hare (Lepus timidus)." Genome Biology and Evolution 12, no. 1 (December 13, 2019): 3656–62. http://dx.doi.org/10.1093/gbe/evz273.

Full text
Abstract:
Abstract Hares (genus Lepus) provide clear examples of repeated and often massive introgressive hybridization and striking local adaptations. Genomic studies on this group have so far relied on comparisons to the European rabbit (Oryctolagus cuniculus) reference genome. Here, we report the first de novo draft reference genome for a hare species, the mountain hare (Lepus timidus), and evaluate the efficacy of whole-genome re-sequencing analyses using the new reference versus using the rabbit reference genome. The genome was assembled using the ALLPATHS-LG protocol with a combination of overlapping pair and mate-pair Illumina sequencing (77x coverage). The assembly contained 32,294 scaffolds with a total length of 2.7 Gb and a scaffold N50 of 3.4 Mb. Re-scaffolding based on the rabbit reference reduced the total number of scaffolds to 4,205 with a scaffold N50 of 194 Mb. A correspondence was found between 22 of these hare scaffolds and the rabbit chromosomes, based on gene content and direct alignment. We annotated 24,578 protein coding genes by combining ab-initio predictions, homology search, and transcriptome data, of which 683 were solely derived from hare-specific transcriptome data. The hare reference genome is therefore a new resource to discover and investigate hare-specific variation. Similar estimates of heterozygosity and inferred demographic history profiles were obtained when mapping hare whole-genome re-sequencing data to the new hare draft genome or to alternative references based on the rabbit genome. Our results validate previous reference-based strategies and suggest that the chromosome-scale hare draft genome should enable chromosome-wide analyses and genome scans on hares.
APA, Harvard, Vancouver, ISO, and other styles
8

Seid, Jerome, Larisa Lozovatsky, Patrick G. Gallagher, and Karin E. Finberg. "Identification of a Novel SLC40A1 Arg88Ile Mutation in a Patient with Familial Iron Overload Treated By Phlebotomy." Blood 126, no. 23 (December 3, 2015): 954. http://dx.doi.org/10.1182/blood.v126.23.954.954.

Full text
Abstract:
Abstract INTRODUCTION: The cellular iron exporter ferroportin, encoded by the SLC40A1 gene, plays a key role in systemic iron regulation by mediating the absorption of dietary iron from duodenal enterocytes and the release of macrophage iron stores into the plasma. SLC40A1 mutations result in a clinically heterogeneous iron overload disorder exhibiting autosomal dominant transmission. Mutations that impair iron export function result in a classical ferroportin disease phenotype characterized by hyperferritinemia, normal transferrin saturation, and macrophage iron loading, while mutations that impair the regulation of ferroportin by hepcidin result in a non-classical form of disease exhibiting high transferrin saturation and hepatocellular iron loading. Here we report the clinical phenotype of a male patient with a personal and family history of iron overload who was found to harbor a novel SLC40A1 mutation. CLINICAL HISTORY: A 39-year-old male of Italian descent came to clinical attention after laboratory evidence of iron overload was detected at the time of a routine physical exam. Serum ferritin was markedly elevated at 5018 ng/mL, while transferrin iron saturation was within the normal range at 42%. A complete blood count revealed hemoglobin 16.1 g/dL, hematocrit 48.7%, and MCV 96.7 fL. Liver function tests revealed mild elevation of transaminases (AST 50 U/L, ALT 114 U/L). HBV and HCV serologies, as well as an anti-nuclear antibody screen, were negative. Rheumatoid factor, ceruloplasmin, and alpha-fetoprotein were within the normal range. Genetic testing for the HFE C282Y, H63D, and S65C variants was negative. Abdominal ultrasound revealed a somewhat course and echogenic liver. The patient's past medical and surgical histories were non-contributory. Family history was notable for a 63-year-old father with non-HFE hemochromatosis treated by phlebotomy, as well as a possible history of iron overload in the paternal grandmother. There was no known family history of hepatocellular carcinoma. The patient reported consuming ≤ 5 alcoholic beverages per week. Review of systems was negative, and no organomegaly was detected on physical exam. The patient began undergoing phlebotomy approximately every 2 weeks, which now has been well tolerated for almost two years. His most recent ferritin level was within the normal range (328 ng/mL). METHODS: Using genomic DNA extracted from peripheral blood as template, all coding regions and intron-exon boundaries of SLC40A1 were amplified by polymerase chain reaction and analyzed by bidirectional Sanger sequencing. Sequence chromatograms were analyzed using Sequencher software. This study was approved by the Yale University Human Investigation Committee (protocol #010412377). RESULTS: A heterozygous single nucleotide substitution (c.263G>T, encoding p.Arg88Ile) was identified in exon 3 of the SLC40A1 gene (nomenclature per Ensembl reference transcript ENST00000261024). Ferroportin is a predicted multipass transmembrane protein, and Arg88 resides between two predicted transmembrane domains. Protein sequence alignments reveal that amino acid 88 is conserved as an arginine in ferroportin homologs in species as evolutionarily distant as Xenopus laevis and Danio rerio, suggesting that this residue is required for normal ferroportin function. The p.Arg88Ile variant has not been reported in the 1000 Genomes Project or the Exome Aggregation Consortium, demonstrating that it is not a polymorphism in the general population. The mutation predictor algorithms PolyPhen2, SIFT, and MutationTaster all strongly predicted this mutation to be damaging. DISCUSSION: The ferroportin variant detected in this case (p.Arg88Ile) represents the third non-synonymous substitution at ferroportin residue 88 detected in patients with iron overload phenotypes. p.Arg88Gly was identified in a 38-year-old male who became anemic under a phlebotomy program (Cunat S., et al., Clin Chem 2007), while p.Arg88Thr was detected in multiple affected members of a single kindred in whom serial phlebotomy was well tolerated (Bach V, et al. Blood Cells Mol Dis 2006). Collectively, these findings suggest that the particular amino acid substitution at residue 88 may influence the degree of cellular iron sequestration. Future work will assess the effect of the Arg88Ile substitution on ferroportin function. Disclosures No relevant conflicts of interest to declare.
APA, Harvard, Vancouver, ISO, and other styles
9

Ameur, Adam, Huiwen Che, Marcel Martin, Ignas Bunikis, Johan Dahlberg, Ida Höijer, Susana Häggqvist, et al. "De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data." Genes 9, no. 10 (October 9, 2018): 486. http://dx.doi.org/10.3390/genes9100486.

Full text
Abstract:
The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.
APA, Harvard, Vancouver, ISO, and other styles
10

Taylor, Rebecca S., Rebekah L. Horn, Xi Zhang, G. Brian Golding, Micheline Manseau, and Paul J. Wilson. "The Caribou (Rangifer tarandus) Genome." Genes 10, no. 7 (July 17, 2019): 540. http://dx.doi.org/10.3390/genes10070540.

Full text
Abstract:
Rangifer tarandus, known as caribou or reindeer, is a widespread circumpolar species which presents significant variability in their morphology, ecology, and genetics. A genome was sequenced from a male boreal caribou (R. t. caribou) from Manitoba, Canada. Both paired end and Chicago libraries were constructed and sequenced on Illumina platforms. The final assembly consists of approximately 2.205 Gb, and has a scaffold N50 of 11.765 Mb. BUSCO (Benchmarking Universal Single-Copy Orthologs) reconstructed 3820 (93.1%) complete mammalian genes, and genome annotation identified the locations of 33,177 protein-coding genes. An alignment to the bovine genome was carried out, indicating sequence coverage on all bovine chromosomes. A high-quality reference genome will be invaluable for evolutionary research and for conservation efforts for the species. Further information about the genome, including a FASTA file of the assembly and the annotation files, is available on our caribou genome website. Raw sequence data is available at the National Centre for Biotechnology Information (NCBI), under the BioProject accession number PRJNA549927.
APA, Harvard, Vancouver, ISO, and other styles
11

De Vega, Jose, Iain Donnison, Sarah Dyer, and Kerrie Farrar. "Draft genome assembly of the biofuel grass crop Miscanthus sacchariflorus." F1000Research 10 (January 18, 2021): 29. http://dx.doi.org/10.12688/f1000research.44714.1.

Full text
Abstract:
Miscanthus sacchariflorus (Maxim.) Hack. is a highly productive C4 perennial rhizomatous biofuel grass crop. M. sacchariflorus is among the most widely distributed species in the genus, particularly at cold northern latitudes, and is one of the progenitor species of the commercial M. × giganteus genotypes. We generated a 2.54 Gb whole-genome assembly of the diploid M. sacchariflorus cv. “Robustus 297” genotype, which represented ~59% of the expected total genome size. We later anchored this assembly using the chromosomes from the M. sinensis genome to generate a second assembly with improved contiguity. We annotated 86,767 and 69,049 protein-coding genes in the unanchored and anchored assemblies, respectively. We estimated our assemblies included ~85% of the M. sacchariflorus genes based on homology and core markers. The utility of the new reference for genomic studies was evidenced by a 99% alignment rate of the RNA-seq reads from the same genotype. The raw data, unanchored and anchored assemblies, and respective gene annotations are publicly available.
APA, Harvard, Vancouver, ISO, and other styles
12

Gram, Trine, and Peter Ahrens. "Improved Diagnostic PCR Assay for Actinobacillus pleuropneumoniae Based on the Nucleotide Sequence of an Outer Membrane Lipoprotein." Journal of Clinical Microbiology 36, no. 2 (1998): 443–48. http://dx.doi.org/10.1128/jcm.36.2.443-448.1998.

Full text
Abstract:
The gene (omlA) coding for an outer membrane protein ofActinobacillus pleuropneumoniae serotypes 1 and 5 has been described earlier and has formed the basis for development of a specific PCR assay. The corresponding regions of all 12 A. pleuropneumoniae reference strains of biovar 1 were sequenced. Alignment of the sequences revealed conserved terminal and variable middle regions, which divided the reference strains into four distinct groups. Primers were selected from the conserved 5′ and 3′ termini of the gene. A 950-bp amplicon was obtained from each of 102 tested field isolates of A. pleuropneumoniae obtained from lungs. Their identity was verified by sequencing approximately 500 bp of the amplification product from 50 of the A. pleuropneumoniaeisolates, which all showed the expected DNA sequence characteristic of the serotype. To test the specificity of the reaction, 23 other bacterial species related to A. pleuropneumoniae or isolated from pigs were assayed. They were all found negative in the PCR, as were tonsil cultures from 50 pigs of an A. pleuropneumoniae-negative herd. The sensitivity assessed by agarose gel analysis of the PCR product was 102 CFU/PCR test tube. The specificity and sensitivity of this PCR compared to those of culture suggest the use of this PCR for routine identification of A. pleuropneumoniae.
APA, Harvard, Vancouver, ISO, and other styles
13

Saha, Indrajit, Nimisha Ghosh, Ayan Pradhan, Nikhil Sharma, Debasree Maity, and Kaushik Mitra. "Whole genome analysis of more than 10 000 SARS-CoV-2 virus unveils global genetic diversity and target region of NSP6." Briefings in Bioinformatics 22, no. 2 (March 2021): 1106–21. http://dx.doi.org/10.1093/bib/bbab025.

Full text
Abstract:
Abstract Whole genome analysis of SARS-CoV-2 is important to identify its genetic diversity. Moreover, accurate detection of SARS-CoV-2 is required for its correct diagnosis. To address these, first we have analysed publicly available 10 664 complete or near-complete SARS-CoV-2 genomes of 73 countries globally to find mutation points in the coding regions as substitution, deletion, insertion and single nucleotide polymorphism (SNP) globally and country wise. In this regard, multiple sequence alignment is performed in the presence of reference sequence from NCBI. Once the alignment is done, a consensus sequence is build to analyse each genomic sequence to identify the unique mutation points as substitutions, deletions, insertions and SNPs globally, thereby resulting in 7209, 11700, 119 and 53 such mutation points respectively. Second, in such categories, unique mutations for individual countries are determined with respect to other 72 countries. In case of India, unique 385, 867, 1 and 11 substitutions, deletions, insertions and SNPs are present in 566 SARS-CoV-2 genomes while 458, 1343, 8 and 52 mutation points in such categories are common with other countries. In majority (above 10%) of virus population, the most frequent and common mutation points between global excluding India and India are L37F, P323L, F506L, S507G, D614G and Q57H in NSP6, RdRp, Exon, Spike and ORF3a respectively. While for India, the other most frequent mutation points are T1198K, A97V, T315N and P13L in NSP3, RdRp, Spike and ORF8 respectively. These mutations are further visualised in protein structures and phylogenetic analysis has been done to show the diversity in virus genomes. Third, a web application is provided for searching mutation points globally and country wise. Finally, we have identified the potential conserved region as target that belongs to the coding region of ORF1ab, specifically to the NSP6 gene. Subsequently, we have provided the primers and probes using that conserved region so that it can be used for detecting SARS-CoV-2. Contact:indrajit@nitttrkol.ac.inSupplementary information: Supplementary data are available at http://www.nitttrkol.ac.in/indrajit/projects/COVID-Mutation-10K
APA, Harvard, Vancouver, ISO, and other styles
14

Lee, Taeyoung, Moon Young Kim, Jungmin Ha, and Suk-Ha Lee. "Detection of large sequence insertions by a hybrid approach that combine de novo assembly and resequencing of medium-coverage genome sequences." Genome 61, no. 10 (October 2018): 745–54. http://dx.doi.org/10.1139/gen-2018-0027.

Full text
Abstract:
Large sequence insertion (LSI) is one of the structural variations (SVs) that may cause phenotypic differences in plants. To identify the LSIs using medium-coverage sequencing data of four wild soybean (Glycine soja) genotypes, we designed a hybrid approach combining de novo assembly and read mapping. Total reads and reads with both ends unmapped were independently assembled into “ordinary contigs” and “orphan contigs”, respectively, and subjected to pairwise alignment and stringent filtering. This approach predicted 24 LSIs averaging 2682 bp in size, with no overlap with SVs detected by Pindel, BreakDancer, or ScanIndel, and they were validated by PCR. Compared with the soybean (Glycine max) reference genome, 20 LSIs were located outside genic regions. One of the four LSIs within a genic region, LSI05, is located in the coding DNA sequence region of a protein kinase superfamily gene (Glyma.08G123500). It caused delayed translation initiation and loss of 24 amino acids in the wild soybean genotype CW12. LSI05 was more frequently observed in 29 G. soja accessions than in 34 G. max accessions. Identified LSIs would be genomic resources harboring novel gene contents for studying SVs and improving crops. Moreover, our cost-efficient approach may be applicable to other plant species.
APA, Harvard, Vancouver, ISO, and other styles
15

Tasma, I. Made, Dani Satyawan, Habib Rijzaani, Ida Rosdianti, Puji Lestari, and Rubiyo Rubiyo. "GENOMIC VARIATION OF FIVE INDONESIAN CACAO (Theobroma cacao L.) VARIETIES BASED ON ANALYSIS USING NEXT GENERATION SEQUENCING." Indonesian Journal of Agricultural Science 17, no. 2 (May 9, 2017): 57. http://dx.doi.org/10.21082/ijas.v17n2.2016.p57-64.

Full text
Abstract:
<p class="abstrakInggris"><span>Indonesian cacao productivity is still low mainly due to the lack availability of superior cacao planting materials. A new breeding method is necessary to expedite cacao yield improvement programs. To date, no study has yet been done to characterize Indonesian cacao varieties at the whole genome level. The objective of this study was to characterize genomic variation of five superior Indonesian cacao varieties using next-generation sequencing. Genetic materials used were five Indonesian cacao varieties, i.e. ICCRI2, ICCRI3, ICCRI4, SUL2 and ICS13. Genome sequences were mapped to the cacao reference genome sequence of Criollo variety. Sequence alignment and genomic variation discovery were done using Bowtie2 and mpileup software of Samtools, respectively. A total of 2,326,088 single nucleotide polymorphisms (SNPs) and 362,081 insertions and deletions (Indels) were obtained from this study. In average, a DNA variant was identified in every 121 nucleotides of the genome sequence. Most of the DNA variants were located outside the genes. Only 347,907 SNPs and Indels (13.18%) were located within protein coding region (exon). Among the DNA variations within exon, 188,949 SNPs caused missense mutation and 1,535 SNPs induced nonsense mutation. Unique gene-based SNPs were also discovered from this study that can be used as fingerprints for the particular cacao variety. The DNA variants obtained were excellent DNA marker resources to support cacao breeding programs. The SNPs discovered are useful as materials for genome-wide SNP chip development to be used for gene and QTL tagging of important traits for expediting national cacao breeding program.</span></p>
APA, Harvard, Vancouver, ISO, and other styles
16

Ho, Eric S. K., Howard C. H. Chow, Chris T. L. Chan, Ruibang Luo, Henry C. M. Leung, Siu Ming Yiu, Francis Y. L. Chin, Yok Lam Kwong, and Anskar Y. H. Leung. "Whole Genome Sequencing On Donor Cell Leukemia in a Patient with Multiple Myeloma Identified Gene Mutations That May Provide Insights to Leukemogenesis." Blood 120, no. 21 (November 16, 2012): 2414. http://dx.doi.org/10.1182/blood.v120.21.2414.2414.

Full text
Abstract:
Abstract Abstract 2414 Donor cell leukemia (DCL) is a rare occurrence and refers to leukemia of donor origin in patients who have received allogeneic hematopoietic stem cell transplantation (HSCT). We have previously described a male patient with IgG-κ myeloma who received non-myeloablative allogeneic HSCT from a HLA-matched brother and developed complex karyotype acute myeloid leukemia (AML) of donor origin 10 years after transplantation. He achieved complete remission (CR) with standard induction and consolidation chemotherapy but relapsed one year afterwards. We hypothesized that a comparison of the donor HSC before transplantation (pre-leukemic) and the subsequent AML at whole genome level will provide a unique dataset that may shed light on the pathogenesis of leukemia. DNA was extracted from an aliquot of donor mobilized peripheral blood mononuclear cells (mPBMNC) frozen before transplantation as well as unfractionated and CD34+ myeloblasts of the patient's bone marrow at diagnosis and subsequent relapse of AML. The complete donor origin of the AML was confirmed by PCR based on polymorphic STRs. Whole-genome sequencing (WGS) was performed to sequence paired-end reads generated by Illumina HiSeq 2000. Reads were aligned to the human referecne genome (hg19, NCBI37) by SOAP3 and analysed to detect single nucleotide variants (SNVs), small insertion and deletion (indels) and copy number variations (CNVs). Selected genes after filtering were independently validated by Sanger sequencing. There were 835M and 810M 100bp paried-end reads with insert distance of 500bp generated from donor mPBMNC and CD34+ myeloblasts of the relapsed DCL with respective mean depths of 43.2X and 42.6X after alignment. The digital karytoyping based on the read depth was consistent with that by conventional cytogenetic study. 3,979,582 and 1,020,717 SNVs and indels were detectable from both samples. Based on the Catalog of Somatic Mutations in Cancer (COSMIC) and excluding those asian specific wildtypes annotated in 1000 genome project, 11 SNVs and 15 indels within coding sequence with potential roles as tumor suppressors or oncogenes were identified. On the other hand, there were 128,752 and 56,330 SNVs and indels detected exclusively in DCL. Those putative non-pathogenic SNP and those changes locating outside the gene regions were filtered. Within the gene region, SNVs in introns and synonymous mutations were also filtered. 142 non-synonymous SNVs (139 missense and 3 nonsense mutations) were identified of which 25 were considered as statistically highly confident and 17 of them could be confirmed by Sanger Sequencing. Twelve of these were also identified from the whole BM sample of DCL at diagnosis. These candidates include transcription factor (SALL1), metabolic enzymes (UGT1A5, SPEG), membrane protein (TMC6, SCN3A), cytoskeleton protein (MYH10), ribonucleoprotein (RAVER1), secreted protein (WNT7A), protein involved in DNA damage repair (APLF) and others (PRPF8, ZNF518B and MKRN3). 26 indels were indentified in the coding region of which 5 were considered as statistically highly confident, however, only one indel could be confirmed by Sanger Sequencing in the relapse sample and was not present in the diagnostic sample. The WGS performed in paired pre-leukemic (donor HSC) and leukemic (DCL) human samples has provided us with unique opportunities to dissect the genetic changes in HSC that may contribute to the initiation of AML with complex karyotype. The potential impacts of bone marrow microenvironment in this patient with myeloma in inducing DCL are also being evaluated. Disclosures: No relevant conflicts of interest to declare.
APA, Harvard, Vancouver, ISO, and other styles
17

Tessier, Laurence, Olivier Côté, and Dorothee Bienzle. "Sequence variant analysis of RNA sequences in severe equine asthma." PeerJ 6 (October 11, 2018): e5759. http://dx.doi.org/10.7717/peerj.5759.

Full text
Abstract:
Background Severe equine asthma is a chronic inflammatory disease of the lung in horses similar to low-Th2 late-onset asthma in humans. This study aimed to determine the utility of RNA-Seq to call gene sequence variants, and to identify sequence variants of potential relevance to the pathogenesis of asthma. Methods RNA-Seq data were generated from endobronchial biopsies collected from six asthmatic and seven non-asthmatic horses before and after challenge (26 samples total). Sequences were aligned to the equine genome with Spliced Transcripts Alignment to Reference software. Read preparation for sequence variant calling was performed with Picard tools and Genome Analysis Toolkit (GATK). Sequence variants were called and filtered using GATK and Ensembl Variant Effect Predictor (VEP) tools, and two RNA-Seq predicted sequence variants were investigated with both PCR and Sanger sequencing. Supplementary analysis of novel sequence variant selection with VEP was based on a score of <0.01 predicted with Sorting Intolerant from Tolerant software, missense nature, location within the protein coding sequence and presence in all asthmatic individuals. For select variants, effect on protein function was assessed with Polymorphism Phenotyping 2 and screening for non-acceptable polymorphism 2 software. Sequences were aligned and 3D protein structures predicted with Geneious software. Difference in allele frequency between the groups was assessed using a Pearson’s Chi-squared test with Yates’ continuity correction, and difference in genotype frequency was calculated using the Fisher’s exact test for count data. Results RNA-Seq variant calling and filtering correctly identified substitution variants in PACRG and RTTN. Sanger sequencing confirmed that the PACRG substitution was appropriately identified in all 26 samples while the RTTN substitution was identified correctly in 24 of 26 samples. These variants of uncertain significance had substitutions that were predicted to result in loss of function and to be non-neutral. Amino acid substitutions projected no change of hydrophobicity and isoelectric point in PACRG, and a change in both for RTTN. For PACRG, no difference in allele frequency between the two groups was detected but a higher proportion of asthmatic horses had the altered RTTN allele compared to non-asthmatic animals. Discussion RNA-Seq was sensitive and specific for calling gene sequence variants in this disease model. Even moderate coverage (<10–20 counts per million) yielded correct identification in 92% of samples, suggesting RNA-Seq may be suitable to detect sequence variants in low coverage samples. The impact of amino acid alterations in PACRG and RTTN proteins, and possible association of the sequence variants with asthma, is of uncertain significance, but their role in ciliary function may be of future interest.
APA, Harvard, Vancouver, ISO, and other styles
18

Ashari, Khalidah Syahirah, Najwa Syahirah Roslan, Abdul Rahman Omar, Mohd Hair Bejo, Aini Ideris, and Nurulfiza Mat Isa. "Genome sequencing and analysis ofSalmonella entericasubsp.entericaserovar Stanley UPM 517: Insights on its virulence-associated elements and their potentials as vaccine candidates." PeerJ 7 (June 28, 2019): e6948. http://dx.doi.org/10.7717/peerj.6948.

Full text
Abstract:
Salmonella entericasubsp.entericaserovar Stanley (S. Stanley) is a pathogen that contaminates food, and is related toSalmonellaoutbreaks in a variety of hosts such as humans and farm animals through products like dairy items and vegetables. Despite the fact that several vaccines ofSalmonellastrains had been constructed, none of them were developed according to serovar Stanley up to this day. This study presents results of genome sequencing and analysis on ourS. Stanley UPM 517 strain taken from fecal swabs of 21-day-old healthy commercial chickens in Perak, Malaysia and usedSalmonella entericasubsp.entericaserovar Typhimurium LT2 (S. Typhimurium LT2) as a reference to be compared with. First, sequencing and assembling of theSalmonellaStanley UPM 517 genome into a contiguous form were done. The work was then continued with scaffolding and gap filling. Annotation and alignment of the draft genome was performed withS. Typhimurium LT2. The other elements of virulence estimated in this study includedSalmonellapathogenicity islands, resistance genes, prophages, virulence factors, plasmid regions, restriction-modification sites and the CRISPR-Cas system. TheS. Stanley UPM 517 draft genome had a length of 4,736,817 bp with 4,730 coding sequence and 58 RNAs. It was discovered via genomic analysis on this strain that there were antimicrobial resistance properties toward a wide variety of antibiotics. Tcf and ste, the two fimbrial virulence clusters related with human and broiler intestinal colonizations which were not found inS. Typhimurium LT2, were atypically discovered in theS. Stanley UPM 517 genome. These clusters are involved in the intestinal colonization of human and broilers, respectively. There were sevenSalmonellapathogenicity islands (SPIs) within the draft genome, which contained the virulence factors associated withSalmonellainfection (except SPI-14). Five intact prophage regions, mostly comprising of the protein encoding Gifsy-1, Fels-1, RE-2010 and SEN34 prophages, were also encoded in the draft genome. Also identified were Type I–III restriction-modification sites and the CRISPR-Cas system of the Type I–E subtype. As this strain exhibited resistance toward numerous antibiotics, we distinguished several genes that had the potential for removal in the construction of a possible vaccine candidate to restrain and lessen the pervasiveness of salmonellosis and to function as an alternative to antibiotics.
APA, Harvard, Vancouver, ISO, and other styles
19

Soverini, Simona, Angela Poerio, Alberto Ferrarini, Ilaria Iacobucci, Marco Sazzini, Joannah Score, Enrico Giacomelli, et al. "Whole-Transcriptome Sequencing In Chronic Myeloid Leukemia Reveals Novel Gene Mutations That May Be Associated with Disease Pathogenesis and Progression." Blood 116, no. 21 (November 19, 2010): 885. http://dx.doi.org/10.1182/blood.v116.21.885.885.

Full text
Abstract:
Abstract Abstract 885 Philadelphia-positive (Ph+) chronic myeloid leukemia (CML) has always been regarded as a genetically homogeneous disease. However, the fact that a proportion of patients (pts), especially in the high Sokal risk setting, fail tyrosine kinase inhibitor therapy and progress to blast crisis (BC) suggests that a certain degree of heterogeneity exists. It can be hypothesized that genetic factors additional to the Ph+ chromosome may be present in these pts. To address this issue, we are currently using massively parallel sequencing to perform a qualitative and quantitative survey of the whole transcriptome of Ph+ CML cells at diagnosis and at progression to BC. Results are being integrated with genome-wide search for copy number alterations by Affymetrix SNP 6.0 arrays. We used a Solexa Illumina Genome Analyzer to scan the transcriptome of a CML patient at the time of diagnosis, at the time of remission (major molecular response) and at the time of progression from chronic phase (CP) to lymphoid blast crisis (BC). Both custom scripts and published algorithms were used for read alignment against the human reference genome, for single nucleotide variant (SNV) calling, for identification of alternative splicings and fusion transcripts, and for digital gene expression profiling. Comparison of the SNVs identified in the diagnosis and relapse samples with the SNVs detected in the remission sample – representing inherited sequence variants not specific for the Ph+ clone – allowed the identification of eight missense mutations at diagnosis affecting the coding sequences of AMPD3 (encoding adenosine monophosphate deaminase 3), SUCNR1 (succinate receptor 1), FANCD2 (Fanconi anemia, complementation group D2), INCENP (inner centromere protein), BSPRY (B-box and SPRY domain containing), HEXDC (hexosaminidase containing), NUDT9 (ADP-ribose diphosphatase) and KIAA2018 (encoding a protein with predicted DNA binding and transcriptional regulation activity) genes. Six of these mutations (FANCD2, INCENP, BSPRY, HEXDC, NUDT9) were also detected in the Ph+ clone re-emerged at the time of disease progression, together with seven additional missense mutations affecting the coding sequences of IDH2 (isocitrate dehydrogenase isoform 2), DECR1 (2,4-dienoyl CoA reductase 1), C4Orf14 (mitochondrial nitric oxide synthase), MRM1 (mitochondrial rRNA methyltransferase 1), PRKD2 (protein kinase D2), TCHP (mitostatin) and ABL1 genes. Digital gene expression analysis showed downregulation of SUCNR1, that might be a consequence of the P292A mutation we detected. IDH2, MRM1, AMPD3, and KIAA2018 mutations were found in additional pts. The IDH2 R140Q mutation was detected in 3/75 (4%) myeloid BC, 1/31 (3.2%) lymphoid BC, 0/34 Ph+ ALL and 0/23 Philadelphia-negative (Ph-) ALL pts. The MRM1 C120S mutation was found in 6/70 (9%) additional BC pts (2 lymphoid and 4 myeloid). AMPD3 and KIAA2018 genes were found to harbour the same point mutations (N334S and S1818G, respectively) in 1 out of 20 additional CP patients analyzed. Massively parallel sequencing of the sample collected at diagnosis also revealed that the Bcr-Abl kinase domain was already harbouring point mutations at low levels (E308D, A344G, R386S) but not the T315I that was selected at the time of disease progression. Point mutations in untraslated regions where miRNAs are known to bind were also detected, and are currently under validation. Digital gene expression profiling comparing progression to diagnosis showed significant expression changes including upregulation of 134 genes and downregulation of 88 genes. In particular, we observed an upregulation of the B-cell developmental factor PAX5, its interactor Lef-1 and its targets IRF4, BLNK, Bik, EBF1, CD79A, CD79B, CD19, VpreB1, VpreB3, BOB1, RAG1 and RAG2; upregulation of PAX9; upregulation of WNT3A, WNT9A, GLI3 and downregulation of SFRP1, resulting in aberrant activation of the Wnt signalling pathway. In summary, our preliminary data highlighted putative key genes whose deregulation may be recurrent in a subset of CML patients and may be linked to disease pathogenesis or progression. Their actual role in CML is currently being exlored. Massively parallel sequencing of additional patients is ongoing. Supported by European LeukemiaNet, AIL, AIRC, Fondazione Del Monte di Bologna e Ravenna, FIRB 2006, PRIN 2008, Ateneo RFO grants. Disclosures: Baccarani: NOVARTIS: Honoraria; BRISTOL MYERS SQUIBB: Honoraria. Martinelli:Novartis: Consultancy, Honoraria; BMS: Consultancy, Honoraria; Pfizer: Consultancy.
APA, Harvard, Vancouver, ISO, and other styles
20

Sato, Kazuhiro, Martin Mascher, Axel Himmelbach, Georg Haberer, Manuel Spannagl, and Nils Stein. "Chromosome-scale assembly of wild barley accession “OUH602”." G3 Genes|Genomes|Genetics, July 13, 2021. http://dx.doi.org/10.1093/g3journal/jkab244.

Full text
Abstract:
Abstract Barley (Hordeum vulgare) was domesticated from its wild ancestral form ca. 10,000 years ago in the Fertile Crescent and is widely cultivated throughout the world, except for in tropical areas. The genome size of both cultivated barley and its conspecific wild ancestor is approximately 5 Gb. High-quality chromosome-level assemblies of 19 cultivated and one wild barley genotype were recently established by pan-genome analysis. Here, we release another equivalent short-read assembly of the wild barley accession “OUH602.” A series of genetic and genomic resources were developed for this genotype in prior studies. Our assembly contains more than 4.4 Gb of sequence, with a scaffold N50 value of over 10 Mb. The haplotype shows high collinearity with the most recently updated barley reference genome, “Morex” V3, with some inversions. Gene projections based on “Morex” gene models revealed 46,807 protein-coding sequences and 43,375 protein-coding genes. Alignments to publicly available sequences of bacterial artificial chromosome (BAC) clones of “OUH602” confirm the high accuracy of the assembly. Since more loci of interest have been identified in “OUH602,” the release of this assembly, with detailed genomic information, should accelerate gene identification and the utilization of this key wild barley accession.
APA, Harvard, Vancouver, ISO, and other styles
21

Wang, Chao, Ola Wallerman, Maja-Louise Arendt, Elisabeth Sundström, Åsa Karlsson, Jessika Nordin, Suvi Mäkeläinen, et al. "A novel canine reference genome resolves genomic architecture and uncovers transcript complexity." Communications Biology 4, no. 1 (February 10, 2021). http://dx.doi.org/10.1038/s42003-021-01698-x.

Full text
Abstract:
AbstractWe present GSD_1.0, a high-quality domestic dog reference genome with chromosome length scaffolds and contiguity increased 55-fold over CanFam3.1. Annotation with generated and existing long and short read RNA-seq, miRNA-seq and ATAC-seq, revealed that 32.1% of lifted over CanFam3.1 gaps harboured previously hidden functional elements, including promoters, genes and miRNAs in GSD_1.0. A catalogue of canine “dark” regions was made to facilitate mapping rescue. Alignment in these regions is difficult, but we demonstrate that they harbour trait-associated variation. Key genomic regions were completed, including the Dog Leucocyte Antigen (DLA), T Cell Receptor (TCR) and 366 COSMIC cancer genes. 10x linked-read sequencing of 27 dogs (19 breeds) uncovered 22.1 million SNPs, indels and larger structural variants. Subsequent intersection with protein coding genes showed that 1.4% of these could directly influence gene products, and so provide a source of normal or aberrant phenotypic modifications.
APA, Harvard, Vancouver, ISO, and other styles
22

Linh, Nguyen Thuy, Luu Han Ly, Nguyen Thuy Duong, and Huynh Thi Thu Hue. "Isolation and characterization of a c-repeat binding factor gene from Tevang-1 maize cultivar." TAP CHI SINH HOC 41, no. 3 (July 26, 2019). http://dx.doi.org/10.15625/0866-7160/v41n3.13782.

Full text
Abstract:
C-repeat binding factor (CBF) proteins are transcription factors involved in plant response to abiotic stresses, especially low-temperature condition. In this research, a CBF3-coding gene was isolated from a cold-acclimation maize variety, Zea mays var. Tevang-1 and denoted as ZmCBF3tv. The isolated gene shared 96.49% homology with the B73-reference gene and had no intron in the coding sequence. By using bioinformatic tools, a number of variations in the nucleotide and amino acid sequences were identified. An alignment between ZmCBF3tv and other CBF/DREB1 proteins from various species revealed functional regions and typical features, such as nuclear localization signal (NLS), the AP2 DNA-binding domain, and acidic-amino-acid-rich segments. Additionally, a phylogenetic analysis based on the AP2 domain showed that the maize CBF3 transcription factor had the highest similarity with that from rice and closely related to other DREB1/CBF protein of monocots. The function of the ZmCBF3tv product is suggested to be a CBF/DREB1 transcription factor.
APA, Harvard, Vancouver, ISO, and other styles
23

Linh, Nguyen Thuy, Luu Han Ly, Nguyen Thuy Duong, and Huynh Thi Thu Hue. "Isolation and characterization of a c-repeat binding factor gene from Tevang-1 maize cultivar." ACADEMIA JOURNAL OF BIOLOGY 41, no. 3 (July 26, 2019). http://dx.doi.org/10.15625/2615-9023/v41n3.13782.

Full text
Abstract:
C-repeat binding factor (CBF) proteins are transcription factors involved in plant response to abiotic stresses, especially low-temperature condition. In this research, a CBF3-coding gene was isolated from a cold-acclimation maize variety, Zea mays var. Tevang-1 and denoted as ZmCBF3tv. The isolated gene shared 96.49% homology with the B73-reference gene and had no intron in the coding sequence. By using bioinformatic tools, a number of variations in the nucleotide and amino acid sequences were identified. An alignment between ZmCBF3tv and other CBF/DREB1 proteins from various species revealed functional regions and typical features, such as nuclear localization signal (NLS), the AP2 DNA-binding domain, and acidic-amino-acid-rich segments. Additionally, a phylogenetic analysis based on the AP2 domain showed that the maize CBF3 transcription factor had the highest similarity with that from rice and closely related to other DREB1/CBF protein of monocots. The function of the ZmCBF3tv product is suggested to be a CBF/DREB1 transcription factor.
APA, Harvard, Vancouver, ISO, and other styles
24

Kuo, Richard I., Yuanyuan Cheng, Runxuan Zhang, John W. S. Brown, Jacqueline Smith, Alan L. Archibald, and David W. Burt. "Illuminating the dark side of the human transcriptome with long read transcript sequencing." BMC Genomics 21, no. 1 (October 30, 2020). http://dx.doi.org/10.1186/s12864-020-07123-7.

Full text
Abstract:
Abstract Background The human transcriptome annotation is regarded as one of the most complete of any eukaryotic species. However, limitations in sequencing technologies have biased the annotation toward multi-exonic protein coding genes. Accurate high-throughput long read transcript sequencing can now provide additional evidence for rare transcripts and genes such as mono-exonic and non-coding genes that were previously either undetectable or impossible to differentiate from sequencing noise. Results We developed the Transcriptome Annotation by Modular Algorithms (TAMA) software to leverage the power of long read transcript sequencing and address the issues with current data processing pipelines. TAMA achieved high sensitivity and precision for gene and transcript model predictions in both reference guided and unguided approaches in our benchmark tests using simulated Pacific Biosciences (PacBio) and Nanopore sequencing data and real PacBio datasets. By analyzing PacBio Sequel II Iso-Seq sequencing data of the Universal Human Reference RNA (UHRR) using TAMA and other commonly used tools, we found that the convention of using alignment identity to measure error correction performance does not reflect actual gain in accuracy of predicted transcript models. In addition, inter-read error correction can cause major changes to read mapping, resulting in potentially over 6 K erroneous gene model predictions in the Iso-Seq based human genome annotation. Using TAMA’s genome assembly based error correction and gene feature evidence, we predicted 2566 putative novel non-coding genes and 1557 putative novel protein coding gene models. Conclusions Long read transcript sequencing data has the power to identify novel genes within the highly annotated human genome. The use of parameter tuning and extensive output information of the TAMA software package allows for in depth exploration of eukaryotic transcriptomes. We have found long read data based evidence for thousands of unannotated genes within the human genome. More development in sequencing library preparation and data processing are required for differentiating sequencing noise from real genes in long read RNA sequencing data.
APA, Harvard, Vancouver, ISO, and other styles
25

Chan, Abigail Hui En, Kittipong Chaisiri, Sompob Saralamba, Serge Morand, and Urusa Thaenkham. "Assessing the suitability of mitochondrial and nuclear DNA genetic markers for molecular systematics and species identification of helminths." Parasites & Vectors 14, no. 1 (May 1, 2021). http://dx.doi.org/10.1186/s13071-021-04737-y.

Full text
Abstract:
Abstract Background Genetic markers are employed widely in molecular studies, and their utility depends on the degree of sequence variation, which dictates the type of application for which they are suited. Consequently, the suitability of a genetic marker for any specific application is complicated by its properties and usage across studies. To provide a yardstick for future users, in this study we assess the suitability of genetic markers for molecular systematics and species identification in helminths and provide an estimate of the cut-off genetic distances per taxonomic level. Methods We assessed four classes of genetic markers, namely nuclear ribosomal internal transcribed spacers, nuclear rRNA, mitochondrial rRNA and mitochondrial protein-coding genes, based on certain properties that are important for species identification and molecular systematics. For molecular identification, these properties are inter-species sequence variation; length of reference sequences; easy alignment of sequences; and easy to design universal primers. For molecular systematics, the properties are: average genetic distance from order/suborder to species level; the number of monophyletic clades at the order/suborder level; length of reference sequences; easy alignment of sequences; easy to design universal primers; and absence of nucleotide substitution saturation. Estimation of the cut-off genetic distances was performed using the ‘K-means’ clustering algorithm. Results The nuclear rRNA genes exhibited the lowest sequence variation, whereas the mitochondrial genes exhibited relatively higher variation across the three groups of helminths. Also, the nuclear and mitochondrial rRNA genes were the best possible genetic markers for helminth molecular systematics, whereas the mitochondrial protein-coding and rRNA genes were suitable for molecular identification. We also revealed that a general gauge of genetic distances might not be adequate, using evidence from the wide range of genetic distances among nematodes. Conclusion This study assessed the suitability of DNA genetic markers for application in molecular systematics and molecular identification of helminths. We provide a novel way of analyzing genetic distances to generate suitable cut-off values for each taxonomic level using the ‘K-means’ clustering algorithm. The estimated cut-off genetic distance values, together with the summary of the utility and limitations of each class of genetic markers, are useful information that can benefit researchers conducting molecular studies on helminths.
APA, Harvard, Vancouver, ISO, and other styles
26

Pham, Gina M., John P. Hamilton, Joshua C. Wood, Joseph T. Burke, Hainan Zhao, Brieanne Vaillancourt, Shujun Ou, Jiming Jiang, and C. Robin Buell. "Construction of a chromosome-scale long-read reference genome assembly for potato." GigaScience 9, no. 9 (September 2020). http://dx.doi.org/10.1093/gigascience/giaa100.

Full text
Abstract:
Abstract Background Worldwide, the cultivated potato, Solanum tuberosum L., is the No. 1 vegetable crop and a critical food security crop. The genome sequence of DM1–3 516 R44, a doubled monoploid clone of S. tuberosum Group Phureja, was published in 2011 using a whole-genome shotgun sequencing approach with short-read sequence data. Current advanced sequencing technologies now permit generation of near-complete, high-quality chromosome-scale genome assemblies at minimal cost. Findings Here, we present an updated version of the DM1–3 516 R44 genome sequence (v6.1) using Oxford Nanopore Technologies long reads coupled with proximity-by-ligation scaffolding (Hi-C), yielding a chromosome-scale assembly. The new (v6.1) assembly represents 741.6 Mb of sequence (87.8%) of the estimated 844 Mb genome, of which 741.5 Mb is non-gapped with 731.2 Mb anchored to the 12 chromosomes. Use of Oxford Nanopore Technologies full-length complementary DNA sequencing enabled annotation of 32,917 high-confidence protein-coding genes encoding 44,851 gene models that had a significantly improved representation of conserved orthologs compared with the previous annotation. The new assembly has improved contiguity with a 595-fold increase in N50 contig size, 99% reduction in the number of contigs, a 44-fold increase in N50 scaffold size, and an LTR Assembly Index score of 13.56, placing it in the category of reference genome quality. The improved assembly also permitted annotation of the centromeres via alignment to sequencing reads derived from CENH3 nucleosomes. Conclusions Access to advanced sequencing technologies and improved software permitted generation of a high-quality, long-read, chromosome-scale assembly and improved annotation dataset for the reference genotype of potato that will facilitate research aimed at improving agronomic traits and understanding genome evolution.
APA, Harvard, Vancouver, ISO, and other styles
27

Pelia, Ranjit, Suresh Venkateswaran, Jason D. Matthews, Yael Haberman, David J. Cutler, Jeffrey S. Hyams, Lee A. Denson, and Subra Kugathasan. "Profiling non-coding RNA levels with clinical classifiers in pediatric Crohn’s disease." BMC Medical Genomics 14, no. 1 (July 29, 2021). http://dx.doi.org/10.1186/s12920-021-01041-7.

Full text
Abstract:
Abstract Background Crohn’s disease (CD) is a heritable chronic inflammatory disorder. Non-coding RNAs (ncRNAs) play an important role in epigenetic regulation by affecting gene expression, but can also directly affect protein function, thus having a substantial impact on biological processes. We investigated whether non-coding RNAs (ncRNA) at diagnosis are dysregulated during CD at different CD locations and future disease behaviors to determine if ncRNA signatures can serve as an index to outcomes. Methods Using subjects belonging to the RISK cohort, we analyzed ncRNA from the ileal biopsies of 345 CD and 71 non-IBD controls, and ncRNA from rectal biopsies of 329 CD and 61 non-IBD controls. Sequence alignment was done (STAR package) using Human Genome version 38 (hg38) as reference panel. The differential expression (DE) analysis was performed with EdgeR package and DE ncRNAs were identified with a threshold of fold change (FC) > 2 and FDR < 0.05 after multiple test corrections. Results In total, we identified 130 CD specific DE ncRNAs (89 in ileum and 41 in rectum) when compared to non-IBD controls. Similarly, 35 DE ncRNAs were identified between B1 and B2 in ileum, whereas no differences among CD disease behaviors were noticed in rectum. We also found inflammation specific ncRNAs between inflamed and non-inflamed groups in ileal biopsies. Overall, we observed that expression of mir1244-2, mir1244-3, mir1244-4, and RN7SL2 were increased during CD, regardless of disease behavior, location, or inflammatory status. Lastly, we tested ncRNA expression at baseline as potential tool to predict the disease status, disease behaviors and disease inflammation at 3-year follow up. Conclusions We have identified ncRNAs that are specific to disease location, disease behavior, and disease inflammation in CD. Both ileal and rectal specific ncRNA are changing over the course of CD, specifically during the disease progression in the intestinal mucosa. Collectively, our findings show changes in ncRNA during CD and may have a clinical utility in early identification and characterization of disease progression.
APA, Harvard, Vancouver, ISO, and other styles
28

Amador, María de Lourdes Moreno, Julio Masaru Iehisa, Carolina Sousa Martín, and Francisco Barro Losada. "Functional genomic characterization of immunogenic gluten proteins from oat cultivars that differ in toxicity for celiac disease." Proceedings of the Nutrition Society 79, OCE2 (2020). http://dx.doi.org/10.1017/s0029665120004991.

Full text
Abstract:
AbstractIntroduction:Oat human consumption has increased due to its nutritional value and its health benefits. Oat is a rich source of protein that contains high level of minerals, lipids, β-glucan, a mixed-linkage polysaccharide, which forms an important part of oat dietary fiber, and also contains various other phytoconstituents like flavonoids and sterols among others. Different pharmacological activities have been reported on oats like antioxidant, anti-inflammatory, antidiabetic or anticholesterolaemic.The safety of oats in a gluten-free diet has been a topic of debate for several years. Previous studies suggested that oats may induce the immunological response in celiacs and others confirmed the impossibility of consuming oats habitually by its toxicity. Our research group found oat cultivars with different immunotoxic potential against G12 monoclonal antibody that may explain the different clinical responses observed in patients suffering from celiac disease. In this study we have characterized by massive sequencing the transcriptomes of non-toxic and toxic varieties.Materials and Methods:The transcriptomes of both oat varieties were sequenced by Illumina HiSeq™2000. To assemble the contents, criteria of overlap > 40% and similarity > 95% were used. The functional annotations were inferred by similarity to Uniprot reference proteins. The minimum similarity threshold required for annotating a transcript was a BLAST e value minor than 10-10. Uniref90 was used for the selection of annotated proteins.Results:We have found 17 and 11 locus in the non-toxic and toxic varieties, respectively. We selected a set of 239983 reference proteins downloaded from Uniprot belonging to the taxonomic nodes BEP clade. Only proteins representative of Uniref90 clusters were used. The identification of immunotoxic epitopes in the coding sequences were determined by alignment with the T-cell recognized canonicals, encompassing one to three mismatches. We identified a total of 24 epitopes with an average of 2 modifications in the genome of the toxic variety with respect to the non-toxic. The epitope variants DQ2.5-ave-1β and DQ2.5-glia-α3 were the most repeated.Discussion:The presence of epitopes in the toxic oat variety that are not present in non-toxic variety could be related with the immunotoxic potential found in our previous assays and also with the different clinical responses in celiacs consuming oats. Preliminary results suggest that a depth study based on searching epitopes found in toxic oat variety could help to the identification of real oat varieties available for celiac patients, and therefore, their incorporation in improvement programs to obtain commercial lines without toxicity.
APA, Harvard, Vancouver, ISO, and other styles
29

Hertzman, Rebecca J., Pooja Deshpande, Shay Leary, Yueran Li, Ramesh Ram, Abha Chopra, Don Cooper, et al. "Visual Genomics Analysis Studio as a Tool to Analyze Multiomic Data." Frontiers in Genetics 12 (June 17, 2021). http://dx.doi.org/10.3389/fgene.2021.642012.

Full text
Abstract:
Type B adverse drug reactions (ADRs) are iatrogenic immune-mediated syndromes with mechanistic etiologies that remain incompletely understood. Some of the most severe ADRs, including delayed drug hypersensitivity reactions, are T-cell mediated, restricted by specific human leukocyte antigen risk alleles and sometimes by public or oligoclonal T-cell receptors (TCRs), central to the immunopathogenesis of tissue-damaging response. However, the specific cellular signatures of effector, regulatory, and accessory immune populations that mediate disease, define reaction phenotype, and determine severity have not been defined. Recent development of single-cell platforms bringing together advances in genomics and immunology provides the tools to simultaneously examine the full transcriptome, TCRs, and surface protein markers of highly heterogeneous immune cell populations at the site of the pathological response at a single-cell level. However, the requirement for advanced bioinformatics expertise and computational hardware and software has often limited the ability of investigators with the understanding of diseases and biological models to exploit these new approaches. Here we describe the features and use of a state-of-the-art, fully integrated application for analysis and visualization of multiomic single-cell data called Visual Genomics Analysis Studio (VGAS). This unique user-friendly, Windows-based graphical user interface is specifically designed to enable investigators to interrogate their own data. While VGAS also includes tools for sequence alignment and identification of associations with host or organism genetic polymorphisms, in this review we focus on its application for analysis of single-cell TCR–RNA–Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE)-seq, enabling holistic cellular characterization by unbiased transcriptome and select surface proteome. Critically, VGAS does not require user-directed coding or access to high-performance computers, instead incorporating performance-optimized hidden code to provide application-based fast and intuitive tools for data analyses and production of high-resolution publication-ready graphics on standard specification laptops. Specifically, it allows analyses of comprehensive single-cell TCR sequencing (scTCR-seq) data, detailing (i) functional pairings of α–β heterodimer TCRs, (ii) one-click histograms to display entropy and gene rearrangements, and (iii) Circos and Sankey plots to visualize clonality and dominance. For unbiased single-cell RNA sequencing (scRNA-seq) analyses, users extract cell transcriptome signatures according to global structure via principal component analysis, t-distributed stochastic neighborhood embedding, or uniform manifold approximation and projection plots, with overlay of scTCR-seq enabling identification and selection of the immunodominant TCR-expressing populations. Further integration with similar sequence-based detection of surface protein markers using oligo-labeled antibodies (CITE-seq) provides comparative understanding of surface protein expression, with differential gene or protein analyses visualized using volcano plot or heatmap functions. These data can be compared to reference cell atlases or suitable controls to reveal discrete disease-specific subsets, from epithelial to tissue-resident memory T-cells, and activation status, from senescence through exhaustion, with more finite transcript expression displayed as violin and box plots. Importantly, guided tutorial videos are available, as are regular application updates based on the latest advances in bioinformatics and user feedback.
APA, Harvard, Vancouver, ISO, and other styles
30

Lücking, Robert, Miko Nadel, Elena Araujo, and Alice Gerlach. "Two decades of DNA barcoding in the genus Usnea (Parmeliaceae): how useful and reliable is the ITS?" Plant and Fungal Systematics, December 29, 2020, 303–57. http://dx.doi.org/10.35535/pfsyst-2020-0025.

Full text
Abstract:
We present an exhaustive analysis of the ITS barcoding marker in the genus Usnea s.lat., separated into Dolichousnea, Eumitria, and Usnea including the subgenus Neuropogon, analyzing 1,751 accessions. We found only a few low-quality accessions, whereas information on voucher specimens and accuracy and precision of identifications was of subpar quality for many accessions. We provide an updated voucher table, alignment and phylogenetic tree to facilitate DNA barcoding of Usnea, either locally or through curated databases such as UNITE. Taxonomic and geographic coverage was moderate: while Dolichousnea and subgenus Neuropogon were well-represented among ITS data, sampling for Eumitria and Usnea s.str. was sparse and biased towards certain lineages and geographic regions, such as Antarctica, Europe, and South America. North America, Africa, Asia and Oceania were undersampled. A peculiar situation arose with New Zealand, represented by a large amount of ITS accessions from across both major islands, but most of them left unidentified. The species pair Usnea antarctica vs. U. aurantiacoatra was the most sampled clade, including numerous ITS accessions from taxonomic and ecological studies. However, published analyses of highly resolved microsatellite and RADseq markers showed that ITS was not able to properly resolve the two species present in this complex. While lack of resolution appears to be an issue with ITS in recently evolving species complexes, we did not find evidence for gene duplication (paralogs) or hybridization for this marker. Comparison with other markers demonstrated that particularly IGS and RPB1 are useful to complement ITS-based phylogenies. Both IGS and RPB1 provided better backbone resolution and support than ITS; while IGS also showed better resolution and support at species level, RPB1 was less resolved and delineated for larger species complexes. The nuLSU was of limited use, providing neither resolution nor backbone support. The other three commonly employed protein-coding markers, TUB2, RPB2, and MCM7, showed variable evidence of possible gene duplication and paralog formation, particularly in the MCM7, and these markers should be used with care, especially in multimarker coalescence approaches. A substantial challenge was provided by difficult morphospecies that did not form coherent clades with ITS or other markers, suggesting various levels of cryptic speciation, the most notorious example being the U. cornuta complex. In these cases, the available data suggest that multimarker approaches using ITS, IGS and RPB1 help to assess distinct lineages. Overall, ITS was found to be a good first approximation to assess species delimitation and recognition in Usnea s.lat., as long as the data are carefully analyzed, and reference sequences are critically assessed and not taken at face value. In difficult groups, we recommend IGS as a secondary barcode marker, with the option to employ more resource-intensive approaches, such as RADseq, in species complexes involving so-called species pairs or other cases of disparate morphology not reflected in the ITS or IGS. Attempts should be made to close taxonomic and geographic gaps especially for the latter two markers, in particular in Eumitria and Usnea s.str. and in the highly diverse areas of North America and Central America, Africa, Asia, and Oceania.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography