To see the other types of publications on this topic, follow the link: Proteins Bioinformatics. Computational biology.

Journal articles on the topic 'Proteins Bioinformatics. Computational biology'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Proteins Bioinformatics. Computational biology.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

G. Hawley, Robert, Yuzhong Chen, Irene Riz, and Chen Zeng. "An Integrated Bioinformatics and Computational Biology Approach Identifies New BH3-Only Protein Candidates." Open Biology Journal 5, no. 1 (2012): 6–16. http://dx.doi.org/10.2174/1874196701205010006.

Full text
Abstract:
In this study, we utilized an integrated bioinformatics and computational biology approach in search of new BH3-only proteins belonging to the BCL2 family of apoptotic regulators. The BH3 (BCL2 homology 3) domain mediates specific binding interactions among various BCL2 family members. It is composed of an amphipathic α-helical region of approximately 13 residues that has only a few amino acids that are highly conserved across all members. Using a generalized motif, we performed a genome-wide search for novel BH3-containing proteins in the NCBI Consensus Coding Sequence (CCDS) database. In addition to known pro-apoptotic BH3-only proteins, 197 proteins were recovered that satisfied the search criteria. These were categorized according to α-helical content and predictive binding to BCL-xL (encoded by BCL2L1) and MCL-1, two representative anti-apoptotic BCL2 family members, using position-specific scoring matrix models. Notably, the list is enriched for proteins associated with autophagy as well as a broad spectrum of cellular stress responses such as endoplasmic reticulum stress, oxidative stress, antiviral defense, and the DNA damage response. Several potential novel BH3-containing proteins are highlighted. In particular, the analysis strongly suggests that the apoptosis inhibitor and DNA damage response regulator, AVEN, which was originally isolated as a BCL-xLinteracting protein, is a functional BH3-only protein representing a distinct subclass of BCL2 family members.
APA, Harvard, Vancouver, ISO, and other styles
2

Mih, Nathan, Elizabeth Brunk, Ke Chen, et al. "ssbio: a Python framework for structural systems biology." Bioinformatics 34, no. 12 (2018): 2155–57. http://dx.doi.org/10.1093/bioinformatics/bty077.

Full text
Abstract:
Abstract Summary Working with protein structures at the genome-scale has been challenging in a variety of ways. Here, we present ssbio, a Python package that provides a framework to easily work with structural information in the context of genome-scale network reconstructions, which can contain thousands of individual proteins. The ssbio package provides an automated pipeline to construct high quality genome-scale models with protein structures (GEM-PROs), wrappers to popular third-party programs to compute associated protein properties, and methods to visualize and annotate structures directly in Jupyter notebooks, thus lowering the barrier of linking 3D structural data with established systems workflows. Availability and implementation ssbio is implemented in Python and available to download under the MIT license at http://github.com/SBRG/ssbio. Documentation and Jupyter notebook tutorials are available at http://ssbio.readthedocs.io/en/latest/. Interactive notebooks can be launched using Binder at https://mybinder.org/v2/gh/SBRG/ssbio/master?filepath=Binder.ipynb. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
3

Cameron, J. M., T. Hurd, and B. H. Robinson. "Computational identification of human mitochondrial proteins based on homology to yeast mitochondrially targeted proteins." Bioinformatics 21, no. 9 (2005): 1825–30. http://dx.doi.org/10.1093/bioinformatics/bti280.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Gabaldón, Toni. "Computational approaches for the prediction of protein function in the mitochondrion." American Journal of Physiology-Cell Physiology 291, no. 6 (2006): C1121—C1128. http://dx.doi.org/10.1152/ajpcell.00225.2006.

Full text
Abstract:
Understanding a complex biological system, such as the mitochondrion, requires the identification of the complete repertoire of proteins targeted to the organelle, the characterization of these, and finally, the elucidation of the functional and physical interactions that occur within the mitochondrion. In the last decade, significant developments have contributed to increase our understanding of the mitochondrion, and among these, computational research has played a significant role. Not only general bioinformatics tools have been applied in the context of the mitochondrion, but also some computational techniques have been specifically developed to address problems that arose from within the mitochondrial research field. In this review the contribution of bioinformatics to mitochondrial biology is addressed through a survey of current computational methods that can be applied to predict which proteins will be localized to the mitochondrion and to unravel their functional interactions.
APA, Harvard, Vancouver, ISO, and other styles
5

Likić, Vladimir A., Malcolm J. McConville, Trevor Lithgow, and Antony Bacic. "Systems Biology: The Next Frontier for Bioinformatics." Advances in Bioinformatics 2010 (February 9, 2010): 1–10. http://dx.doi.org/10.1155/2010/268925.

Full text
Abstract:
Biochemical systems biology augments more traditional disciplines, such as genomics, biochemistry and molecular biology, by championing (i) mathematical and computational modeling; (ii) the application of traditional engineering practices in the analysis of biochemical systems; and in the past decade increasingly (iii) the use of near-comprehensive data sets derived from ‘omics platform technologies, in particular “downstream” technologies relative to genome sequencing, including transcriptomics, proteomics and metabolomics. The future progress in understanding biological principles will increasingly depend on the development of temporal and spatial analytical techniques that will provide high-resolution data for systems analyses. To date, particularly successful were strategies involving (a) quantitative measurements of cellular components at the mRNA, protein and metabolite levels, as well as in vivo metabolic reaction rates, (b) development of mathematical models that integrate biochemical knowledge with the information generated by high-throughput experiments, and (c) applications to microbial organisms. The inevitable role bioinformatics plays in modern systems biology puts mathematical and computational sciences as an equal partner to analytical and experimental biology. Furthermore, mathematical and computational models are expected to become increasingly prevalent representations of our knowledge about specific biochemical systems.
APA, Harvard, Vancouver, ISO, and other styles
6

PERES LOPES, GRAZIELA MIÊ, and SANDRO JOSÉ DE SOUZA. "DISSECTING THE HUMAN SPLICEOSOME THROUGH BIOINFORMATICS AND PROTEOMICS APPROACHES." Journal of Bioinformatics and Computational Biology 01, no. 04 (2004): 743–50. http://dx.doi.org/10.1142/s0219720004000405.

Full text
Abstract:
The precise excision of introns from mRNAs is executed by the spliceosome, a cellular machinery composed by five small nuclear RNAs and hundreds of proteins. In the last few years, several groups have used proteomics and computational biology tools to characterize the components of the human spliceosome. These reports have identified basically all known splicing factors and several new proteins. The composition of the human spliceosome confirms the link between splicing and other steps in gene expression. Here we comment on these reports and discuss the perspectives for the coming years.
APA, Harvard, Vancouver, ISO, and other styles
7

Segura, Joan, Ruben Sanchez-Garcia, C. O. S. Sorzano, and J. M. Carazo. "3DBIONOTES v3.0: crossing molecular and structural biology data with genomic variations." Bioinformatics 35, no. 18 (2019): 3512–13. http://dx.doi.org/10.1093/bioinformatics/btz118.

Full text
Abstract:
Abstract Motivation Many diseases are associated to single nucleotide polymorphisms that affect critical regions of proteins as binding sites or post translational modifications. Therefore, analysing genomic variants with structural and molecular biology data is a powerful framework in order to elucidate the potential causes of such diseases. Results A new version of our web framework 3DBIONOTES is presented. This version offers new tools to analyse and visualize protein annotations and genomic variants, including a contingency analysis of variants and amino acid features by means of a Fisher exact test, the integration of a gene annotation viewer to highlight protein features on gene sequences and a protein–protein interaction viewer to display protein annotations at network level. Availability and implementation The web server is available at https://3dbionotes.cnb.csic.es Supplementary information Supplementary data are available at Bioinformatics online. Contact Spanish National Institute for Bioinformatics (INB ELIXIR-ES) and Biocomputing Unit, National Centre of Biotechnology (CSIC)/Instruct Image Processing Centre, C/ Darwin nº 3, Campus of Cantoblanco, 28049 Madrid, Spain.
APA, Harvard, Vancouver, ISO, and other styles
8

Simoncini, David, Kam Y. J. Zhang, Thomas Schiex, and Sophie Barbe. "A structural homology approach for computational protein design with flexible backbone." Bioinformatics 35, no. 14 (2018): 2418–26. http://dx.doi.org/10.1093/bioinformatics/bty975.

Full text
Abstract:
Abstract Motivation Structure-based Computational Protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. Energy functions remain however imperfect and injecting relevant information from known structures in the design process should lead to improved designs. Results We introduce Shades, a data-driven CPD method that exploits local structural environments in known protein structures together with energy to guide sequence design, while sampling side-chain and backbone conformations to accommodate mutations. Shades (Structural Homology Algorithm for protein DESign), is based on customized libraries of non-contiguous in-contact amino acid residue motifs. We have tested Shades on a public benchmark of 40 proteins selected from different protein families. When excluding homologous proteins, Shades achieved a protein sequence recovery of 30% and a protein sequence similarity of 46% on average, compared with the PFAM protein family of the target protein. When homologous structures were added, the wild-type sequence recovery rate achieved 93%. Availability and implementation Shades source code is available at https://bitbucket.org/satsumaimo/shades as a patch for Rosetta 3.8 with a curated protein structure database and ITEM library creation software. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
9

Bensmail, Halima, and Abdelali Haoudi. "Postgenomics: Proteomics and Bioinformatics in Cancer Research." Journal of Biomedicine and Biotechnology 2003, no. 4 (2003): 217–30. http://dx.doi.org/10.1155/s1110724303209207.

Full text
Abstract:
Now that the human genome is completed, the characterization of the proteins encoded by the sequence remains a challenging task. The study of the complete protein complement of the genome, the “proteome,” referred to as proteomics, will be essential if new therapeutic drugs and new disease biomarkers for early diagnosis are to be developed. Research efforts are already underway to develop the technology necessary to compare the specific protein profiles of diseased versus nondiseased states. These technologies provide a wealth of information and rapidly generate large quantities of data. Processing the large amounts of data will lead to useful predictive mathematical descriptions of biological systems which will permit rapid identification of novel therapeutic targets and identification of metabolic disorders. Here, we present an overview of the current status and future research approaches in defining the cancer cell's proteome in combination with different bioinformatics and computational biology tools toward a better understanding of health and disease.
APA, Harvard, Vancouver, ISO, and other styles
10

Orlando, Gabriele, Daniele Raimondi, Francesco Tabaro, Francesco Codicè, Yves Moreau, and Wim F. Vranken. "Computational identification of prion-like RNA-binding proteins that form liquid phase-separated condensates." Bioinformatics 35, no. 22 (2019): 4617–23. http://dx.doi.org/10.1093/bioinformatics/btz274.

Full text
Abstract:
Abstract Motivation Eukaryotic cells contain different membrane-delimited compartments, which are crucial for the biochemical reactions necessary to sustain cell life. Recent studies showed that cells can also trigger the formation of membraneless organelles composed by phase-separated proteins to respond to various stimuli. These condensates provide new ways to control the reactions and phase-separation proteins (PSPs) are thus revolutionizing how cellular organization is conceived. The small number of experimentally validated proteins, and the difficulty in discovering them, remain bottlenecks in PSPs research. Results Here we present PSPer, the first in-silico screening tool for prion-like RNA-binding PSPs. We show that it can prioritize PSPs among proteins containing similar RNA-binding domains, intrinsically disordered regions and prions. PSPer is thus suitable to screen proteomes, identifying the most likely PSPs for further experimental investigation. Moreover, its predictions are fully interpretable in the sense that it assigns specific functional regions to the predicted proteins, providing valuable information for experimental investigation of targeted mutations on these regions. Finally, we show that it can estimate the ability of artificially designed proteins to form condensates (r=−0.87), thus providing an in-silico screening tool for protein design experiments. Availability and implementation PSPer is available at bio2byte.com/psp. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
11

Morse, Thomas M. "Article Commentary: Neuroinformatics: From Bioinformatics to Databasing the Brain." Bioinformatics and Biology Insights 2 (January 2008): BBI.S540. http://dx.doi.org/10.4137/bbi.s540.

Full text
Abstract:
Neuroinformatics seeks to create and maintain web-accessible databases of experimental and computational data, together with innovative software tools, essential for understanding the nervous system in its normal function and in neurological disorders. Neuroinformatics includes traditional bioinformatics of gene and protein sequences in the brain; atlases of brain anatomy and localization of genes and proteins; imaging of brain cells; brain imaging by positron emission tomography (PET), functional magnetic resonance imaging (fMRI), electroencephalography (EEG), magnetoencephalography (MEG) and other methods; many electrophysiological recording methods; and clinical neurological data, among others. Building neuroinformatics databases and tools presents difficult challenges because they span a wide range of spatial scales and types of data stored and analyzed. Traditional bioinformatics, by comparison, focuses primarily on genomic and proteomic data (which of course also presents difficult challenges). Much of bioinformatics analysis focus on sequences (DNA, RNA, and protein molecules), as the type of data that are stored, compared, and sometimes modeled. Bioinformatics is undergoing explosive growth with the addition, for example, of databases that catalog interactions between proteins, of databases that track the evolution of genes, and of systems biology databases which contain models of all aspects of organisms. This commentary briefly reviews neuroinformatics with clarification of its relationship to traditional and modern bioinformatics.
APA, Harvard, Vancouver, ISO, and other styles
12

Collins, Kodi, and Tandy Warnow. "PASTA for proteins." Bioinformatics 34, no. 22 (2018): 3939–41. http://dx.doi.org/10.1093/bioinformatics/bty495.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Liu, Zhi-Ping. "Predicting lncRNA-protein Interactions by Machine Learning Methods: A Review." Current Bioinformatics 15, no. 8 (2021): 831–40. http://dx.doi.org/10.2174/1574893615666200224095925.

Full text
Abstract:
In this work, a review of predicting lncRNA-protein interactions by bioinformatics methods is provided with a focus on machine learning. Firstly, a computational framework for predicting lncRNA-protein interactions is presented. Then, the currently available data resources for the predictions have been listed. The existing methods will be reviewed by introducing their crucial steps in the prediction framework. The key functions of lncRNA, e.g., mediator on transcriptional regulation, are often involved in interacting with proteins. The interactions with proteins provide a tunnel of leveraging the molecular cooperativity for fulfilling crucial functions. Thus, the important directions in bioinformatics have been highlighted for identifying essential lncRNA-protein interactions and deciphering the dysfunctional importance of lncRNA, especially in carcinogenesis.
APA, Harvard, Vancouver, ISO, and other styles
14

Pagès, Guillaume, and Sergei Grudinin. "DeepSymmetry: using 3D convolutional networks for identification of tandem repeats and internal symmetries in protein structures." Bioinformatics 35, no. 24 (2019): 5113–20. http://dx.doi.org/10.1093/bioinformatics/btz454.

Full text
Abstract:
Abstract Motivation Thanks to the recent advances in structural biology, nowadays 3D structures of various proteins are solved on a routine basis. A large portion of these structures contain structural repetitions or internal symmetries. To understand the evolution mechanisms of these proteins and how structural repetitions affect the protein function, we need to be able to detect such proteins very robustly. As deep learning is particularly suited to deal with spatially organized data, we applied it to the detection of proteins with structural repetitions. Results We present DeepSymmetry, a versatile method based on 3D convolutional networks that detects structural repetitions in proteins and their density maps. Our method is designed to identify tandem repeat proteins, proteins with internal symmetries, symmetries in the raw density maps, their symmetry order and also the corresponding symmetry axes. Detection of symmetry axes is based on learning 6D Veronese mappings of 3D vectors, and the median angular error of axis determination is less than one degree. We demonstrate the capabilities of our method on benchmarks with tandem-repeated proteins and also with symmetrical assemblies. For example, we have discovered about 7800 putative tandem repeat proteins in the PDB. Availability and implementation The method is available at https://team.inria.fr/nano-d/software/deepsymmetry. It consists of a C++ executable that transforms molecular structures into volumetric density maps, and a Python code based on the TensorFlow framework for applying the DeepSymmetry model to these maps. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
15

Miotto, Mattia, Pier Paolo Olimpieri, Lorenzo Di Rienzo, et al. "Insights on protein thermal stability: a graph representation of molecular interactions." Bioinformatics 35, no. 15 (2018): 2569–77. http://dx.doi.org/10.1093/bioinformatics/bty1011.

Full text
Abstract:
Abstract Motivation Understanding the molecular mechanisms of thermal stability is a challenge in protein biology. Indeed, knowing the temperature at which proteins are stable has important theoretical implications, which are intimately linked with properties of the native fold, and a wide range of potential applications from drug design to the optimization of enzyme activity. Results Here, we present a novel graph-theoretical framework to assess thermal stability based on the structure without any a priori information. In this approach we describe proteins as energy-weighted graphs and compare them using ensembles of interaction networks. Investigating the position of specific interactions within the 3D native structure, we developed a parameter-free network descriptor that permits to distinguish thermostable and mesostable proteins with an accuracy of 76% and area under the receiver operating characteristic curve of 78%. Availability and implementation Code is available upon request to edoardo.milanetti@uniroma1.it Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
16

Cui, Juan, Qi Liu, David Puett, and Ying Xu. "Computational prediction of human proteins that can be secreted into the bloodstream." Bioinformatics 24, no. 20 (2008): 2370–75. http://dx.doi.org/10.1093/bioinformatics/btn418.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Goncearenco, Alexander, and Igor N. Berezovsky. "Computational reconstruction of primordial prototypes of elementary functional loops in modern proteins." Bioinformatics 27, no. 17 (2011): 2368–75. http://dx.doi.org/10.1093/bioinformatics/btr396.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Zhu, Q., Y. Deng, P. Vanka, S. J. Brown, S. Muthukrishnan, and K. J. Kramer. "Computational identification of novel chitinase-like proteins in the Drosophila melanogaster genome." Bioinformatics 20, no. 2 (2004): 161–69. http://dx.doi.org/10.1093/bioinformatics/bth020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Aszói, A., and W. R. Taylor. "Connection topology of proteins." Bioinformatics 9, no. 5 (1993): 523–29. http://dx.doi.org/10.1093/bioinformatics/9.5.523.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Ramakrishnan, Reshmi, Bert Houben, Frederic Rousseau, and Joost Schymkowitz. "Differential proteostatic regulation of insoluble and abundant proteins." Bioinformatics 35, no. 20 (2019): 4098–107. http://dx.doi.org/10.1093/bioinformatics/btz214.

Full text
Abstract:
Abstract Motivation Despite intense effort, it has been difficult to explain chaperone dependencies of proteins from sequence or structural properties. Results We constructed a database collecting all publicly available data of experimental chaperone interaction and dependency data for the Escherichia coli proteome, and enriched it with an extensive set of protein-specific as well as cell-context-dependent proteostatic parameters. Employing this new resource, we performed a comprehensive meta-analysis of the key determinants of chaperone interaction. Our study confirms that GroEL client proteins are biased toward insoluble proteins of low abundance, but for client proteins of the Trigger Factor/DnaK axis, we instead find that cellular parameters such as high protein abundance, translational efficiency and mRNA turnover are key determinants. We experimentally confirmed the finding that chaperone dependence is a function of translation rate and not protein-intrinsic parameters by tuning chaperone dependence of Green Fluorescent Protein (GFP) in E.coli by synonymous mutations only. The juxtaposition of both protein-intrinsic and cell-contextual chaperone triage mechanisms explains how the E.coli proteome achieves combining reliable production of abundant and conserved proteins, while also enabling the evolution of diverging metabolic functions. Availability and implementation The database will be made available via http://phdb.switchlab.org. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
21

Iqbal, Muhammad Javed, Ibrahima Faye, Brahim Belhaouari Samir, and Abas Md Said. "Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics." Scientific World Journal 2014 (2014): 1–12. http://dx.doi.org/10.1155/2014/173869.

Full text
Abstract:
Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth.
APA, Harvard, Vancouver, ISO, and other styles
22

Dinkel, Holger, and Heinrich Sticht. "A computational strategy for the prediction of functional linear peptide motifs in proteins." Bioinformatics 23, no. 24 (2007): 3297–303. http://dx.doi.org/10.1093/bioinformatics/btm524.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Jain, Aashish, and Daisuke Kihara. "Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences." Bioinformatics 35, no. 5 (2018): 753–59. http://dx.doi.org/10.1093/bioinformatics/bty704.

Full text
Abstract:
Abstract Motivation Function annotation of proteins is fundamental in contemporary biology across fields including genomics, molecular biology, biochemistry, systems biology and bioinformatics. Function prediction is indispensable in providing clues for interpreting omics-scale data as well as in assisting biologists to build hypotheses for designing experiments. As sequencing genomes is now routine due to the rapid advancement of sequencing technologies, computational protein function prediction methods have become increasingly important. A conventional method of annotating a protein sequence is to transfer functions from top hits of a homology search; however, this approach has substantial short comings including a low coverage in genome annotation. Results Here we have developed Phylo-PFP, a new sequence-based protein function prediction method, which mines functional information from a broad range of similar sequences, including those with a low sequence similarity identified by a PSI-BLAST search. To evaluate functional similarity between identified sequences and the query protein more accurately, Phylo-PFP reranks retrieved sequences by considering their phylogenetic distance. Compared to the Phylo-PFP’s predecessor, PFP, which was among the top ranked methods in the second round of the Critical Assessment of Functional Annotation (CAFA2), Phylo-PFP demonstrated substantial improvement in prediction accuracy. Phylo-PFP was further shown to outperform prediction programs to date that were ranked top in CAFA2. Availability and implementation Phylo-PFP web server is available for at http://kiharalab.org/phylo_pfp.php. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
24

Sverud, O., and R. M. MacCallum. "Towards optimal views of proteins." Bioinformatics 19, no. 7 (2003): 882–88. http://dx.doi.org/10.1093/bioinformatics/btg100.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

SARAI, AKINORI, JORG SIEBERS, SAMUEL SELVARAJ, M. MICHAEL GROMIHA, and HIDETOSHI KONO. "INTEGRATION OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY TO UNDERSTAND PROTEIN-DNA RECOGNITION MECHANISM." Journal of Bioinformatics and Computational Biology 03, no. 01 (2005): 169–83. http://dx.doi.org/10.1142/s0219720005000965.

Full text
Abstract:
Transcription factors play essential role in the gene regulation in higher organisms, binding to multiple target sequences and regulating multiple genes in a complex manner. In order to decipher the mechanism of gene regulation, it is important to understand the molecular mechanism of protein-DNA recognition. Here we describe a strategy to approach this problem, using various methods in bioinformatics and computational biology. We have used a knowledge-based approach, utilizing rapidly increasing structural data of protein-DNA complexes, to derive empirical potential functions for the specific interactions between bases and amino acids as well as for DNA conformation, from the statistical analyses on the structural data. Then these statistical potentials are used to quantify the specificity of protein-DNA recognition. The quantification of specificity has enabled us to establish the structure-function analysis of transcription factors, such as the effects of binding cooperativity on target recognition. The method is also applied to real genome sequences, predicting potential target sites. We are also using computer simulations of protein-DNA interactions and DNA conformation in order to complement the empirical method. The integration of these approaches together will provide deeper insight into the mechanism of protein-DNA recognition and improve the target prediction of transcription factors.
APA, Harvard, Vancouver, ISO, and other styles
26

Orlov, Yuriy L., Ancha V. Baranova, and Tatiana V. Tatarinova. "Bioinformatics Methods in Medical Genetics and Genomics." International Journal of Molecular Sciences 21, no. 17 (2020): 6224. http://dx.doi.org/10.3390/ijms21176224.

Full text
Abstract:
Medical genomics relies on next-gen sequencing methods to decipher underlying molecular mechanisms of gene expression. This special issue collects materials originally presented at the “Centenary of Human Population Genetics” Conference-2019, in Moscow. Here we present some recent developments in computational methods tested on actual medical genetics problems dissected through genomics, transcriptomics and proteomics data analysis, gene networks, protein–protein interactions and biomedical literature mining. We have selected materials based on systems biology approaches, database mining. These methods and algorithms were discussed at the Digital Medical Forum-2019, organized by I.M. Sechenov First Moscow State Medical University presenting bioinformatics approaches for the drug targets discovery in cancer, its computational support, and digitalization of medical research, as well as at “Systems Biology and Bioinformatics”-2019 (SBB-2019) Young Scientists School in Novosibirsk, Russia. Selected recent advancements discussed at these events in the medical genomics and genetics areas are based on novel bioinformatics tools.
APA, Harvard, Vancouver, ISO, and other styles
27

Pruess, Manuela, and Rolf Apweiler. "Bioinformatics Resources for In Silico Proteome Analysis." Journal of Biomedicine and Biotechnology 2003, no. 4 (2003): 231–36. http://dx.doi.org/10.1155/s1110724303209219.

Full text
Abstract:
In the growing field of proteomics, tools for the in silico analysis of proteins and even of whole proteomes are of crucial importance to make best use of the accumulating amount of data. To utilise this data for healthcare and drug development, first the characteristics of proteomes of entire species—mainly the human—have to be understood, before secondly differentiation between individuals can be surveyed. Specialised databases about nucleic acid sequences, protein sequences, protein tertiary structure, genome analysis, and proteome analysis represent useful resources for analysis, characterisation, and classification of protein sequences. Different from most proteomics tools focusing on similarity searches, structure analysis and prediction, detection of specific regions, alignments, data mining, 2D PAGE analysis, or protein modelling, respectively, comprehensive databases like the proteome analysis database benefit from the information stored in different databases and make use of different protein analysis tools to provide computational analysis of whole proteomes.
APA, Harvard, Vancouver, ISO, and other styles
28

Khan, Abdul Arif, and Zakir Khan. "COVID-2019-associated overexpressed Prevotella proteins mediated host–pathogen interactions and their role in coronavirus outbreak." Bioinformatics 36, no. 13 (2020): 4065–69. http://dx.doi.org/10.1093/bioinformatics/btaa285.

Full text
Abstract:
Abstract Motivation The outbreak of COVID-2019 initiated at Wuhan, China has become a global threat by rapid transmission and severe fatalities. Recent studies have uncovered whole genome sequence of SARS-CoV-2 (causing COVID-2019). In addition, lung metagenomic studies on infected patients revealed overrepresented Prevotella spp. producing certain proteins in abundance. We performed host–pathogen protein–protein interaction analysis between SARS-CoV-2 and overrepresented Prevotella proteins with human proteome. We also performed functional overrepresentation analysis of interacting proteins to understand their role in COVID-2019 severity. Results It was found that overexpressed Prevotella proteins can promote viral infection. As per the results, Prevotella proteins, but not viral proteins, are involved in multiple interactions with NF-kB, which is involved in increasing clinical severity of COVID-2019. Prevotella may have role in COVID-2019 outbreak and should be given importance for understanding disease mechanisms and improving treatment outcomes. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
29

Tiwari, Arvind Kumar, and Rajeev Srivastava. "A Survey of Computational Intelligence Techniques in Protein Function Prediction." International Journal of Proteomics 2014 (December 11, 2014): 1–22. http://dx.doi.org/10.1155/2014/845479.

Full text
Abstract:
During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction.
APA, Harvard, Vancouver, ISO, and other styles
30

Long, Wei, Yang Yang, and Hong-Bin Shen. "ImPLoc: a multi-instance deep learning model for the prediction of protein subcellular localization based on immunohistochemistry images." Bioinformatics 36, no. 7 (2019): 2244–50. http://dx.doi.org/10.1093/bioinformatics/btz909.

Full text
Abstract:
Abstract Motivation The tissue atlas of the human protein atlas (HPA) houses immunohistochemistry (IHC) images visualizing the protein distribution from the tissue level down to the cell level, which provide an important resource to study human spatial proteome. Especially, the protein subcellular localization patterns revealed by these images are helpful for understanding protein functions, and the differential localization analysis across normal and cancer tissues lead to new cancer biomarkers. However, computational tools for processing images in this database are highly underdeveloped. The recognition of the localization patterns suffers from the variation in image quality and the difficulty in detecting microscopic targets. Results We propose a deep multi-instance multi-label model, ImPLoc, to predict the subcellular locations from IHC images. In this model, we employ a deep convolutional neural network-based feature extractor to represent image features, and design a multi-head self-attention encoder to aggregate multiple feature vectors for subsequent prediction. We construct a benchmark dataset of 1186 proteins including 7855 images from HPA and 6 subcellular locations. The experimental results show that ImPLoc achieves significant enhancement on the prediction accuracy compared with the current computational methods. We further apply ImPLoc to a test set of 889 proteins with images from both normal and cancer tissues, and obtain 8 differentially localized proteins with a significance level of 0.05. Availability and implementation https://github.com/yl2019lw/ImPloc. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
31

Contreras-Moreira, B., and J. Collado-Vides. "Comparative footprinting of DNA-binding proteins." Bioinformatics 22, no. 14 (2006): e74-e80. http://dx.doi.org/10.1093/bioinformatics/btl215.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Bannen, R. M., V. Suresh, G. N. Phillips, S. J. Wright, and J. C. Mitchell. "Optimal design of thermally stable proteins." Bioinformatics 24, no. 20 (2008): 2339–43. http://dx.doi.org/10.1093/bioinformatics/btn450.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Goffard, N., V. Garcia, F. Iragne, A. Groppi, and A. de Daruvar. "IPPRED: server for proteins interactions inference." Bioinformatics 19, no. 7 (2003): 903–4. http://dx.doi.org/10.1093/bioinformatics/btg091.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Arnold, R., T. Rattei, P. Tischler, M. D. Truong, V. Stumpflen, and W. Mewes. "SIMAP--The similarity matrix of proteins." Bioinformatics 21, Suppl 2 (2005): ii42—ii46. http://dx.doi.org/10.1093/bioinformatics/bti1107.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Leluk, J., L. Konieczny, and I. Roterman. "Search for structural similarity in proteins." Bioinformatics 19, no. 1 (2003): 117–24. http://dx.doi.org/10.1093/bioinformatics/19.1.117.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Counsell, Damian. "Workshop—Predicting the Structure of Biological Molecules." Comparative and Functional Genomics 5, no. 6-7 (2004): 480–90. http://dx.doi.org/10.1002/cfg.414.

Full text
Abstract:
This April, in Cambridge (UK), principal investigators from the Mathematical Biology Group of the Medical Research Council's National Institute of Medical Research organized a workshop in structural bioinformatics at the Centre for Mathematical Sciences. Bioinformatics researchers of several nationalities from labs around the country presented and discussed their computational work in biomolecular structure prediction and analysis, and in protein evolution. The meeting was intensive and lively and gave attendees an overview of the healthy state of protein bioinformatics in the UK.
APA, Harvard, Vancouver, ISO, and other styles
37

Dai, Bowen, and Chris Bailey-Kellogg. "Protein interaction interface region prediction by geometric deep learning." Bioinformatics 37, no. 17 (2021): 2580–88. http://dx.doi.org/10.1093/bioinformatics/btab154.

Full text
Abstract:
Abstract Motivation Protein–protein interactions drive wide-ranging molecular processes, and characterizing at the atomic level how proteins interact (beyond just the fact that they interact) can provide key insights into understanding and controlling this machinery. Unfortunately, experimental determination of three-dimensional protein complex structures remains difficult and does not scale to the increasingly large sets of proteins whose interactions are of interest. Computational methods are thus required to meet the demands of large-scale, high-throughput prediction of how proteins interact, but unfortunately, both physical modeling and machine learning methods suffer from poor precision and/or recall. Results In order to improve performance in predicting protein interaction interfaces, we leverage the best properties of both data- and physics-driven methods to develop a unified Geometric Deep Neural Network, ‘PInet’ (Protein Interface Network). PInet consumes pairs of point clouds encoding the structures of two partner proteins, in order to predict their structural regions mediating interaction. To make such predictions, PInet learns and utilizes models capturing both geometrical and physicochemical molecular surface complementarity. In application to a set of benchmarks, PInet simultaneously predicts the interface regions on both interacting proteins, achieving performance equivalent to or even much better than the state-of-the-art predictor for each dataset. Furthermore, since PInet is based on joint segmentation of a representation of a protein surfaces, its predictions are meaningful in terms of the underlying physical complementarity driving molecular recognition. Availability and implementation PInet scripts and models are available at https://github.com/FTD007/PInet. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
38

Zhang, Zheng, Fen Yu, Yuanqiang Zou, et al. "Phage protein receptors have multiple interaction partners and high expressions." Bioinformatics 36, no. 10 (2020): 2975–79. http://dx.doi.org/10.1093/bioinformatics/btaa123.

Full text
Abstract:
Abstract Motivation Receptors on host cells play a critical role in viral infection. How phages select receptors is still unknown. Results Here, we manually curated a high-quality database named phageReceptor, including 427 pairs of phage–host receptor interactions, 341 unique viral species or sub-species and 69 bacterial species. Sugars and proteins were most widely used by phages as receptors. The receptor usage of phages in Gram-positive bacteria was different from that in Gram-negative bacteria. Most protein receptors were located on the outer membrane. The phage protein receptors (PPRs) were highly diverse in their structures, and had little sequence identity and no common protein domain with mammalian virus receptors. Further functional characterization of PPRs in Escherichia coli showed that they had larger node degrees and betweennesses in the protein–protein interaction network, and higher expression levels, than other outer membrane proteins, plasma membrane proteins or other intracellular proteins. These findings were consistent with what observed for mammalian virus receptors reported in previous studies, suggesting that viral protein receptors tend to have multiple interaction partners and high expressions. The study deepens our understanding of virus–host interactions. Availability and implementation phageReceptor is publicly available from: http://www.computationalbiology.cn/phageReceptor/index.html. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
39

Suplatov, D. А., and V. К. Švedas. "Study of Functional and Allosteric Sites in Protein Superfamilies." Acta Naturae 7, no. 4 (2015): 34–45. http://dx.doi.org/10.32607/20758251-2015-7-4-34-45.

Full text
Abstract:
The interaction of proteins (enzymes) with a variety of low-molecular-weight compounds, as well as protein-protein interactions, is the most important factor in the regulation of their functional properties. To date, research effort has routinely focused on studying ligand binding to the functional sites of proteins (active sites of enzymes), whereas the molecular mechanisms of allosteric regulation, as well as binding to other pockets and cavities in protein structures, remained poorly understood. Recent studies have shown that allostery may be an intrinsic property of virtually all proteins. Novel approaches are needed to systematically analyze the architecture and role of various binding sites and establish the relationship between structure, function, and regulation. Computational biology, bioinformatics, and molecular modeling can be used to search for new regulatory centers, characterize their structural peculiarities, as well as compare different pockets in homologous proteins, study the molecular mechanisms of allostery, and understand the communication between topologically independent binding sites in protein structures. The establishment of an evolutionary relationship between different binding centers within protein superfamilies and the discovery of new functional and allosteric (regulatory) sites using computational approaches can improve our understanding of the structure-function relationship in proteins and provide new opportunities for drug design and enzyme engineering.
APA, Harvard, Vancouver, ISO, and other styles
40

Lafita, Aleix, Pengfei Tian, Robert B. Best, and Alex Bateman. "TADOSS: computational estimation of tandem domain swap stability." Bioinformatics 35, no. 14 (2018): 2507–8. http://dx.doi.org/10.1093/bioinformatics/bty974.

Full text
Abstract:
Abstract Summary Proteins with highly similar tandem domains have shown an increased propensity for misfolding and aggregation. Several molecular explanations have been put forward, such as swapping of adjacent domains, but there is a lack of computational tools to systematically analyze them. We present the TAndem DOmain Swap Stability predictor (TADOSS), a method to computationally estimate the stability of tandem domain-swapped conformations from the structures of single domains, based on previous coarse-grained simulation studies. The tool is able to discriminate domains susceptible to domain swapping and to identify structural regions with high propensity to form hinge loops. TADOSS is a scalable method and suitable for large scale analyses. Availability and implementation Source code and documentation are freely available under an MIT license on GitHub at https://github.com/lafita/tadoss. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
41

Khaldi, Nora. "Bioinformatics approaches for identifying new therapeutic bioactive peptides in food." Functional Foods in Health and Disease 2, no. 10 (2012): 325. http://dx.doi.org/10.31989/ffhd.v2i10.80.

Full text
Abstract:
The traditional methods for mining foods for bioactive peptides are tedious and long. Similar to the drug industry, the length of time to identify and deliver a commercial health ingredient that reduces disease symptoms can take anything between 5 to 10 years. Reducing this time and effort is crucial in order to create new commercially viable products with clear and important health benefits. In the past few years, bioinformatics, the science that brings together fast computational biology, and efficient genome mining, is appearing as the long awaited solution to this problem. By quickly mining food genomes for characteristics of certain food therapeutic ingredients, researchers can potentially find new ones in a matter of a few weeks. Yet, surprisingly, very little success has been achieved so far using bioinformatics in mining for food bioactives. The absence of food specific bioinformatic mining tools, the slow integration of both experimental mining and bioinformatics, and the important difference between different experimental platforms are some of the reasons for the slow progress of bioinformatics in the field of functional food and more specifically in bioactive peptide discovery. In this paper I discuss some methods that could be easily translated, using a rational peptide bioinformatics design, to food bioactive peptide mining. I highlight the need for an integrated food peptide database. I also discuss how to better integrate experimental work with bioinformatics in order to improve the mining of food for bioactive peptides, therefore achieving a higher success rates.Keywords: bioactive peptides, bioinformatics, mining food, therapeutic properties, food proteins, functional food.
APA, Harvard, Vancouver, ISO, and other styles
42

Xu, Ying-Ying, Hong-Bin Shen, and Robert F. Murphy. "Learning complex subcellular distribution patterns of proteins via analysis of immunohistochemistry images." Bioinformatics 36, no. 6 (2019): 1908–14. http://dx.doi.org/10.1093/bioinformatics/btz844.

Full text
Abstract:
Abstract Motivation Systematic and comprehensive analysis of protein subcellular location as a critical part of proteomics (‘location proteomics’) has been studied for many years, but annotating protein subcellular locations and understanding variation of the location patterns across various cell types and states is still challenging. Results In this work, we used immunohistochemistry images from the Human Protein Atlas as the source of subcellular location information, and built classification models for the complex protein spatial distribution in normal and cancerous tissues. The models can automatically estimate the fractions of protein in different subcellular locations, and can help to quantify the changes of protein distribution from normal to cancer tissues. In addition, we examined the extent to which different annotated protein pathways and complexes showed similarity in the locations of their member proteins, and then predicted new potential proteins for these networks. Availability and implementation The dataset and code are available at: www.csbio.sjtu.edu.cn/bioinf/complexsubcellularpatterns. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
43

Modi, M., N. G. Jadeja, and K. Zala. "FMFinder: A Functional Module Detector for PPI Networks." Engineering, Technology & Applied Science Research 7, no. 5 (2017): 2022–25. http://dx.doi.org/10.48084/etasr.1347.

Full text
Abstract:
Bioinformatics is an integrated area of data mining, statistics and computational biology. Protein-Protein Interaction (PPI) network is the most important biological process in living beings. In this network a protein module interacts with another module and so on, forming a large network of proteins. The same set of proteins which takes part in the organic courses of biological actions is detected through the Function Module Detection method. Clustering process when applied in PPI networks is made of proteins which are part of a larger communication network. As a result of this, we can define the limits for module detection as well as clarify the construction of a PPI network. For understating the bio-mechanism of various living beings, a detailed study of FMFinder detection by clustering process is called for.
APA, Harvard, Vancouver, ISO, and other styles
44

Rao, Allam Appa, Hanuman Thota, Ramamurthy Adapala, et al. "Proteomic Analysis in Diabetic Cardiomyopathy using Bioinformatics Approach." Bioinformatics and Biology Insights 2 (January 2008): BBI.S313. http://dx.doi.org/10.4137/bbi.s313.

Full text
Abstract:
Diabetic cardiomyopathy is a distinct clinical entity that produces asymptomatic heart failure in diabetic patients without evidence of coronary artery disease and hypertension. Abnormalities in diabetic cardiomyopathy include: myocardial hypertrophy, impairment of contractile proteins, accumulation of extracellular matrix proteins, formation of advanced glycation end products, and decreased left ventricular compliance. These abnormalities lead to the most common clinical presentation of diabetic cardiomyopathy in the form of diastolic dysfunction. We evaluated the role of various proteins that are likely to be involved in diabetic cardiomyopathy by employing multiple sequence alignment using ClustalW tool and constructed a Phylogenetic tree using functional protein sequences extracted from NCBI. Phylogenetic tree was constructed using Neighbour—Joining Algorithm in bioinformatics approach. These results suggest a causal relationship between altered calcium homeostasis and diabetic cardiomyopathy that implies that efforts directed to normalize calcium homeostasis could form a novel therapeutic approach.
APA, Harvard, Vancouver, ISO, and other styles
45

Wang, Kai, Nan Lyu, Hongjuan Diao, et al. "GM-DockZn: a geometry matching-based docking algorithm for zinc proteins." Bioinformatics 36, no. 13 (2020): 4004–11. http://dx.doi.org/10.1093/bioinformatics/btaa292.

Full text
Abstract:
Abstract Motivation Molecular docking is a widely used technique for large-scale virtual screening of the interactions between small-molecule ligands and their target proteins. However, docking methods often perform poorly for metalloproteins due to additional complexity from the three-way interactions among amino-acid residues, metal ions and ligands. This is a significant problem because zinc proteins alone comprise about 10% of all available protein structures in the protein databank. Here, we developed GM-DockZn that is dedicated for ligand docking to zinc proteins. Unlike the existing docking methods developed specifically for zinc proteins, GM-DockZn samples ligand conformations directly using a geometric grid around the ideal zinc-coordination positions of seven discovered coordination motifs, which were found from the survey of known zinc proteins complexed with a single ligand. Results GM-DockZn has the best performance in sampling near-native poses with correct coordination atoms and numbers within the top 50 and top 10 predictions when compared to several state-of-the-art techniques. This is true not only for a non-redundant dataset of zinc proteins but also for a homolog set of different ligand and zinc-coordination systems for the same zinc proteins. Similar superior performance of GM-DockZn for near-native-pose sampling was also observed for docking to apo-structures and cross-docking between different ligand complex structures of the same protein. The highest success rate for sampling nearest near-native poses within top 5 and top 1 was achieved by combining GM-DockZn for conformational sampling with GOLD for ranking. The proposed geometry-based sampling technique will be useful for ligand docking to other metalloproteins. Availability and implementation GM-DockZn is freely available at www.qmclab.com/ for academic users. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
46

Chen, Muhao, Chelsea J. T. Ju, Guangyu Zhou, et al. "Multifaceted protein–protein interaction prediction based on Siamese residual RCNN." Bioinformatics 35, no. 14 (2019): i305—i314. http://dx.doi.org/10.1093/bioinformatics/btz328.

Full text
Abstract:
AbstractMotivationSequence-based protein–protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information.ResultsWe present an end-to-end framework, PIPR (Protein–Protein Interaction Prediction Based on Siamese Residual RCNN), for PPI predictions using only the protein sequences. PIPR incorporates a deep residual recurrent convolutional neural network in the Siamese architecture, which leverages both robust local features and contextualized information, which are significant for capturing the mutual influence of proteins sequences. PIPR relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Experimental evaluations show that PIPR outperforms various state-of-the-art systems on the binary PPI prediction problem. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short.Availability and implementationThe implementation is available at https://github.com/muhaochen/seq_ppi.git.Supplementary informationSupplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
47

Chen, Lifan, Xiaoqin Tan, Dingyan Wang, et al. "TransformerCPI: improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments." Bioinformatics 36, no. 16 (2020): 4406–14. http://dx.doi.org/10.1093/bioinformatics/btaa524.

Full text
Abstract:
Abstract Motivation Identifying compound–protein interaction (CPI) is a crucial task in drug discovery and chemogenomics studies, and proteins without three-dimensional structure account for a large part of potential biological targets, which requires developing methods using only protein sequence information to predict CPI. However, sequence-based CPI models may face some specific pitfalls, including using inappropriate datasets, hidden ligand bias and splitting datasets inappropriately, resulting in overestimation of their prediction performance. Results To address these issues, we here constructed new datasets specific for CPI prediction, proposed a novel transformer neural network named TransformerCPI, and introduced a more rigorous label reversal experiment to test whether a model learns true interaction features. TransformerCPI achieved much improved performance on the new experiments, and it can be deconvolved to highlight important interacting regions of protein sequences and compound atoms, which may contribute chemical biology studies with useful guidance for further ligand structural optimization. Availability and implementation https://github.com/lifanchen-simm/transformerCPI.
APA, Harvard, Vancouver, ISO, and other styles
48

Durairaj, Janani, Mehmet Akdel, Dick de Ridder, and Aalt D. J. van Dijk. "Geometricus represents protein structures as shape-mers derived from moment invariants." Bioinformatics 36, Supplement_2 (2020): i718—i725. http://dx.doi.org/10.1093/bioinformatics/btaa839.

Full text
Abstract:
Abstract Motivation As the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding functional and physical properties. Many existing embedding approaches are alignment based, which is both time-consuming and ineffective for distantly related proteins. On the other hand, library- or model-based approaches depend on a small library of fragments or require the use of a trained model, both of which may not generalize well. Results We present Geometricus, a novel and universally applicable approach to embedding proteins in a fixed-dimensional space. The approach is fast, accurate, and interpretable. Geometricus uses a set of 3D moment invariants to discretize fragments of protein structures into shape-mers, which are then counted to describe the full structure as a vector of counts. We demonstrate the applicability of this approach in various tasks, ranging from fast structure similarity search, unsupervised clustering and structure classification across proteins from different superfamilies as well as within the same family. Availability and implementation Python code available at https://git.wur.nl/durai001/geometricus.
APA, Harvard, Vancouver, ISO, and other styles
49

Quadeer, Ahmed A., David Morales-Jimenez, and Matthew R. McKay. "RocaSec: a standalone GUI-based package for robust co-evolutionary analysis of proteins." Bioinformatics 36, no. 7 (2019): 2262–63. http://dx.doi.org/10.1093/bioinformatics/btz890.

Full text
Abstract:
Abstract Summary Patterns of mutational correlations, learnt from protein sequences, have been shown to be informative of co-evolutionary sectors that are tightly linked to functional and/or structural properties of proteins. Previously, we developed a statistical inference method, robust co-evolutionary analysis (RoCA), to reliably predict co-evolutionary sectors of proteins, while controlling for statistical errors caused by limited data. RoCA was demonstrated on multiple viral proteins, with the inferred sectors showing close correspondences with experimentally-known biochemical domains. To facilitate seamless use of RoCA and promote more widespread application to protein data, here we present a standalone cross-platform package ‘RocaSec’ which features an easy-to-use GUI. The package only requires the multiple sequence alignment of a protein for inferring the co-evolutionary sectors. In addition, when information on the protein biochemical domains is provided, RocaSec returns the corresponding statistical association between the inferred sectors and biochemical domains. Availability and implementation The RocaSec software is publicly available under the MIT License at https://github.com/ahmedaq/RocaSec. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
50

Zervou, Michaela Areti, Effrosyni Doutsi, Pavlos Pavlidis, and Panagiotis Tsakalides. "Structural classification of proteins based on the computationally efficient recurrence quantification analysis and horizontal visibility graphs." Bioinformatics 37, no. 13 (2021): 1796–804. http://dx.doi.org/10.1093/bioinformatics/btab407.

Full text
Abstract:
Abstract Motivation Protein structural class prediction is one of the most significant problems in bioinformatics, as it has a prominent role in understanding the function and evolution of proteins. Designing a computationally efficient but at the same time accurate prediction method remains a pressing issue, especially for sequences that we cannot obtain a sufficient amount of homologous information from existing protein sequence databases. Several studies demonstrate the potential of utilizing chaos game representation along with time series analysis tools such as recurrence quantification analysis, complex networks, horizontal visibility graphs (HVG) and others. However, the majority of existing works involve a large amount of features and they require an exhaustive, time consuming search of the optimal parameters. To address the aforementioned problems, this work adopts the generalized multidimensional recurrence quantification analysis (GmdRQA) as an efficient tool that enables to process concurrently a multidimensional time series and reduce the number of features. In addition, two data-driven algorithms, namely average mutual information and false nearest neighbors, are utilized to define in a fast yet precise manner the optimal GmdRQA parameters. Results The classification accuracy is improved by the combination of GmdRQA with the HVG. Experimental evaluation on a real benchmark dataset demonstrates that our methods achieve similar performance with the state-of-the-art but with a smaller computational cost. Availability and implementation The code to reproduce all the results is available at https://github.com/aretiz/protein_structure_classification/tree/main. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!