Dissertations / Theses: 'Protein Sequence Analysis'

1

Abhiman, Saraswathi. "Prediction of function shift in protein families /." Stockholm, 2006. http://diss.kib.ki.se/2006/91-7140-869-X/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Parsons, Jeremy David. "Computer analysis of molecular sequences." Thesis, University of Cambridge, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.282922.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Boscott, Paul Edmond. "Sequence analysis in protein structure prediction." Thesis, University of Oxford, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.386870.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Hollich, Volker. "Orthology and protein domain architecture evolution /." Stockholm, 2006. http://diss.kib.ki.se/2006/91-7140-783-9/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Lassmann, Timo. "Algorithms for building and evaluating multiple sequence alignments /." Stockholm, 2006. http://diss.kib.ki.se/2006/91-7140-887-8/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Tuason, Maria Clarita. "Functional analysis of Proteolipid Protein regulatory sequence." Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=101805.

Full text

Abstract:

Myelin is an evolutionarily late acquisition of the vertebrate nervous system which speeds electrochemical signaling in mature nerve fibers by providing insulation in the form of a lipid-rich multilammelar sheath. Proteolipid Protein (PLP) is the most abundant protein in mature mammalian central nervous system myelin where it serves as a structural component in addition to other yet undefined roles. It is coordinately regulated at the transcriptional level with other myelin genes such as Myelin Basic Protein (MBP). The major components of MBP transcriptional regulation have been defined using a strategy of targeted transgenesis at the hypoxanthine phosphoribosyl transferase (HPRT) locus allowing quantitative and qualitative in vivo analysis of the transcriptional control exerted by conserved non-protein-coding sequences in transgenic mice. This study describes the localization and functional characterization of PLP conserved non-protein-coding sequences. Conservation was surveyed using alignments of genome sequences for a number of vertebrates ranging in their evolutionary distance from mouse. Aligned sequences were also scanned for clusters of conserved consensus binding sites for sox 10, krox20, gtx and betaHLH transcription factors, which play key roles in nervous system development. Dissection of conserved non-protein-coding sequence resulted in the production of a series of 10 reporter constructs addressing the search for PLP regulatory elements. This series includes highly conserved regions, some of which contain clusters of transcription factor consensus sites, as well as lesser conserved regions which were suggested to have regulatory activity in previous investigations. Notably, in vivo evidence of the importance of intron 1 for expression in the nervous system led to subsequent deletion-transfection analyses revealing a seemingly potent enhancer, the antisilencer/enhancer (ASE) within the intron, which is functionally validated in this study. Of this series, 8 of the selected regions have been amplified successfully and cloned into HPRT targeting constructs with a minimal lisp promoter and LacZ reporter. All 8 constructs have been transfected into ES cells. Homologous recombinants with the transgene docked at HPRT were selected, and chimeras have been analyzed for 3 of these constructs. To our surprise, neither a construct containing 2kb of 5' flanking sequence, nor a construct containing the highly conserved intron 3, were able to drive expression at the peak of myelination. A construct containing the ASE was unexpectedly shown to drive expression in cells scattered within the central grey matter of the spinal cord in a pattern intriguingly similar to that seen embryonically from migrating oligodendrocyte progenitors. The lack of expression of the first 2 constructs suggests that PLP regulatory elements may be interdependent, but we anticipate that delivery into germline followed by developmental and analysis of the full series of constructs will bring light to the emerging picture of partnerships between regulatory elements, and will also reveal the identity of the cells driving expression from the ASE. Understanding PLP transcriptional control may lead to therapeutic interventions as associated diseases result predominantly from imbalances in gene dosage leading to abnormal levels of PLP protein.

APA, Harvard, Vancouver, ISO, and other styles

7

Russell, Robert Bruce. "Computer analysis of protein sequence and structure." Thesis, University of Oxford, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.358736.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Hamby, Stephen Edward. "Data mining techniques for protein sequence analysis." Thesis, University of Nottingham, 2010. http://eprints.nottingham.ac.uk/11498/.

Full text

Abstract:

This thesis concerns two areas of bioinformatics related by their role in protein structure and function: protein structure prediction and post translational modification of proteins. The dihedral angles Ψ and Φ are predicted using support vector regression. For the prediction of Ψ dihedral angles the addition of structural information is examined and the normalisation of Ψ and Φ dihedral angles is examined. An application of the dihedral angles is investigated. The relationship between dihedral angles and three bond J couplings determined from NMR experiments is described by the Karplus equation. We investigate the determination of the correct solution of the Karplus equation using predicted Φ dihedral angles. Glycosylation is an important post translational modification of proteins involved in many different facets of biology. The work here investigates the prediction of N-linked and O-linked glycosylation sites using the random forest machine learning algorithm and pairwise patterns in the data. This methodology produces more accurate results when compared to state of the art prediction methods. The black box nature of random forest is addressed by using the trepan algorithm to generate a decision tree with comprehensible rules that represents the decision making process of random forest. The prediction of our program GPP does not distinguish between glycans at a given glycosylation site. We use farthest first clustering, with the idea of classifying each glycosylation site by the sugar linking the glycan to protein. This thesis demonstrates the prediction of protein backbone torsion angles and improves the current state of the art for the prediction of glycosylation sites. It also investigates potential applications and the interpretation of these methods.

APA, Harvard, Vancouver, ISO, and other styles

9

Maccallum, Robert Matthew. "Computational analysis of protein sequence and structure." Thesis, University College London (University of London), 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.285202.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Katti, M. V. "Analysis of simple sequence repeats in genome and protein sequences and development of computational tools for comparative promoter sequence analysis." Thesis(Ph.D.), CSIR-National Chemical Laboratory, Pune, 2001. http://dspace.ncl.res.in:8080/xmlui/handle/20.500.12252/2323.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Jonsson, Andreas. "Mass spectrometry in protein structure analysis /." Stockholm, 2001. http://diss.kib.ki.se/2001/91-628-4716-3/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Parry-Smith, David John. "Algorithms and data structures for protein sequence analysis." Thesis, University of Leeds, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.277404.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Chivian, Dylan Casey. "Application of information from homologous proteins for the prediction of protein structure /." Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/9264.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Giorgini, Flaviano. "Functional analysis of the murine sequence-specific RNA binding protein MSY4 /." Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/10293.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Ganapathy, Ashwin. "Computational analysis of protein identification using peptide mass fingerprinting approach /." free to MU campus, to others for purchase, 2004. http://wwwlib.umi.com/cr/mo/fullcit?p1426056.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Oppermann, Madalina. "Chemical and mass spectrometrical methods in protein analysis /." Stockholm, 2000. http://diss.kib.ki.se/2000/91-628-4542-x/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Gilbert, Richard James. "Novel programs for protein sequence analysis and structure prediction." Thesis, University of Oxford, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.305431.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Sonnhammer, Erik Leonard Laage. "Classification of protein domain families for genomic sequence analysis." Thesis, Open University, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.336799.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Tångrot, Jeanette. "Structural Information and Hidden Markov Models for Biological Sequence Analysis." Doctoral thesis, Umeå universitet, Institutionen för datavetenskap, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-1629.

Full text

Abstract:

Bioinformatics is a fast-developing field, which makes use of computational methods to analyse and structure biological data. An important branch of bioinformatics is structure and function prediction of proteins, which is often based on finding relationships to already characterized proteins. It is known that two proteins with very similar sequences also share the same 3D structure. However, there are many proteins with similar structures that have no clear sequence similarity, which make it difficult to find these relationships. In this thesis, two methods for annotating protein domains are presented, one aiming at assigning the correct domain family or families to a protein sequence, and the other aiming at fold recognition. Both methods use hidden Markov models (HMMs) to find related proteins, and they both exploit the fact that structure is more conserved than sequence, but in two different ways. Most of the research presented in the thesis focuses on the structure-anchored HMMs, saHMMs. For each domain family, an saHMM is constructed from a multiple structure alignment of carefully selected representative domains, the saHMM-members. These saHMM-members are collected in the so called "midnight ASTRAL set", and are chosen so that all saHMM-members within the same family have mutual sequence identities below a threshold of about 20%. In order to construct the midnight ASTRAL set and the saHMMs, a pipe-line of software tools are developed. The saHMMs are shown to be able to detect the correct family relationships at very high accuracy, and perform better than the standard tool Pfam in assigning the correct domain families to new domain sequences. We also introduce the FI-score, which is used to measure the performance of the saHMMs, in order to select the optimal model for each domain family. The saHMMs are made available for searching through the FISH server, and can be used for assigning family relationships to protein sequences. The other approach presented in the thesis is secondary structure HMMs (ssHMMs). These HMMs are designed to use both the sequence and the predicted secondary structure of a query protein when scoring it against the model. A rigorous benchmark is used, which shows that HMMs made from multiple sequences result in better fold recognition than those based on single sequences. Adding secondary structure information to the HMMs improves the ability of fold recognition further, both when using true and predicted secondary structures for the query sequence.
Bioinformatik är ett område där datavetenskapliga och statistiska metoder används för att analysera och strukturera biologiska data. Ett viktigt område inom bioinformatiken försöker förutsäga vilken tredimensionell struktur och funktion ett protein har, utifrån dess aminosyrasekvens och/eller likheter med andra, redan karaktäriserade, proteiner. Det är känt att två proteiner med likande aminosyrasekvenser också har liknande tredimensionella strukturer. Att två proteiner har liknande strukturer behöver dock inte betyda att deras sekvenser är lika, vilket kan göra det svårt att hitta strukturella likheter utifrån ett proteins aminosyrasekvens. Den här avhandlingen beskriver två metoder för att hitta likheter mellan proteiner, den ena med fokus på att bestämma vilken familj av proteindomäner, med känd 3D-struktur, en given sekvens tillhör, medan den andra försöker förutsäga ett proteins veckning, d.v.s. ge en grov bild av proteinets struktur. Båda metoderna använder s.k. dolda Markov modeller (hidden Markov models, HMMer), en statistisk metod som bland annat kan användas för att beskriva proteinfamiljer. Med hjälp en HMM kan man förutsäga om en viss proteinsekvens tillhör den familj modellen representerar. Båda metoderna använder också strukturinformation för att öka modellernas förmåga att känna igen besläktade sekvenser, men på olika sätt. Det mesta av arbetet i avhandlingen handlar om strukturellt förankrade HMMer (structure-anchored HMMs, saHMMer). För att bygga saHMMerna används strukturbaserade sekvensöverlagringar, vilka genereras utifrån hur proteindomänerna kan läggas på varandra i rymden, snarare än utifrån vilka aminosyror som ingår i deras sekvenser. I varje proteinfamilj används bara ett särskilt, representativt urval av domäner. Dessa är valda så att då sekvenserna jämförs parvis, finns det inget par inom familjen med högre sekvensidentitet än ca 20%. Detta urval görs för att få så stor spridning som möjligt på sekvenserna inom familjen. En programvaruserie har utvecklats för att välja ut representanter för varje familj och sedan bygga saHMMer baserade på dessa. Det visar sig att saHMMerna kan hitta rätt familj till en hög andel av de testade sekvenserna, med nästan inga fel. De är också bättre än den ofta använda metoden Pfam på att hitta rätt familj till helt nya proteinsekvenser. saHMMerna finns tillgängliga genom FISH-servern, vilken alla kan använda via Internet för att hitta vilken familj ett intressant protein kan tillhöra. Den andra metoden som presenteras i avhandlingen är sekundärstruktur-HMMer, ssHMMer, vilka är byggda från vanliga multipla sekvensöverlagringar, men också från information om vilka sekundärstrukturer proteinsekvenserna i familjen har. När en proteinsekvens jämförs med ssHMMen används en förutsägelse om sekundärstrukturen, och den beräknade sannolikheten att sekvensen tillhör familjen kommer att baseras både på sekvensen av aminosyror och på sekundärstrukturen. Vid en jämförelse visar det sig att HMMer baserade på flera sekvenser är bättre än sådana baserade på endast en sekvens, när det gäller att hitta rätt veckning för en proteinsekvens. HMMerna blir ännu bättre om man också tar hänsyn till sekundärstrukturen, både då den riktiga sekundärstrukturen används och då man använder en teoretiskt förutsagd.
Jeanette Hargbo.

APA, Harvard, Vancouver, ISO, and other styles

20

Nilsson, Johan. "Membrane protein topology : prediction, experimental mapping and genome-wide analysis /." Stockholm, 2004. http://diss.kib.ki.se/2004/91-7349-963-3/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Wang, Kai. "Novel computational methods for accurate quantitative and qualitative protein function prediction /." Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/11488.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Reinhardt, Astrid. "Neural network-based methods for large scale protein sequence analysis." Thesis, University of Cambridge, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.624141.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Tubiana, Jérôme. "Restricted Boltzmann machines : from compositional representations to protein sequence analysis." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE039/document.

Full text

Abstract:

Les Machines de Boltzmann restreintes (RBM) sont des modèles graphiques capables d’apprendre simultanément une distribution de probabilité et une représentation des données. Malgré leur architecture relativement simple, les RBM peuvent reproduire très fidèlement des données complexes telles que la base de données de chiffres écrits à la main MNIST. Il a par ailleurs été montré empiriquement qu’elles peuvent produire des représentations compositionnelles des données, i.e. qui décomposent les configurations en leurs différentes parties constitutives. Cependant, toutes les variantes de ce modèle ne sont pas aussi performantes les unes que les autres, et il n’y a pas d’explication théorique justifiant ces observations empiriques. Dans la première partie de ma thèse, nous avons cherché à comprendre comment un modèle si simple peut produire des distributions de probabilité si complexes. Pour cela, nous avons analysé un modèle simplifié de RBM à poids aléatoires à l’aide de la méthode des répliques. Nous avons pu caractériser théoriquement un régime compositionnel pour les RBM, et montré sous quelles conditions (statistique des poids, choix de la fonction de transfert) ce régime peut ou ne peut pas émerger. Les prédictions qualitatives et quantitatives de cette analyse théorique sont en accord avec les observations réalisées sur des RBM entraînées sur des données réelles. Nous avons ensuite appliqué les RBM à l’analyse et à la conception de séquences de protéines. De part leur grande taille, il est en effet très difficile de simuler physiquement les protéines, et donc de prédire leur structure et leur fonction. Il est cependant possible d’obtenir des informations sur la structure d’une protéine en étudiant la façon dont sa séquence varie selon les organismes. Par exemple, deux sites présentant des corrélations de mutations importantes sont souvent physiquement proches sur la structure. A l’aide de modèles graphiques tels que les Machine de Boltzmann, on peut exploiter ces signaux pour prédire la proximité spatiale des acides-aminés d’une séquence. Dans le même esprit, nous avons montré sur plusieurs familles de protéines que les RBM peuvent aller au-delà de la structure, et extraire des motifs étendus d’acides aminés en coévolution qui reflètent les contraintes phylogénétiques, structurelles et fonctionnelles des protéines. De plus, on peut utiliser les RBM pour concevoir de nouvelles séquences avec des propriétés fonctionnelles putatives par recombinaison de ces motifs. Enfin, nous avons développé de nouveaux algorithmes d’entraînement et des nouvelles formes paramétriques qui améliorent significativement la performance générative des RBM. Ces améliorations les rendent compétitives avec l’état de l’art des modèles génératifs tels que les réseaux génératifs adversariaux ou les auto-encodeurs variationnels pour des données de taille intermédiaires
Restricted Boltzmann machines (RBM) are graphical models that learn jointly a probability distribution and a representation of data. Despite their simple architecture, they can learn very well complex data distributions such the handwritten digits data base MNIST. Moreover, they are empirically known to learn compositional representations of data, i.e. representations that effectively decompose configurations into their constitutive parts. However, not all variants of RBM perform equally well, and little theoretical arguments exist for these empirical observations. In the first part of this thesis, we ask how come such a simple model can learn such complex probability distributions and representations. By analyzing an ensemble of RBM with random weights using the replica method, we have characterised a compositional regime for RBM, and shown under which conditions (statistics of weights, choice of transfer function) it can and cannot arise. Both qualitative and quantitative predictions obtained with our theoretical analysis are in agreement with observations from RBM trained on real data. In a second part, we present an application of RBM to protein sequence analysis and design. Owe to their large size, it is very difficult to run physical simulations of proteins, and to predict their structure and function. It is however possible to infer information about a protein structure from the way its sequence varies across organisms. For instance, Boltzmann Machines can leverage correlations of mutations to predict spatial proximity of the sequence amino-acids. Here, we have shown on several synthetic and real protein families that provided a compositional regime is enforced, RBM can go beyond structure and extract extended motifs of coevolving amino-acids that reflect phylogenic, structural and functional constraints within proteins. Moreover, RBM can be used to design new protein sequences with putative functional properties by recombining these motifs at will. Lastly, we have designed new training algorithms and model parametrizations that significantly improve RBM generative performance, to the point where it can compete with state-of-the-art generative models such as Generative Adversarial Networks or Variational Autoencoders on medium-scale data

APA, Harvard, Vancouver, ISO, and other styles

24

Gane, Paul J. "A sequence, structure and electrostatic analysis of the disulphide oxidoreductases." Thesis, University of Kent, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.242888.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Russell, Rodney S. "Novel RNA and protein sequences involved in dimerization and packaging of HIV-1 genomic RNA." Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=85092.

Full text

Abstract:

During HIV-1 assembly, the Gag structural protein specifically encapsidates two copies of viral genomic RNA in the form of a dimer. An RNA stem-loop structure (SL1) in the 5' untranslated region, known as the dimerization initiation site (DIS), is important for dimerization and packaging of HIV-1 genomic RNA; however, the mechanisms involved are not fully understood. The major goal of this PhD study was to further understand HIV-1 RNA dimerization, and to study the role of the Gag protein in the dimerization and packaging processes. Despite the known involvement of the DIS in RNA dimerization, DIS-mutated viruses still contain significant levels of dimerized RNA, and electron microscopy studies suggest that the RNA molecules are linked at the extreme 5' end. We show here that RNA sequences on both sides of the DIS are also required for HIVA genome dimerization, suggesting that multiple RNA elements are involved. We have also examined the contribution of specific amino acids within Gag to the dimerization and packaging processes. Previous work showed that partial deletion of the DIS impacted on viral replication capacity, but could largely be corrected by compensatory point mutations within Gag. To further elucidate the mechanism(s) of these compensatory mutations, we generated DIS mutants lacking the entire SL1, or only the SL1 loop sequences, and combined these deletions with various combinations of compensatory mutations. Analysis of virion-derived RNA showed that the relevant mutant viruses contained increased levels of spliced viral RNA compared to wild type, indicating that a defect in genome packaging specificity was present. However, this defect was corrected by our compensatory mutations, and a T121 substitution in p2 was shown to be solely responsible for this activity. These results suggest that the p2 spacer peptide plays a critical role in the specific packaging of viral genomic RNA. In summary, these findings provide new insig

APA, Harvard, Vancouver, ISO, and other styles

26

Johnstone, Pamela. "Cloning and sequence analysis of rubella virus nonstructural protein coding region." Thesis, University of Surrey, 1994. http://epubs.surrey.ac.uk/844437/.

Full text

Abstract:

A reliable working methodology for the reverse transcription (RT) and polymerase chain reaction (PCR) amplification of rubella virus (RV) RNA was established. The effect of magnesium concentration and RNA concentration on the yield and specificity of PCR products was investigated. Factors involved in the design of efficient primers for PCR were also studied. RT primers designed to specifically anneal to the RV genome were shown to increase the yield of PCR product when compared to RT-PCRs in which the RT reaction was primed by random hexamers. Using the RT-PCR technology, nonstructural (NS) protein coding regions of the wild-type strain Thomas and the vaccine strain Cendehill were amplified, cloned and sequenced. In addition a region encompassing part of the 5' NC region and the start NS protein ORF, covering nucleotides 18 to 540, for the wild-type strains Thomas, RB-1 and Machado and vaccine strains Cendehill, RA27/3, HPV77.DE5 and TO-336 was amplified, cloned and sequenced. When the Cendehill and Thomas sequences were compared with the equivalent sequences in the Therien and M33 wild-type strains, three amino acids were found which were unique to the Cendehill vaccine strain. The sequences of part of the 5' NC and 5' end of the NS coding regions of the above strains were compared to the equivalent sequences in the Therien and M33 strains. One amino acid substitution was found which was unique to RA27/3 and a second was identified which was present in both the RA27/3 and TO-336 vaccine strains. Nucleotide substitutions found in an area of the 5' NC region which, it has been suggested, plays a key role in the initiation of translation and positive strand replication were also identified. The importance of all of these substitutions is discussed with particular reference to their possible roles in attenuation. The suitability of the NS RV RT-PCR system developed in the early stages of these studies was examined with regard to its use in the amplification and detection of RV in clinical samples. The results obtained were in total agreement with those obtained using an RT-PCR system which detects the El RV gene and also correlated well with other laboratory results. Possible future applications of the NS RV RT-PCR system were discussed. Results obtained in this thesis were discussed in the context of possible future molecular biology studies in this field.

APA, Harvard, Vancouver, ISO, and other styles

27

Di, Domenico Tomás. "Computational Analysis and Annotation of Proteome Data: Sequence, Structure, Function and Interactions." Doctoral thesis, Università degli studi di Padova, 2014. http://hdl.handle.net/11577/3423805.

Full text

Abstract:

With the advent of modern sequencing technologies, the amount of biological data available has begun to challenge our ability to process it. The development of new tools and methods has become essential for the production of results based on such a vast amount of information. This thesis focuses on the development of such computational tools and method for the study of protein data. I first present the work done towards the understanding of intrinsic protein disorder. Through the development of novel disorder predictors, we were able to expand the available data sources to cover any protein of known sequence. By storing these predicted annotations, together with data from other sources, we created MobiDB, a resource that provides a comprehensive view of available disorder annotations for a protein of interest, covering all sequences in the UniProt database. Based on observations obtained from this resource, we proceeded to create a data analysis workflow with the goal of furthering our understanding of intrinsic protein disorder. The second part focuses on tandem repeat proteins. The RAPHAEL method was developed to assist in the identification of tandem repeat protein structures from PDB files. Identified repeat structures were then manually classified into a formal classification schema, and published as part of the RepeatsDB database. Finally, I describe the development of network-based tools for the analysis of protein data. RING allows the user to visualise and study the structure of a protein as a network of nodes, linked by physico-chemical properties. The second method, PANADA, enables the user to create protein similarity networks and to assess the transferability of functional annotations between clusters of proteins.
Con l'avvento delle tecnologie di sequenziamento moderne, la quantità di dati biologici disponibili ha cominciato a sfidare la nostra capacità di elaborarli. È diventato quindi essenziale sviluppare nuovi strumenti e tecniche capaci di produrre dei risultati basati su grandi moli di informazioni. Questa tesi si concentra sullo sviluppo di tali strumenti computazionali e dei metodi per lo studio dei dati proteici. Viene dapprima presento il lavoro svolto per la comprensione delle proteine intrinsecamente disordinate. Attraverso lo sviluppo di nuovi predittori di disordine, siamo stati in grado di sfruttare le fonti di dati attualmente disponibili per annotare qualsiasi proteina avente sequenza nota. Memorizzando queste predizioni, insieme ai dati provenienti da altre fonti, è stato creato MobiDB. Questa risorsa fornisce una visione completa sulle annotazioni di disordine disponibili per una qualsiasi proteina di interesse presente nel database UniProt. Sulla base delle osservazioni ottenute da questo strumento, è stato quindi creato un workflow di analisi dei dati con l'obiettivo di approfondire la nostra comprensione delle proteine intrinsecamente disordinate. La seconda parte della tesi si concentra sulle proteine ripetute. Il metodo RAPHAEL è stato sviluppato per contribuire nell'identificazione di strutture proteiche ripetute all'interno dei file PDB. Le strutture selezionate da questo strumento sono state poi catalogate manualmente utilizzando uno schema formale di classificazione, e pubblicate quindi come parte del database RepeatsDB. Infine, viene descritto lo sviluppo di strumenti basati su grafi per l'analisi di dati proteici. RING consente all'utente di visualizzare e studiare la struttura di una proteina come una rete di nodi collegati da tra loro da proprietà fisico-chimiche. Il secondo metodo, PANADA, consente all'utente di creare reti di similarità di proteine e di valutare la trasferibilità delle annotazioni funzionali tra cluster diversi.

APA, Harvard, Vancouver, ISO, and other styles

28

Wistrand, Markus. "Hidden Markov models for remote protein homology detection /." Stockholm, 2005. http://diss.kib.ki.se/2006/91-7140-598-4/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Kosuk, Nicholas L. "Topological analysis of the F plasmid encoded TraD protein /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/10244.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Chen, Sharon S. "Peptide sequence assignments by probabilistic peptide profile matching to an annotated peptide database /." Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/8084.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Zhao, Zhiyu. "Robust and Efficient Algorithms for Protein 3-D Structure Alignment and Genome Sequence Comparison." ScholarWorks@UNO, 2008. http://scholarworks.uno.edu/td/851.

Full text

Abstract:

Sequence analysis and structure analysis are two of the fundamental areas of bioinformatics research. This dissertation discusses, specifically, protein structure related problems including protein structure alignment and query, and genome sequence related problems including haplotype reconstruction and genome rearrangement. It first presents an algorithm for pairwise protein structure alignment that is tested with structures from the Protein Data Bank (PDB). In many cases it outperforms two other well-known algorithms, DaliLite and CE. The preliminary algorithm is a graph-theory based approach, which uses the concept of \stars" to reduce the complexity of clique-finding algorithms. The algorithm is then improved by introducing \double-center stars" in the graph and applying a self-learning strategy. The updated algorithm is tested with a much larger set of protein structures and shown to be an improvement in accuracy, especially in cases of weak similarity. A protein structure query algorithm is designed to search for similar structures in the PDB, using the improved alignment algorithm. It is compared with SSM and shows better performance with lower maximum and average Q-score for missing proteins. An interesting problem dealing with the calculation of the diameter of a 3-D sequence of points arose and its connection to the sublinear time computation is discussed. The diameter calculation of a 3-D sequence is approximated by a series of sublinear time deterministic, zero-error and bounded-error randomized algorithms and we have obtained a series of separations about the power of sublinear time computations. This dissertation also discusses two genome sequence related problems. A probabilistic model is proposed for reconstructing haplotypes from SNP matrices with incomplete and inconsistent errors. The experiments with simulated data show both high accuracy and speed, conforming to the theoretically provable e ciency and accuracy of the algorithm. Finally, a genome rearrangement problem is studied. The concept of non-breaking similarity is introduced. Approximating the exemplar non-breaking similarity to factor n1..f is proven to be NP-hard. Interestingly, for several practical cases, several polynomial time algorithms are presented.

APA, Harvard, Vancouver, ISO, and other styles

32

Hsieh, Jui-Cheng. "Structure-function analysis of the bacteriophage PRD1 DNA terminal protein: Nucleotide sequence, overexpression, and site-directed mutagenesis of the terminal protein gene." Diss., The University of Arizona, 1990. http://hdl.handle.net/10150/184974.

Full text

Abstract:

The nucleotide sequence of the PRD1 terminal protein gene has been determined. The coding region for PRD1 terminal protein is 777 base pairs long and encodes 259 amino acid residues (29,326 daltons). The deduced amino acid sequence of PRD1 terminal protein reveals no overall homology with other known terminal proteins or related proteins. A closer examination revealed a highly conserved amino acid sequence, YSRLRT, exist among all identified DNA terminal proteins including PRD1, PZA, Nf, φ29 and adenovirus. This is the first conserved amino acid sequence that has been found in all identified DNA terminal proteins. Not only is the YSRLRT sequence conserved, but its spatial location is similar as well. Therefore, the significance of the YSRLRT conserved sequence is suggested by both its conservative spatial location and high degree of homology across species. To study the structure-function relationship of the YSRLRT sequence of PRD1 terminal protein, in vitro site-directed mutagenesis was performed to determine the role of each amino acid in this conserved region. The PRD1 terminal protein and DNA polymerase genes were cloned into phagemid pEMBLex3, and the recombinant plasmid used for constructing mutants. Eleven PRD1 terminal protein mutant clones were examined for their priming complex formation activities. Our results have strongly demonstrated that the positive charge residue of arginine-174 plays an important role for PRD1 terminal protein function. There are 13 tyrosine residues in the predicted PRD1 terminal protein. It was of interest to known which tyrosine is actually linked to terminal nucleotide of the PRD1 DNA. We used a new approach involving replacing the tyrosine residues with phenylalanine residues in the carboxyl terminal portion of the protein. From analyses, the tyrosine-190 has been determined to be the most likely linkage site between terminal protein and PRD1 DNA.

APA, Harvard, Vancouver, ISO, and other styles

33

Dubey, Anshul. "Search and Analysis of the Sequence Space of a Protein Using Computational Tools." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/14115.

Full text

Abstract:

A new approach to the process of Directed Evolution is proposed, which utilizes different machine learning algorithms. Directed Evolution is a process of improving a protein for catalytic purposes by introducing random mutations in its sequence to create variants. Through these mutations, Directed Evolution explores the sequence space, which is defined as all the possible sequences for a given number of amino acids. Each variant sequence is divided into one of two classes, positive or negative, according to their activity or stability. By employing machine learning algorithms for feature selection on the sequence of these variants of the protein, attributes or amino acids in its sequence important for the classification into positive or negative, can be identified. Support Vector Machines (SVMs) were utilized to identify the important individual amino acids for any protein, which have to be preserved to maintain its activity. The results for the case of beta-lactamase show that such residues can be identified with high accuracy while using a small number of variant sequences. Another class of machine learning problems, Boolean Learning, was used to extend this approach to identifying interactions between the different amino acids in a proteins sequence using the variant sequences. It was shown through simulations that such interactions can be identified for any protein with a reasonable number of variant sequences. For experimental verification of this approach, two fluorescent proteins, mRFP and DsRed, were used to generate variants, which were screened for fluorescence. Using Boolean Learning, an interacting pair was identified, which was shown to be important for the fluorescence. It was also shown through experiments and simulations that knowing such pairs can increase the fraction active variants in the library. A Boolean Learning algorithm was also developed for this application, which can learn Boolean functions from data in the presence of classification noise.

APA, Harvard, Vancouver, ISO, and other styles

34

Cote, Marie-Jose. "The human parainfluenza virus 3 fusion protein: Cloning, mapping, sequence analysis and expression." Thesis, University of Ottawa (Canada), 1989. http://hdl.handle.net/10393/20781.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Wang, Xiao-yu. "Deduced amino acid sequence and gene sequence of microvitellogenin, a female specific hemolymph and egg protein from the tobacco hornworm, Manduca sexta." Diss., The University of Arizona, 1988. http://hdl.handle.net/10150/184329.

Full text

Abstract:

Microvitellogenin is a female specific yolk protein from the tobacco hornworm moth Manduca sexta. A cDNA library was constructed from poly (A)⁺ RNA isolated from adult female fat body. cDNA clones of mRNA for microvitellogenin were isolated by screening the cDNA library with antiserum against microvitellogenin. The results of Northern blot analysis and hybrid selection indicated that the cDNA clone was specific for microvitellogenin. The complete nucleotide sequence of the 834 base pair cDNA insert has been determined by the dideoxy chain termination method. The deduced amino acid sequence was compared with the N-terminal sequence determined by Edman degradation, an amino terminal extension of 17 amino acids appeared to be a signal peptide. The cDNA sequence predicts that the mature microvitellogenin is a protein of 232 amino acids with a calculated molecular weight of 26,201. A comparison of the translated amino acid sequence with the sequences in National Biomedical Research Foundation protein library did not establish any sequence similarity with known proteins. The microvitellogenin gene begins to be expressed in the fat body on the first day of the wandering (prepupal) females as determined by using the cDNA insert as a probe to hybridize with the mRNA for microvitellogenin. The cDNA probe was also used to screen a genomic library of M. sexta, yielding three genomic clones for microvitellogenin. One of them was characterized and it contained the complete microvitellogenin gene. The gene sequence was determined. Comparison to the cDNA sequence showed that the microvitellogenin gene contains an intron near the 5'-end of the non-coding region. The 5'-flanking sequence of the gene has been compared to the same regions of yp genes of Drosophila and vitellogenin genes of locust, some similar sequences have been observed and discussed.

APA, Harvard, Vancouver, ISO, and other styles

36

Lutya, Portia Thandokazi. "Expression and purification of the novel protein domain DWNN." Thesis, University of the Western Cape, 2002. http://etd.uwc.ac.za/index.php?module=etd&amp.

Full text

Abstract:

Proteins play an important role in cells, as the morphology, function and activities of the cell depend on the proteins they express. The key to understanding how different proteins function lies in an understanding of the molecular structure. The overall aim of this thesis was the determination of the structure of DWNN domains. This thesis described the preparation of samples of human DWNN suitable for structural analysis by nuclear magnetic resonance spectroscopy (NMR), as well as NMR analysis.

APA, Harvard, Vancouver, ISO, and other styles

37

Lim, Raelene. "Analysis of Madm, a novel adaptor protein that associates with Myeloid Leukemia Factor 1." Thesis, Curtin University, 2003. http://hdl.handle.net/20.500.11937/2269.

Full text

Abstract:

Myeloid Leukemia Factor 1 (Mlf1) is the murine homolog of MLF1, which was identified as a fusion gene with Nucleophosmin (NPM) resulting from the (3;5)(q25.1;q34) translocation associated with acute myeloid leukemia and myelodysplastic syndrome (Yoneda-Kato et al., 1996). Mlf1 was independently isolated using cDNA representational difference to identify genes up-regulated when an erythroleukemic cell line underwent a lineage switch to display a monoblastoid phenotype (Williams et al., 1999). Mlf1 has been shown to enhance myeloid differentiation and suppress erythroid differentiation; however, its mechanism of action is unknown. A yeast two hybrid screen was employed to identify Mlf1-interacting proteins. This screen isolated a number of known protein, as well as several novel molecules, that bound Mlf1. One of these was 14-3-3ξ, a member of a family of molecules that bind phosphoserine motifs and regulate the subcellular localization of partner proteins. Mlf1 contains a classic RSXSXP sequence for 14-3-3 binding and associated with 14-3-3ξ; via this phosphorylated motif (Lim et al., 2002). The aim of this thesis was to characterise a novel Mlf1-interacting protein that had some homology to protein kinases and was named Mlf1 Adaptor Molecule (Madm). Adaptor proteins are molecules that possess no enzymatic or transcriptional activity, but instead mediate protein-protein interactions. Madm is encoded by a gene consisting of 18 exons and promoter analysis suggested Madm expression might be widespread; indeed Northern blotting of adult tissues and in situ hybridization of embryos demonstrated ubiquitous Madm expression. Significantly, the Madm protein sequence is highly conserved across diverse species.Madm formed dimers and although it contains a kinase-like domain, the protein lacks several critical residues required for catalytic activity, including an ATP-binding site. Purification of recombinant Madm revealed that the protein was not a kinase; however, studies in mammalian cells showed that Madm associated with a kinase and that Madm was phosphorylated on serine residues in vivo and in vitro. Madm also contains a nuclear localization sequence and nuclear export sequence and was shown to localise to both cytoplasm and nucleus by subcellular fractionation and confocal microscopy. The presence of two nuclear receptor binding motifs (consensus MILL) suggests that Madm may have a functional role in the nucleus. Madm co-immunoprecipitated with Mlf1 and co-localized in the cytoplasm. In addition, the Madm-associated kinase phosphorylated Mlf1 on serine residues, including the RSXSXP motif. In contrast to wild-type Mlf1, the oncogenic fusion protein NPM-MLF1 did not bind 14-3-3i; and localized exclusively in the nucleus. Although Madm co-immunoprecipitated with NPM-MLF1 the binding mechanism was altered. As Mlf1 is able to reprogram erythroleukemic cells to display a monoblastoid phenotype and potentiate myeloid maturation (Williams et al., 1999), the effects of Madm on myeloid differentiation was investigated. However, unlike Mlf1, ectopic expression of Madm in M1 myeloid cells suppressed cytokine-induced differentiation.In summary, the data presented in this thesis reports on the cloning and characterization of a novel adaptor protein that is involved in the phosphorylation of the proto-oncoprotein MIM. Phosphorylation of Mlf1 is likely to affect its interaction with other proteins, such as 14-3-3~. Complex formation, therefore, may well alter the localization of Mlf1 and Madm, and influence hematopoietic differentiation.

APA, Harvard, Vancouver, ISO, and other styles

38

Bresell, Anders. "Characterization of protein families, sequence patterns, and functional annotations in large data sets." Doctoral thesis, Linköping : Department of Physics, Chemistry and Biology, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10565.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Boscariol, Rya. "Studies on ovine CD4 : genomic sequence analysis and protein cleavage studies with cathepsin proteases." Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=81601.

Full text

Abstract:

Here we report the expression and purification of two recombinant Fasciola hepatica enzymes, catL2 and catL5 which were used to perform cleavage studies with substrates potentially encountered by the parasite in vivo; BSA, hIgG3K and the important T cell marker, CD4. We examined the digestion products generated by the cleavage of human CD4 with catL5 using mass spectrometry and predicted candidate cleavage sites by performing a theoretical digest of the protein.
Ovine CD4 is also of interest to us as a target of F. hepatica cathepsin L activity. Here we confirm a recently reported ovine CD4 cDNA sequence and the existence of a single nucleotide polymorphism (T/C) within this sequence. The polymorphism translates to a serine-proline switch near the hinge region of the protein. Additionally, we have found that this polymorphism is also present in genomic DNA, suggesting that two alleles of CD4 exist in the ovine genome.

APA, Harvard, Vancouver, ISO, and other styles

40

Fredriksson, Simon. "Proximity Ligation : Transforming protein analysis into nucleic acid detection through proximity-dependent ligation of DNA sequence tagged protein-binders." Doctoral thesis, Uppsala University, Department of Genetics and Pathology, 2002. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-2691.

Full text

Abstract:

A novel technology for protein detection, proximity ligation, has been developed along with improved methods for in situ synthesis of DNA microarrays. Proximity ligation enables a specific and quantitative transformation of proteins present in a sample into nucleic acid sequences. As pairs of so-called proximity probes bind the individual target protein molecules at distinct sites, these reagents are brought in close proximity. The probes consist of a protein specific binding part coupled to an oligonucleotide with either a free 3’- or 5’-end capable of hybridizing to a common connector oligonucleotide. When the probes are in proximity, promoted by target binding, then the DNA strands can be joined by enzymatic ligation. The nucleic acid sequence that is formed can then be amplified and quantitatively detected in a real-time monitored polymerase chain reaction. This convenient assay is simple to perform and allows highly sensitive protein detection. Parallel analysis of multiple proteins by DNA microarray technology is anticipated for proximity ligation and enabled by the information carrying ability of nucleic acids to define the individual proteins. Assays detecting cytokines using SELEX aptamers or antibodies, monoclonal and polyclonal, are presented in the thesis.

Microarrays synthesized in situ using photolithographic methods generate impure products due to damaged molecules and interrupted synthesis. Through a molecular inversion mechanism presented here, these impurities may be removed. At the end of synthesis, full-length oligonucleotides receive a functional group that can then be made to react with the solid support forming an arched structure. The 3’-ends of the oligonucleotides are then cleaved, removing the impurities from the support and allowing the liberated 3’-hydroxyl to prime polymerase extension reactions from the inverted oligonucleotides. The effect of having pure oligonucleotides probes compared to ones contaminated with shorter variants was investigated in allele specific hybridization reactions. Pure probes were shown to have greater ability to discriminate between matched and singly mismatched targets at optimal hybridization temperatures.

APA, Harvard, Vancouver, ISO, and other styles

41

Iqbal, Sumaiya. "Machine Learning based Protein Sequence to (un)Structure Mapping and Interaction Prediction." ScholarWorks@UNO, 2017. http://scholarworks.uno.edu/td/2379.

Full text

Abstract:

Proteins are the fundamental macromolecules within a cell that carry out most of the biological functions. The computational study of protein structure and its functions, using machine learning and data analytics, is elemental in advancing the life-science research due to the fast-growing biological data and the extensive complexities involved in their analyses towards discovering meaningful insights. Mapping of protein’s primary sequence is not only limited to its structure, we extend that to its disordered component known as Intrinsically Disordered Proteins or Regions in proteins (IDPs/IDRs), and hence the involved dynamics, which help us explain complex interaction within a cell that is otherwise obscured. The objective of this dissertation is to develop machine learning based effective tools to predict disordered protein, its properties and dynamics, and interaction paradigm by systematically mining and analyzing large-scale biological data. In this dissertation, we propose a robust framework to predict disordered proteins given only sequence information, using an optimized SVM with RBF kernel. Through appropriate reasoning, we highlight the structure-like behavior of IDPs in disease-associated complexes. Further, we develop a fast and effective predictor of Accessible Surface Area (ASA) of protein residues, a useful structural property that defines protein’s exposure to partners, using regularized regression with 3rd-degree polynomial kernel function and genetic algorithm. As a key outcome of this research, we then introduce a novel method to extract position specific energy (PSEE) of protein residues by modeling the pairwise thermodynamic interactions and hydrophobic effect. PSEE is found to be an effective feature in identifying the enthalpy-gain of the folded state of a protein and otherwise the neutral state of the unstructured proteins. Moreover, we study the peptide-protein transient interactions that involve the induced folding of short peptides through disorder-to-order conformational changes to bind to an appropriate partner. A suite of predictors is developed to identify the residue-patterns of Peptide-Recognition Domains from protein sequence that can recognize and bind to the peptide-motifs and phospho-peptides with post-translational-modifications (PTMs) of amino acid, responsible for critical human diseases, using the stacked generalization ensemble technique. The involved biologically relevant case-studies demonstrate possibilities of discovering new knowledge using the developed tools.

APA, Harvard, Vancouver, ISO, and other styles

42

Capella, Gutiérrez Salvador Jesús 1985. "Analysis of multiple protein sequence alignments and phylogenetic trees in the context of phylogenomics studies." Doctoral thesis, Universitat Pompeu Fabra, 2012. http://hdl.handle.net/10803/97289.

Full text

Abstract:

Phylogenomics is a biological discipline which can be understood as the intersection of the fields of genomics and evolution. Its main focuses are the analyses of genomes through the evolutionary lens and the understanding of how different organisms relate to each other. Moreover, phylogenomics allows to make accurate functional annotations of newly sequenced genomes. This discipline has grown in response to the deluge of data coming from different genome projects. To achieve their objectives, phylogenomics heavily depends on the accuracy of different methods to generate precise phylogenetic trees. Phylogenetic trees are the basic tool of this field and serve to represent how sequences or species relate to each other through common ancestry. During my thesis, I have centered my efforts in improving an automated pipeline to generate accurate phylogenetic trees and its posterior publication through a public database. Among the efforts to improve the pipeline, I have specially focused on the problem of multiple sequence alignment post-processing, which has been shown to be central to the reliability of subsequent analyses. Subsequently I have applied this pipeline, and a battery of other phylogenomics tools, to the study of the phylogenetic position of Microsporidia, a group of fast-evolving intracellular parasites. Due to their special genomic features, Microsporidia evolution constitutes one of the classical examples of challenging problems for phylogenomics. Finally, I have also used the pipeline as a part of a newly designed method for selecting robust combinations of phylogenetic gene markers. I have used this method for selecting optimal gene sets to assess the phylogenetic relationships within fungi and cyanobacteria, showing that the potential of these genes as phylogenetic markers goes well beyond the species used for their selection.
Filogenómica es una disciplina biológica que puede ser entendida como la intersección entre los campos de la genómica y la evolución. Su área de estudio es el análisis evolutivo de los genomas y como se relacionan las distintas especies entre sí. Además, la filogenómica tiene como objetivo anotar funcionalmente, con gran precisi ón, genomas recién secuenciados. De hecho, esta disciplina ha crecido rápidamente en los úultimos años como respuesta a la avalancha de datos provenientes de distintos proyectos genómicos. Para alcanzar sus objetivos, la filogenómica depende, en gran medida, de los distintos métodos usados para generar árboles filogenéticos. Los árboles filogenéticos son las herramientas básicas de la filogenómica y sirven para representar como secuencias y especies se relacionan entre sí por ascendencia. Durante el desarrollo de mi tesis, he centrado mis esfuerzos en mejorar una pipeline (conjunto de programas ejecutados de forma controlada) automática que permite generar árboles filogenéticos con gran precisión, y como ofrecer estos datos a la comunidad científica a través de una base de datos. Entre los esfuerzos realizados para mejorar la pipeline, me he centrado especialmente en el post-procesamiento previo a cualquier análisis de alineamientos múltiples de secuencias, ya que la calidad del alineamiento determina la de los estudios posteriores. En un contexto más biológico, he usado esta pipeline junto con otras herramientas filogenómicas en el estudio de la posición filogenética de Microsporidia. Dadas sus características genómicas especiales, la evolución de Microsporidia constituye uno de los problemas clásicos y difíciles de resolver en filogenómica. Finalmente, he usado también la pipeline como parte de un nuevo método para seleccionar combinaciones óptimas de genes con potencial como marcadores filogenéticos. De hecho, he usado este método para identificar conjuntos de marcadores filogenéticos que permiten reconstruir con alto grado de precisión las relaciones evolutivas en Cyanobacterias y en Hongos. Lo más interesante de este método es que eval úa la fiabilidad de los marcadores en especies no usadas para su selección.

APA, Harvard, Vancouver, ISO, and other styles

43

Roth, Christian [Verfasser]. "Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts / Christian Roth." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2021. http://nbn-resolving.de/urn:nbn:de:gbv:7-21.11130/00-1735-0000-0008-5912-0-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Roscoe, Benjamin P. "Analyses of All Possible Point Mutations within a Protein Reveals Relationships between Function and Experimental Fitness: A Dissertation." eScholarship@UMMS, 2014. https://escholarship.umassmed.edu/gsbs_diss/716.

Full text

Abstract:

The primary amino acid sequence of a protein governs its specific cellular functions. Since the cracking of the genetic code in the late 1950’s, it has been possible to predict the amino acid sequence of a given protein from the DNA sequence of a gene. Nevertheless, the ability to predict a protein’s function from its primary sequence remains a great challenge in biology. In order to address this problem, we combined recent advances in next generation sequencing technologies with systematic mutagenesis strategies to assess the function of thousands of protein variants in a single experiment. Using this strategy, my dissertation describes the effects of most possible single point mutants in the multifunctional Ubiquitin protein in yeast. The effects of these mutants on the essential activation of ubiquitin by the ubiquitin activating protein (E1, Uba1p) as well as their effects on overall yeast growth were measured. Ubiquitin mutants defective for E1 activation were found to correlate with growth defects, although in a non-linear fashion. Further examination of select point mutants indicated that E1 activation deficiencies predict downstream defects in Ubiquitin function, resulting in the observed growth phenotypes. These results indicate that there may be selective pressure for the activity of the E1enzyme to selectively activate ubiquitin protein variants that do not result in functional downstream defects. Additionally, I will describe the use of similar techniques to discover drug resistant mutants of the oncogenic protein BRAFV600E in human melanoma cell lines as an example of the widespread applicability of our strategy for addressing the relationship between protein function and biological fitness.

APA, Harvard, Vancouver, ISO, and other styles

45

Roscoe, Benjamin P. "Analyses of All Possible Point Mutations within a Protein Reveals Relationships between Function and Experimental Fitness: A Dissertation." eScholarship@UMMS, 2003. http://escholarship.umassmed.edu/gsbs_diss/716.

Full text

Abstract:

The primary amino acid sequence of a protein governs its specific cellular functions. Since the cracking of the genetic code in the late 1950’s, it has been possible to predict the amino acid sequence of a given protein from the DNA sequence of a gene. Nevertheless, the ability to predict a protein’s function from its primary sequence remains a great challenge in biology. In order to address this problem, we combined recent advances in next generation sequencing technologies with systematic mutagenesis strategies to assess the function of thousands of protein variants in a single experiment. Using this strategy, my dissertation describes the effects of most possible single point mutants in the multifunctional Ubiquitin protein in yeast. The effects of these mutants on the essential activation of ubiquitin by the ubiquitin activating protein (E1, Uba1p) as well as their effects on overall yeast growth were measured. Ubiquitin mutants defective for E1 activation were found to correlate with growth defects, although in a non-linear fashion. Further examination of select point mutants indicated that E1 activation deficiencies predict downstream defects in Ubiquitin function, resulting in the observed growth phenotypes. These results indicate that there may be selective pressure for the activity of the E1enzyme to selectively activate ubiquitin protein variants that do not result in functional downstream defects. Additionally, I will describe the use of similar techniques to discover drug resistant mutants of the oncogenic protein BRAFV600E in human melanoma cell lines as an example of the widespread applicability of our strategy for addressing the relationship between protein function and biological fitness.

APA, Harvard, Vancouver, ISO, and other styles

46

Duke, Jamie L. "Structural analysis of the EGR family of transcription factors : templates for predicting protein-DNA interactions /." Link to online version, 2006. https://ritdml.rit.edu/dspace/handle/1850/2296.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Lim, Raelene. "Analysis of Madm, a novel adaptor protein that associates with Myeloid Leukemia Factor 1." Curtin University of Technology, School of Biomedical Sciences, 2003. http://espace.library.curtin.edu.au:80/R/?func=dbin-jump-full&object_id=14294.

Full text

Abstract:

Myeloid Leukemia Factor 1 (Mlf1) is the murine homolog of MLF1, which was identified as a fusion gene with Nucleophosmin (NPM) resulting from the (3;5)(q25.1;q34) translocation associated with acute myeloid leukemia and myelodysplastic syndrome (Yoneda-Kato et al., 1996). Mlf1 was independently isolated using cDNA representational difference to identify genes up-regulated when an erythroleukemic cell line underwent a lineage switch to display a monoblastoid phenotype (Williams et al., 1999). Mlf1 has been shown to enhance myeloid differentiation and suppress erythroid differentiation; however, its mechanism of action is unknown. A yeast two hybrid screen was employed to identify Mlf1-interacting proteins. This screen isolated a number of known protein, as well as several novel molecules, that bound Mlf1. One of these was 14-3-3ξ, a member of a family of molecules that bind phosphoserine motifs and regulate the subcellular localization of partner proteins. Mlf1 contains a classic RSXSXP sequence for 14-3-3 binding and associated with 14-3-3ξ; via this phosphorylated motif (Lim et al., 2002). The aim of this thesis was to characterise a novel Mlf1-interacting protein that had some homology to protein kinases and was named Mlf1 Adaptor Molecule (Madm). Adaptor proteins are molecules that possess no enzymatic or transcriptional activity, but instead mediate protein-protein interactions. Madm is encoded by a gene consisting of 18 exons and promoter analysis suggested Madm expression might be widespread; indeed Northern blotting of adult tissues and in situ hybridization of embryos demonstrated ubiquitous Madm expression. Significantly, the Madm protein sequence is highly conserved across diverse species.
Madm formed dimers and although it contains a kinase-like domain, the protein lacks several critical residues required for catalytic activity, including an ATP-binding site. Purification of recombinant Madm revealed that the protein was not a kinase; however, studies in mammalian cells showed that Madm associated with a kinase and that Madm was phosphorylated on serine residues in vivo and in vitro. Madm also contains a nuclear localization sequence and nuclear export sequence and was shown to localise to both cytoplasm and nucleus by subcellular fractionation and confocal microscopy. The presence of two nuclear receptor binding motifs (consensus MILL) suggests that Madm may have a functional role in the nucleus. Madm co-immunoprecipitated with Mlf1 and co-localized in the cytoplasm. In addition, the Madm-associated kinase phosphorylated Mlf1 on serine residues, including the RSXSXP motif. In contrast to wild-type Mlf1, the oncogenic fusion protein NPM-MLF1 did not bind 14-3-3i; and localized exclusively in the nucleus. Although Madm co-immunoprecipitated with NPM-MLF1 the binding mechanism was altered. As Mlf1 is able to reprogram erythroleukemic cells to display a monoblastoid phenotype and potentiate myeloid maturation (Williams et al., 1999), the effects of Madm on myeloid differentiation was investigated. However, unlike Mlf1, ectopic expression of Madm in M1 myeloid cells suppressed cytokine-induced differentiation.
In summary, the data presented in this thesis reports on the cloning and characterization of a novel adaptor protein that is involved in the phosphorylation of the proto-oncoprotein MIM. Phosphorylation of Mlf1 is likely to affect its interaction with other proteins, such as 14-3-3~. Complex formation, therefore, may well alter the localization of Mlf1 and Madm, and influence hematopoietic differentiation.

APA, Harvard, Vancouver, ISO, and other styles

48

Hase, Manuela. "Molecular and ultrastructural analysis of Tpr, a nuclear pore complex-attached coiled-coil protein /." Stockholm, 2003. http://diss.kib.ki.se/2003/91-7349-525-5/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Verzotto, Davide. "Advanced Computational Methods for Massive Biological Sequence Analysis." Doctoral thesis, Università degli studi di Padova, 2011. http://hdl.handle.net/11577/3426282.

Full text

Abstract:

With the advent of modern sequencing technologies massive amounts of biological data, from protein sequences to entire genomes, are becoming increasingly available. This poses the need for the automatic analysis and classification of such a huge collection of data, in order to enhance knowledge in the Life Sciences. Although many research efforts have been made to mathematically model this information, for example finding patterns and similarities among protein or genome sequences, these approaches often lack structures that address specific biological issues. In this thesis, we present novel computational methods for three fundamental problems in molecular biology: the detection of remote evolutionary relationships among protein sequences, the identification of subtle biological signals in related genome or protein functional sites, and the phylogeny reconstruction by means of whole-genome comparisons. The main contribution is given by a systematic analysis of patterns that may affect these tasks, leading to the design of practical and efficient new pattern discovery tools. We thus introduce two advanced paradigms of pattern discovery and filtering based on the insight that functional and conserved biological motifs, or patterns, should lie in different sites of sequences. This enables to carry out space-conscious approaches that avoid a multiple counting of the same patterns. The first paradigm considered, namely irredundant common motifs, concerns the discovery of common patterns, for two sequences, that have occurrences not covered by other patterns, whose coverage is defined by means of specificity and extension. The second paradigm, namely underlying motifs, concerns the filtering of patterns, from a given set, that have occurrences not overlapping other patterns with higher priority, where priority is defined by lexicographic properties of patterns on the boundary between pattern matching and statistical analysis. We develop three practical methods directly based on these advanced paradigms. Experimental results indicate that we are able to identify subtle similarities among biological sequences, using the same type of information only once. In particular, we employ the irredundant common motifs and the statistics based on these patterns to solve the remote protein homology detection problem. Results show that our approach, called Irredundant Class, outperforms the state-of-the-art methods in a challenging benchmark for protein analysis. Afterwards, we establish how to compare and filter a large number of complex motifs (e.g., degenerate motifs) obtained from modern motif discovery tools, in order to identify subtle signals in different biological contexts. In this case we employ the notion of underlying motifs. Tests on large protein families indicate that we drastically reduce the number of motifs that scientists should manually inspect, further highlighting the actual functional motifs. Finally, we combine the two proposed paradigms to allow the comparison of whole genomes, and thus the construction of a novel and practical distance function. With our method, called Unic Subword Approach, we relate to each other the regions of two genome sequences by selecting conserved motifs during evolution. Experimental results show that our approach achieves better performance than other state-of-the-art methods in the whole-genome phylogeny reconstruction of viruses, prokaryotes, and unicellular eukaryotes, further identifying the major clades of these organisms.
Con l'avvento delle moderne tecnologie di sequenziamento, massive quantità di dati biologici, da sequenze proteiche fino a interi genomi, sono disponibili per la ricerca. Questo progresso richiede l'analisi e la classificazione automatica di tali collezioni di dati, al fine di migliorare la conoscenza nel campo delle Scienze della Vita. Nonostante finora siano stati proposti molti approcci per modellare matematicamente le sequenze biologiche, ad esempio cercando pattern e similarità tra sequenze genomiche o proteiche, questi metodi spesso mancano di strutture in grado di indirizzare specifiche questioni biologiche. In questa tesi, presentiamo nuovi metodi computazionali per tre problemi fondamentali della biologia molecolare: la scoperta di relazioni evolutive remote tra sequenze proteiche, l'individuazione di segnali biologici complessi in siti funzionali tra loro correlati, e la ricostruzione della filogenesi di un insieme di organismi, attraverso la comparazione di interi genomi. Il principale contributo è dato dall'analisi sistematica dei pattern che possono interessare questi problemi, portando alla progettazione di nuovi strumenti computazionali efficaci ed efficienti. Vengono introdotti così due paradigmi avanzati per la scoperta e il filtraggio di pattern, basati sull'osservazione che i motivi biologici funzionali, o pattern, sono localizzati in differenti regioni delle sequenze in esame. Questa osservazione consente di realizzare approcci parsimoniosi in grado di evitare un conteggio multiplo degli stessi pattern. Il primo paradigma considerato, ovvero irredundant common motifs, riguarda la scoperta di pattern comuni a coppie di sequenze che hanno occorrenze non coperte da altri pattern, la cui copertura è definita da una maggiore specificità e/o possibile estensione dei pattern. Il secondo paradigma, ovvero underlying motifs, riguarda il filtraggio di pattern che hanno occorrenze non sovrapposte a quelle di altri pattern con maggiore priorità, dove la priorità è definita da proprietà lessicografiche dei pattern al confine tra pattern matching e analisi statistica. Sono stati sviluppati tre metodi computazionali basati su questi paradigmi avanzati. I risultati sperimentali indicano che i nostri metodi sono in grado di identificare le principali similitudini tra sequenze biologiche, utilizzando l'informazione presente in maniera non ridondante. In particolare, impiegando gli irredundant common motifs e le statistiche basate su questi pattern risolviamo il problema della rilevazione di omologie remote tra proteine. I risultati evidenziano che il nostro approccio, chiamato Irredundant Class, ottiene ottime prestazioni su un benchmark impegnativo, e migliora i metodi allo stato dell'arte. Inoltre, per individuare segnali biologici complessi utilizziamo la nozione di underlying motifs, definendo così alcune modalità per il confronto e il filtraggio di motivi degenerati ottenuti tramite moderni strumenti di pattern discovery. Esperimenti su grandi famiglie proteiche dimostrano che il nostro metodo riduce drasticamente il numero di motivi che gli scienziati dovrebbero altrimenti ispezionare manualmente, mettendo in luce inoltre i motivi funzionali identificati in letteratura. Infine, combinando i due paradigmi proposti presentiamo una nuova e pratica funzione di distanza tra interi genomi. Con il nostro metodo, chiamato Unic Subword Approach, relazioniamo tra loro le diverse regioni di due sequenze genomiche, selezionando i motivi conservati durante l'evoluzione. I risultati sperimentali evidenziano che il nostro approccio offre migliori prestazioni rispetto ad altri metodi allo stato dell'arte nella ricostruzione della filogenesi di organismi quali virus, procarioti ed eucarioti unicellulari, identificando inoltre le sottoclassi principali di queste specie.

APA, Harvard, Vancouver, ISO, and other styles

50

Gong, Ping Otsuka Anthony John. "Genetic and biochemical analysis of the interaction between unc-44 AO13 ankyrin and protein phosphatase 2A." Normal, Ill. : Illinois State University, 2005. http://wwwlib.umi.com/cr/ilstu/fullcit?p3196647.

Full text

Abstract:

Thesis (Ph. D.)--Illinois State University, 2005.
Title from title page screen, viewed September 26, 2006. Dissertation Committee: Anthony J. Otsuka (chair), Radheshyam Jayaswal, Kevin A. Edwards, David L. Williams, Hou Tak Cheung. Includes bibliographical references (leaves 110-124) and abstract. Also available in print.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Protein Sequence Analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles