To see the other types of publications on this topic, follow the link: Prediction of transcription factor binding sites.

Dissertations / Theses on the topic 'Prediction of transcription factor binding sites'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Prediction of transcription factor binding sites.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Robert, Christelle L. R. S. "Computational Prediction of Transcription Factor Binding Sites in Bacterial Genomes." Thesis, University of Dundee, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.521672.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Morozov, Vyacheslav. "Computational Methods for Inferring Transcription Factor Binding Sites." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/23382.

Full text
Abstract:
Position weight matrices (PWMs) have become a tool of choice for the identification of transcription factor binding sites in DNA sequences. PWMs are compiled from experimentally verified and aligned binding sequences. PWMs are then used to computationally discover novel putative binding sites for a given protein. DNA-binding proteins often show degeneracy in their binding requirement, the overall binding specificity of many proteins is unknown and remains an active area of research. Although PWMs are more reliable predictors than consensus string matching, they generally result in a high number of false positive hits. A previous study introduced a novel method to PWM training based on the known motifs to sample additional putative binding sites from a proximal promoter area. The core idea was further developed, implemented and tested in this thesis with a large scale application. Improved mono- and dinucleotide PWMs were computed for Drosophila melanogaster. The Matthews correlation coefficient was used as an optimization criterion in the PWM refinement algorithm. New PWMs keep an account of non-uniform background nucleotide distributions on the promoters and consider a larger number of new binding sites during the refinement steps. The optimization included the PWM motif length, the position on the promoter, the threshold value and the binding site location. The obtained predictions were compared for mono- and dinucleotide PWM versions with initial matrices and with conventional tools. The optimized PWMs predicted new binding sites with better accuracy than conventional PWMs.
APA, Harvard, Vancouver, ISO, and other styles
3

Sealfon, Rachel (Rachel Sima). "Predicting enhancer regions and transcription factor binding sites in D. melanogaster." Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/62434.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 71-75).
Identifying regions in the genome that have regulatory function is important to the fundamental biological problem of understanding the mechanisms through which a regulatory sequence drives specific spatial and temporal patterns of gene expression in early development. The modENCODE project aims to comprehensively identify functional elements in the C. elegans and D. melanogaster genomes. The genome- wide binding locations of all known transcription factors as well as of other DNA- binding proteins are currently being mapped within the context of this project [8]. The large quantity of new data that is becoming available through the modENCODE project and other experimental efforts offers the potential for gaining insight into the mechanisms of gene regulation. Developing improved approaches to identify functional regions and understand their architecture based on available experimental data represents a critical part of the modENCODE effort. Towards this goal, I use a machine learning approach to study the predictive power of experimental and sequence-based combinations of features for predicting enhancers and transcription factor binding sites.
by Rachel Sealfon.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
4

Sandelin, Albin. "In silico prediction of CIS-regulatory elements /." Stockholm, 2004. http://diss.kib.ki.se/2004/91-7349-879-3/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Jayaram, N. "Improving the prediction of transcription factor binding sites to aid the interpretation of non-coding single nucleotide variants." Thesis, University College London (University of London), 2017. http://discovery.ucl.ac.uk/1556214/.

Full text
Abstract:
Single nucleotide variants (SNVs) that occur in transcription factor binding sites (TFBSs) can disrupt the binding of transcription factors and alter gene expression which can cause inherited diseases and act as driver SNVs in cancer. The identification of SNVs in TFBSs has historically been challenging given the limited number of experimentally characterised TFBSs. The recent ENCODE project has resulted in the availability of ChIP-Seq data that provides genome wide sets of regions bound by transcription factors. These data have the potential to improve the identification of SNVs in TFBSs. However, as the ChIP-Seq data identify a broader range of DNA in which a transcription factor binds, computational prediction is required to identify the precise TFBS. Prediction of TFBSs involves scanning a DNA sequence with a Position Weight Matrix (PWM) using a pattern matching tool. This thesis focusses on the prediction of TFBSs by: (a) evaluating a set of locally-installable pattern-matching tools and identifying the best performing tool (FIMO), (b) using the ENCODE ChIP-Seq data to evaluate a set of de novo motif discovery tools that are used to derive PWMs which can handle large volumes of data, (c) identifying the best performing tool (rGADEM), (d) using rGADEM to generate a set of PWMs from the ENCODE ChIP-Seq data and (e) by finally checking that the selection of the best pattern matching tool is not unduly influenced by the choice of PWMs. These analyses were exploited to obtain a set of predicted TFBSs from the ENCODE ChIP-Seq data. The predicted TFBSs were utilised to analyse somatic cancer driver, and passenger SNVs that occur in TFBSs. Clear signals in conservation and therefore Shannon entropy values were identified, and subsequently exploited to identify a threshold that can be used to prioritize somatic cancer driver SNVs for experimental validation.
APA, Harvard, Vancouver, ISO, and other styles
6

Rezwan, Faisal Ibne. "Improving computational predictions of Cis-regulatory binding sites in genomic data." Thesis, University of Hertfordshire, 2011. http://hdl.handle.net/2299/7133.

Full text
Abstract:
Cis-regulatory elements are the short regions of DNA to which specific regulatory proteins bind and these interactions subsequently influence the level of transcription for associated genes, by inhibiting or enhancing the transcription process. It is known that much of the genetic change underlying morphological evolution takes place in these regions, rather than in the coding regions of genes. Identifying these sites in a genome is a non-trivial problem. Experimental (wet-lab) methods for finding binding sites exist, but all have some limitations regarding their applicability, accuracy, availability or cost. On the other hand computational methods for predicting the position of binding sites are less expensive and faster. Unfortunately, however, these algorithms perform rather poorly, some missing most binding sites and others over-predicting their presence. The aim of this thesis is to develop and improve computational approaches for the prediction of transcription factor binding sites (TFBSs) by integrating the results of computational algorithms and other sources of complementary biological evidence. Previous related work involved the use of machine learning algorithms for integrating predictions of TFBSs, with particular emphasis on the use of the Support Vector Machine (SVM). This thesis has built upon, extended and considerably improved this earlier work. Data from two organisms was used here. Firstly the relatively simple genome of yeast was used. In yeast, the binding sites are fairly well characterised and they are normally located near the genes that they regulate. The techniques used on the yeast genome were also tested on the more complex genome of the mouse. It is known that the regulatory mechanisms of the eukaryotic species, mouse, is considerably more complex and it was therefore interesting to investigate the techniques described here on such an organism. The initial results were however not particularly encouraging: although a small improvement on the base algorithms could be obtained, the predictions were still of low quality. This was the case for both the yeast and mouse genomes. However, when the negatively labeled vectors in the training set were changed, a substantial improvement in performance was observed. The first change was to choose regions in the mouse genome a long way (distal) from a gene over 4000 base pairs away - as regions not containing binding sites. This produced a major improvement in performance. The second change was simply to use randomised training vectors, which contained no meaningful biological information, as the negative class. This gave some improvement over the yeast genome, but had a very substantial benefit for the mouse data, considerably improving on the aforementioned distal negative training data. In fact the resulting classifier was finding over 80% of the binding sites in the test set and moreover 80% of the predictions were correct. The final experiment used an updated version of the yeast dataset, using more state of the art algorithms and more recent TFBSs annotation data. Here it was found that using randomised or distal negative examples once again gave very good results, comparable to the results obtained on the mouse genome. Another source of negative data was tried for this yeast data, namely using vectors taken from intronic regions. Interestingly this gave the best results.
APA, Harvard, Vancouver, ISO, and other styles
7

Parmar, Victor. "Predicting transcription factor binding sites using phylogenetic footprinting and a probabilistic framework for evolutionary turnover." Thesis, McGill University, 2010. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=87000.

Full text
Abstract:
Identifying genomic locations of transcription-factor binding sites (TFBS), particularly in higher eukaryotic genomes, has been an enormous challenge. Computational methods involving identification of sequence conservation between related genomes have been the most successful since sites found in such highly conserved regions are more likely to be functional, i.e. are bound and regulate protein production. In this thesis, we present such a probabilistic algorithm for predicting TFBSs which also takes evolutionary turnovers into account. Our algorithm is validated via simulations and the results of its application on ChIP-chip data are presented.
L'identification des sites de fixation des facteurs de transcription (TFBS), particulièrement sur les génomes eucaryotiques plus élevés, a été un énorme défi. Les méthodes informatiques comportant l'identification de la conservation de séquence entre les génomes de différentes espèces ont eu beaucoup de succès parce que les sites trouvés dans de telles régions fortement conservées sont probablement fonctionnels (les facteurs de transcription se rajoutent sur le génome à ces sites-là et réglent la production de protéine). Dans cette thèse, nous présentons un algorithme probabiliste pour la prédiction de TFBSs qui prend en considération également le remuement évolutionnaire. Notre algorithme est validé par l'intermédiare des simulations et le résultats de son application sur des données ChIP-chip sont présentés
APA, Harvard, Vancouver, ISO, and other styles
8

Kiełbasa, Szymon M. "Bioinformatics of eukaryotic gene regulation." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät I, 2006. http://dx.doi.org/10.18452/15562.

Full text
Abstract:
Die Aufklärung der Mechanismen zur Kontrolle der Genexpression ist eines der wichtigsten Probleme der modernen Molekularbiologie. Detaillierte experimentelle Untersuchungen sind enorm aufwändig aufgrund der komplexen und kombinatorischen Wechselbeziehungen der beteiligten Moleküle. Infolgedessen sind bioinformatische Methoden unverzichtbar. Diese Dissertation stellt drei Methoden vor, die die Vorhersage der regulatorischen Elementen der Gentranskription verbessern. Der erste Ansatz findet Bindungsstellen, die von den Transkriptionsfaktoren erkannt werden. Dieser sucht statistisch überrepräsentierte kurze Motive in einer Menge von Promotersequenzen und wird erfolgreich auf das Genom der Bäckerhefe angewandt. Die Analyse der Genregulation in höheren Eukaryoten benötigt jedoch fortgeschrittenere Techniken. In verschiedenen Datenbanken liegen Hunderte von Profilen vor, die von den Transkriptionsfaktoren erkannt werden. Die Ähnlichkeit zwischen ihnen resultiert in mehrfachen Vorhersagen einer einzigen Bindestelle, was im nachhinein korrigiert werden muss. Es wird eine Methode vorgestellt, die eine Möglichkeit zur Reduktion der Anzahl von Profilen bietet, indem sie die Ähnlichkeiten zwischen ihnen identifiziert. Die komplexe Natur der Wechselbeziehung zwischen den Transkriptionsfaktoren macht jedoch die Vorhersage von Bindestellen schwierig. Auch mit einer Verringerung der zu suchenden Profile sind die Resultate der Vorhersagen noch immer stark fehlerbehafted. Die Zuhilfenahme der unabhängigen Informationsressourcen reduziert die Häufigkeit der Falschprognosen. Die dritte beschriebene Methode schlägt einen neuen Ansatz vor, die die Gen-Anotation mit der Regulierung von multiplen Transkriptionsfaktoren und den von ihnen erkannten Bindestellen assoziiert. Der Nutzen dieser Methode wird anhand von verschiedenen wohlbekannten Sätzen von Transkriptionsfaktoren demonstriert.
Understanding the mechanisms which control gene expression is one of the fundamental problems of molecular biology. Detailed experimental studies of regulation are laborious due to the complex and combinatorial nature of interactions among involved molecules. Therefore, computational techniques are used to suggest candidate mechanisms for further investigation. This thesis presents three methods improving the predictions of regulation of gene transcription. The first approach finds binding sites recognized by a transcription factor based on statistical over-representation of short motifs in a set of promoter sequences. A succesful application of this method to several gene families of yeast is shown. More advanced techniques are needed for the analysis of gene regulation in higher eukaryotes. Hundreds of profiles recognized by transcription factors are provided by libraries. Dependencies between them result in multiple predictions of the same binding sites which need later to be filtered out. The second method presented here offers a way to reduce the number of profiles by identifying similarities between them. Still, the complex nature of interaction between transcription factors makes reliable predictions of binding sites difficult. Exploiting independent sources of information reduces the false predictions rate. The third method proposes a novel approach associating gene annotations with regulation of multiple transcription factors and binding sites recognized by them. The utility of the method is demonstrated on several well-known sets of transcription factors. RNA interference provides a way of efficient down-regulation of gene expression. Difficulties in predicting efficient siRNA sequences motivated the development of a library containing siRNA sequences and related experimental details described in the literature. This library, presented in the last chapter, is publicly available at http://www.human-sirna-database.net
APA, Harvard, Vancouver, ISO, and other styles
9

Gebhardt, Marie Luise. "Enrichment of miRNA targets in REST-regulated genes allows filtering of miRNA target predictions." Doctoral thesis, Humboldt-Universität zu Berlin, Lebenswissenschaftliche Fakultät, 2016. http://dx.doi.org/10.18452/17407.

Full text
Abstract:
Vorhersagen von miRNA-Bindestellen enthalten oft einen hohen Prozentsatz an falsch positiven Ergebnissen (24-70%). Gleichzeitig ist es schwierig die biologischen Interaktionen von miRNAs und ihren Zieltranskripten auf experimentellem Wege und Genom weit zu messen. Daher wurde in der vorliegenden Arbeit die Frage beantwortet, ob ChIP-Sequenzierungsdaten, von denen es immer mehr gibt, verwendet werden können, um Vorhersagen von miRNA-Bindestellen zu filtern. Dabei wurde von einem Netzwerk aus miRNAs und Transkriptionsfaktoren gebraucht gemacht, die Zieltranskripte gemeinsam regulieren. Zunächst wurden verschiedene Methoden getestet, mit denen „Peaks“ aus der ChIP-Sequenzierung Zielgenen zugeordnet werden können. Zielgenlisten des transkriptionalen Repressors RE1-silencing transcription factor (REST/NRSF) wurden mithilfe von ChIP-Sequenzierungsdaten erzeugt. Ein Algorithmus zur Suche nach überrepräsentierten miRNA-Zielgenen in REST-Genlisten basierend auf Vorhersagen von TargetScanHuman wurde entwickelt und angewandt. Die detektierten „enrichment“-miRNAs waren Teil eines vielfältig regulierten REST-miRNA-Netzwerks. Mögliche Funktionen von miRNAs wurden vorgeschlagen und ihre Rolle im gemeinsamen Netzwerk mit REST und im damit gebildeten Netzwerkmotiv (Inkoherente Schleife zur Vorwärtskopplung Typ 2) wurde analysiert. Es stellte sich heraus, dass ein Filtern der Vorhersagen tatsächlich möglich ist, da Gene, die sowohl von REST als auch von einer oder mehreren „enrichment“-miRNAs reguliert werden, einen höheren Anteil an wahren miRNA-Transkript-Interaktionen haben.
Predictions of miRNA binding sites suffer from high false positive rates (24-70%) and measuring biological interactions of miRNAs and target transcripts on a genome wide scale remains challenging. In the thesis at hand the question was answered if the ever growing body of ChIP-sequencing data can be applied to filter miRNA target predictions by making use of the underlying regulatory network of miRNAs and transcription factors. First different methods for association of ChIP-sequencing peaks to target genes were tested. Target gene lists of the transcriptional repressor RE1-silencing transcription factor (REST/NRSF) were generated by means of ChIP-sequencing data. An enrichment analysis tool based on predictions from TargetScanHuman was developed and applied to find ‘enrichment’-miRNAs with over-represented targets in the REST gene lists. The detected miRNAs were shown to be part of a highly regulated REST-miRNA network. Possible functions could be assigned to them and their role in the regulatory network and special network motifs (incoherent feedforward loop of type 2) was analyzed. It turned out that miRNA target predictions of genes shared by enrichment-miRNAs and REST had a higher proportion of true positive associations than the TargetScanHuman background, thus the procedure made a filtering possible.
APA, Harvard, Vancouver, ISO, and other styles
10

Pape, Utz J. [Verfasser]. "Statistics for transcription factor binding sites / Utz J. Pape." Berlin : Freie Universität Berlin, 2009. http://d-nb.info/1023329476/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Klein, Holger [Verfasser]. "Co-occurrence of transcription factor binding sites / Holger Klein." Berlin : Freie Universität Berlin, 2010. http://d-nb.info/1024541517/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Maynou, Fernàndez Joan. "Computational representation and discovery of transcription factor binding sites." Doctoral thesis, Universitat Politècnica de Catalunya, 2016. http://hdl.handle.net/10803/387550.

Full text
Abstract:
The information about how, when, and where are produced the proteins has been one of the major challenge in molecular biology. The studies about the control of the gene expression are essential in order to have a better knowledge about the protein synthesis. The gene regulation is a highly controlled process that starts with the DNA transcription. This process operates at the gene level, hereditary basic units, which will be copied into primary ribonucleic acid (RNA). This first step is controlled by the binding of specific proteins, called as Transcription Factors (TF), with a sequence of the DNA (Deoxyribonucleic Acid) in the regulatory region of the gene. These DNA sequences are known as binding sites (BS). The binding sites motifs are usually very short (5 to 20 bp long) and highly degenerate. These sequences are expected to occur at random every few hundred base pairs. Besides, a TF can bind among different sites. Due to its highly variability, it is difficult to establish a consensus sequence. The study and identification binding sites is important to clarify the control of the gene expression. Due to the importance of identifying binding sites sequences, projects such as ENCODE (Encyclopedia of DNA elements), have dedicated efforts to map binding sites for large set of transcription factor to identify regulatory regions. In this thesis, we have approached the problem of the binding site detection from another angle. We have developed a set of toolkit for motif binding detection based on linear and non-linear models. First of all, we have been able to characterize binding sites using different approaches. The first one is based on the information that there is in each binding sites position. The second one is based on the covariance model of an aligned set of binding sites sequences. From these motif characterizations, we have proposed a new set of computational methods to detect binding sites. First, it was developed a new method based on parametric uncertainty measurement (Rényi entropy). This detection algorithm evaluates the variation on the total Rényi entropy of a set of sequences when a candidate sequence is assumed to be a true binding site belonging to the set. This method was found to perform especially well on transcription factors that the correlation among binding sites was null. The correlation among binding sites positions was considered through linear, Q-residuals, and non-linear models, alpha-Divergence and SIGMA. Q-residuals is a novel motif finding method which constructs a subspace based on the covariance of numerical DNA sequences. When the number of available sequences was small, The Q-residuals performance was significantly better and faster than all the others methodologies. Alpha-Divergence was based on the variation of the total parametric divergence in a set of aligned sequenced with binding evidence when a candidate sequence is added. Given an optimal q-value, the alpha-Divergence performance had a better behavior than the others methodologies in most of the studied transcription factor binding sites. And finally, a new computational tool, SIGMA, was developed as a trade-off between the good generalisation properties of pure entropy methods and the ability of position-dependency metrics to improve detection power. In approximately 70% of the cases considered, SIGMA exhibited better performance properties, at comparable levels of computational resources, than the methods which it was compared. This set of toolkits and the models for the detection of a set of transcription factor binding sites (TFBS) has been included in an R-package called MEET.
La informació sobre com, quan i on es produeixen les proteïnes ha estat un dels majors reptes en la biologia molecular. Els estudis sobre el control de l'expressió gènica són essencials per conèixer millor el procés de síntesis d'una proteïna. La regulació gènica és un procés altament controlat que s'inicia amb la transcripció de l'ADN. En aquest procés, els gens, unitat bàsica d'herència, són copiats a àcid ribonucleic (RNA). El primer pas és controlat per la unió de proteïnes, anomenades factors de transcripció (TF), amb una seqüència d'ADN (àcid desoxiribonucleic) en la regió reguladora del gen. Aquestes seqüències s'anomenen punts d'unió i són específiques de cada proteïna. La unió dels factors de transcripció amb el seu corresponent punt d'unió és l'inici de la transcripció. Els punts d'unió són seqüències molt curtes (5 a 20 parells de bases de llargada) i altament degenerades. Aquestes seqüències poden succeir de forma aleatòria cada centenar de parells de bases. A més a més, un factor de transcripció pot unir-se a diferents punts. A conseqüència de l'alta variabilitat, és difícil establir una seqüència consensus. Per tant, l'estudi i la identificació del punts d'unió és important per entendre el control de l'expressió gènica. La importància d'identificar seqüències reguladores ha portat a projectes com l'ENCODE (Encyclopedia of DNA Elements) a dedicar grans esforços a mapejar les seqüències d'unió d'un gran conjunt de factors de transcripció per identificar regions reguladores. L'accés a seqüències genòmiques i els avanços en les tecnologies d'anàlisi de l'expressió gènica han permès també el desenvolupament dels mètodes computacionals per la recerca de motius. Gràcies aquests avenços, en els últims anys, un gran nombre de algorismes han sigut aplicats en la recerca de motius en organismes procariotes i eucariotes simples. Tot i la simplicitat dels organismes, l'índex de falsos positius és alt respecte als veritables positius. Per tant, per estudiar organismes més complexes és necessari mètodes amb més sensibilitat. En aquesta tesi ens hem apropat al problema de la detecció de les seqüències d'unió des de diferents angles. Concretament, hem desenvolupat un conjunt d'eines per la detecció de motius basats en models lineals i no-lineals. Les seqüències d'unió dels factors de transcripció han sigut caracteritzades mitjançant dues aproximacions. La primera està basada en la informació inherent continguda en cada posició de les seqüències d'unió. En canvi, la segona aproximació caracteritza la seqüència d'unió mitjançant un model de covariància. A partir d'ambdues caracteritzacions, hem proposat un nou conjunt de mètodes computacionals per la detecció de seqüències d'unió. Primer, es va desenvolupar un nou mètode basat en la mesura paramètrica de la incertesa (entropia de Rényi). Aquest algorisme de detecció avalua la variació total de l'entropia de Rényi d'un conjunt de seqüències d'unió quan una seqüència candidata és afegida al conjunt. Aquest mètode va obtenir un bon rendiment per aquells seqüències d'unió amb poca o nul.la correlació entre posicions. La correlació entre posicions fou considerada a través d'un model lineal, Qresiduals, i dos models no-lineals, alpha-Divergence i SIGMA. Q-residuals és una nova metodologia per la recerca de motius basada en la construcció d'un subespai a partir de la covariància de les seqüències d'ADN numèriques. Quan el nombre de seqüències disponible és petit, el rendiment de Q-residuals fou significant millor i més ràpid que en les metodologies comparades. Alpha-Divergence avalua la variació total de la divergència paramètrica en un conjunt de seqüències d'unió quan una seqüència candidata és afegida. Donat un q-valor òptim, alpha-Divergence va tenir un millor rendiment que les metodologies comparades en la majoria de seqüències d'unió dels factors de transcripció considerats. Finalment, un nou mètode computacional, SIGMA, va ser desenvolupat per tal millorar la potència de detecció
APA, Harvard, Vancouver, ISO, and other styles
13

Sanchez, Galan Frauca Javier. "Large scale identification of transcription factor binding sites in DNA sequences." Thesis, McGill University, 2010. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=86960.

Full text
Abstract:
To date, gene regulation is still one of the most studied processes in molecular biology. Among its main actors, proteins called transcription factors, play an essential role in controling the rate of expression of genes, by binding to specific sites on the DNA sequence. These sites are short in lenght (5 to 15 basepairs) and are called transcription factor binding sites (TFBSs). These interactions between proteins and DNA have a fundamental role at several stages of cell development and in response to stress conditions. Various computational methods that exploit specific characteristic of TFBS have been developed and tested for the purpose of the identification of TFBSs. Examples include, the identification of TFBSs via phylogenetic footprinting, via cis-regulatory modules and via statistical over-representation.
In this thesis we present a new approach that uses elements of the three identification methods to develop a large-scale approach that assesses the over-representation of TFBS in DNA sequences. Results of application of this new method are presented for five biological datasets: including a set of regions bound by estrogen receptor (ER). We also present new results, yet to be validated experimentally, from two interesting biological datasets. The first is a dataset containing coding regions under non-coding selection (called CRUNCS). The other is a set of genes regulated by proteins called angiopoietins.
Finally, a new public bioinformatic software, used to estimate the over-representation of TFBSs in DNA sequences, that we call the Genome-Wide Analysis of TFBS Over- Representation (GATOR), is introduced.
À ce jour, la régulation des gènes est encore l'un des processus les plus étudiés en biologie moléculaire. L'une de ses principales categories d'acteurs, des protéines appelées facteurs de transcription, joue un rôle essentiel dans le contrôle du taux d'expression des gènes, en se liant à des sites spécifiques sur la séquence d'ADN. Ces sites sont des séquences courtes (de 5 à 15 paires de bases) et sont communément appelés sites de liaison pour les facteurs de transcription (TFBSs, en anglais). Les interactions entre ces protéines et l'ADN jouent un rôle fondamental à plusieurs stades du développement cellulaire et de la réponse à divers types de stress. Diverses méthodes de calcul qui exploitent les caractéristiques spécifiques des TFBS ont été développées et testées dans le but de l'identifier de tels sites de liaison. Citons par ex- emple l'identification des TFBS à l'aide des empreintes phylogénétiques, des modules de régulation cis et de la sur-représentation statistique.
Dans cette thèse nous présentons une nouvelle approche qui utilise des éléments des trois méthodes d'identification susmentionnés pour développer une approche à grande échelle qui évalue la sur-représentation des TFBS, dans les séquences d'ADN. Les résultats de l'utilisation de cette nouvelle méthode sont présentés pour cinq ensembles de données biologiques. Parmi eux, un ensemble des régions de sites de liaison liées aux récepteurs d'oestrogène (ER), un ensemble de données qui contient des régions codantes sous sélection non codante (appelé CRUNCS) et finalment, un ensemble de génes régulés par des protéines appelées angiopoietines.
Finalement, nous présentons un nouveau logiciel bioinformatique public qui sert à estimer la sur-représentation des TFBSs dans les séquences d'ADN et que nous avos appelé le Genome-Wide Analysis of TFBS Over-Representation (GATOR).
APA, Harvard, Vancouver, ISO, and other styles
14

Jaini, Suma. "Methods for functional characterization of transcription factor binding sites in bacteria." Thesis, Boston University, 2014. https://hdl.handle.net/2144/11097.

Full text
Abstract:
Thesis (Ph.D.)--Boston University
Understanding gene regulation is necessary to gain insight into and model important cellular processes including disease. Current inability to combat many diseases is partly because of incomplete understanding of gene circuitry. Regulation mechanisms of Mycobacterium tuberculosis, the causative agent of Tuberculosis are not properly understood. Transcriptional regulatory network (TRN) is a network comprising transcription factors (TF) and their targeted genes that provide a powerful framework to analyze the complete regulatory system. Chromatin immunoprecipitation followed by next generation sequencing (ChiP-Seq) is becoming the method of choice to identify genome wide TFBS . Therefore, we use ChiP-Seq on known transcription factors to reconstruct the TRN of Mycobacterium tuberculosis (Mtb) and other bacteria. ChiP-Seq reveals various transcription factor binding sites (TFBS) but doesn't provide any information on the mechanism of regulation of the genes by their corresponding TF's. Techniques to gain more insight into the mechanisms include microarray, knock out studies and qPCR. But, these techniques provide a static view of network. Also, they provide information at RNA level and mask the regulation happening at protein level. Therefore, in order to understand both the mechanism of regulation at protein level as well as to capture the network dynamics, we built a synthetic gene circuit in Mycobacterium smegmatis and defined input-output relationships between key TFs and their targeted promoters. We validated this system on kstR, a TF which is a known repressor. KstR regulates genes involved in cholesterol degradation and is shown to de- repress itself and its regulon genes in the presence of cholesterol as well as in hypoxia, where there are no exogenous lipids4- . We explored the possibility of other by-products that may be responsible for the de-repression of kstR and its regulon. The data suggests that propionyl-coA, a by-product from degradation of cholesterol, odd numbered fatty acids as well as branched chain amino-acids is causing the de-repression of kstR and its regulon. ChiP-Seq data on transcription factors in MTb as well as E.coli shows that many TFBS are located immediately upstream of open reading frame start sites, consistent with our understanding ofprokaryotic gene regulation. However, the data also suggests that many TFBS are located inside and also downstream of open reading frames6. One of our hypotheses is that these novel TFBS might be indirect binding sites that mediate chromatin looping . Therefore, we developed a method 3C (Chromosome Conformation Capture) to understand the regulation in the third dimension by analyzing the chromosomal interactions. We optimized the protocol in E.coli and validated using a known interaction mediated by a repressor GalR . We then identified two regions, 20 kbp apart, containing TFBS of StpA, a nucleoid associated protein, which are not directly involved in gene regulation of their downstream genes. The data from a 3C experiment on an E.coli strain with inducible StpA suggests that these two regions interact by an unknown mechanism. However, the interaction was not lost when a similar experiment is done in StpA knock out strain suggesting that StpA may not be a sole TF responsible for this interaction. Lastly, we developed Hi-C method on E.coli genomic DNA to identify long range interactions in a genome wide and unbiased manner.
APA, Harvard, Vancouver, ISO, and other styles
15

Gazzillo, Lisa Christine. "The Mapping of Transcription Factor Binding Sites in the Turkey Prolactin Gene." Thesis, Virginia Tech, 2000. http://hdl.handle.net/10919/35719.

Full text
Abstract:
The cessation of egg-laying during the incubation period of the turkey hen is a source of major economic loss to the turkey industry. In August of 2000 there were approximately 2.7 million turkey breeder hens in the United States. Since the value of one fertile turkey egg is $0.62, the loss of only one egg per hen per year would cost the industry $1.7 million. A number of management procedures have been implemented to control egg production and prevent incubation. However, these methods are labor intensive. The anterior pituitary hormone prolactin (PRL) is involved in the onset of incubation in the turkey hen. Levels of circulating PRL and PRL mRNA are 10X greater in photostimulated hens than in photorefractory hens, 20X greater in laying hens, and 100X greater in incubating hens. It would be useful to determine the molecular mechanisms controlling regulation of the turkey (t) PRL gene. This information could be used to modulate the release of PRL and thereby prevent the induction of the incubation period in turkey hens. Approximately 2 kilobases (kb) of the tPRL 5'-flanking region were examined by the electrophoretic mobility shift assay (EMSA) using nuclear extracts from turkey pituitaries and liver. Within this 2 kb fragment, only three regions of the tPRL gene were identified that participate in tissue- and sequence-specific DNA-protein interactions with nuclear extracts from turkey pituitaries. These are the regions from nucleotides (nt) -41 to -73, -105 to -137, and -175 to -199, named tprl-1, tprl-2 and tprl-3, respectively. Three shifted bands were observed using tprl-1 and tprl-2 while two shifted bands were seen using tprl-3. Competition EMSAs done on these three regions showed that in the presence of unlabeled, excess, specific competitor DNA, the proteins bound to competitor DNA and no shifted bands were observed. If the competitor was a nonspecific DNA sequence, then there was no effect on the shifted bands. When using labeled tprl-2 and unlabeled tprl-1 as competitor DNA, no shifted bands were observed. However, when using labeled tprl-1 and unlabeled tprl-2 as competitor DNA, only one of three shifted bands was eliminated. These data indicate that tprl-1 and tprl-2 bind both common and specific pituitary nuclear proteins and have different affinities for pituitary nuclear proteins. A supershift EMSA involving the addition of rabbit-anti-rat Pit-1 indicated that tPit-1 is a common pituitary nuclear protein that is bound to tprl-1 and tprl-2. However, this interaction may not occur in the turkey in vivo. The mapping of transcription factor binding sites in the tPRL 5'-flanking region is the first step toward the identification and isolation of factors that bind to and regulate transcription of the PRL gene.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
16

Pairó, Castiñeira Erola. "Detection of Transcription Factor Binding Sites by Means of Multivariate Signal Processing Techniques." Doctoral thesis, Universitat de Barcelona, 2015. http://hdl.handle.net/10803/336663.

Full text
Abstract:
Gene expression is a complex and highly regulated process. Most of the regulation is controlled by short DNA sequences that can be bound by some proteins called transcription factors (TF). Binding to these sites, the transcription factors, can start the transcription of mRNA, stop it, or just control the amount of mRNA produced. The DNA binding sites of these transcription factors have some specific characteristics: (1) They are short sequences (2) They can be located anywhere in the genome and (3) they are degenerated, which means that some mutations in the binding site sequence do not alter its functionality. These characteristics made impossible to look for a specific sequences in a specific region and, create the need to model the binding sites in order to detect them. Due to the importance of gene expression in the study of cell differentiation and its implication in some genetic diseases, many computational models and experimental processes to model binding site motifs and then find them into a genome have appeared. The computational models can be divided into two main groups: motif discovery methods which try to find binding sites within a set of co-regulated sequences without previous knowledge and motif search methods which use previous known sites to create a model and then try to locate binding sequences fitting this model. Most of the algorithms for binding site detection (both discovery and search) are based on Position weight matrices (PWM), which are matrices of frequencies of each nucleotide in each position, and assume that positions are independent. Some others take into account interdependences, but they need many sequences to be trained and high computational times. The focus of this thesis is to use the conversion from symbolical to numerical DNA and the previous knowledge of binding site sequences in order to construct models for DNA motifs. In this context, known multivariate signal processing techniques can be the ideal tools to construct models which can take into account interdependences without needing a large number of sequences or a high computational time. To characterize the transcriptions factors, the relationships TF-protein were studied, showing that most transcription factors regulate the expression of 5-10 genes and at the same time most proteins are regulated by more than 1 TF. The study of interdependences between positions showed that more than 90% of the binding sites have significant interdependences, but that the percentage of interdependences is not enough to classify TF according to structure. The conversion of DNA motif matrices into numerical matrices allows the use ofl Component Analysis (PCA) to model the binding sites which captures the information of the interdependences into the covariance, a second order statistics. Using the hypothesis that the binding sites will fit better to the PCA model than genomic, sequences, the Q-residuals can be used to detect binding sites within the genome. When compared to PWM the Q-residuals detector performs as least as well, and the improvement of detection is significantly correlated to the percentage of positions with interdependences. The disadvantage of these PCa models is that they are difficult to interpret. Converting the DNA symbolical matrix into a DNA numerical cube allows the calculation PARAFAC models which are easier to interpret. Since PARAFAC models have unique solutions, their scores can be combined with the PARAFAC Q-residuals in order to construct a quadratic detector that also performs better than PSSM models. When the numerical detectors are compared to detectors that take into account interdependences, they perform better when there are not many sequences available, but there are more sensitive to the number of positions.
APA, Harvard, Vancouver, ISO, and other styles
17

Schmidt, Jens. "Discovery of Putative STAT5 Transcription Factor Binding Sites in Mice with Diabetic Nephropathy." Ohio University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1385482459.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Lee, Tek Hyung. "A regulatory role for repeated decoy transcription factor binding sites in target gene expression." Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/76563.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemical Engineering, 2012.
Cataloged from PDF version of thesis.
Includes bibliographical references.
Repetitive DNA sequences are prevalent in both prokaryote and eukaryote genomes and the majority of repeats are concentrated in intergenic regions. These tandem repeats (TRs) are highly variable as the number of repeated units changes frequently due to recombination events and/or polymerase slippage during replication. While TRs have been traditionally regarded as non-functional 'junk' DNA, variability in the number of TRs present within or close to genes is known to lead to gross phenotypic changes and disease. However, whether intergenic TRs have a functional role is less understood. Recent studies reveal that many intergenic TRs contain transcription factor (TF) binding sites and that several TRs of TF binding sites indeed influence gene expression. A possible mechanism is that TRs serve as TF decoys, competing with a promoter for TF binding. We utilized a synthetic system in budding yeast to examine if repeated binding sites serve as decoys, and alter the expression of genes regulated by the sequestered TF. Combining experiments with kinetic modeling suggests that repeated decoy binding sites sequester activators more strongly than a promoter binding site although both binding sites are identical in sequence. This strong binding converts a graded dose-response between activator and promoter to a sigmoidal-like response. We further find that the tight activatordecoy interaction becomes weaker with increasing activator levels, suggesting that the activator binding at the repeated decoy site array might be anti-cooperative. Finally, we show that the high affinity of repeated decoy sites qualitatively changes the behavior of a transcriptional positive feedback loop from a graded to bimodal, all-or-none response. Taken together, repeated TF binding sites play an unappreciated role as a gene regulator. Since repeated decoy sites are hypervariable in number, this variability can lead to qualitative changes in gene expression and potentially phenotypic variation over short evolutionary time scales.
by Tek Hyung Lee.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
19

Piper, Jason. "The demarcation of transcription factor binding sites through the analysis of DNase-seq data." Thesis, University of Warwick, 2014. http://wrap.warwick.ac.uk/71314/.

Full text
Abstract:
The expression of eukaryotic genes is controlled by non-coding regulatory elements such as promoters and enhancers, which bind sequence-specific DNA-binding proteins (transcription factors). In multicellular organisms, the characterisation of these elements is required in order to understand how a single genome is utilised to generate a multitude of cell types, and how aberrant regulation of transcription contributes to disease processes. This involves the identification of transcription factor binding sites within regulatory elements that are occupied in a defined regulatory context. Digestion with DNase I and the subsequent analysis of regions protected from digestion followed by high-throughput sequencing (DNase-seq footprinting), allows for the quantification of genome-wide transcription factor binding. However, the handful of methods for analysing DNase-seq data has not been extensively validated or benchmarked. This thesis describes a novel footprinting algorithm, Wellington, which is presented in the context of a comprehensive comparison of several other DNase-seq footprinting algorithms on a multitude of datasets. Wellington outperforms other methods in almost all situations. An open-source software package, pyDNase, that facilitates interacting with DNase-seq data and provides many tools for DNase-seq analysis is also presented. Wellington is used to perform footprinting on clinical samples to validate cell lines as a model system, and to identify the binding partners of the RUNX1/ETO fusion protein in t(8;21) AML. By expanding the Wellington method, differential footprinting is shown to be able to link differences in transcription factor binding at promoters to changes in gene expression. Applying this methodology to a range of haematopoietic cell types illustrates the ability for differential footprinting to identify key regulators in the haematopoietic lineage. These results represent advances in the methods available to analyse DNase-seq data (all of which have been released as free, opensource software) and demonstrate the power of integrating DNase-seq footprinting with other functional genomic assays to study transcriptional regulation.
APA, Harvard, Vancouver, ISO, and other styles
20

Ochs, Sharon D. "Elucidating transcription factor regulation by TCDD within the hs1,2 enhancer." Wright State University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=wright1333992865.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Zandvakili, Arya. "The Role of Affinity and Arrangement of Transcription Factor Binding Sites in Determining Hox-regulated Gene Expression Patterns." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1535708748728472.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Teo, William J. "Screening of potential upstream regulators and identification of DNA binding sites for the tooth transcription factor Krox-26." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp05/MQ63008.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Beach, Michael. "Unraveling the molecular physiology of the β-cell: genome wide analysis of binding sites for the transcription factor PDX1." Thesis, University of British Columbia, 2009. http://hdl.handle.net/2429/15879.

Full text
Abstract:
The selected expression of the genome determines distinct cell types, properties, and conditions. In the pancreatic β-cell, our knowledge of how this is regulated and maintained is incomplete. Deciphering the molecular physiology of the β-cell is critical to develop improvements for expanding pools of donor islets for transplantation, the most promising curative option for sufferers of diabetes. Genomic regulation is controlled primarily by transcription factors, of which pancreatic duodenal homeobox 1 (Pdxl) plays a critical role in both the developing and mature pancreas. As such, I begin to unlock the molecular physiology of the β-cell by identifying the binding sites of Pdxl in pancreatic islets on a genome-wide scale through the use of chromatin immunoprecipitation followed by sequencing (ChIP-Seq). This provides the best picture of Pdxl binding that has ever been assembled. Moreover, I identify a highly co-occurring relationship between Pdxl and pre-B-cell leukemia homeobox 1 (Pbxl) in adult islets. The coupling of this data with other genome-wide analyses will prove invaluable to discovering novel transcriptional complexes and the genes they regulate. It will also contribute to the creation of an islet transcriptional network, thereby greatly enhancing our knowledge of β-cell regulation.
APA, Harvard, Vancouver, ISO, and other styles
24

Tria, Fernando Domingues Kümmel. "Análise in silico de regiões promotoras de genes de Xylella fastidiosa." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/95/95131/tde-13082013-194053/.

Full text
Abstract:
Xylella fastidiosa é uma bactéria gram-negativa, não flagelada, agente causal de doenças de importância econômica como a doença de Pierce nas videiras e a clorose variegada dos citros (CVC) nas laranjeiras. O objetivo do presente trabalho foi realizar análises in silico das sequências promotoras dos genes deste fitopatógeno em uma tentativa de arrecadar novas evidências para o melhor entendimento da dinâmica de regulação transcricional de seus genes, incluindo aqueles envolvidos em mecanismos de patogenicidade e virulência. Para tanto, duas estratégias foram utilizadas para predição de elementos cis-regulatórios em regiões promotoras do genoma da cepa referência 9a5c, comprovadamente associada à CVC. A primeira, conhecida como phylogenetic footprinting, foi empregada para identificação de elementos regulatórios conservados em promotores de unidades transcricionais ortólogas, levando em consideração o conjunto de genes de X. fastidiosa e 7 espécies comparativas. O critério para identificação de unidades transcricionais ortólogas, isto é, unidades trancricionais oriundas de espécies distintas e cujos promotores compartilham elementos cis-regulatórios, foi paralelamente estudado utilizando-se informações regulatórias das bactérias modelos: Pseudomonas aeruginosa, Bacillus subtilis e Escherichia coli. Os resultados obtidos com análise de phylogenetic footprinting nos permitiu acessar a rede regulatória transcricional da espécie de forma compreensiva (global). Foram estabelecidas 2990 interações regulatórias, compreendendo 80 motivos distribuídos nos promotores de 56.8% das unidades transcricionais do genoma de X. fastidiosa. Na segunda estratégia recuperamos informações regulatórias experimentalmente validadas em E. coli e complementamos o conhecimento de dez regulons de X. fastidiosa, através de uma metodologia de scanning (varredura), dos quais algumas interações regulatórias já haviam sido previamente descritas por outros trabalhos. Destacamos os regulons de Fur e CRP, reguladores transcricionais globais, que se mostraram responsáveis pela modulação de genes relacionados a mecanismos de invasão e colonização do hospedeiro vegetal entre outros. Por fim, análises comparativas em regiões regulatórias correspondentes entre cepas foram realizadas e diferenças possivelmente associadas a particularidades fenotípicas foram identificadas entre 9a5c e J1a12, um isolado de citros não virulento, e 9a5c e Temecula1, um isolado de videira causador da doença de Pierce.
Xylella fastidiosa is a gram-negative, non-flagellated bacterium responsible for causing economically important diseases such as Pierce\'s disease in grapevines and Citrus Variegated Clorosis (CVC) in sweet orange trees. In the present work we performed in silico analysis on promoter sequences of protein-coding genes from this phytopathogen, including those involved in virulence and pathogenic mechanisms, in an attempt to better understand the underlying transcriptional regulatory dynamics. Two strategies for cis-regulatory elements prediction were applied on promoter sequences from 9a5c strain genome, a proven causal agent of CVC. The first one, known as phylogenetic footprinting, involved the prediction of regulatory motifs conserved on promoter sequences of orthologous transcription units from X. fastidiosa and a set of 7 comparatives species. The criteria to identify orthologous transcription units, i. e., those from different species and whose promoter sequences share at least one common regulatory motif, was studied based on regulatory information available for model organisms: Pseudomonas aeruginosa, Bacillus subtilis and Escherichia coli. The results obtained with the phylogenetic footprinting analysis permitted us to access the underlying transcriptional regulatory network from the species in a comprehensive manner (genome-wide), with a total of 2990 regulatory interactions corresponding to 80 predicted motifs distributed on promoter sequences of 56.8% of all transcription units. In the second strategy regulatory information from E. coli was recovered and used to expand the knowledge of ten regulons in X. fastidiosa, through a scanning process, of which some regulatory interactions were previously described by independent studies. We emphasize some genes related to host invasion and colonization present in the Fur and CRP regulons, two global transcription regulators. Lastly, comparative analysis on corresponding regulatory regions among strains were performed and differences possibly associated to phenotypic variation were identified between 9a5c and J1a12, a non-virulent strain isolated from orange trees, and between 9a5c and Temecula1, a strain associated to Pierce\'s disease on grapevines.
APA, Harvard, Vancouver, ISO, and other styles
25

Carbonari, Gioia <1983&gt. "Identification of the N-Linked Glycosylation Sites of the Transcription Factor Rest and Effect of Glycosylation on DNA Binding and Transcriptional Activity." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2012. http://amsdottorato.unibo.it/4288/.

Full text
Abstract:
REST is a zinc-finger transcription factor implicated in several processes such as maintenance of embryonic stem cell pluripotency and regulation of mitotic fidelity in non-neuronal cells [Chong et al., 1995]. The gene encodes for a 116-kDa protein that acts as a molecular platform for co-repressors recruitment and promotes modifications of DNA and histones [Ballas, 2005]. REST showed different apparent molecular weights, consistent with the possible presence of post-translational modifications [Lee et al., 2000]. Among these the most common is glycosylation, the covalent attachment of carbohydrates during or after protein synthesis [Apweiler et al., 1999] My thesis has ascertained, for the first time, the presence of glycan chians in the transcription factor REST. Through enzymatic deglycosylation and MS, oligosaccharide composition of glycan chains was evaluated: a complex mixture of glycans, composed of N-acetylgalactosamine, galactose and mannose, was observed thus confirming the presence of O- and N-linked glycan chains. Glycosylation site mapping was done using a 18O-labeling method and MS/MS and twelve potential N-glycosylation sites were identified. The most probable glycosylation target residues were mutated through site-directed mutagenesis and REST mutants were expressed in different cell lines. Variations in the protein molecular weight and mutant REST ability to bind the RE-1 sequence were analyzed. Gene reporter assays showed that, altogether, removal of N-linked glycan chains causes loss of transcriptional repressor function, except for mutant N59 which showed a slight residual repressor activity in presence of IGF-I. Taken togheter these results demonstrate the presence of complex glycan chians in the transcription factor REST: I have depicted their composition, started defining their position on the protein backbone and identified their possible role in the transcription factor functioning. Considering the crucial role of glycosylation and transcription factors activity in the aetiology of many diseases, any further knowledge could find important and interesting pharmacological application.
APA, Harvard, Vancouver, ISO, and other styles
26

Langer, Björn. "Phenotype-related regulatory element and transcription factor identification via phylogeny-aware discriminative sequence motif scoring." Doctoral thesis, Center for Systems Biology Dresden, 2017. https://tud.qucosa.de/id/qucosa%3A31172.

Full text
Abstract:
Understanding the connection between an organism’s genotype and its phenotype is a key question in evolutionary biology and genetics. It has been shown that many changes of morphological or other complex phenotypic traits result from changes in the expression pattern of key developmental genes rather than from changes in the genes itself. Such altered gene expression arises often from changes in the gene regulatory regions. That usually means the loss of important transcription factor (TF) binding sites within these regulatory regions, because the interaction between TFs and specific sites on the DNA is a key element of gene regulation. An established approach for the genome-wide mapping of genomic regions to phenotypes is the Forward Genomics framework. This approach compares the genomic sequences of species with and without the phenotype of interest based upon two ideas. First, the initial loss of a phenotype relaxes selection on all phenotypically related genomic regions and, second, this can happen independently in multiple species. Of interest are such regions that diverged specifically in phenotype-loss species. Although this principle is general, the current implementation is only well-suited for the identification of phenotype related gene-coding regions and has a limited applicability on regulatory regions. The reason is its reliance on sequence conservation as divergence measure, which does not accurately measure functional divergence of regulatory elements. In this thesis, I developed REforge, a novel implementation of the Forward Genomics principle that takes functional information of regulatory elements in the form of known phenotype-related TF into account. The consideration of the flexible organization of TF binding sites within a regulatory region, both in terms of strength and order, allows the abstraction from the region’s sequence level to its functional level. Thus, functional divergence of regulatory regions is directly compared to phenotypical divergence, which tremendously improves performance compared to Forward Genomics, as I demonstrated on synthetic and real data. Additionally, I developed TFforge which follows the same approach but aims at identifying the TFs relevant for the given phenotype. Given a multi-species alignment with a phenotype annotation and a set of regulatory regions, TFforge systematically searches for TFs whose changes in binding affinity between species fit the phenotype signature. The reported output is a ranking of the TFs according to their level of correspondence. I prove the concept of this approach on both biological data and artificially generated regions. TFforge can be used as a standalone analysis tool and also to generate the input set of TFs for a subsequent REforge analysis. I demonstrate that REforge in combination with TFforge is able to substantially outperform standard Forward Genomics, i.e. even without foreknowledge of relevant TFs. Overall, the in this thesis introduced methods are examples for the power of computational tools in comparative genomics to catalyze biological insights. I did not only show a detailed description of the methods but also conducted a real data analysis as validation. REforge and TFforge have a wide applicability on endless phenotypes, both on their own in the association of TF and regulatory region to a phenotype. Moreover, particularly their combination constitutes in respect to gene regulatory network analyses a valuable tool set for evo-devo studies.
APA, Harvard, Vancouver, ISO, and other styles
27

Yang, Doo Seok. "Computational Study of Nucleosome Positioning Sequence Patterns and the Effects of the Nucleosome Positioning on the Availability of the Transcription Factor Binding Sites in Study Systems." Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/36580.

Full text
Abstract:
Nucleosomes, the primary unit of chromatin structure, are positioned either statistically or specifically. The statistical positioning denotes the arbitrary positioning of nucleosomes on DNA agreeing with the nucleosome’s broad coverage of the genome—however, there is evidence that nucleosomes are also positioned specifically at controlled positions. DNA sequences determine the specific nucleosome positions, and the presence or depletion of nucleosomes affects the availability of the DNA region to other proteins. The DNA sequences of H2A and H2A.Z nucleosomes in Drosophila were analysed in search of nucleosome positioning patterns. Dinucleotide patterns with 10 bp periodicity were identified from the DNA sequences of H2A nucleosomes. Compared with the yeast patterns, the Drosophila patterns had the same periodicity but different dinucleotides near the dyad, which was related to the different H3 structure between them. The nucleosome positioning patterns from the H2A.Z nucleosomes implied the specific histone-DNA interaction as a result of the deviations of the patterns where the different amino acids of H2A and H2A.Z interact with the DNA. The Ly49 gene cluster was selected as a model system to study the interplay between nucleosomes and transcription factors. Ly49 proteins, the surface receptors on NK cells, display variegated expression patterns, and the bidirectional promoter Pro-1 is known as a key determinant of the stochastic expression of each Ly49 gene. The systematic analysis of nucleosome positions based on the genome sequences in the Ly49 gene cluster revealed that the repressing Pro-1 reverse promoters are open, while the activating forward Pro-1 promoters were covered by nucleosomes. Furthermore, specific nucleosome positions covered transcription factor binding sites. The covered factor binding sites were further examined by their periodic appearances on the nucleosome-covered sequences, which revealed the accessibility to the sites. The sequence analysis predicted that the regulation by the transcription factor AML-1 would be sensitive to the nucleosome coverage; the prediction was confirmed by cell line experiments. The 10 bp periodic nucleosome positioning patterns interact with histones specifically. The long nucleosome positioning patterns coexist with the short sequence motifs for transcription factor binding sites adding another layer of the control to the transcriptional regulation.
APA, Harvard, Vancouver, ISO, and other styles
28

Bukka, Prasanna L. "Regulation of parathyroid hormone-related peptide gene expression in osteoblast-like cells : the role of an intronic minisatellite ans Sp1 transcription factor binding sites in the promoter region." Thesis, McGill University, 2003. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=19451.

Full text
Abstract:
Skeletal development and homeostasis is comprised of complex events characterized by the presence of cell specific regulators and markers. Aberrant expression of any of the regulating factors usually results in metabolic bone disease. Therefore understanding the mechanisms of skeletal homeostasis may disclose new therapies for treating these disorders. While the complete pathway that defines normal osteoblast differentiation remains indeterminate, several regulatory factors have already been identified and characterized. Using gene targeting technology, the importance of parathyroid hormone-related peptide (PTHrP) in skeletal biology has been well established, as PTHrP^-/- and PTHrP^+/- mice demonstrate skeletal abnormalities. Several lines of evidence from in vivo and in vitro studies suggest that PTHrP has unique roles in osteoblast biology, independent from its well-established roles in chondrocytes. The presence of an intronic variable number tandem repeat sequence (VNTR) in the human PTHrP gene has been proposed to provide a novel mechanism of transcriptional gene regulation in osteoblasts. A preliminary survey indicated that osteoporotic patients with low bone mineral density have the shortest allele of the VNTR. In this work, we set out to investigate the role of the minisatellite in PTHrP expression and as a potential predictor of decreased bone mineral density. However, in a large sample of osteoporotic patients, a correlation between bone mineral density and VNTR length was not apparent. Also, the presence of the VNTR, or its length, seemed not to affect PTHrP promoter activity in osteoblast-like cells. EMSA experiments demonstrated the ability of the VNTR to bind proteins whose natures remain unclear. We also studied the effect of the newly identified zinc-finger transcription factor, Osterix, on the GC-rich PTHrP promoter. Osx did not bind the promoter or affect its transcriptional activity. However, unexpectedly, we observed that the transcription factor Cbfal, the master regulator of osteoblast differentiation, decreases PTHrP gene expression, likely through a molecular intermediate. Our investigations have shed some light on the role of PTHrP in osteoblast differentiation, and its regulation in this process by polymorphic elements and transcription factors specific to osteoblasts. The potential role of these parameters in the homeostatic regulation of the skeleton and the development of metabolic bone diseases such as osteoporosis warrants further investigation.
APA, Harvard, Vancouver, ISO, and other styles
29

Hassan, Faizule. "Adenovirus Mediated Delivery of Decoy Hyper Binding Sites for Sequestration of an Oncogenic Transcription Factor HMGA as a Potential Novel Cancer Therapy and Antibacterial Activity of Local Mushrooms." Miami University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=miami1511449587326648.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Repapi, Emmanouela. "An integrated genomic approach for the identification and analysis of single nucleotide polymorphisms that affect cancer in humans." Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:16f4482e-7f83-46c9-88d9-583c4154e044.

Full text
Abstract:
The identification of genetic variants such as single nucleotide polymorphisms (SNPs), which affect cancer progression, survival and response to treatments could help in the design of better prevention and treatment strategies. Genome-wide association studies (GWAS) have provided the first step of identifying SNPs associating with cancer risk. However, identifying the causal SNPs responsible for the associations has proven challenging, and GWAS have not been successful for time-to-event phenotypes such as cancer progression, due to the insurmountable obstacle of the large sample size needed. The aim of this thesis is to design and implement strategies that combine the identification of SNPs significantly associated with cancer, focusing on time-to-event phenotypes, with detailed bioinformatics analysis to allow for further experimental validation and modelling, to better understand cancer-associated genomic loci and accelerate their incorporation into the clinic. First, a methodology that utilises the Random Survival Forest is developed and combined with a bioinformatics analysis that ranks SNPs according to their potential to result in differential protein levels or activity, in order to identify SNPs that affect the progression of B-cell chronic lymphocytic leukaemia. Next, an analysis that aims to extend our understanding of the role of SNPs in mediating the cellular responses to chemotherapeutic agents is applied. SNPs that could associate with differential cellular growth responses in cancer cell line panels are identified, and their association with the differential survival of cancer patients is explored. Finally, the potential roles of SNPs in affecting the transcriptional regulation of key cancer genes resulting in differential cancer risk are assessed. First, by focusing on SNPs in an important transcription factor binding motif that has been shown to be extremely sensitive to single base pair changes (the E-box) and next, by exploring the possibility that polymorphic transcription factor binding sites could underlie the significant associations noted in cancer GWAS.
APA, Harvard, Vancouver, ISO, and other styles
31

Neto, Antonio Ferrão. "Predição computacional de sítios de ligação de fatores de transcrição baseada em gramáticas regulares estocásticas." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/95/95131/tde-02012018-144349/.

Full text
Abstract:
Fatores de transcrição (FT) são proteínas que se ligam em sequências específicas e bem conservadas de nucleotídeos no DNA, denominadas sítios de ligação dos fatores de transcrição (SLFT), localizadas em regiões de regulação gênica conhecidas como módulos cis-reguladores (CRM). Ao reconhecer o SLFT, o fator de transcrição se liga naquele sítio e influencia a transcrição gênica positiva ou negativamente. Existem técnicas experimentais para a identificação dos locais dos SLFTs em um genoma, como footprinting, ChIP-chip ou ChIP-seq. Entretanto, a execução de tais técnicas implica em custos e tempo elevados. Alternativamente, pode-se utilizar as sequências de SLFTs já conhecidas para um determinado fator de transcrição e aplicar técnicas de aprendizado computacional supervisionado para criar um modelo computacional para tal sítio e então realizar a predição computacional no genoma. Entretanto, a maioria das ferramentas computacionais existentes para esse fim considera independência entre as posições entre os nucleotídeos de um sítio - como as baseadas em PWMs (position weight matrix) - o que não é necessariamente verdade. Este projeto teve como objetivo avaliar a utilização de gramáticas regulares estocásticas (GRE) como técnica alternativa às PWMs neste problema, uma vez que GREs são capazes de caracterizar dependências entre posições consecutivas dos sítios. Embora as diferenças de desempenho tenham sido sutis, GREs parecem mesmo ser mais adequadas do que PWMs na presença de valores mais altos de dependência de bases, e PWMs nos demais casos. Por fim, uma ferramenta de predição computacional de SLFTs foi criada baseada tanto em GREs quanto em PWMs.
Transcription factors (FT) are proteins that bind to specific and well-conserved sequences of nucleotides in the DNA, called transcription factor binding sites (TFBS), contained in regions of gene regulation known as cis-regulatory modules (CRM). By recognizing TFBA, the transcription factor binds to that site and positively or negatively influence the gene transcription. There are experimental procedures for the identification of TFBS in a genome such as footprinting, ChIP-chip or ChIP-Seq. However, the implementation of these techniques involves high costs and time. Alternatively, one may utilize the TFBS sequences already known for a particular transcription factor and applying computational supervised learning techniques to create a computational model for that site and then perform the computational prediction in the genome. However, most existing software tools for this purpose considers independence between nucleotide positions in the site - such as those based on PWMs (position weight matrix) - which is not necessarily true. This project aimed to evaluate the use of stochastic regular grammars (SRG) as an alternative technique to PWMs in this problem, since SRGs are able to characterize dependencies between consecutive positions in the sites. Although differences in performance have been subtle, SRGs appear to be more suitable than PWMs in the presence of higher base dependency values, and PWMs in other cases. Finally, a computational TFBS prediction tool was created based on both SRGs and PWMs.
APA, Harvard, Vancouver, ISO, and other styles
32

Liao, Yi-Sian, and 廖一憲. "Prediction of Transcription Factor Binding Sites from Unaligned Gene Sequences." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/41692945232299039263.

Full text
Abstract:
碩士
國立清華大學
電機工程學系
97
To know the regulation of gene transcription, transcription factor binding sites (motifs) are helpful information. In fact, cDNA microarray hybridization (ChIP array) has became a popular tool for recognizing motif from gene sequences. However the ChIp array can only map the probable sequence within 1-2 kilobases resolution. Our goal is to find out the motif binding site without the information of motif length. To reach this goal we design a computational program, base on the discriminator and binomial model to find the most possible patterns. And we compare our performance to the program called constraint-less Cosmo [1]. From the simulation results, we can prove that our program is better than Cosmo.
APA, Harvard, Vancouver, ISO, and other styles
33

Benner, Philipp. "Combining Prior Information for the Prediction of Transcription Factor Binding Sites." 2015. https://ul.qucosa.de/id/qucosa%3A21541.

Full text
Abstract:
Despite the fact that each cell in an organism has the same genetic information, it is possible that cells fundamentally differ in their function. The molecular basis for the functional diversity of cells is governed by biochemical processes that regulate the expression of genes. Key to this regulatory process are proteins called transcription factors that recognize and bind specific DNA sequences of a few nucleotides. Here we tackle the problem of identifying the binding sites of a given transcription factor. The prediction of binding preferences from the structure of a transcription factor is still an unsolved problem. For that reason, binding sites are commonly identified by searching for overrepresented sites in a given collection of nucleotide sequences. Such sequences might be known regulatory regions of genes that are assumed to be coregulated, or they are obtained from so-called ChIP-seq experiments that identify approximately the sites that were bound by a given transcription factor. In both cases, the observed nucleotide sequences are much longer than the actual binding sites and computational tools are required to uncover the actual binding preferences of a factor. Aggravated by the fact that transcription factors recognize not only a single nucleotide sequence, the search for overrepresented patterns in a given collection of sequences has proven to be a challenging problem. Most computational methods merely relied on the given set of sequences, but additional information is required in order to make reliable predictions. Here, this information is obtained by looking at the evolution of nucleotide sequences. For that reason, each nucleotide sequence in the observed data is augmented by its orthologs, i.e. sequences from related species where the same transcription factor is present. By constructing multiple sequence alignments of the orthologous sequences it is possible to identify functional regions that are under selective pressure and therefore appear more conserved than others. The processing of the additional information exerted by ortholog sequences relies on a phylogenetic tree equipped with a nucleotide substitution model that not only carries information about the ancestry, but also about the expected similarity of functional sites. As a result, a Bayesian method for the identification of transcription factor binding sites is presented. The method relies on a phylogenetic tree that agrees with the assumptions of the nucleotide substitution process. Therefore, the problem of estimating phylogenetic trees is discussed first. The computation of point estimates relies on recent developments in Hadamard spaces. Second, the statistical model is presented that captures the enrichment and conservation of binding sites and other functional regions in the observed data. The performance of the method is evaluated on ChIP-seq data of transcription factors, where the binding preferences have been estimated in previous studies.
APA, Harvard, Vancouver, ISO, and other styles
34

Liu, Chen-Yei, and 劉承業. "An integrated computational tool for predicting transcription factor binding and microRNA target sites of vertebrate genomes." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/15515659671163337497.

Full text
Abstract:
碩士
國立臺灣海洋大學
生物科技研究所
95
Both transcription factor and microRNA play important roles in the regulation of gene expression. We developed a computational tool analyzing upstream and downstream sequences of interested genes. We used PWM(position weight matrix) based on the current biochemical studies for identifying transcription factor binding sites. We also used the Miranda program to analyze the 3’UTR sequences spotting the potential microRNA target sequences which might regulate the expression of the genes harboring the target sequences. Comparative genomics is employed by comparing the sequences between species to detect the transcription factors and microRNAs binding sites conserved during evolution. At last, we build a web tool for assisting users in discovering the transcription factor binding and microRNA target sites given a gene set.
APA, Harvard, Vancouver, ISO, and other styles
35

Wu, Ping-Cheng, and 吳秉承. "Incorporating sequence motifs to improve accuracy of predicting transcription factor binding sites using ChIP-seq data." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/94459583035276354379.

Full text
Abstract:
碩士
國立臺灣大學
生物產業機電工程學研究所
104
Transcription factors (TF) regulate gene expression in living organisms and influence multiple biological processes. Chromatin immunoprecipitation sequencing (ChIP-seq) is a technology that have been widely used to find transcription factor binding sites (TFBSs) of a specific TF among the DNA sequences of a genome. However, the accuracy of the TFBSs identified by ChIP-seq has not been systematically evaluated. In this regard, this thesis utilized TFBS information provided by the TRANSFAC database to validate the TFBSs identified by using ChIP-seq only with multiple false discovery rate (FDR). Moreover, in this thesis, a method incorporating de novo motif discovery was proposed to improve the performance of the predicted TFBSs. ChIP-seq data sampled from different cell lines was collected from ENCODE database. In general, ~60% of the peak regions identified by using the ChIP-seq only with a strict FDR cutoff (FDR = 0) contained at least one TFBS of the specific TF across multiple cell lines. In addition, by our proposed method, the prediction accuracy was improved and better than the results using ChIP-seq alone, though it was observed that the improved levels were affected by the used FDR cutoffs and discovered motifs. In conclusion, this thesis identified the accuracy problem of the ChIP-seq platform by observing from the data in a large scale, and address this issue by proposing a method incorporating de novo motif discovery. The observed results can serve as an important foundation for developing bioinformatics tools on TFBS prediction in future.
APA, Harvard, Vancouver, ISO, and other styles
36

Austin, Ryan. "The de novo Prediction of Functionally Significant Sequence Motifs in Arabidopsis thaliana." Thesis, 2009. http://hdl.handle.net/1807/19021.

Full text
Abstract:
This thesis performs de novo predictions for functionally significant sequence motifs in the Arabidopsis genome under two separate contexts. Each study applies the use of genomic positional information, statistical over-representation and several biologically contextual filters to maximize the visibility of biological signal in prediction results. Numerous literature supported motifs are prevalent in the results of both studies and a number of novel motif patterns possess a strong potential for in planta significance. The first study examines the statistical over-representation of C-terminal tripeptides as a means for identifying eukaryotic conserved protein targetting signatures. Comparative genomics is applied to the analysis of tripeptide frequencies in the C-terminus of 7 eukaryotic proteomes. While biological signal is maximized through the filtering of both simple sequences and homologous sequences present across protein families. The second study introduces a methodology for the effective prediction of transcription factor binding sites in Arabidopsis. A collection of motif prediction algorithms and a novel enumerative strategy are applied to the prediction of cis-acting regulatory elements within the promoters of genes found coexpressed within distinct tissues and under specific abiotic stress treatments. Overall, the analysis identifies 4 known motifs in expected contexts, 5 known motifs in novel contexts and 7 novel motifs with a high potential for biological function.
APA, Harvard, Vancouver, ISO, and other styles
37

Liu, Kai-Wei, and 劉凱維. "Mapping of Transcription Factor Binding Sites and DNA-Binding Motifs." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/74621550301522822856.

Full text
Abstract:
碩士
國立臺灣大學
資訊工程學研究所
96
Transcription factors (TFs) play an essential role in gene regulation by activating or inhibiting the expressions of the corresponding genes. The transcription factors carry out their functions by docking at a specific region in the DNA sequence, which is normally referred to as transcription factor binding site (TFBS). Since the complete network of the interactions between TFs and genes is still largely unknown, figuring out the key residues in the DNA binding domain of a TF can provide the biochemists with valuable information for design of biochemical experiments to verify the interactions between the TF and the corresponding genes. Furthermore, with the key residues in the DNA binding domain identified, we can move to establish a mapping between the DNA binding motifs and the TFBS motifs. In the study reported in this thesis, we have proposed a novel approach to achieve the objectives mentioned above. The proposed approach begins with clustering the TFBSs with the same binding type. Then, sequence alignment with a strict criterion is applied to the corresponding DNA binding domains of the TFBSs in the same cluster in order to identify the key residues in the DNA binding domains. For those TFs whose tertiary structure is present in the Protein Data Bank (PDB), we have examined the physiochemical significance of the key residues identified.
APA, Harvard, Vancouver, ISO, and other styles
38

Quon, Gerald T. "The landscape of false-positive transcription factor binding site predictions in yeast." 2007. http://link.library.utoronto.ca/eir/EIRdetail.cfm?Resources__ID=452973&T=F.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Shih, Chih-Yuan, and 石智遠. "Isolation and Characterization of Binding Sites for Transcription Factor FOXP2." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/46154906962594778308.

Full text
Abstract:
碩士
國立清華大學
生命科學系
92
FOXP2 belongs to a family of transcription factors containing the DNA-binding forkhead/winged helix domain. The known FOXP2 mutations that are associated with speech and language disorder are a missense mutation in the KE family and a t(5;7)(q22;q31.2) translocation with a breakpoint in individual CS . Nevertheless, there is no known about the target genes regulated by FOXP2. In this study, using whole genome PCR procedure, we searched human genomic DNA for potential FOXP2 target sites. A number of related sequences that interacted with FOXP2 were identified in vitro by band shift and DNase I footprint analysis. Transient transfection assays in 293T cells further confirmed that the FOXP2 binding sites could also function in vivo. Promoter databases analysis reveals that FOXP2 binding sites are present in the upstream regions of several candidate target genes. A sequence comparison based on several of the novel sequences yielded a putative consensus binding sequence of 5’-TGTTTGT-3’. Remarkably, this sequence is similar to the consensus sequences for forkhead proteins. These DNA binding sites may help identify novel targets of FOXP2 and aid in further understanding FOXP2 function during development of speech and language.
APA, Harvard, Vancouver, ISO, and other styles
40

LEE, M. I., and 李美宜. "Recognizing Cancer-related Genes based on Transcription Factor Binding Sites." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/11997758257999628220.

Full text
Abstract:
碩士
臺中健康暨管理學院
資訊科學與應用學系碩士班
93
Abstract The purpose of transcription factors (TFs) is to regulate the expression of other genes. They are also the key-point to control if mutation will occur on promoter region or not? Current researchers on TFs mainly focus on predicting motifs using algorithms such as Multiple Em for Motif Elicitation (MEME), Genetic Algorithm (GA), and Gibbs Sampler. In this thesis, we propose a new approach to predict possible cancer-related genes based on transcription factor binding sites (TFBS). The experimented TFBS that are binding on promoter region and the known cancer-related genes have been collected from TFSEARCH and CHIP websites, respectively. The TFBS that result in mutation of genes are selected. We then analyze the occurrence frequencies of these TFBS to investigate the relations of TFBS and possible cancer-related genes. We also discuss the two-factor case of analyzing the relations of two TFBS and possible cancer-related genes. Our results show that the TFBS-based approach for predicting possible cancer-related genes is a reliable method to recognize possible cancer-related genes.
APA, Harvard, Vancouver, ISO, and other styles
41

Hsu, Jen-Jay, and 徐振傑. "Prediction of DNA Binding Transcription Factor segments under Specified structure." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/46288973583523923659.

Full text
Abstract:
碩士
國立臺灣大學
資訊工程學研究所
96
This thesis discusses the design of a predictor aimed at identifying the secondary structures in a transcription factor that are involved in interaction with the DNA. In particular, the design of the predictor has been optimized for identifying the alpha-helix structures involved in interaction with the DNA due to their prevalence. In the design of the predictor, the support vector machine (SVM) was employed and the study reported in this thesis focused on the features exploited for making prediction. In the experiments conducted in this study, two datasets have been used. The first dataset was derived from the TF-DNA complexes deposited in the Protein Data Bank (PDB) and the second dataset was derived from the TF sequences deposited in SWISS-PROT. With respect to identifying the alpha-helix structures involved in interaction with the DNA, the predictor proposed in this thesis delivered sensitivity of 75%, precision of 80%, and specificity of 92% with the first dataset and sensitivity 65%, precision 85%, and specificity 98% with the second dataset.
APA, Harvard, Vancouver, ISO, and other styles
42

Wu, Po-Chun, and 吳柏均. "Investigating Variations of Transcription Factor Binding Sites by 1000 Genomes Data." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/31281686353680898713.

Full text
Abstract:
碩士
國立臺灣大學
生醫電子與資訊學研究所
103
Gene regulation is essential and important for maintaining cellular functions. Therefore, how biological system regulates gene expression is a very important research topic for researchers. Gene regulation of cell functioning can be divided into many parts, including gene expression, mRNA transcription and splicing, post-translational modification, etc. This study aims at exploring the activation and inactivation effect of gene expression, through the interaction between transcription factors and double-stranded DNA. Among the three billion base pairs of human genome, some biological significant fragments such as genes or transcription factor binding sites account for only a small portion of DNA. The size of transcription factor binding motifs is about 5 to 15 nucleotides. Accordingly, how to identify transcription factor binding sites and how they achieve gene regulation is a very important research issue. Meanwhile, the bonding strength between transcription factors and their binding sites may also affect the regulation of gene expression. In the 1990s, the Human Genome Sequencing Project launched. Limited to the technology at that time, this project spent a lot of money and manpower. Finally, 23 human chromosomes were completed sequencing in 2001, including in total three billion bases. This is a considerable milestone on human genome research. With the development of biotechnology and the reducing cost of computer calculation, the technology of genome sequencing started to grow fast. In 2008, the 1000 Genomes Project started, planning to use faster and easier sequencing technology, to sequencing more than a thousand human genomes within three years. In 2012, in total 1,092 human genomes have been published. So far, the latest version dataset of this project has already contained 2,504 human genome data. The completion of human genome allows researchers to perform high-throughput screening of transcription factor binding sites. More and more individual genome datasets, provided a wealth of research themes letting us to glimpse the differences within individual transcription factor binding sites. The objective of this study is using the data of 1000 Genomes Project to explore individual variations in transcription factor binding sites, and the possibilities of its applications on genetic tests. This study collected the binding site data of 34 human transcription factors in the JASPAR database, and combined this information with the variant data of the 1000 Genomes Project to explore individual variations in transcription factor binding sites. Analysis from the study shows, the JASPAR-denoted transcription factor binding sites have only about 3% of position with individual variations. Furthermore, the positions with individual variations do not consistent with the original motifs of the transcription factor binding sites. Some individual variations occur at the positions where the corresponding motif implies not allowing variations. In order to further investigate the rationale behind this inconsistency, this study used an online tool named PiDNA, which predicts the binding motif of a DNA-binding protein using protein-DNA complex structures. This study employed such binding motifs to explore the potential minor form that might be omitted previously. At the end of this study, it discusses the future application of personal genetic diagnosis, and how to use existing bioinformatics tools and public databases to assess the importance of the occurrence of variants observed in transcription factor binding sites. It is expected that this study can provide novel insights for individual genetic tests in the personalized medecine.
APA, Harvard, Vancouver, ISO, and other styles
43

Wang, Mei-Huei, and 王美惠. "Analysis of Transcription Factor Binding Sites by Using Sequential Pattern Mining." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/59563074202747148930.

Full text
Abstract:
碩士
長庚大學
資訊管理研究所
94
The process of transcription is that an RNA product is produced from a given DNA. In this process, transcription factors affect the expression of genes by binding to specific regions with consensus patterns in the upstream region of genes. Therefore, the consensus patterns are also known as transcription factor binding sites (TFBS). By analyzing how transcription factors act on DNA binding sites and how they collaborate in ordered coordination, we could get an insight of the gene regulation process. Many computational studies on the combinations of and the relationships between transcription factor binding sites are based on association rule mining. Results of the studies are provided to biologists for further research. However, the sequenceing order of the transcription factor binding sites can not be mined by association rule minig and the number of rules produced by association rule mining is enormous. This study uses the known TFBS in TRANSFAC database to mark the TFBS positions in upstream sequence of gene. Sequential pattern mining technique is proposed to analyze the permatation of transcription factor binding sites in upstream region of genes. The differences between sequential pattern mining and association rule mining are explored. The result shows that sequential pattern mining find the combination and permutation transcription factor binding sites more efficiently and thus save the time a biologist must otherwise spend on validating the experiment.
APA, Harvard, Vancouver, ISO, and other styles
44

Zhao, Xiaoyan. "Improved Algorithms for Discovery of Transcription Factor Binding Sites in DNA Sequences." Thesis, 2010. http://hdl.handle.net/1969.1/ETD-TAMU-2010-12-8834.

Full text
Abstract:
Understanding the mechanisms that regulate gene expression is a major challenge in biology. One of the most important tasks in this challenge is to identify the transcription factors binding sites (TFBS) in DNA sequences. The common representation of these binding sites is called “motif” and the discovery of TFBS problem is also referred as motif finding problem in computer science. Despite extensive efforts in the past decade, none of the existing algorithms perform very well. This dissertation focuses on this difficult problem and proposes three new methods (MotifEnumerator, PosMotif, and Enrich) with excellent improvements. An improved pattern-driven algorithm, MotifEnumerator, is first proposed to detect the optimal motif with reduced time complexity compared to the traditional exact pattern-driven approaches. This strategy is further extended to allow arbitrary don’t care positions within a motif without much decrease in solvable values of motif length. The performance of this algorithm is comparable to the best existing motif finding algorithms on a large benchmark set of samples. Another algorithm with further post processing, PosMotif, is proposed to use a string representation that allows arbitrary ignored positions within the non-conserved portion of single motifs, and use Markov chains to model the background distributions of motifs of certain length while skipping these positions within each Markov chain. Two post processing steps considering redundancy information are applied in this algorithm. PosMotif demonstrates an improved performance compared to the best five existing motif finding algorithms on several large benchmark sets of samples. The third method, Enrich, is proposed to improve the performance of general motif finding algorithms by adding more sequences to the samples in the existing benchmark datasets. Five famous motif finding algorithms have been chosen to run on the original datasets and the enriched datasets, and the performance comparisons show a general great improvement on the enriched datasets.
APA, Harvard, Vancouver, ISO, and other styles
45

Wei-Hao, Yuan. "Extracting Transcription Factor Binding Sites from Unaligned Gene Sequences with Statistical Models." 2006. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0016-1303200709314000.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Yuan, Wei-Hao, and 袁偉豪. "Extracting Transcription Factor Binding Sites from Unaligned Gene Sequences with Statistical Models." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/09117995586690727615.

Full text
Abstract:
碩士
國立清華大學
電機工程學系
94
Transcription factor binding sites (motifs) are crucial in the regulation of the gene transcription. Recently, the chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP array) have been used to identify potential regulatory sequences, but the procedure can only map the probable protein-DNA interaction loci within 1-2 kilobases resolution. To find out the exact binding motifs, it is necessary to build a computational method to examine the ChIP-array binding sequences and search for possible motifs representing the transcription factor binding sites. In this thesis, we design a program to find out accurate motif sites in the yeast genome with dependency graphs and their expanded Bayesian networks. The program incorporates with the binomial probability model to build significant initial motif sets. Finally, we compare our results with those obtained from famous programs and show that our program outperforms these program in the consistence with known specificities.
APA, Harvard, Vancouver, ISO, and other styles
47

Church, William David. "Mapping the YY1 and p65 binding sites on the transcription factor LSF." Thesis, 2013. https://hdl.handle.net/2144/14244.

Full text
Abstract:
Late SV40 factor (LSF) is a CP2 family transcription factor involved in cell cycle regulation. In liver cancer, LSF is an oncogene, in part due to its role in upregulation of osteopontin leading to increase tumor size. As a result, LSF is a potential target for drug discovery. LSF binds the p65 subunit of the transcription factor NFkB and also the transcription factor ying yang 1 (YY1). In this thesis, I show that binding of both YY1 and p65 occurs at the ubiquitin-like domain of LSF in U2OS cell extracts. Interestingly, when phosphatase inhibitors are added during preparation of U2OS cell extracts, the binding of YY1 and p65 to LSF shifts from the ubiquitin-like domain of LSF to the DNA binding domain. The role of a yet unidentified docking protein may be responsible for this shift in binding. In an attempt to map the specific region of the LSF sequence that is involved in these interactions, I have developed a peptide identification assay which utilizes protease digestion, protein mediated peptide capture, and LC ESI-MS. Through the use of this assay, I'm confident that the sequence(s) involved in these LSF protein-protein interactions can be further defined.
APA, Harvard, Vancouver, ISO, and other styles
48

Hsu, Chih-Kai, and 許智凱. "A two-way predicting computational tool website for transcription factor binding site andvertebrate genomes." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/60071928811643946109.

Full text
Abstract:
碩士
國立臺灣海洋大學
生物科技研究所
97
Transcriptional control by the transcription factors plays the central roles in the regulation of the gene expression. Given the gene lists involved in a biological phenomenon generated in a gene expression profiling experiment, the commonly asked question is what kind of and how the transcription factors control a battery of genes. eMOG (Extraction of Motif or Gene) is the web tool that we develop to analyze the upstream sequence given a gene list. eMOG scans the upstream sequences of genes and, by judging a probability score, discover the over-represented known transcription factor binding site (TFBS). Furthermore, eMOG allows the users to employ TF names to predict the genes that are potentially regulated by the given TFs. Finally, the user can visualize the TFBS patterns on the upstream sequence of genes using Scalable Vector Graphics (SVG). We use 115 human genes the upstream sequences of which are bound by E2F family, a TF family that regulates the entry of S phase in cell cycle. eMOG revealed four TFBS (E2F, CREB, NF-Y, Nrf-1) that are over-represented in the upstream sequences of those 115 genes. Moreover, we discover another 27 genes that are potentially under the transcriptional control of these four TF by reverse eMOG. Functional analysis of these 27 genes reveals that 14 genes are known to be directly related to cell cycle control and two genes associated with membrane receptor. Interestingly, using the same approach, 26 mouse genes are discovered to be potentially under the transcriptional control of the same four TF by reverse eMOG. The function of 11 out of these 26 mouse genes are known to be related to cell cycle control.
APA, Harvard, Vancouver, ISO, and other styles
49

Fu, Changjui, and 傅昶瑞. "A mixed 0-1 linear programming approach for finding transcription factor binding sites." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/80211213408434043290.

Full text
Abstract:
博士
國立交通大學
資訊管理研究所
95
The discrimination of transcription factor binding sites (TFBS) in multiple DNA sequences is an essential work for function analysis of gene expression. Enumeration methods that search all possible patterns have best precision among all current algorithms but require an exponential computational time and have difficulties to search for longer patterns. A predefined shared pattern can notably prunes the searching space but such information is often unavailable. Finding unframed TFBS today still relies on heuristic approaches which compromise to accuracy. To effectively find TFBS, this study develops a mixed 0-1 linear programming approach to solve a series of problems for issues including fixed-pattern TFBS finding, ambiguous spacer TFBS finding and pattern-free TFBS finding. The proposed method has the following advantages over current methods: (1) A pattern-driven instead of sample-driven (or sequence-driven) design; (2) A global optimal solution is promised; (3) Structural features of motifs are embeddable to help facilitate search process. And with pattern-free approaches we can successfully determine TFBS within dispersed spacers. We apply several experiments on every kind of TFBS finding programs and in these examples the real TFBS are successfully determined in an acceptable computational time.
APA, Harvard, Vancouver, ISO, and other styles
50

Gonsalves, Sarah E. "Identification of Heat Shock Factor Binding Sites in the Drosophila Genome." Thesis, 2012. http://hdl.handle.net/1807/34017.

Full text
Abstract:
The heat shock response (HSR) is a highly conserved mechanism that enables organisms to survive environmental and pathophysiological stress. In Drosophila, the HSR is regulated by a single transcription factor, heat shock factor (HSF). During stress, HSF trimerizes and binds to over 200 loci on Drosophila polytene chromosomes with only nine mapping to major heat shock (HS) inducible gene loci. The function of HSF binding to the other sites in the genome is currently unknown. Some of these sites may contain yet unidentified “minor” HS genes. Interestingly, the binding of HSF also coincides with puff regression at some sites. Two such sites contain the major developmentally regulated genes Eip74 and Eip75: key regulators in the response to 20-hydroxyecdysone (20E), the main hormone responsible for the temporal co-ordination of post-embryonic development in Drosophila. Previous work in our and other labs indicates that the regression of non-HS puffs during the HSR is dependent on the presence of functional HSF. Using chromatin immunoprecipitation (ChIP) followed by hybridization to genome tiling arrays (Chip), I have identified 434 regions in the Drosophila Kc cell genome that are bound by HSF during HS, and have determined that 57% of these sites are located within the transcribed regions of genes. By examining the transcriptional response to HS in Kc cells and third instar larvae using expression microarrays, I found that only about 10% of all genes within 1250 bp of an HSF binding site are transcriptionally regulated by HS and many genes whose transcript levels change during HS do not appear to be near an HSF binding site. Furthermore, genes with an HSF binding site within their introns are significantly enriched (modified Fisher Exact p-value between 2.0x10-3 and 1.5x10-6) in gene ontology terms related to developmental processes and reproduction. Using expression microarray technology, I characterized the transcriptional response to 20E and its structural analog ponasterone A. I have identified multiple HSF binding sites within Eip74 and Eip75, and show that induction of the HSR correlates with repression of these genes and all other 20E-inducible genes. Taken together, this work provides a basis for further investigation into the role of HSF binding to sites not associated with HS genes and its possible function as a repressor of gene transcription during conditions of stress and as a regulator of developmental genes under stress and non-stress conditions.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography