Dissertations / Theses on the topic 'Base de données distribuée'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Base de données distribuée.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Sahri, Soror. "Conception et implantation d'un système de bases de données distribuée & scalable : SD-SQL Server." Paris 9, 2006. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2006PA090013.
Full textOur thesis elaborates on the design of a scalable distributed database system (SD-DBS). A novel feature of an SD-DBS is the concept of a scalable distributed relational table, a scalable table in short. Such a table accommodates dynamic splits of its segments at SD-DBS storage nodes. A split occurs when an insert makes a segment to overflow, like in, e. G. , B-tree file. Current DBMSs provide the static partitioning only, requiring a cumbersome global reorganization from time to time. The transparency of the distribution of a scalable table is in this light an important step beyond the current technology. Our thesis explores the design issues of an SD-DBS, by constructing a prototype termed SD-SQL Server. As its name indicates, it uses the services of SQL-Server. SD-SQL Server repartitions a table when an insert overflows existing segments. With the comfort of a single node SQL Server user, the SD-SQL Server user has larger tables or a faster response time through the dynamic parallelism. We present the architecture of our system, its implementation and the performance analysis
Bessière, Hélène. "Assimilation de données variationnelle pour la modélisation hydrologique distribuée des crues à cinétique rapide." Phd thesis, Toulouse, INPT, 2008. http://oatao.univ-toulouse.fr/7761/1/bessiere1.pdf.
Full textMoussa, Rim. "Contribution à la conception et l'implantation de la structure de données distribuée & scalable à haute disponibilité LH*RS." Paris 9, 2004. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2004PA090050.
Full textDe, Vlieger P. "Création d'un environnement de gestion de base de données " en grille ". Application à l'échange de données médicales." Phd thesis, Université d'Auvergne - Clermont-Ferrand I, 2011. http://tel.archives-ouvertes.fr/tel-00654660.
Full textMaillot, Pierre. "Nouvelles méthodes pour l'évaluation, l'évolution et l'interrogation des bases du Web des données." Thesis, Angers, 2015. http://www.theses.fr/2015ANGE0007/document.
Full textThe web of data is a mean to share and broadcast data user-readable data as well as machine-readable data. This is possible thanks to rdf which propose the formatting of data into short sentences (subject, relation, object) called triples. Bases from the web of data, called rdf bases, are sets of triples. In a rdf base, the ontology – structural data – organize the description of factual data. Since the web of datacreation in 2001, the number and sizes of rdf bases have been constantly rising. This increase has accelerated since the apparition of linked data, which promote the sharing and interlinking of publicly available bases by user communities. The exploitation – interrogation and edition – by theses communities is made without adequateSolution to evaluate the quality of new data, check the current state of the bases or query together a set of bases. This thesis proposes three methods to help the expansion at factual and ontological level and the querying of bases from the web ofData. We propose a method designed to help an expert to check factual data in conflict with the ontology. Finally we propose a method for distributed querying limiting the sending of queries to bases that may contain answers
De, Vlieger Paul. "Création d'un environnement de gestion de base de données "en grille" : application à l'échange de données médicales." Phd thesis, Université d'Auvergne - Clermont-Ferrand I, 2011. http://tel.archives-ouvertes.fr/tel-00719688.
Full textAntoine, Émilien. "Gestion des données distribuées avec le langage de règles: Webdamlog." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00908155.
Full textLe, Sergent Thierry. "Méthodes d'exécution et machines virtuelles parallèles pour l'implantation distribuée du langage de programmation parallèle LCS." Toulouse 3, 1993. http://www.theses.fr/1993TOU30021.
Full textTian, Yongchao. "Accéler la préparation des données pour l'analyse du big data." Thesis, Paris, ENST, 2017. http://www.theses.fr/2017ENST0017/document.
Full textWe are living in a big data world, where data is being generated in high volume, high velocity and high variety. Big data brings enormous values and benefits, so that data analytics has become a critically important driver of business success across all sectors. However, if the data is not analyzed fast enough, the benefits of big data will be limited or even lost. Despite the existence of many modern large-scale data analysis systems, data preparation which is the most time-consuming process in data analytics has not received sufficient attention yet. In this thesis, we study the problem of how to accelerate data preparation for big data analytics. In particular, we focus on two major data preparation steps, data loading and data cleaning. As the first contribution of this thesis, we design DiNoDB, a SQL-on-Hadoop system which achieves interactive-speed query execution without requiring data loading. Modern applications involve heavy batch processing jobs over large volume of data and at the same time require efficient ad-hoc interactive analytics on temporary data generated in batch processing jobs. Existing solutions largely ignore the synergy between these two aspects, requiring to load the entire temporary dataset to achieve interactive queries. In contrast, DiNoDB avoids the expensive data loading and transformation phase. The key innovation of DiNoDB is to piggyback on the batch processing phase the creation of metadata, that DiNoDB exploits to expedite the interactive queries. The second contribution is a distributed stream data cleaning system, called Bleach. Existing scalable data cleaning approaches rely on batch processing to improve data quality, which are very time-consuming in nature. We target at stream data cleaning in which data is cleaned incrementally in real-time. Bleach is the first qualitative stream data cleaning system, which achieves both real-time violation detection and data repair on a dirty data stream. It relies on efficient, compact and distributed data structures to maintain the necessary state to clean data, and also supports rule dynamics. We demonstrate that the two resulting systems, DiNoDB and Bleach, both of which achieve excellent performance compared to state-of-the-art approaches in our experimental evaluations, and can help data scientists significantly reduce their time spent on data preparation
Bechchi, Mounir. "Clustering-based Approximate Answering of Query Result in Large and Distributed Databases." Phd thesis, Université de Nantes, 2009. http://tel.archives-ouvertes.fr/tel-00475917.
Full textPontisso, Nadège. "Association cohérente de données dans les systèmes temps réel à base de composants - Application aux logiciels spatiaux." Phd thesis, Institut National Polytechnique de Toulouse - INPT, 2009. http://tel.archives-ouvertes.fr/tel-00459071.
Full textJachiet, Louis. "Sur la compilation des langages de requêtes pour le web des données : optimisation et évaluation distribuée de SPARQL." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAM038/document.
Full textThe topic of my PhD is the compilation of web data query languages. More particularly, the analysisand the distributed evaluation of a such language: SPARQL. My main contributions concern theevaluation of web data queries especially for recursive queries or for distributed settings.In this thesis, I introduce μ-algebra: it is a kind of relational algebra equipped with a fixpointoperator. I present its syntax, semantics, and a translation from SPARQL with Property Paths (anew feature of SPARQL allowing some form of recursion) to this μ-algebra.I then present a type system and show how μ-algebra terms can be rewritten to terms withequivalent semantics using either classical rewrite rules of the relational world or new rules that arespecific to this μ-algebra. We demonstrate the correctness of these new rules that are introduced tohandle the rewriting of fixpoints: they allow to push filters, joins and projections inside fixpointsor to combine several fixpoints (when some condition holds).I demonstrate how these terms could be evaluated both from a general perspective and in thespecific case of a distributed evaluation. I devise a cost model for μ-algebra terms inspired by thisevaluation. With this cost model and this evaluator, several terms that are semantically equivalentcan be seen as various Query Execution Plans (QEP) for a given query. I show that the μ-algebraand its rewrite rules allow the reach of QEP that are more efficient than all QEP considered in otherexisting approaches and confirm this by an experimental comparison of several query evaluators onSPARQL queries with recursion.I investigate the use of an efficient distributed framework (Spark) to build a fast SPARQL dis-tributed query evaluator. It is based on a fragment of μ-algebra, limited to operators that havea translation into fast Spark code. The result of this has been used to implement SPARQLGX, astate of the art distributed SPARQL query evaluator.Finally, my last contribution concerns the estimation of the cardinality of solutions to a μ-algebraterm. Such estimators are key in the optimization. Indeed, most cost models for QEP rely on suchestimators and are therefore necessary to determine the most efficient QEP. I specifically considerthe conjunctive query fragment of μ-algebra (which corresponds to the well-known Basic GraphPattern fragment of SPARQL). I propose a new cardinality estimation based on statistics about thedata and implemented the method into SPARQLGX. Experiments show that this method improvesthe performance of SPARQLGX
Sauquet, Dominique. "Lied : un modèle de données sémantique et temporel : son intégration dans une architecture distribuée et son utilisation pour des applications médicales." Châtenay-Malabry, Ecole centrale de Paris, 1998. http://www.theses.fr/1998ECAP0586.
Full textNguyen, Thi Thanh Quynh. "A new approach for distributed programming in smart grids." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAT079.
Full textThe main challenge of smart grids control and management is the amount of data to be processed. Traditional, centralized techniques, even if they offer the advantage of the ease of management by their global grid vision, do not support in practice the continuous growth of data volumes (limited bandwidth, bottleneck, amount of calculations, etc.). The transition to decentralized(distributed)control and management, where the system is made up of a multitude of co-operating computing units, offers very good prospects (robustness, calculation close to the producers and consumers of data, exploitation of data in all available resources), but remains challenging to implement. In fact, the programming of distributed algorithms requires taking into account the data exchanges and the synchronization of the participating units; this complexity increases with the number of units. In this thesis, we propose an innovative approach of programming of a high level of abstraction masking these difficulties.First, we propose to abstract all Smartgrid computing units (smart meters, sensors, data concentrators, etc.) as a distributed database. Each computing unit hosts a local database and only the data needed to continue the calculation are exchanged with other units, which decreases the use of the available bandwidth. The use of a declarative data handling language will simplify the programming of control and management applications. Besides, we also propose SmartLog, a rule-based language (based on the Datalog language and its derivatives dedicated to these applications. It facilitates distributed programming of Smartgrid applications by immediately responding to any changes in the data.Even with a language such as SmartLog, it is necessary to take into account the data exchange and the synchronization of the participants. This is why we then propose an approach that simplifies distributed programming. This approach, named CPDE for Centralized Programming and Distributed Execution, consists of two steps: (i) programming the centralized application in SmartLog, as this is easier, and (ii) translating the centralized program into a distributed program based on the actual location of the data. To do this, we propose a semi-automatic Smartlog rule distribution algorithm.In order to demonstrate the interest of CPDE, we conducted a comprehensive experiment using applications and algorithms actually used in Smartgrids, such as secondary control in isolated micro-grids or fair voltage regulation. The experiment was carried out on a real-time electrical network simulation platform, with an OPAL-RT simulation machine, and a Raspberry-Pi network representing the computing units (their performances are quite comparable to the real equipment). This experiment allowed validating the behaviours and the performances of the distributed programs conceived with CPDE, and comparing to their centralized versions in SmartLog and their reference versions implanted in Java. The impact of different parameters, such as the number of calculation units or different data distribution alternatives, is studied as well
Pierkot, Christelle. "Gestion de la Mise à Jour de Données Géographiques Répliquées." Phd thesis, Université Paul Sabatier - Toulouse III, 2008. http://tel.archives-ouvertes.fr/tel-00366442.
Full textL'institution militaire utilise elle aussi les données spatiales comme soutien et aide à la décision. A chaque étape d'une mission, des informations géographiques de tous types sont employées (données numériques, cartes papiers, photographies aériennes...) pour aider les unités dans leurs choix stratégiques. Par ailleurs, l'utilisation de réseaux de communication favorise le partage et l'échange des données spatiales entre producteurs et utilisateurs situés à des endroits différents. L'information n'est pas centralisée, les données sont répliquées sur chaque site et les utilisateurs peuvent ponctuellement être déconnectés du réseau, par exemple lorsqu'une unité mobile va faire des mesures sur le terrain.
La problématique principale concerne donc la gestion dans un contexte militaire, d'une application collaborative permettant la mise à jour asynchrone et symétrique de données géographiques répliquées selon un protocole à cohérence faible optimiste. Cela nécessite de définir un modèle de cohérence approprié au contexte militaire, un mécanisme de détection des mises à jour conflictuelles lié au type de données manipulées et des procédures de réconciliation des écritures divergentes adaptées aux besoins des unités participant à la mission.
L'analyse des travaux montre que plusieurs protocoles ont été définis dans les communautés systèmes (Cederqvist :2001 ; Kermarrec :2001) et bases de données (Oracle :2003 ; Seshadri :2000) pour gérer la réplication des données. Cependant, les solutions apportées sont souvent fonctions du besoin spécifique de l'application et ne sont donc pas réutilisables dans un contexte différent, ou supposent l'existence d'un serveur de référence centralisant les données. Les mécanismes employés en information géographique pour gérer les données et les mises à jour ne sont pas non plus appropriés à notre étude car ils supposent que les données soient verrouillées aux autres utilisateurs jusqu'à ce que les mises à jour aient été intégrée (approche check in-check out (ESRI :2004), ou utilisent un serveur centralisé contenant les données de référence (versionnement : Cellary :1990).
Notre objectif est donc de proposer des solutions permettant l'intégration cohérente et autant que possible automatique, des mises à jour de données spatiales dans un environnement de réplication optimiste, multimaître et asynchrone.
Nous proposons une stratégie globale d'intégration des mises à jour spatiales basée sur une vérification de la cohérence couplé à des sessions de mises à jour. L'originalité de cette stratégie réside dans le fait qu'elle s'appuie sur des métadonnées pour fournir des solutions de réconciliation adaptées au contexte particulier d'une mission militaire.
La contribution de cette thèse est double. Premièrement, elle s'inscrit dans le domaine de la gestion de la mise à jour des données spatiales, domaine toujours très actif du fait de la complexité et de l'hétérogénéité des données (Nous limitons néanmoins notre étude aux données géographiques vectorielles) et de la relative «jeunesse » des travaux sur le sujet. Deuxièmement, elle s'inscrit dans le domaine de la gestion de la cohérence des données répliquées selon un protocole optimiste, en spécifiant en particulier, de nouveaux algorithmes pour la détection et la réconciliation de données conflictuelles, dans le domaine applicatif de l'information géographique.
Grazziottin, Ribeiro Helena. "Un service de règles actives pour fédérations de bases de données." Université Joseph Fourier (Grenoble), 2000. http://www.theses.fr/2000GRE10084.
Full textPomares, Alexandra. "Médiation et sélection de sources de données pour des organisations virtuelles distribuées à grande échelle." Phd thesis, Université de Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00717263.
Full textLeon, Luna Claudia. "Contraintes d'intégrité et transactions imbriquées." Paris 6, 2001. http://www.theses.fr/2001PA066518.
Full textTekin, Ugur. "Approches de mise en œuvre d’une plateforme logicielle modulable et configurable d’analyse de données Big Data et de modélisation des systèmes à base d’objets connectés." Thesis, Bourgogne Franche-Comté, 2020. http://www.theses.fr/2020UBFCD050.
Full textCurrently, in the fields of Big Data and IoT, the research carried out has for a long time remained focused on upstream issues related to communication protocols for data collection, or downstream issues related to the exploitation of collected data. Few attention has been paid to important issues that lie precisely between those upstream and those downstream problems, such as deployment, modular software architectures implementation (at Edge, Fog or Cloud Computing level), fine control of IoTs behaviors for a better use of the network bandwidth and the limited energy of devices. In the context of this work, the following questions that one has to deal with to implement Big Data / IoT technology solutions are considered. Firstly, what generic architecture could we adopt to implement a reconfigurable platform according to expected functionalities for a client? In other words, how can we implement a multi-micro-services platform that could be unlocked or created on demand to customize an offer? Secondly, which formal modelling method should be used to overcome encountered obstacles when deploying IoT operational projects to optimize performance, considering that an IoT based systems can be viewed as discrete-event dynamic systems? To address the first issue, a modular and configurable Big Data analysis platform has been developed with essential functions: (a) data acquisition from IoT devices but also information gathered from other sources and protocols, (b) control and processing components for their exploitation, and (c) massive data storage. A Data Science service has been developed, integrating AI approaches, (d) for knowledge discovery and (e) forecasting for diagnosis and decision support. In addition, the platform must be modular (i.e. the possibility of adding or removing micro-services on demand) and configurable (i.e. the possibility to customize the platform's parameters according to the needs and business rules of each user). To address the second issue, an approach based on Max Plus algebra is developed, to tackle real problems encountered in IoT operational projects. This has enabled the implementation of solutions that make the deployment and energy management of IoTs more efficient. This work has been carried out using an analytical modelling approach to establish criteria for monitoring and assessing the performance of the solutions
Golenetskaya, Natalia. "Adressing scaling challenges in comparative genomics." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2013. http://tel.archives-ouvertes.fr/tel-00865840.
Full textBelfkih, Abderrahmen. "Contraintes temporelles dans les bases de données de capteurs sans fil." Thesis, Le Havre, 2016. http://www.theses.fr/2016LEHA0014/document.
Full textIn this thesis, we are interested in adding real-time constraints in the Wireless Sensor Networks Database (WSNDB). Temporal consistency in WSNDB must be ensured by respecting the transaction deadlines and data temporal validity, so that sensor data reflect the current state of the environment. However, delays of transmission and/or reception in a data collection process can lead to not respect the data temporal validity. A database solution is most appropriate, which should coincide with the traditional database aspects with sensors and their environment. For this purpose, the sensor in WSN is considered as a table in a distributed database, which applied transactions (queries, updates, etc.). Transactions in a WSNDB require modifications to take into account of the continuous datastream and real-time aspects. Our contribution in this thesis focus on three parts: (i) a comparative study of temporal properties between a periodic data collection based on a remote database and query processing approach with WSNDB, (ii) the proposition of a real-time query processing model, (iii) the implementation of a real time WSNDB, based on the techniques described in the second contribution
Antoine, Emilien. "Distributed data management with a declarative rule-based language webdamlog." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00933808.
Full textEl, Merhebi Souad. "La gestion d'effet : une méthode de filtrage pour les environnements virtuels répartis." Toulouse 3, 2008. http://thesesups.ups-tlse.fr/243/1/El_Merhebi_Souad.pdf.
Full textDistributed virtual environments (DVEs) are intended to provide an immersive experience to their users within a shared virtual environment. For this purpose, DVEs try to supply participants with coherent views of the shared world. This requires a heavy message exchange between participants especially with the increasing popularity of massively multiplayer DVEs. This heavy message exchange consumes a lot of processing power and bandwidth, slowing down the system and limiting interactivity. Indeed, coherence, interactivity and scalability are basic requirements of DVEs. However, these requirements are conflicting because coherence requires the more important exchange of messages that we can have while interactivity and scalability demand to decrease this exchange to minimum. For this reason, the management of message exchange is essential for distributed virtual environments. To manage message exchange in an intelligent way, DVE systems use various filtering techniques. Among them, interest management techniques filter messages according to users' interests in the world. In this document, we present our interest management technique, the effect management. This technique expresses the interests and manifestations of participants in various media through conscience and effect zones. When the conscience zone of a participant collides the effect zone of another participant in a given medium, the first one becomes conscious of the second. ). .
Saoudi, Massinissa. "Conception d'un réseau de capteurs sans fil pour des prises de décision à base de méthodes du Data Mining." Thesis, Brest, 2017. http://www.theses.fr/2017BRES0065/document.
Full textRecently, Wireless Sensor Networks (WSNs) have emerged as one of the most exciting fields. However, the common challenge of all sensor network applications remains the vulnerability of sensor nodes due to their characteristics and also the nature of the data generated which are of large volume, heterogeneous, and distributed. On the other hand, the need to process and extract knowledge from these large quantities of data motivated us to explore Data mining techniques and develop new approaches to improve the detection accuracy, the quality of information, the reduction of data size, and the extraction of knowledge from WSN datasets to help decision making. However, the classical Data mining methods are not directly applicable to WSNs due to their constraints.It is therefore necessary to satisfy the following objectives: an efficient solution offering a good adaptation of Data mining methods to the analysis of huge and continuously arriving data from WSNs, by taking into account the constraints of the sensor nodes which allows to extract knowledge in order to make better decisions. The contributions of this thesis focus mainly on the study of several distributed algorithms which can deal with the nature of sensed data and the resource constraints of sensor nodes based on the Data mining algorithms by first using the local computation at each node and then exchange messages with its neighbors, in order to reach consensus on a global model. The different results obtained show that the proposed approaches reduce the energy consumption and the communication cost considerably which extends the network lifetime.The results also indicate that the proposed approaches are extremely efficient in terms of model computation, latency, reduction of data size, adaptability, and event detection
Ait, Lahcen Ayoub. "Développement d'Applications à Base de Composants avec une Approche Centrée sur les Données et dans une Architecture Orientée Service et Pair-à-Pair : Spécification, Analyse et Intergiciel." Phd thesis, Université Nice Sophia Antipolis, 2012. http://tel.archives-ouvertes.fr/tel-00766329.
Full textScotto, di Apollonia Gaëtan. "Une fédération de serveurs de calcul pour applications distribuées à base de composants hétérogènes et de connecteurs génériques." Lille 1, 2003. https://pepite-depot.univ-lille.fr/RESTREINT/Th_Num/2003/50376-2003-319.pdf.
Full textServajean, Maximilien. "Recommandation diversifiée et distribuée pour les données scientifiques." Thesis, Montpellier 2, 2014. http://www.theses.fr/2014MON20216/document.
Full textIn many fields, novel technologies employed in information acquisition and measurement (e.g. phenotyping automated greenhouses) are at the basis of a phenomenal creation of data. In particular, we focus on two real use cases: plants observations in botany and phenotyping data in biology. Our contributions can be, however, generalized to Web data. In addition to their huge volume, data are also distributed. Indeed, each user stores their data in many heterogeneous sites (e.g. personal computers, servers, cloud); yet he wants to be able to share them. In both use cases, collaborative solutions, including distributed search and recommendation techniques, could benefit to the user.Thus, the global objective of this work is to define a set of techniques enabling sharing and discovery of data in heterogeneous distributed environment, through the use of search and recommendation approaches.For this purpose, search and recommendation allow users to be presented sets of results, or recommendations, that are both relevant to the queries submitted by the users and with respect to their profiles. Diversification techniques allow users to receive results with better novelty while avoiding redundant and repetitive content. By introducing a distance between each result presented to the user, diversity enables to return a broader set of relevant items.However, few works exploit profile diversity, which takes into account the users that share each item. In this work, we show that in some scenarios, considering profile diversity enables a consequent increase in results quality: surveys show that in more than 75% of the cases, users would prefer profile diversity to content diversity.Additionally, in order to address the problems related to data distribution among heterogeneous sites, two approaches are possible. First, P2P networks aim at establishing links between peers (nodes of the network): creating in this way an overlay network, where peers directly connected to a given peer p are known as his neighbors. This overlay is used to process queries submitted by each peer. However, in state of the art solutions, the redundancy of the peers in the various neighborhoods limits the capacity of the system to retrieve relevant items on the network, given the queries submitted by the users. In this work, we show that introducing diversity in the computation of the neighborhood, by increasing the coverage, enables a huge gain in terms of quality. By taking into account diversity, each peer in a given neighborhood has indeed, a higher probability to return different results given a keywords query compared to the other peers in the neighborhood. Whenever a query is submitted by a peer, our approach can retrieve up to three times more relevant items than state of the art solutions.The second category of approaches is called multi-site. Generally, in state of the art multi-sites solutions, the sites are homogeneous and consist in big data centers. In our context, we propose an approach enabling sharing among heterogeneous sites, such as small research teams servers, personal computers or big sites in the cloud. A prototype regrouping all contributions have been developed, with two versions addressing each of the use cases considered in this thesis
Ayed, Rihab. "Recherche d’information agrégative dans des bases de graphes distribuées." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1305.
Full textIn this research, we are interested in investigating issues related to query evaluation and optimization in the framework of aggregated search. Aggregated search is a new paradigm to access massively distributed information. It aims to produce answers to queries by combining fragments of information from different sources. The queries search for objects (documents) that do not exist as such in the targeted sources, but are built from fragments extracted from the different sources. The sources might not be specified in the query expression, they are dynamically discovered at runtime. In our work, we consider data dependencies to propose a framework for optimizing query evaluation over distributed graph-oriented data sources. For this purpose, we propose an approach for the document indexing/orgranizing process of aggregated search systems. We consider information retrieval systems that are graph oriented (RDF graphs). Using graph relationships, our work is within relational aggregated search where relationships are used to aggregate fragments of information. Our goal is to optimize the access to source of information in a aggregated search system. These sources contain fragments of information that are relevant partially for the query. We aim at minimizing the number of sources to ask, also at maximizing the aggregation operations within a same source. For this, we propose to reorganize the graph database(s) in partitions, dedicated to aggregated queries. We use a semantic or strucutral clustering of RDF predicates. For structural clustering, we propose to use frequent subgraph mining algorithms, we performed for this, a comparative study of their performances. For semantic clustering, we use the descriptive metadata of RDF predicates and apply semantic textual similarity methods to calculate their relatedness. Following the clustering, we define query decomposing rules based on the semantic/structural aspects of RDF predicates
Termier, Alexandre. "Pattern mining rock: more, faster, better." Habilitation à diriger des recherches, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-01006195.
Full textSubias, Audine. "Contribution au diagnostic des systèmes complexes." Habilitation à diriger des recherches, Université Paul Sabatier - Toulouse III, 2006. http://tel.archives-ouvertes.fr/tel-00134944.
Full textEl, Zoghby Nicole. "Fusion distribuée de données échangées dans un réseau de véhicules." Phd thesis, Université de Technologie de Compiègne, 2014. http://tel.archives-ouvertes.fr/tel-01070896.
Full textDia, Amadou Fall. "Filtrage sémantique et gestion distribuée de flux de données massives." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS495.
Full textOur daily use of the Internet and related technologies generates, at a rapid and variable speeds, large volumes of heterogeneous data issued from sensor networks, search engine logs, multimedia content sites, weather forecasting, geolocation, Internet of Things (IoT) applications, etc. Processing such data in conventional databases (Relational Database Management Systems) may be very expensive in terms of time and memory storage resources. To effectively respond to the needs of rapid decision-making, these streams require real-time processing. Data Stream Management Systems (SGFDs) evaluate queries on the recent data of a stream within structures called windows. The input data are different formats such as CSV, XML, RSS, or JSON. This heterogeneity lock comes from the nature of the data streams and must be resolved. For this, several research groups have benefited from the advantages of semantic web technologies (RDF and SPARQL) by proposing RDF data streams processing systems called RSPs. However, large volumes of RDF data, high input streams, concurrent queries, combination of RDF streams and large volumes of stored RDF data and expensive processing drastically reduce the performance of these systems. A new approach is required to considerably reduce the processing load of RDF data streams. In this thesis, we propose several complementary solutions to reduce the processing load in centralized environment. An on-the-fly RDF graphs streams sampling approach is proposed to reduce data and processing load while preserving semantic links. This approach is deepened by adopting a graph-oriented summary approach to extract the most relevant information from RDF graphs by using centrality measures issued from the Social Networks Analysis. We also adopt a compressed format of RDF data and propose an approach for querying compressed RDF data without decompression phase. To ensure parallel and distributed data streams management, the presented work also proposes two solutions for reducing the processing load in distributed environment. An engine and parallel processing approaches and distributed RDF graphs streams. Finally, an optimized processing approach for static and dynamic data combination operations is also integrated into a new distributed RDF graphs streams management system
Zouari, Mohamed. "Architecture logicielle pour l'adaptation distribuée : Application à la réplication de données." Phd thesis, Université Rennes 1, 2011. http://tel.archives-ouvertes.fr/tel-00652046.
Full textGalilée, François. "Athapascan-1 : interprétation distribuée du flot de données d'un programme parallèle." Phd thesis, Grenoble INPG, 1999. http://tel.archives-ouvertes.fr/tel-00004832.
Full textBrahem, Mariem. "Optimisation de requêtes spatiales et serveur de données distribué - Application à la gestion de masses de données en astronomie." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLV009/document.
Full textThe big scientific data generated by modern observation telescopes, raises recurring problems of performances, in spite of the advances in distributed data management systems. The main reasons are the complexity of the systems and the difficulty to adapt the access methods to the data. This thesis proposes new physical and logical optimizations to optimize execution plans of astronomical queries using transformation rules. These methods are integrated in ASTROIDE, a distributed system for large-scale astronomical data processing.ASTROIDE achieves scalability and efficiency by combining the benefits of distributed processing using Spark with the relevance of an astronomical query optimizer.It supports the data access using the query language ADQL that is commonly used.It implements astronomical query algorithms (cone search, kNN search, cross-match, and kNN join) tailored to the proposed physical data organization.Indeed, ASTROIDE offers a data partitioning technique that allows efficient processing of these queries by ensuring load balancing and eliminating irrelevant partitions. This partitioning uses an indexing technique adapted to astronomical data, in order to reduce query processing time
Jawad, Mohamed. "Data privacy in P2P Systems." Nantes, 2011. http://www.theses.fr/2011NANT2020.
Full textOnline peer-to-peer (P2P) communities such as professional ones (e. G. , medical or research communities) are becoming popular due to increasing needs on data sharing. P2P environments offer valuable characteristics but limited guarantees when sharing sensitive data. They can be considered as hostile because data can be accessed by everyone (by potentially malicious peers) and used for everything (e. G. , for marketing or for activities against the owner’s preferences or ethics). This thesis proposes a privacy service that allows sharing sensitive data in P2P systems while protecting their privacy. The first contribution consists on analyzing existing techniques for data privacy in P2P architectures. The second contribution is a privacy model for P2P systems named PriMod which allows data owners to specify their privacy preferences in privacy policies and to associate them with their data. The third contribution is the development of PriServ, a privacy service located on top of DHT-based P2P systems which implements PriMod to prevent data privacy violations. Among others, PriServ uses trust techniques to predict peers behavior
Moscu, Mircea. "Inférence distribuée de topologie de graphe à partir de flots de données." Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4081.
Full textThe second decade of the current millennium can be summarized in one short phrase: the advent of data. There has been a surge in the number of data sources: from audio-video streaming, social networks and the Internet of Things, to smartwatches, industrial equipment and personal vehicles, just to name a few. More often than not, these sources form networks in order to exchange information. As a direct consequence, the field of Graph Signal Processing has been thriving and evolving. Its aim: process and make sense of all the surrounding data deluge.In this context, the main goal of this thesis is developing methods and algorithms capable of using data streams, in a distributed fashion, in order to infer the underlying networks that link these streams. Then, these estimated network topologies can be used with tools developed for Graph Signal Processing in order to process and analyze data supported by graphs. After a brief introduction followed by motivating examples, we first develop and propose an online, distributed and adaptive algorithm for graph topology inference for data streams which are linearly dependent. An analysis of the method ensues, in order to establish relations between performance and the input parameters of the algorithm. We then run a set of experiments in order to validate the analysis, as well as compare its performance with that of another proposed method of the literature.The next contribution is in the shape of an algorithm endowed with the same online, distributed and adaptive capacities, but adapted to inferring links between data that interact non-linearly. As such, we propose a simple yet effective additive model which makes use of the reproducing kernel machinery in order to model said nonlinearities. The results if its analysis are convincing, while experiments ran on biomedical data yield estimated networks which exhibit behavior predicted by medical literature.Finally, a third algorithm proposition is made, which aims to improve the nonlinear model by allowing it to escape the constraints induced by additivity. As such, the newly proposed model is as general as possible, and makes use of a natural and intuitive manner of imposing link sparsity, based on the concept of partial derivatives. We analyze this proposed algorithm as well, in order to establish stability conditions and relations between its parameters and its performance. A set of experiments are ran, showcasing how the general model is able to better capture nonlinear links in the data, while the estimated networks behave coherently with previous estimates
Gillet, Noel. "Optimisation de requêtes sur des données massives dans un environnement distribué." Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0553/document.
Full textDistributed data store are massively used in the actual context of Big Data. In addition to provide data management features, those systems have to deal with an increasing amount of queries sent by distant users in order to process data mining or data visualization operations. One of the main challenge is to evenly distribute the workload of queries between the nodes which compose these system in order to minimize the treatment time. In this thesis, we tackle the problem of query allocation in a distributed environment. We consider that data are replicated and a query can be handle only by a node storing the concerning data. First, near-optimal algorithmic proposals are given when communications between nodes are asynchronous. We also consider that some nodes can be faulty. Second, we study more deeply the impact of data replication on the query treatement. Particularly, we present an algorithm which manage the data replication based on the demand on these data. Combined with our allocation algorithm, we guaranty a near-optimal allocation. Finally, we focus on the impact of data replication when queries are received as a stream by the system. We make an experimental evaluation using the distributed database Apache Cassandra. The experiments confirm the interest of our algorithmic proposals to improve the query treatement compared to the native allocation scheme in Cassandra
Mokadem, Riad. "Signatures algébriques dans la gestion de structures de données distribuées et scalables." Paris 9, 2006. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2006PA090014.
Full textRecent years saw emergence of new architectures, involving multiple computers. New concepts were proposed. Among most popular are those of a multicomputer or of a Network of Worksattion and more recently, of Peer to Peer and Grid Computing. This thesis consists on the design, implementation and performance measurements of a prototype SDDS manager, called SDDS-2005. It manages key based ordered files in distributed RAM of Windows machines forming a grid or P2P network. Our scheme can backup the RAM on each storage node onto the local disk. Our goal is to write only the data that has changed since the last backup. We interest also to update records and non key search (scans). Their common denominator was some application of the properties of new signature scheme based that we call algebraic signatures, which are useful in this context. Ones needs then to find only the areas that changed in the bucket since the last buckup. Our signature based scheme for updating records at the SDDS client should prove its advantages in client-server based database systems in general. It holds the promise of interesting possibilities for transactional concurrency control, beyond the mere avoidance of lost updates. We also update only data have been changed because of the using the algebraic signatures. Also, partly pre-computed algebraic signature of a string encodes each symbol by its cumulative signatures. They protect the SDDS data against incidental viewing by an unauthorized server’s administrator. The method appears attractive, it does not amply any storage overhead. It is also completly transparent for servers and occurs in client. Next, our cheme provide fast string search (match) directly on encoded data at the SDDS servers. They appear an alternative to known Karp-Rabin type schemes. Scans can explore the storage nodes in parallel. They match the records by entire non-key content or by its substring, prefix, longest common prefix or longest common string. The search complexity is almost O (1) for prefix search. One may use them also to detect and localize the silent corruption. These features should be of interest to P2P and grid computing. Then, we propose novel string search algorithm called n-Gramme search. It also appears then among the fastest known, e. G, probably often the faster one we know. It cost only a small fraction of existing records match, especially for larger strings search. The experiments prove high efficiency of our implementation. Our buckup scheme is substantially more efficient with the algebraic signatures. The signature calculus is itself substantially faster, the gain being about 30 %. Also, experiments prove that our cumulative pre-computing notably accelerates the string searchs which are faster than the partial one, at the expense of higher encoding/decoding overhead. They are new alternatives to known Karp-Rabin type schemes, and likely to be usually faster. The speed of string matches opens interesting perspectives for the popular join, group-by, rollup, and cube database operations. Our work has been subject of five publications in international conferences [LMS03, LMS05a, LMS05b, ML06, l&al06]. For convenience, we have included the latest publications. Also, the package termed SDDS-2005 is available for non-commercial use at http://ceria. Dauphine. Fr/. It builds up on earlier versions of the prototype, a cumulative effort of several folks and n-Gramme algorithm implementation. We have also presented our proposed prototype, SDDS-2005, at the Microsoft Research Academic Days 2006
Golenetskaya, Natalia. "Adresser les défis de passage à l'échelle en génomique comparée." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2013. http://tel.archives-ouvertes.fr/tel-00859439.
Full textBechchi, Mounir. "Réponses approchées de résultat de requêtes par classification dans des bases de données volumineuses et distribuées." Nantes, 2009. http://www.theses.fr/2009NANT2033.
Full textDatabase systems are increasingly used for interactive and exploratory data retrieval. In such retrievals, users queries often result in too many answers, so users waste significant time and efforts sifting and sorting through these answers to find the relevant ones. In this thesis, we first propose an efficient and effective algorithm coined Explore-Select-Rearrange Algorithm (ESRA), based on the SAINTETIQ model, to quickly provide users with hierarchical clustering schemas of their query results. SAINTETIQ is a domain knowledge-based approach that provides multi-resolution summaries of structured data stored into a database. Each node (or summary) of the hierarchy provided by ESRA describes a subset of the result set in a user-friendly form based on domain knowledge. The user then navigates through this hierarchy structure in a top-down fashion, exploring the summaries of interest while ignoring the rest. Experimental results show that the ESRA algorithm is efficient and provides well-formed (tight and clearly separated) and well-organized clusters of query results. The ESRA algorithm assumes that the summary hierarchy of the queried data is already built using SAINTETIQ and available as input. However, SAINTETIQ requires full access to the data which is going to be summarized. This requirement severely limits the applicability of the ESRA algorithm in a distributed environment, where data is distributed across many sites and transmitting the data to a central site is not feasible or even desirable. The second contribution of this thesis is therefore a solution for summarizing distributed data without a prior “unification” of the data sources. We assume that the sources maintain their own summary hierarchies (local models), and we propose new algorithms for merging them into a single final one (global model). An experimental study shows that our merging algorithms result in high quality clustering schemas of the entire distributed data and are very efficient in terms of computational time
Liroz-Gistau, Miguel. "Partitionnement dans les Systèmes de Gestion de Données Parallèles." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2013. http://tel.archives-ouvertes.fr/tel-00920615.
Full textBergougnoux, Patrick. "MIME, un environnement de développement coopératif pour applications distribuées." Toulouse 3, 1992. http://www.theses.fr/1992TOU30014.
Full textGros, Pierre-Emmanuel. "Etude et conception d'une plate-forme d'intégration et de visualisation de données génomiques et d'outils bioinformatiques." Paris 11, 2006. http://www.theses.fr/2006PA112139.
Full textIn this beginning of millennium, the efforts of the industrial and academic world allowed for a first version of the sequencing of the human genome. By opening one of these files of sequence, the reader reaches a text of several million characters “a”, “t”, “g”, or “c”, each one symbolizing one of the four bases which constitute the dna. This sequence of letters puts forward our misunderstanding of dna. In order better to tackle this language, a lot of databases of dna sequences, annotations, and experiments were built, several tools of treatments of information were written. The first part of this thesis resolves the integration problem of bioinformatic tools. The approach adopted for the integration of tools is to melt a distributed architecture within the basic data engine. The other facet of integration relates to the integration of data resulting from various biological databases. In a more precise way, our goal is that a user integrate his personal data (coming from an excel file, a text file,. . . ) with the data of the “institutional” bases such as those of the ncbi or swissprot. Lastly, we propose a semantic integration tool called “lysa”. This tool proposes not to explore a database through the structure of the base but through the data within. The purpose of this exploration is to make it possible for the user to find the “semantic” links between data
Manolescu, Goujot Ioana Gabriela. "Techniques d'optimisation pour l'interrogation des sources de données hétérogènes et distribuées." Versailles-St Quentin en Yvelines, 2001. http://www.theses.fr/2001VERS0027.
Full textBenaissa, Adel. "Managing uncertain data over distributed environments." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCB003.
Full textIn recent years, data has become uncertain due to the flourishing advanced technologies that participate continuously and increasingly in producing large amounts of incomplete data. Often, many modern applications where uncertainty occurs are distributed in nature, e.g., distributed sensor networks, information extraction, data integration, social network etc. Consequently, even though the data uncertainty has been studied in the past for centralized behavior, it is still a challenging issue to manage uncertainty over the data in situ. In this work, we focus on the type of data records that are composed of a set of descriptive attributes, which are neither numeric nor inherently ordered in any way namely categorical data. We propose two approaches to managing uncertain categorical data over distributed environments. These approaches are built upon a hierarchical indexing technique and a distributed algorithm to efficiently process queries on uncertain data in distributed environment In the first approach, we propose a distributed indexing technique based on inverted index structure for efficiently searching uncertain categorical data over distributed environments. By leveraging this indexing technique, we address two kinds of queries on the distributed uncertain databases (1) a distributed probabilistic thresholds query, where its answers are satisfy the probabilistic threshold requirement (2) a distributed top k-queries, optimizing, the transfer of the tuples from the distributed sources to the coordinator site and the time treatment. Extensive experiments are conducted to verify the effectiveness and efficiency of the proposed method in terms of communication costs and response time. The second approach is focuses on answering top-k queries and proposing a distributed algorithm namely TDUD. Its aim is to efficiently answer top-k queries over distributed uncertain categorical data in single round of communication. For that purpose, we enrich the global uncertain index provided in the first approach with richer summarizing information from the local indexes, and use it to minimize the amount of communication needed to answer a top-k query. Moreover, the approach maintains the mean sum dispersion of the probability distribution on each site which are then merged at the coordinator site. Extensive experiments are conducted to verify the effectiveness and efficiency of the proposed method in terms of communication costs and response time. We show empirically that the related algorithm is near-optimal in that it can typically retrieve the top-k query answers by communicating few k tuples in a single round
Martinez, Medina Lourdes. "Optimisation des requêtes distribuées par apprentissage." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENM015.
Full textDistributed data systems are becoming increasingly complex. They interconnect devices (e.g. smartphones, tablets, etc.) that are heterogeneous, autonomous, either static or mobile, and with physical limitations. Such devices run applications (e.g. virtual games, social networks, etc.) for the online interaction of users producing / consuming data on demand or continuously. The characteristics of these systems add new dimensions to the query optimization problem, such as multi-optimization criteria, scarce information on data, lack of global system view, among others. Traditional query optimization techniques focus on semi (or not at all) autonomous systems. They rely on information about data and make strong assumptions about the system behavior. Moreover, most of these techniques are centered on the optimization of execution time only. The difficulty for evaluating queries efficiently on nowadays applications motivates this work to revisit traditional query optimization techniques. This thesis faces these challenges by adapting the Case Based Reasoning (CBR) paradigm to query processing, providing a way to optimize queries when there is no prior knowledge of data. It focuses on optimizing queries using cases generated from the evaluation of similar past queries. A query case comprises: (i) the query, (ii) the query plan and (iii) the measures (computational resources consumed) of the query plan. The thesis also concerns the way the CBR process interacts with the query plan generation process. This process uses classical heuristics and makes decisions randomly (e.g. when there are no statistics for join ordering and selection of algorithms, routing protocols). It also (re)uses cases (existing query plans) for similar queries parts, improving the query optimization, and therefore evaluation efficiency. The propositions of this thesis have been validated within the CoBRa optimizer developed in the context of the UBIQUEST project
Colonna, François-Marie. "Intégration de données hétérogènes et distribuées sur le web et applications à la biologie." Aix-Marseille 3, 2008. http://www.theses.fr/2008AIX30050.
Full textOver the past twenty years, the volume of data generated by genomics and biology has grown exponentially. Interoperation of publicly available or copyrighted datasources is difficult due to syntactic and semantic heterogeneity between them. Thus, integrating heterogeneous data is nowadays one of the most important field of research in databases, especially in the biological domain, for example for predictive medicine purposes. The work presented in this thesis is organised around two classes of integration problems. The first part of our work deals with joining data sets across several datasources. This method is based on a description of sources capabilities using feature logics. The second part of our work is a contribution to the development of a BGLAV mediation architecture based on semi-structured data, for an effortless and flexible data integration using the XQuery language
Quiané-Ruiz, Jorge-Alnulfo. "Allocation de requêtes dans des systèmes d'information distribués avec des participants autonomes." Nantes, 2008. https://tel.archives-ouvertes.fr/tel-00464475.
Full textIn large-scale distributed information systems, where participants (consumers and providers) are autonomous and have special interests for some queries, query allocation is a challenge. Much work in this context has focused on distributing queries among providers in a way that maximizes overall performance (typically throughput and response time). However, participants usually have certain expectations with respect to the mediator, which are not only performance-related. Such expectations mainly reflect their interests to allocate and perform queries, e. G. Their interests towards: providers (based on reputation for example), quality of service, topics of interests, and relationships with other participants. In this context, because of participants’ autonomy, dissatisfaction is a problem since it may lead participants to leave the mediator. Participant’s satisfaction means that the query allocation method meets its expectations. Thus, besides balancing query load, preserving the participants’ interests so that they are satisfied is also important. In this thesis, we address the query allocation problem in these environments and make the following main contributions. First, we provide a model to characterize the participants’ perception of the system regarding their interests and propose measures to evaluate the quality of query allocation methods. Second, we propose a framework for query allocation, called SbQA, that dynamically trades consumers’ interests for providers’ interests based on their satisfaction. Third, we propose an query allocation approach, called SbQA, that allows a query allocation method (specifically SbQA) to scale up in terms of the numbers of mediators, participants, and hence of performed queries. Fourth, we propose a query replication method, called SbQR, that allows to support participants’ failures when allocating queries while preserving participants’ satisfaction and good system performance. Last, but not least, we analytically and experimentally validate our proposals and demonstrate that they yield high efficiency while satisfying participants
El, Attar Ali. "Estimation robuste des modèles de mélange sur des données distribuées." Phd thesis, Université de Nantes, 2012. http://tel.archives-ouvertes.fr/tel-00746118.
Full text