Dissertations / Theses on the topic 'Les données'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Les données.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Chambon, Arthur. "Caractérisation logique de données : application aux données biologiques." Thesis, Angers, 2017. http://www.theses.fr/2017ANGE0030/document.
Full textAnalysis of groups of binary data is now a challenge given the amount of collected data. It can be achieved by logical based approaches. These approaches identify subsets of relevant Boolean attributes to characterize the observations of a group and may help the user to better understand the properties of this group. This thesis presents an approach for characterizing groups of binary data by identifying a minimal subset of attributes that allows to distinguish data from different groups. We have precisely defined the multiple characterization problem and proposed new algorithms that can be used to solve its different variants. Our data characterization approach can be extended to search for patterns in the framework of logical analysis of data. A pattern can be considered as a partial explanation of the positive observations that can be used by practitioners, for instance for diagnosis purposes. Many patterns may exist and several preference criteria can be added in order to focus on more restricted sets of patterns (prime patterns, strong patterns, . . . ). We propose a comparison between these two methodologies as well as algorithms for generating patterns. The purpose is also to precisely study the properties of the solutions that are computed with regards to the topological properties of the instances. Experiments are thus conducted on real biological data
Castanié, Laurent. "Visualisation de données volumiques massives : application aux données sismiques." Thesis, Vandoeuvre-les-Nancy, INPL, 2006. http://www.theses.fr/2006INPL083N/document.
Full textSeismic reflection data are a valuable source of information for the three-dimensional modeling of subsurface structures in the exploration-production of hydrocarbons. This work focuses on the implementation of visualization techniques for their interpretation. We face both qualitative and quantitative challenges. It is indeed necessary to consider (1) the particular nature of seismic data and the interpretation process (2) the size of data. Our work focuses on these two distinct aspects : 1) From the qualitative point of view, we first highlight the main characteristics of seismic data. Based on this analysis, we implement a volume visualization technique adapted to the specificity of the data. We then focus on the multimodal aspect of interpretation which consists in combining several sources of information (seismic and structural). Depending on the nature of these sources (strictly volumes or both volumes and surfaces), we propose two different visualization systems. 2) From the quantitative point of view, we first define the main hardware constraints involved in seismic interpretation. Focused on these constraints, we implement a generic memory management system. Initially able to couple visualization and data processing on massive data volumes, it is then improved and specialised to build a dynamic system for distributed memory management on PC clusters. This later version, dedicated to visualization, allows to manipulate regional scale seismic data (100-200 GB) in real-time. The main aspects of this work are both studied in the scientific context of visualization and in the application context of geosciences and seismic interpretation
Gomes, Da Silva Alzennyr. "Analyse des données évolutives : application aux données d'usage du Web." Phd thesis, Université Paris Dauphine - Paris IX, 2009. http://tel.archives-ouvertes.fr/tel-00445501.
Full textCastelltort, Arnaud. "Historisation de données dans les bases de données NoSQLorientées graphes." Thesis, Montpellier 2, 2014. http://www.theses.fr/2014MON20076.
Full textThis thesis deals with data historization in the context of graphs. Graph data have been dealt with for many years but their exploitation in information systems, especially in NoSQL engines, is recent. The emerging Big Data and 3V contexts (Variety, Volume, Velocity) have revealed the limits of classical relational databases. Historization, on its side, has been considered for a long time as only linked with technical and backups issues, and more recently with decisional reasons (Business Intelligence). However, historization is now taking more and more importance in management applications.In this framework, graph databases that are often used have received little attention regarding historization. Our first contribution consists in studying the impact of historized data in management information systems. This analysis relies on the hypothesis that historization is taking more and more importance. Our second contribution aims at proposing an original model for managing historization in NoSQL graph databases.This proposition consists on the one hand in elaborating a unique and generic system for representing the history and on the other hand in proposing query features.We show that the system can support both simple and complex queries.Our contributions have been implemented and tested over synthetic and real databases
Marine, Cadoret. "Analyse factorielle de données de catégorisation. : Application aux données sensorielles." Rennes, Agrocampus Ouest, 2010. http://www.theses.fr/2010NSARG006.
Full textIn sensory analysis, holistic approaches in which objects are considered as a whole are increasingly used to collect data. Their interest comes on a one hand from their ability to acquire other types of information as the one obtained by traditional profiling methods and on the other hand from the fact they require no special skills, which makes them feasible by all subjects. Categorization (or free sorting), in which subjects are asked to provide a partition of objects, belongs to these approaches. The first part of this work focuses on categorization data. After seeing that this method of data collection is relevant, we focus on the statistical analysis of these data through the research of Euclidean representations. The proposed methodology which consists in using factorial methods such as Multiple Correspondence Analysis (MCA) or Multiple Factor Analysis (MFA) is also enriched with elements of validity. This methodology is then illustrated by the analysis of two data sets obtained from beers on a one hand and perfumes on the other hand. The second part is devoted to the study of two data collection methods related to categorization: sorted Napping® and hierarchical sorting. For both data collections, we are also interested in statistical analysis by adopting an approach similar to the one used for categorization data. The last part is devoted to the implementation in the R software of functions to analyze the three kinds of data that are categorization data, hierarchical sorting data and sorted Napping® data
Gomes, da Silva Alzennyr. "Analyse des données évolutives : Application aux données d'usage du Web." Paris 9, 2009. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2009PA090047.
Full textNowadays, more and more organizations are becoming reliant on the Internet. The Web has become one of the most widespread platforms for information change and retrieval. The growing number of traces left behind user transactions (e. G. : customer purchases, user sessions, etc. ) automatically increases the importance of usage data analysis. Indeed, the way in which a web site is visited can change over time. These changes can be related to some temporal factors (day of the week, seasonality, periods of special offer, etc. ). By consequence, the usage models must be continuously updated in order to reflect the current behaviour of the visitors. Such a task remains difficult when the temporal dimension is ignored or simply introduced into the data description as a numeric attribute. It is precisely on this challenge that the present thesis is focused. In order to deal with the problem of acquisition of real usage data, we propose a methodology for the automatic generation of artificial usage data over which one can control the occurrence of changes and thus, analyse the efficiency of a change detection system. Guided by tracks born of some exploratory analyzes, we propose a tilted window approach for detecting and following-up changes on evolving usage data. In order measure the level of changes, this approach applies two external evaluation indices based on the clustering extension. The proposed approach also characterizes the changes undergone by the usage groups (e. G. Appearance, disappearance, fusion and split) at each timestamp. Moreover, the refereed approach is totally independent of the clustering method used and is able to manage different kinds of data other than usage data. The effectiveness of this approach is evaluated on artificial data sets of different degrees of complexity and also on real data sets from different domains (academic, tourism, e-business and marketing)
Martin, Marie-Laure. "Données de survie." Paris 11, 2001. http://www.theses.fr/2001PA112335.
Full textWe consider two statistical problems arising during the estimation of the hazard function of cancer death in Hiroshima. The first problem is the estimation of the hazard function when the covariate is mismeasured. In Chapter 2, only grouped data are available, and the mismeasurement of the covariate is modeled as a misclassification. An easily implemented estimation procedure based on a generalization of the least squares method is devised for estimating simultaneously the parameters of the hazard function and the misclassification probabilities. The procedure is applied for taking into account the mismeasurement of the dose of radiation in the estimation of the hazard function of solid cancer death in Hiroshima. In Chapter 3 available data are individual data. We consider a model of excess relative risk, and we assume that the covariate is measured with a Gaussian additive error. We propose an estimation criterion based on the partial log-likelihood, and we show that the estimator obtained by maximization of this criterion is consistent and asymptotically Gaussian. Our result extends to other polynomial regression functions, to the Cox model and to the log-normal error model. The second problem is the non-parametric estimation of the hazard function. We consider the model of excess relative and absolute risk and propose a non-parametric estimation of the effect of the covariate using a model selection procedure, when available data are stratified data. We approximate the function of the covariate by a collection of spline functions, and select the best one according to Akaike Information Criterion. By the same way we choose which model between the model of excess relative risk or excess absolute risk fits the best the data. We apply our method for estimating the solid cancer and leukemia death hazard functions in Hiroshima
Gaumer, Gaëtan. "Résumé de données en extraction de connaissances à partir des données (ECD) : application aux données relationnelles et textuelles." Nantes, 2003. http://www.theses.fr/2003NANT2025.
Full textVoisard, Agnès. "Bases de données géographiques : du modèle de données à l'interface utilisateur." Paris 11, 1992. http://www.theses.fr/1992PA112354.
Full textPérinel, Emmanuel. "Segmentation en analyse de données symboliques : le cas de données probabilistes." Paris 9, 1996. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1996PA090079.
Full textMichel, Franck. "Intégrer des sources de données hétérogènes dans le Web de données." Thesis, Université Côte d'Azur (ComUE), 2017. http://www.theses.fr/2017AZUR4002/document.
Full textTo a great extent, the success of the Web of Data depends on the ability to reach out legacy data locked in silos inaccessible from the web. In the last 15 years, various works have tackled the problem of exposing various structured data in the Resource Description Format (RDF). Meanwhile, the overwhelming success of NoSQL databases has made the database landscape more diverse than ever. NoSQL databases are strong potential contributors of valuable linked open data. Hence, the object of this thesis is to enable RDF-based data integration over heterogeneous data sources and, in particular, to harness NoSQL databases to populate the Web of Data. We propose a generic mapping language, xR2RML, to describe the mapping of heterogeneous data sources into an arbitrary RDF representation. xR2RML relies on and extends previous works on the translation of RDBs, CSV/TSV and XML into RDF. With such an xR2RML mapping, we propose either to materialize RDF data or to dynamically evaluate SPARQL queries on the native database. In the latter, we follow a two-step approach. The first step performs the translation of a SPARQL query into a pivot abstract query based on the xR2RML mapping of the target database to RDF. In the second step, the abstract query is translated into a concrete query, taking into account the specificities of the database query language. Great care is taken of the query optimization opportunities, both at the abstract and the concrete levels. To demonstrate the effectiveness of our approach, we have developed a prototype implementation for MongoDB, the popular NoSQL document store. We have validated the method using a real-life use case in Digital Humanities
Hajjar, Chantal. "Cartes auto-organisatrices pour la classification de données symboliques mixtes, de données de type intervalle et de données discrétisées." Thesis, Supélec, 2014. http://www.theses.fr/2014SUPL0066/document.
Full textThis thesis concerns the clustering of symbolic data with bio-inspired geometric methods, more specifically with Self-Organizing Maps. We set up several learning algorithms for the self-organizing maps in order to cluster mixed-feature symbolic data as well as interval-valued data and binned data. Several simulated and real symbolic data sets, including two sets built as part of this thesis, are used to test the proposed methods. In addition, we propose a self-organizing map for binned data in order to accelerate the learning of standard maps, and we use the proposed method for image segmentation
Lahbib, Dhafer. "Préparation non paramétrique des données pour la fouille de données multi-tables." Phd thesis, Université de Cergy Pontoise, 2012. http://tel.archives-ouvertes.fr/tel-00854142.
Full textBenzine, Mehdi. "Combinaison sécurisée des données publiques et sensibles dans les bases de données." Versailles-St Quentin en Yvelines, 2010. http://www.theses.fr/2010VERS0024.
Full textProtection of sensitive data is a major issue in the databases field. Many software and hardware solutions have been designed to protect data when stored and during query processing. Moreover, it is also necessary to provide a secure manner to combine sensitive data with public data. To achieve this goal, we designed a new storage and processing architecture. Our solution combines a main server that stores public data and a secure server dedicated to the storage and processing of sensitive data. The secure server is a hardware token which is basically a combination of (i) a secured microcontroller and (ii) a large external NAND Flash memory. The queries which combine public and sensitive data are split in two sub queries, the first one deals with the public data, the second one deals with the sensitive data. Each sub query is processed on the server storing the corresponding data. Finally, the data obtained by the computation of the sub query on public data is sent to the secure server to be mixed with the result of the computation on sensitive data. For security reasons, the final result is built on the secure server. This architecture resolves the security problems, because all the computations dealing with sensitive data are done by the secure server, but brings performance problems (few RAM, asymmetric cost of read/write operations. . . ). These problems will be solved by different strategies of query optimization
Bazin, Cyril. "Tatouage de données géographiques et généralisation aux données devant préserver des contraintes." Caen, 2010. http://www.theses.fr/2010CAEN2006.
Full textDigital watermaking is a fundamental process for intellectual property protection. It consists in inserting a mark into a digital document by slightly modifications. The presence of this mark allows the owner of a document to prove the priority of his rights. The originality of our work is twofold. In one hand, we use a local approach to ensure a priori that the quality of constrained documents is preserved during the watermark insertion. On the other hand, we propose a generic watermarking scheme. The manuscript is divided in three parts. Firstly, we introduce the basic concepts of digital watermarking for constrainted data and the state of the art of geographical data watermarking. Secondly, we present our watermarking scheme for digital vectorial maps often used in geographic information systems. This scheme preserves some topological and metric qualities of the document. The watermark is robust, it is resilient against geometric transformations and cropping. We give an efficient implementation that is validated by many experiments. Finally, we propose a generalization of the scheme for constrainted data. This generic scheme will facilitate the design of watermarking schemes for new data type. We give a particular example of application of a generic schema for relational databases. In order to prove that it is possible to work directly on the generic scheme, we propose two detection protocols straightly applicable on any implementation of generic scheme
Léonard, Michel. "Conception d'une structure de données dans les environnements de bases de données." Grenoble 1, 1988. http://tel.archives-ouvertes.fr/tel-00327370.
Full textHaddad, Raja. "Apprentissage supervisé de données symboliques et l'adaptation aux données massives et distribuées." Thesis, Paris Sciences et Lettres (ComUE), 2016. http://www.theses.fr/2016PSLED028/document.
Full textThis Thesis proposes new supervised methods for Symbolic Data Analysis (SDA) and extends this domain to Big Data. We start by creating a supervised method called HistSyr that converts automatically continuous variables to the most discriminant histograms for classes of individuals. We also propose a new method of symbolic decision trees that we call SyrTree. SyrTree accepts many types of inputs and target variables and can use all symbolic variables describing the target to construct the decision tree. Finally, we extend HistSyr to Big Data, by creating a distributed method called CloudHistSyr. Using the Map/Reduce framework, CloudHistSyr creates of the most discriminant histograms for data too big for HistSyr. We tested CloudHistSyr on Amazon Web Services. We show the efficiency of our method on simulated data and on actual car traffic data in Nantes. We conclude on overall utility of CloudHistSyr which, through its results, allows the study of massive data using existing symbolic analysis methods
Baghdadli, Amaria. "Syndrome d'Asperger : données actuelles." Montpellier 1, 1996. http://www.theses.fr/1996MON11104.
Full textMarion, Cécile. "Données pharmacologiques du sumatriptan." Paris 5, 1993. http://www.theses.fr/1993PA05P051.
Full textMedina, Marquez Alejandro. "L'analyse des données évolutives." Paris 9, 1985. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1985PA090022.
Full textSirdey, Christine. "Céruloplasmine : quelques données récentes." Paris 5, 1988. http://www.theses.fr/1988PA05P084.
Full textYkhlef, Mourad. "Interrogation des données semistructurées." Bordeaux 1, 1999. http://www.theses.fr/1999BOR1A640.
Full textArenou, Frédéric. "Contribution à la validation statistique des données d'Hipparcos : catalogue d'entrée et données préliminaires." Phd thesis, Observatoire de Paris, 1993. http://tel.archives-ouvertes.fr/tel-00010577.
Full textRaïssi, Chedy. "Extraction de Séquences Fréquentes : Des Bases de Données Statiques aux Flots de Données." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2008. http://tel.archives-ouvertes.fr/tel-00351626.
Full textRaissi, Chedy. "Extraction de séquences fréquentes : des bases de données statiques aux flots de données." Montpellier 2, 2008. http://www.theses.fr/2008MON20063.
Full textTran, Ba-Huy. "Une approche sémantique pour l’exploitation de données environnementales : application aux données d’un observatoire." Thesis, La Rochelle, 2017. http://www.theses.fr/2017LAROS025.
Full textThe need to collect long-term observations for research on environmental issues led to the establishment of "Zones Ateliers" by the CNRS. Thus, for several years, many databases of a spatio-temporal nature are collected by different teams of researchers. To facilitate transversal analysis of different observations, it is desirable to cross-reference information from these data sources. Nevertheless, these sources are constructed independently of each other, which raise problems of data heterogeneity in the analysis.Therefore, this thesis proposes to study the potentialities of ontologies as both objects of modeling, inference, and interoperability. The aim is to provide experts in the field with a suitable method for exploiting heterogeneous data. Being applied in the environmental domain, ontologies must take into account the spatio-temporal characteristics of these data. As the need for modeling concepts and spatial and temporal operators, we rely on the solution of reusing the ontologies of time and space. Then, a spatial-temporal data integration approach with a reasoning mechanism on the relations of these data has been introduced. Finally, data mining methods have been adapted to spatio-temporal RDF data to discover new knowledge from the knowledge-base. The approach was then applied within the Geminat prototype, which aims to help understand farming practices and their relationships with the biodiversity in the "zone atelier Plaine and Val de Sèvre". From data integration to knowledge analysis, it provides the necessary elements to exploit heterogeneous spatio-temporal data as well as to discover new knowledge
Bard, Sylvain. "Méthode d'évaluation de la qualité de données géographiques généralisées : application aux données urbaines." Paris 6, 2004. http://www.theses.fr/2004PA066004.
Full textLaurent, Anne. "Bases de données multidimensionnelles floues et leur utilisation pour la fouille de données." Paris 6, 2002. http://www.theses.fr/2002PA066426.
Full textTos, Uras. "Réplication de données dans les systèmes de gestion de données à grande échelle." Thesis, Toulouse 3, 2017. http://www.theses.fr/2017TOU30066/document.
Full textIn recent years, growing popularity of large-scale applications, e.g. scientific experiments, Internet of things and social networking, led to generation of large volumes of data. The management of this data presents a significant challenge as the data is heterogeneous and distributed on a large scale. In traditional systems including distributed and parallel systems, peer-to-peer systems and grid systems, meeting objectives such as achieving acceptable performance while ensuring good availability of data are major challenges for service providers, especially when the data is distributed around the world. In this context, data replication, as a well-known technique, allows: (i) increased data availability, (ii) reduced data access costs, and (iii) improved fault-tolerance. However, replicating data on all nodes is an unrealistic solution as it generates significant bandwidth consumption in addition to exhausting limited storage space. Defining good replication strategies is a solution to these problems. The data replication strategies that have been proposed for the traditional systems mentioned above are intended to improve performance for the user. They are difficult to adapt to cloud systems. Indeed, cloud providers aim to generate a profit in addition to meeting tenant requirements. Meeting the performance expectations of the tenants without sacrificing the provider's profit, as well as managing resource elasticities with a pay-as-you-go pricing model, are the fundamentals of cloud systems. In this thesis, we propose a data replication strategy that satisfies the requirements of the tenant, such as performance, while guaranteeing the economic profit of the provider. Based on a cost model, we estimate the response time required to execute a distributed database query. Data replication is only considered if, for any query, the estimated response time exceeds a threshold previously set in the contract between the provider and the tenant. Then, the planned replication must also be economically beneficial to the provider. In this context, we propose an economic model that takes into account both the expenditures and the revenues of the provider during the execution of any particular database query. Once the data replication is decided to go through, a heuristic placement approach is used to find the placement for new replicas in order to reduce the access time. In addition, a dynamic adjustment of the number of replicas is adopted to allow elastic management of resources. Proposed strategy is validated in an experimental evaluation carried out in a simulation environment. Compared with another data replication strategy proposed in the cloud systems, the analysis of the obtained results shows that the two compared strategies respond to the performance objective for the tenant. Nevertheless, a replica of data is created, with our strategy, only if this replication is profitable for the provider
Barrois, Olivier. "Assimilation de données et modélisation stochastique dans la réanalyse des données géomagnétiques satellitaires." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAU030/document.
Full textThis thesis, entitled {sc Data Assimilation and Stochastic Modelling in Geomagnetic Satellite Data Reanalysis}, intends to retrieve information on the state of the Earth's core at the Core-Mantle-Boundary, by combining, first, spatial constraints coming from direct numerical simulations, and second, temporal information coming from stochastic equations.This purpose is achieved through inverse methods and a data assimilation augmented state algorithm.The proposed algorithm is designed to be flexible, textit{i.e.} able to integrate several types of data or constraints, and to be simple, textit{i.e.} with low computation time and easy to modify.This work fits in with the other studies on the geomagnetic data assimilation of the community, and with the opportunity to use the last satellite data from Swarm spacecraft (2014-....).We have worked in collaboration with Julien Aubert (IPGP), who has provided the spatial constraints from Coupled-Earth dynamo, and with Christopher C. Finlay (DTU) and Magnus Hammer (DTU), who have provided the satellites and ground observatories data.The major outcomes of this thesis are the design of a functional algorithm, validated through synthetic twin experiments (published), and applied, first, to the Gauss coefficients of a geomagnetic model, and second, to the measures of the CHAMP and Swarm missions.My algorithm is able to retrieve information, not only on the measured quantities, but also on the unobserved quantities like the core flows or the magnetic diffusion.This work has led to the production of a magnetic field and core flows model at the core surface which is not classically regularized.The geomagnetic field model shows results that are globally similar to the CHAOS-6 reference field model, and that are coherent with the other studies of the community.Thus, the maps of the magnetic field and the velocity field obtained, confirm that the dipole decay is principally driven by advection, and display the persistent presence of the Atlantic gyre associated with a Pacific hemisphere less energetic.The inverted magnetic diffusion is concentrated under Indonesia and Indian Ocean.Fundamentally, my thesis demonstrate the importance of taking into account the modelling errors in the geomagnetic data assimilation, which leads to strong biases and an underestimation of the textit{a posteriori} errors when those errors are neglected.Finally, the work presented in this manuscript is preliminary, and it paves the way toward an increased use of the satellite data, with in particular, the free release of my code in order to compared the results with the ones obtained by the community
Falip, Joris. "Structuration de données multidimensionnelles : une approche basée instance pour l'exploration de données médicales." Thesis, Reims, 2019. http://www.theses.fr/2019REIMS014/document.
Full textA posteriori use of medical data accumulated by practitioners represents a major challenge for clinical research as well as for personalized patient follow-up. However, health professionals lack the appropriate tools to easily explore, understand and manipulate their data. To solve this, we propose an algorithm to structure elements by similarity and representativeness. This method allows individuals in a dataset to be grouped around representative and generic members who are able to subsume the elements and summarize the data. This approach processes each dimension individually before aggregating the results and is adapted to high-dimensional data and also offers transparent, interpretable and explainable results. The results we obtain are suitable for exploratory analysis and reasoning by analogy: the structure is similar to the organization of knowledge and decision-making process used by experts. We then propose an anomaly detection algorithm that allows complex and high-dimensional anomalies to be detected by analyzing two-dimensional projections. This approach also provides interpretable results. We evaluate these two algorithms on real and simulated high-dimensional data with up to thousands of dimensions. We analyze the properties of graphs resulting from the structuring of elements. We then describe a medical data pre-processing tool and a web application for physicians. Through this intuitive tool, we propose a visual structure of the elements to ease the exploration. This decision support prototype assists medical diagnosis by allowing the physician to navigate through the data and explore similar patients. It can also be used to test clinical hypotheses on a cohort of patients
Chardonnens, Anne. "La gestion des données d'autorité archivistiques dans le cadre du Web de données." Doctoral thesis, Universite Libre de Bruxelles, 2020. https://dipot.ulb.ac.be/dspace/bitstream/2013/315804/5/Contrat.pdf.
Full textThe subject of this thesis is the management of authority records for persons. The research was conducted in an archival context in transition, which was marked by the evolution of international standards of archival description and a shift towards the application of knowledge graphs. The aim of this thesis is to explore how the archival sector can benefit from the developments concerning Linked Data in order to ensure the sustainable management of authority records. Attention is not only devoted to the creation of the records and how they are made available but also to their maintenance and their interlinking with other resources.The first part of this thesis addresses the state of the art of the developments concerning the international standards of archival description as well as those regarding the Wikibase ecosystem. The second part presents an analysis of the possibilities and limits associated with an approach in which the free software Wikibase is used. The analysis is based on an empirical study carried out with data of the Study and Documentation Centre War and Contemporary Society (CegeSoma). It explores the options that are available to institutions that have limited resources and that have not yet implemented Linked Data. Datasets that contain information of people linked to the Second World War were used to examine the different stages involved in the publication of data as Linked Open Data.The experiment carried out in the second part of the thesis shows how a knowledge base driven by software such as Wikibase streamlines the creation of multilingual structured authority data. Examples illustrate how these entities can then be reused and enriched by using external data in interfaces aimed at the general public. This thesis highlights the possibilities of Wikibase, particularly in the context of data maintenance, without ignoring the limitations associated with its use. Due to its empirical nature and the formulated recommendations, this thesis contributes to the efforts and reflections carried out within the framework of the transition of archival metadata.
Doctorat en Information et communication
info:eu-repo/semantics/nonPublished
Sansen, Joris. "La visualisation d’information pour les données massives : une approche par l’abstraction de données." Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0636/document.
Full textThe evolution and spread of technologies have led to a real explosion of information and our capacity to generate data and our need to analyze them have never been this strong. Still, the problems raised by such accumulation (storage, computation delays, diversity, speed of gathering/generation, etc. ) is as strong as the data are big, complex and varied. Information visualization,by its ability to summarize and abridge data was naturally established as appropriate approach. However, it does not solve the problem raised by Big Data. Actually, classical visualization techniques are rarely designed to handle such mass of information. Moreover, the problems raised by data storage and computation time have repercussions on the analysis system. For example,the increasing distance between the data and the analyst : the place where the data is stored and the place where the user will perform the analyses arerarely close. In this thesis, we focused on these issues and more particularly on adapting the information visualization techniques for Big Data. First of all focus on relational data : how does the existence of a relation between entity istransmitted and how to improve this transmission for hierarchical data. Then,we focus on multi-variate data and how to handle their complexity for the required computations. Finally, we present the methods we designed to make our techniques compatible with Big Data
Sautot, Lucile. "Conception et implémentation semi-automatique des entrepôts de données : application aux données écologiques." Thesis, Dijon, 2015. http://www.theses.fr/2015DIJOS055/document.
Full textThis thesis concerns the semi-automatic design of data warehouses and the associated OLAP cubes analyzing ecological data.The biological sciences, including ecology and agronomy, generate data that require an important collection effort: several years are often required to obtain a complete data set. Moreover, objects and phenomena studied by these sciences are complex and require many parameter recording to be understood. Finally, the collection of complex data over a long time results in an increased risk of inconsistency. Thus, these sciences generate numerous and heterogeneous data, which can be inconsistent. It is interesting to offer to scientists, who work in life sciences, information systems able to store and restore their data, particularly when those data have a significant volume. Among the existing tools, business intelligence tools, including online analytical systems (On-Line Analytical processing: OLAP), particularly caught our attention because it is data analysis process working on large historical collections (i.e. a data warehouse) to provide support to the decision making. The business intelligence offers tools that allow users to explore large volumes of data, in order to discover patterns and knowledge within the data, and possibly confirm their hypotheses.However, OLAP systems are complex information systems whose implementation requires advanced skills in business intelligence. Thus, although they have interesting features to manage and analyze multidimensional data, their complexity makes them difficult to manage by potential users, who would not be computer scientists.In the literature, several studies have examined the automatic multidimensional design, but the examples provided by theses works were traditional data. Moreover, other articles address the multidimensional modeling adapted to complex data (inconsistency, heterogeneous data, spatial objects, texts, images within a warehouse ...) but the proposed methods are rarely automatic. The aim of this thesis is to provide an automatic design method of data warehouse and OLAP cubes. This method must be able to take into account the inherent complexity of biological data. To test the prototypes, that we proposed in this thesis, we have prepared a data set concerning bird abundance along the Loire. This data set is structured as follows: (1) we have the census of 213 bird species (described with a set of qualitative factors, such as diet) in 198 points along the river for 4 census campaigns; (2) each of the 198 points is described by a set of environmental variables from different sources (land surveys, satellite images, GIS). These environmental variables address the most important issue in terms of multidimensional modeling. These data come from different sources, sometimes independent of bird census campaigns, and are inconsistent in time and space. Moreover, these data are heterogeneous: they can be qualitative factors, quantitative varaibles or spatial objects. Finally, these environmental data include a large number of attributes (158 selected variables) (...)
Ben, salem Aïcha. "Qualité contextuelle des données : détection et nettoyage guidés par la sémantique des données." Thesis, Sorbonne Paris Cité, 2015. http://www.theses.fr/2015USPCD054/document.
Full textNowadays, complex applications such as knowledge extraction, data mining, e-learning or web applications use heterogeneous and distributed data. The quality of any decision depends on the quality of the used data. The absence of rich, accurate and reliable data can potentially lead an organization to make bad decisions.The subject covered in this thesis aims at assisting the user in its quality ap-proach. The goal is to better extract, mix, interpret and reuse data. For this, the data must be related to its semantic meaning, data types, constraints and comments.The first part deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.The second part is the data cleansing using the reports on anomalies returned by the first part. It allows corrections to be made within a column itself (data homogeni-zation), between columns (semantic dependencies), and between lines (eliminating duplicates and similar data). Throughout all this process, recommendations and analyses are provided to the user
Kerhervé, Brigitte. "Vues relationnelles : implantation dans les systèmes de gestion de bases de données centralisés et répartis." Paris 6, 1986. http://www.theses.fr/1986PA066090.
Full textBoullé, Marc. "Recherche d'une représentation des données efficace pour la fouille des grandes bases de données." Phd thesis, Télécom ParisTech, 2007. http://pastel.archives-ouvertes.fr/pastel-00003023.
Full textEl, Golli Aïcha. "Extraction de données symboliques et cartes topologiques: application aux données ayant une structure complexe." Phd thesis, Université Paris Dauphine - Paris IX, 2004. http://tel.archives-ouvertes.fr/tel-00178900.
Full textCuré, Olivier. "Relations entre bases de données et ontologies dans le cadre du web des données." Habilitation à diriger des recherches, Université Paris-Est, 2010. http://tel.archives-ouvertes.fr/tel-00843284.
Full textBoutin, Denis. "Outils de mise à jour de données distribuées application aux données à référence spatiale." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape7/PQDD_0016/MQ56871.pdf.
Full textCharmpi, Konstantina. "Méthodes statistiques pour la fouille de données dans les bases de données de génomique." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GRENM017/document.
Full textOur focus is on statistical testing methods, that compare a given vector of numeric values, indexed by all genes in the human genome, to a given set of genes, known to be associated to a particular type of cancer for instance. Among existing methods, Gene Set Enrichment Analysis is the most widely used. However it has several drawbacks. Firstly, the calculation of p-values is very much time consuming, and insufficiently precise. Secondly, like most other methods, it outputs a large number of significant results, the majority of which are not biologically meaningful. The two issues are addressed here, by two new statistical procedures, the Weighted and Doubly Weighted Kolmogorov-Smirnov tests. The two tests have been applied both to simulated and real data, and compared with other existing procedures. Our conclusion is that, beyond their mathematical and algorithmic advantages, the WKS and DWKS tests could be more informative in many cases, than the classical GSEA test and efficiently address the issues that have led to their construction
Barhoumi, Mohamed Adel. "Traitement des données manquantes dans les données de panel : cas des variables dépendantes dichotomiques." Thesis, Université Laval, 2006. http://www.theses.ulaval.ca/2006/23619/23619.pdf.
Full textAouiche, Kamel. "Techniques de fouille de données pour l'optimisation automatique des performances des entrepôts de données." Lyon 2, 2005. http://theses.univ-lyon2.fr/documents/lyon2/2005/aouiche_k.
Full textWith the development of databases in general and data warehouses in particular, it becomes very important to reduce the function of administration. The aim of auto-administrative systems is administrate and adapt themselves automatically, without loss or even with a gain in performance. The idea of using data mining techniques to extract useful knowledge for administration from the data themselves has been in the air for some years. However, no research has ever been achieved. As for as we know, it nevertheless remains a very promising approach, notably in the field of the data warehousing, where the queries are very heterogeneous and cannot be interpreted easily. The aim of this thesis is to study auto-administration techniques in databases and data warehouses, mainly performance optimization techniques such as indexing and view materialization, and to look for a way of extracting from stored data themselves useful knowledge to apply these techniques. We have designed a tool that finds an index and view configuration allowing to optimize data access time. Our tool searches frequent itemsets in a given workload and clusters the query workload to compute this index and view configuration. Finally, we have extended the performance optimization to XML data warehouses. In this area, we proposed an indexing technique that precomputes joins between XML facts and dimensions and adapted our materialized view selection strategy for XML materialized views
AMMACHE, ZAKARIA. "L'alimentation entérale ultra précoce chez le brûlé : données expérimentales : données cliniques : étude de tolérance." Toulouse 3, 1992. http://www.theses.fr/1992TOU31062.
Full textRipoche, Hugues. "Une construction interactive d'interprétations de données : application aux bases de données de séquences génétiques." Montpellier 2, 1995. http://www.theses.fr/1995MON20248.
Full textEl, Golli Aicha. "Extraction de données symboliques et cartes topologiques : Application aux données ayant une structure complexe." Paris 9, 2004. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2004PA090026.
Full textKezouit, Omar Abdelaziz. "Bases de données relationnelles et analyse de données : conception et réalisation d'un système intégré." Paris 11, 1987. http://www.theses.fr/1987PA112130.
Full textZelasco, José Francisco. "Gestion des données : contrôle de qualité des modèles numériques des bases de données géographiques." Thesis, Montpellier 2, 2010. http://www.theses.fr/2010MON20232.
Full textA Digital Surface Model (DSM) is a numerical surface model which is formed by a set of points, arranged as a grid, to study some physical surface, Digital Elevation Models (DEM), or other possible applications, such as a face, or some anatomical organ, etc. The study of the precision of these models, which is of particular interest for DEMs, has been the object of several studies in the last decades. The measurement of the precision of a DSM model, in relation to another model of the same physical surface, consists in estimating the expectancy of the squares of differences between pairs of points, called homologous points, one in each model which corresponds to the same feature of the physical surface. But these pairs are not easily discernable, the grids may not be coincident, and the differences between the homologous points, corresponding to benchmarks in the physical surface, might be subject to special conditions such as more careful measurements than on ordinary points, which imply a different precision. The generally used procedure to avoid these inconveniences has been to use the squares of vertical distances between the models, which only address the vertical component of the error, thus giving a biased estimate when the surface is not horizontal. The Perpendicular Distance Evaluation Method (PDEM) which avoids this bias, provides estimates for vertical and horizontal components of errors, and is thus a useful tool for detection of discrepancies in Digital Surface Models (DSM) like DEMs. The solution includes a special reference to the simplification which arises when the error does not vary in all horizontal directions. The PDEM is also assessed with DEM's obtained by means of the Interferometry SAR Technique
Buslig, Leticia. "Méthodes stochastiques de modélisation de données : application à la reconstruction de données non régulières." Thesis, Aix-Marseille, 2014. http://www.theses.fr/2014AIXM4734/document.
Full textNguyen, Thanh Binh. "L'interrogation du web de données garantissant des réponses valides par rapport à des critères donnés." Thesis, Orléans, 2018. http://www.theses.fr/2018ORLE2053/document.
Full textThe term Linked Open Data (LOD) is proposed the first time by Tim Berners-Lee since 2006.Since then, LOD has evolved impressively with thousands datasets on the Web of Data, which has raised a number of challenges for the research community to retrieve and to process LOD.In this thesis, we focus on the problem of quality of retrieved data from various sources of the LOD and we propose a context-driven querying system that guarantees the quality of answers with respect to the quality context defined by users. We define a fragment of constraints and propose two approaches: the naive and the rewriting, which allows us to filter dynamically valid answers at the query time instead of validating them at the data source level. The naive approach performs the validation process by generating and evaluating sub-queries for each candidate answer w.r.t. each constraint. While the rewriting approach uses constraints as rewriting rules to reformulate query into a set of auxiliary queries such that the answers of rewritten-queries are not only the answers of the query but also valid answers w.r.t. all integrated constraints. The proof of the correction and completeness of our rewriting system is presented after formalizing the notion of a valid answers w.r.t. a context. These two approaches have been evaluated and have shown the feasibility of our system.This is our main contribution: we extend the set of well-known query-rewriting systems (Chase, Chase& backchase, PerfectRef, Xrewrite, etc.) with a new effective solution for the new purpose of filtering query results based on constraints in user context. Moreover, we also enlarge the trigger condition of the constraint compared with other works by using the notion of one-way MGU