Dissertations / Theses on the topic 'Qualità dei dati'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Qualità dei dati.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Morlini, Gabriele. "Analisi della qualità dei dati in un’enterprise architecture utilizzando un sistema d’inferenza." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2012. http://amslaurea.unibo.it/3398/.
Full textCristofaro, Roberta. "Analisi dell'effetto della qualità dei dati meteorologici sulla simulazione a lungo raggio della dispersione in atmosfera di inquinanti radioattivi." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020.
Find full textZucchi, Erica. "La qualità del binario nelle linee AV/AC: studio dei dati rilevati dai treni diagnostici di RFI e analisi degli interventi manutentivi in previsione dell'aumento di velocità a 360 km/h." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amslaurea.unibo.it/6767/.
Full textCacchi, Alberto. "Valutazione dell'attività fisica tramite l'uso del Global Positioning System." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2017.
Find full textKara, Madjid. "Data quality for the decision of the ambient systems." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLV009.
Full textData quality is a common condition to all information technology projects; it has become a complex research domain with the multiplicity and expansion of different data sources. Researchers have studied the axis of modeling and evaluating data, several approaches have been proposed but they are limited to a specific use field and did not offer a quality profile enabling us to evaluate a global quality model. The evaluation based on ISO quality models has emerged; however, these models do not guide us for their use, having to adapt them to each scenario without precise methods. Our work focuses on the data quality issues of an ambient system where the time constraints for decision-making is greater compared to traditional applications. The main objective is to provide the decision-making system with a very specific view of the sensors data quality. We identify the quantifiable aspects of sensors data to link them to the appropriate metrics of our specified data quality model. Our work presents the following contributions: (i) creating a generic data quality model based on several existing data quality standards, (ii) formalizing the data quality models under an ontology, which allows integrating them (of i) by specifying various links, named equivalence relations between the criteria composing these models, (iii) proposing an instantiation algorithm to extract the specified data quality model from the generic data quality models, (iv) proposing a global evaluation approach of the specified data quality model using two processes, the first one consists in executing the metrics based on sensors data and the second one recovers the result of the first process and uses the concept of fuzzy logic to evaluate the factors of our specified data quality model. Then, the expert defines weight values based on the interdependence table of the model to take account the interaction between criteria and use the aggregation procedure to get a degree of confidence value. Based on the final result, the decisional component makes an analysis to make a decision
Korcari, William. "Analisi del segnale temporale del sistema a tempo di volo dell'esperimento ALICE a LHC per le procedure di controllo di qualita dei dati." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/14607/.
Full textPeralta, Veronika. "Data Quality Evaluation in Data Integration Systems." Phd thesis, Université de Versailles-Saint Quentin en Yvelines, 2006. http://tel.archives-ouvertes.fr/tel-00325139.
Full textPeralta, Costabel Veronika del Carmen. "Data quality evaluation in data integration systems." Versailles-St Quentin en Yvelines, 2006. http://www.theses.fr/2006VERS0020.
Full textCette thèse porte sur la qualité des données dans les Systèmes d’Intégration de Données (SID). Nous nous intéressons, plus précisément, aux problèmes de l’évaluation de la qualité des données délivrées aux utilisateurs en réponse à leurs requêtes et de la satisfaction des exigences des utilisateurs en terme de qualité. Nous analysons également l’utilisation de mesures de qualité pour l’amélioration de la conception du SID et la conséquente amélioration de la qualité des données. Notre approche consiste à étudier un facteur de qualité à la fois, en analysant sa relation avec le SID, en proposant des techniques pour son évaluation et en proposant des actions pour son amélioration. Parmi les facteurs de qualité qui ont été proposés, cette thèse analyse deux facteurs de qualité : la fraîcheur et l’exactitude des données
Issa, Subhi. "Linked data quality : completeness and conciseness." Electronic Thesis or Diss., Paris, CNAM, 2019. http://www.theses.fr/2019CNAM1274.
Full textThe wide spread of Semantic Web technologies such as the Resource Description Framework (RDF) enables individuals to build their databases on the Web, to write vocabularies, and define rules to arrange and explain the relationships between data according to the Linked Data principles. As a consequence, a large amount of structured and interlinked data is being generated daily. A close examination of the quality of this data could be very critical, especially, if important research and professional decisions depend on it. The quality of Linked Data is an important aspect to indicate their fitness for use in applications. Several dimensions to assess the quality of Linked Data are identified such as accuracy, completeness, provenance, and conciseness. This thesis focuses on assessing completeness and enhancing conciseness of Linked Data. In particular, we first proposed a completeness calculation approach based on a generated schema. Indeed, as a reference schema is required to assess completeness, we proposed a mining-based approach to derive a suitable schema (i.e., a set of properties) from data. This approach distinguishes between essential properties and marginal ones to generate, for a given dataset, a conceptual schema that meets the user's expectations regarding data completeness constraints. We implemented a prototype called “LOD-CM” to illustrate the process of deriving a conceptual schema of a dataset based on the user's requirements. We further proposed an approach to discover equivalent predicates to improve the conciseness of Linked Data. This approach is based, in addition to a statistical analysis, on a deep semantic analysis of data and on learning algorithms. We argue that studying the meaning of predicates can help to improve the accuracy of results. Finally, a set of experiments was conducted on real-world datasets to evaluate our proposed approaches
Heguy, Xabier. "Extensions de BPMN 2.0 et méthode de gestion de la qualité pour l'interopérabilité des données." Thesis, Bordeaux, 2018. http://www.theses.fr/2018BORD0375/document.
Full textBusiness Process Model and Notation (BPMN) is being becoming the most used standard for business process modelling. One of the important upgrades of BPMN 2.0 with respect to BPMN 1.2 is the fact that Data Objects are now handling semantic elements. Nevertheless, BPMN doesn't enable the representation of performance measurement in the case of interoperability problems in the exchanged data object, which remains a limitation when using BPMN to express interoperability issues in enterprise processes. We propose to extend the Meta-Object Facility meta-model and the XML Schema Definition of BPMN as well as the notation in order to fill this gap. The extension, named performanceMeasurement, is defined using the BPMN Extension Mechanism. This new element will allow to represent performance measurement in the case of interoperability problems as well as interoperability concerns which have been solved. We illustrate the use of this extension with an example from a real industrial case
El, Sibai Rayane. "Sampling, qualification and analysis of data streams." Thesis, Sorbonne université, 2018. http://www.theses.fr/2018SORUS170/document.
Full textAn environmental monitoring system continuously collects and analyzes the data streams generated by environmental sensors. The goal of the monitoring process is to filter out useful and reliable information and to infer new knowledge that helps the network operator to make quickly the right decisions. This whole process, from the data collection to the data analysis, will lead to two keys problems: data volume and data quality. On the one hand, the throughput of the data streams generated has not stopped increasing over the last years, generating a large volume of data continuously sent to the monitoring system. The data arrival rate is very high compared to the available processing and storage capacities of the monitoring system. Thus, permanent and exhaustive storage of data is very expensive, sometimes impossible. On the other hand, in a real world such as sensor environments, the data are often dirty, they contain noisy, erroneous and missing values, which can lead to faulty and defective results. In this thesis, we propose a solution called native filtering, to deal with the problems of quality and data volume. Upon receipt of the data streams, the quality of the data will be evaluated and improved in real-time based on a data quality management model that we also propose in this thesis. Once qualified, the data will be summarized using sampling algorithms. In particular, we focus on the analysis of the Chain-sample algorithm that we compare against other reference algorithms such as probabilistic sampling, deterministic sampling, and weighted sampling. We also propose two new versions of the Chain-sample algorithm that significantly improve its execution time. Data streams analysis is also discussed in this thesis. We are particularly interested in anomaly detection. Two algorithms are studied: Moran scatterplot for the detection of spatial anomalies and CUSUM for the detection of temporal anomalies. We have designed a method that improves the estimation of the start time and end time of the anomaly detected in CUSUM. Our work was validated by simulations and also by experimentation on two real and different data sets: The data issued from sensors in the water distribution network provided as part of the Waves project and the data relative to the bike sharing system (Velib)
Diallo, Thierno Mahamoudou. "Discovering data quality rules in a master data management context." Thesis, Lyon, INSA, 2013. http://www.theses.fr/2013ISAL0067.
Full textDirty data continues to be an important issue for companies. The datawarehouse institute [Eckerson, 2002], [Rockwell, 2012] stated poor data costs US businesses $611 billion dollars annually and erroneously priced data in retail databases costs US customers $2.5 billion each year. Data quality becomes more and more critical. The database community pays a particular attention to this subject where a variety of integrity constraints like Conditional Functional Dependencies (CFD) have been studied for data cleaning. Repair techniques based on these constraints are precise to catch inconsistencies but are limited on how to exactly correct data. Master data brings a new alternative for data cleaning with respect to it quality property. Thanks to the growing importance of Master Data Management (MDM), a new class of data quality rule known as Editing Rules (ER) tells how to fix errors, pointing which attributes are wrong and what values they should take. The intuition is to correct dirty data using high quality data from the master. However, finding data quality rules is an expensive process that involves intensive manual efforts. It remains unrealistic to rely on human designers. In this thesis, we develop pattern mining techniques for discovering ER from existing source relations with respect to master relations. In this set- ting, we propose a new semantics of ER taking advantage of both source and master data. Thanks to the semantics proposed in term of satisfaction, the discovery problem of ER turns out to be strongly related to the discovery of both CFD and one-to-one correspondences between sources and target attributes. We first attack the problem of discovering CFD. We concentrate our attention to the particular class of constant CFD known as very expressive to detect inconsistencies. We extend some well know concepts introduced for traditional Functional Dependencies to solve the discovery problem of CFD. Secondly, we propose a method based on INclusion Dependencies to extract one-to-one correspondences from source to master attributes before automatically building ER. Finally we propose some heuristics of applying ER to clean data. We have implemented and evaluated our techniques on both real life and synthetic databases. Experiments show both the feasibility, the scalability and the robustness of our proposal
Beretta, Valentina. "évaluation de la véracité des données : améliorer la découverte de la vérité en utilisant des connaissances a priori." Thesis, IMT Mines Alès, 2018. http://www.theses.fr/2018EMAL0002/document.
Full textThe notion of data veracity is increasingly getting attention due to the problem of misinformation and fake news. With more and more published online information it is becoming essential to develop models that automatically evaluate information veracity. Indeed, the task of evaluating data veracity is very difficult for humans. They are affected by confirmation bias that prevents them to objectively evaluate the information reliability. Moreover, the amount of information that is available nowadays makes this task time-consuming. The computational power of computer is required. It is critical to develop methods that are able to automate this task.In this thesis we focus on Truth Discovery models. These approaches address the data veracity problem when conflicting values about the same properties of real-world entities are provided by multiple sources.They aim to identify which are the true claims among the set of conflicting ones. More precisely, they are unsupervised models that are based on the rationale stating that true information is provided by reliable sources and reliable sources provide true information. The main contribution of this thesis consists in improving Truth Discovery models considering a priori knowledge expressed in ontologies. This knowledge may facilitate the identification of true claims. Two particular aspects of ontologies are considered. First of all, we explore the semantic dependencies that may exist among different values, i.e. the ordering of values through certain conceptual relationships. Indeed, two different values are not necessary conflicting. They may represent the same concept, but with different levels of detail. In order to integrate this kind of knowledge into existing approaches, we use the mathematical models of partial order. Then, we consider recurrent patterns that can be derived from ontologies. This additional information indeed reinforces the confidence in certain values when certain recurrent patterns are observed. In this case, we model recurrent patterns using rules. Experiments that were conducted both on synthetic and real-world datasets show that a priori knowledge enhances existing models and paves the way towards a more reliable information world. Source code as well as synthetic and real-world datasets are freely available
Ben, salem Aïcha. "Qualité contextuelle des données : détection et nettoyage guidés par la sémantique des données." Thesis, Sorbonne Paris Cité, 2015. http://www.theses.fr/2015USPCD054/document.
Full textNowadays, complex applications such as knowledge extraction, data mining, e-learning or web applications use heterogeneous and distributed data. The quality of any decision depends on the quality of the used data. The absence of rich, accurate and reliable data can potentially lead an organization to make bad decisions.The subject covered in this thesis aims at assisting the user in its quality ap-proach. The goal is to better extract, mix, interpret and reuse data. For this, the data must be related to its semantic meaning, data types, constraints and comments.The first part deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.The second part is the data cleansing using the reports on anomalies returned by the first part. It allows corrections to be made within a column itself (data homogeni-zation), between columns (semantic dependencies), and between lines (eliminating duplicates and similar data). Throughout all this process, recommendations and analyses are provided to the user
Pol, Adrian Alan. "Machine Learning Anomaly Detection Applications to Compact Muon Solenoid Data Quality Monitoring." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASS083.
Full textThe Data Quality Monitoring of High Energy Physics experiments is a crucial and demanding task to deliver high-quality data used for physics analysis. At the Compact Muon Solenoid experiment operating at the CERN Large Hadron Collider, the current quality assessment paradigm, is based on the scrutiny of a large number of statistical tests. However, the ever increasing detector complexity and the volume of monitoring data call for a growing paradigm shift. Here, Machine Learning techniques promise a breakthrough. This dissertation deals with the problem of automating Data Quality Monitoring scrutiny with Machine Learning Anomaly Detection methods. The high-dimensionality of the data precludes the usage of classic detection methods, pointing to novel ones, based on deep learning. Anomalies caused by detector malfunctioning are difficult to enumerate a priori and rare, limiting the amount of labeled data. This thesis explores the landscape of existing algorithms with particular attention to semi-supervised problems and demonstrates their validity and usefulness on real test cases using the experiment data. As part of this project, the monitoring infrastructure was further optimized and extended, delivering methods with higher sensitivity to various failure modes
Gooch, Michael J. "Accuracy optimisation and error detection in automatically generated elevation models derived using digital photogrammetry." Thesis, Loughborough University, 1999. https://dspace.lboro.ac.uk/2134/7347.
Full textLamer, Antoine. "Contribution à la prévention des risques liés à l’anesthésie par la valorisation des informations hospitalières au sein d’un entrepôt de données." Thesis, Lille 2, 2015. http://www.theses.fr/2015LIL2S021/document.
Full textIntroduction Hospital Information Systems (HIS) manage and register every day millions of data related to patient care: biological results, vital signs, drugs administrations, care process... These data are stored by operational applications provide remote access and a comprehensive picture of Electronic Health Record. These data may also be used to answer to others purposes as clinical research or public health, particularly when integrated in a data warehouse. Some studies highlighted a statistical link between the compliance of quality indicators related to anesthesia procedure and patient outcome during the hospital stay. In the University Hospital of Lille, the quality indicators, as well as the patient comorbidities during the post-operative period could be assessed with data collected by applications of the HIS. The main objective of the work is to integrate data collected by operational applications in order to realize clinical research studies.Methods First, the data quality of information registered by the operational applications is evaluated with methods … by the literature or developed in this work. Then, data quality problems highlighted by the evaluation are managed during the integration step of the ETL process. New data are computed and aggregated in order to dispose of indicators of quality of care. Finally, two studies bring out the usability of the system.Results Pertinent data from the HIS have been integrated in an anesthesia data warehouse. This system stores data about the hospital stay and interventions (drug administrations, vital signs …) since 2010. Aggregated data have been developed and used in two clinical research studies. The first study highlighted statistical link between the induction and patient outcome. The second study evaluated the compliance of quality indicators of ventilation and the impact on comorbity.Discussion The data warehouse and the cleaning and integration methods developed as part of this work allow performing statistical analysis on more than 200 000 interventions. This system can be implemented with other applications used in the CHRU of Lille but also with Anesthesia Information Management Systems used by other hospitals
Serrano, Balderas Eva Carmina. "Preprocessing and analysis of environmental data : Application to the water quality assessment of Mexican rivers." Thesis, Montpellier, 2017. http://www.theses.fr/2017MONTS082/document.
Full textData obtained from environmental surveys may be prone to have different anomalies (i.e., incomplete, inconsistent, inaccurate or outlying data). These anomalies affect the quality of environmental data and can have considerable consequences when assessing environmental ecosystems. Selection of data preprocessing procedures is crucial to validate the results of statistical analysis however, such selection is badly defined. To address this question, the thesis focused on data acquisition and data preprocessing protocols in order to ensure the validity of the results of data analysis mainly, to recommend the most suitable sequence of preprocessing tasks. We propose to control every step in the data production process, from their collection on the field to their analysis. In the case of water quality assessment, it comes to the steps of chemical and hydrobiological analysis of samples producing data that were subsequently analyzed by a set of statistical and data mining methods. The multidisciplinary contributions of the thesis are: (1) in environmental chemistry: a methodological procedure to determine the content of organochlorine pesticides in water samples using the SPE-GC-ECD (Solid Phase Extraction – Gas Chromatography – Electron Capture Detector) techniques; (2) in hydrobiology: a methodological procedure to assess the quality of water on four Mexican rivers using macroinvertebrates-based biological indices; (3) in data sciences: a method to assess and guide on the selection of preprocessing procedures for data produced from the two previous steps as well as their analysis; and (4) the development of a fully integrated analytics environment in R for statistical analysis of environmental data in general, and for water quality data analytics, in particular. Finally, within the context of this thesis that was developed between Mexico and France, we have applied our methodological approaches on the specific case of water quality assessment of the Mexican rivers Tula, Tamazula, Humaya and Culiacan
Da, Silva Carvalho Paulo. "Plateforme visuelle pour l'intégration de données faiblement structurées et incertaines." Thesis, Tours, 2017. http://www.theses.fr/2017TOUR4020/document.
Full textWe hear a lot about Big Data, Open Data, Social Data, Scientific Data, etc. The importance currently given to data is, in general, very high. We are living in the era of massive data. The analysis of these data is important if the objective is to successfully extract value from it so that they can be used. The work presented in this thesis project is related with the understanding, assessment, correction/modification, management and finally the integration of the data, in order to allow their respective exploitation and reuse. Our research is exclusively focused on Open Data and, more precisely, Open Data organized in tabular form (CSV - being one of the most widely used formats in the Open Data domain). The first time that the term Open Data appeared was in 1995 when the group GCDIS (Global Change Data and Information System) (from United States) used this expression to encourage entities, having the same interests and concerns, to share their data [Data et System, 1995]. However, the Open Data movement has only recently undergone a sharp increase. It has become a popular phenomenon all over the world. Being the Open Data movement recent, it is a field that is currently growing and its importance is very strong. The encouragement given by governments and public institutions to have their data published openly has an important role at this level
Chamekh, Fatma. "L’évolution du web de données basée sur un système multi-agents." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE3083/document.
Full textIn this thesis, we investigate the evolution of RDF datasets from documents and LOD. We identify the following issues : the integration of new triples, the proposition of changes by taking into account the data quality and the management of differents versions.To handle with the complexity of the web of data evolution, we propose an agent based argumentation framework. We assume that the agent specifications could facilitate the process of RDF dataset evolution. The agent technology is one of the most useful solution to cope with a complex problem. The agents work as a team and are autonomous in the sense that they have the ability to decide themselves which goals they should adopt and how these goals should be acheived. The Agents use argumentation theory to reach a consensus about the best change alternative. Relatively to this goal, we propose an argumentation model based on the metric related to the intrinsic dimensions.To keep a record of all the occured modifications, we are focused on the ressource version. In the case of a collaborative environment, several conflicts could be generated. To manage those conflicts, we define rules.The exploited domain is general medecine
Djedaini, Mahfoud. "Automatic assessment of OLAP exploration quality." Thesis, Tours, 2017. http://www.theses.fr/2017TOUR4038/document.
Full textIn a Big Data context, traditional data analysis is becoming more and more tedious. Many approaches have been designed and developed to support analysts in their exploration tasks. However, there is no automatic, unified method for evaluating the quality of support for these different approaches. Current benchmarks focus mainly on the evaluation of systems in terms of temporal, energy or financial performance. In this thesis, we propose a model, based on supervised automatic leaming methods, to evaluate the quality of an OLAP exploration. We use this model to build an evaluation benchmark of exploration support sys.terns, the general principle of which is to allow these systems to generate explorations and then to evaluate them through the explorations they produce
Plumejeaud, Christine. "Modèles et méthodes pour l'information spatio-temporelle évolutive." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00630984.
Full textSellier, Elodie. "Traitement de l'information issue d'un réseau de surveillance de la paralysie cérébrale : qualité et analyse des données." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00770324.
Full textNguyen, Thanh Binh. "L'interrogation du web de données garantissant des réponses valides par rapport à des critères donnés." Thesis, Orléans, 2018. http://www.theses.fr/2018ORLE2053/document.
Full textThe term Linked Open Data (LOD) is proposed the first time by Tim Berners-Lee since 2006.Since then, LOD has evolved impressively with thousands datasets on the Web of Data, which has raised a number of challenges for the research community to retrieve and to process LOD.In this thesis, we focus on the problem of quality of retrieved data from various sources of the LOD and we propose a context-driven querying system that guarantees the quality of answers with respect to the quality context defined by users. We define a fragment of constraints and propose two approaches: the naive and the rewriting, which allows us to filter dynamically valid answers at the query time instead of validating them at the data source level. The naive approach performs the validation process by generating and evaluating sub-queries for each candidate answer w.r.t. each constraint. While the rewriting approach uses constraints as rewriting rules to reformulate query into a set of auxiliary queries such that the answers of rewritten-queries are not only the answers of the query but also valid answers w.r.t. all integrated constraints. The proof of the correction and completeness of our rewriting system is presented after formalizing the notion of a valid answers w.r.t. a context. These two approaches have been evaluated and have shown the feasibility of our system.This is our main contribution: we extend the set of well-known query-rewriting systems (Chase, Chase& backchase, PerfectRef, Xrewrite, etc.) with a new effective solution for the new purpose of filtering query results based on constraints in user context. Moreover, we also enlarge the trigger condition of the constraint compared with other works by using the notion of one-way MGU
Maillot, Pierre. "Nouvelles méthodes pour l'évaluation, l'évolution et l'interrogation des bases du Web des données." Thesis, Angers, 2015. http://www.theses.fr/2015ANGE0007/document.
Full textThe web of data is a mean to share and broadcast data user-readable data as well as machine-readable data. This is possible thanks to rdf which propose the formatting of data into short sentences (subject, relation, object) called triples. Bases from the web of data, called rdf bases, are sets of triples. In a rdf base, the ontology – structural data – organize the description of factual data. Since the web of datacreation in 2001, the number and sizes of rdf bases have been constantly rising. This increase has accelerated since the apparition of linked data, which promote the sharing and interlinking of publicly available bases by user communities. The exploitation – interrogation and edition – by theses communities is made without adequateSolution to evaluate the quality of new data, check the current state of the bases or query together a set of bases. This thesis proposes three methods to help the expansion at factual and ontological level and the querying of bases from the web ofData. We propose a method designed to help an expert to check factual data in conflict with the ontology. Finally we propose a method for distributed querying limiting the sending of queries to bases that may contain answers
Plana, Puig Queralt. "Automated Data Collection and Management at Enhanced Lagoons for Wastewater Treatment." Master's thesis, Université Laval, 2015. http://hdl.handle.net/20.500.11794/26531.
Full textAutomated monitoring stations have been used to monitor and control wastewater treatment plants. Their capability to monitor at high frequency has become essential to reduce the negative impacts to the environment since the wastewater characteristics have an elevated spatial and time variability. Over the last few years, the technology used to build these automatic monitoring stations, for example the sensors, has been improved. However, the instrumentation is still expensive. Also, in wastewater uses, basic problems like fouling, bad calibration or clogging are frequently affecting the reliability of the continuous on-line measurements. Thus, a good maintenance of the instruments, as well as a validation of the collected data to detect faults is required. In the context of this thesis, in collaboration with Bionest®, a methodology has been developed to deal with these problems for two facultative/aerated lagoon case studies in Québec, with the objective of optimizing the maintenance activities, of reducing the fraction of unreliable data and of obtaining large representative data series.
Zaidi, Houda. "Amélioration de la qualité des données : correction sémantique des anomalies inter-colonnes." Thesis, Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1094/document.
Full textData quality represents a major challenge because the cost of anomalies can be very high especially for large databases in enterprises that need to exchange information between systems and integrate large amounts of data. Decision making using erroneous data has a bad influence on the activities of organizations. Quantity of data continues to increase as well as the risks of anomalies. The automatic correction of these anomalies is a topic that is becoming more important both in business and in the academic world. In this report, we propose an approach to better understand the semantics and the structure of the data. Our approach helps to correct automatically the intra-column anomalies and the inter-columns ones. We aim to improve the quality of data by processing the null values and the semantic dependencies between columns
Da, Silva Veith Alexandre. "Quality of Service Aware Mechanisms for (Re)Configuring Data Stream Processing Applications on Highly Distributed Infrastructure." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEN050/document.
Full textA large part of this big data is most valuable when analysed quickly, as it is generated. Under several emerging application scenarios, such as in smart cities, operational monitoring of large infrastructure, and Internet of Things (IoT), continuous data streams must be processed under very short delays. In multiple domains, there is a need for processing data streams to detect patterns, identify failures, and gain insights. Data is often gathered and analysed by Data Stream Processing Engines (DSPEs).A DSPE commonly structures an application as a directed graph or dataflow. A dataflow has one or multiple sources (i.e., gateways or actuators); operators that perform transformations on the data (e.g., filtering); and sinks (i.e., queries that consume or store the data). Most complex operator transformations store information about previously received data as new data is streamed in. Also, a dataflow has stateless operators that consider only the current data. Traditionally, Data Stream Processing (DSP) applications were conceived to run in clusters of homogeneous resources or on the cloud. In a cloud deployment, the whole application is placed on a single cloud provider to benefit from virtually unlimited resources. This approach allows for elastic DSP applications with the ability to allocate additional resources or release idle capacity on demand during runtime to match the application requirements.We introduce a set of strategies to place operators onto cloud and edge while considering characteristics of resources and meeting the requirements of applications. In particular, we first decompose the application graph by identifying behaviours such as forks and joins, and then dynamically split the dataflow graph across edge and cloud. Comprehensive simulations and a real testbed considering multiple application settings demonstrate that our approach can improve the end-to-end latency in over 50% and even other QoS metrics. The solution search space for operator reassignment can be enormous depending on the number of operators, streams, resources and network links. Moreover, it is important to minimise the cost of migration while improving latency. Reinforcement Learning (RL) and Monte-Carlo Tree Search (MCTS) have been used to tackle problems with large search spaces and states, performing at human-level or better in games such as Go. We model the application reconfiguration problem as a Markov Decision Process (MDP) and investigate the use of RL and MCTS algorithms to devise reconfiguring plans that improve QoS metrics
Hammond, Janelle K. "Méthodes des bases réduites pour la modélisation de la qualité de l'air urbaine." Thesis, Paris Est, 2017. http://www.theses.fr/2017PESC1230/document.
Full textThe principal objective of this thesis is the development of low-cost numerical tools for spatial mapping of pollutant concentrations from field observations and advanced deterministic models. With increased pollutant emissions and exposure due to mass urbanization and development worldwide, air quality measurement campaigns and epidemiology studies of the association between air pollution and adverse health effects have become increasingly common. However, as air pollution concentrations are highly variable spatially and temporally, the sensitivity and accuracy of these epidemiology studies is often deteriorated by exposure misclassi cation due to poor estimates of individual exposures. Data assimilation methods incorporate available measurement data and mathematical models to provide improved approximations of the concentration. These methods, when based on an advanced deterministic air quality models (AQMs), could provide spatially-rich small-scale approximations and can enable better estimates of effects and exposures. However, these methods can be computationally expensive. They require repeated solution of the model, which could itself be costly. In this work we investigate a combined reduced basis (RB) data assimilation method for use with advanced AQMs on urban scales. We want to diminish the cost of resolution, using RB arguments, and incorporate measurement data to improve the quality of the solution. We extend the Parameterized-Background Data-Weak (PBDW) method to physically-based AQMs. This method can rapidly estimate "online" pollutant concentrations at urban scale, using available AQMs in a non-intrusive and computationally effcient manner, reducing computation times by factors up to hundreds. We apply this method in case studies representing urban residential pollution of PM2.5, and we study the stability of the method depending on the placement or air quality sensors. Results from the PBDW are compared to the Generalized Empirical Interpolation Method (GEIM) and a standard inverse problem, the adjoint method, in order to measure effciency of the method. This comparison shows possible improvement in precision and great improvement in computation cost with respect to classical methods. We fi nd that the PBDW method shows promise for the real-time reconstruction of a pollution eld in large-scale problems, providing state estimation with approximation error generally under 10% when applied to an imperfect model
Ferhat, Fouad. "Une analyse économique de la qualité et de l'efficience des universités et des systèmes universitaires : une comparaison au niveau international." Thesis, Paris 1, 2016. http://www.theses.fr/2016PA01E040/document.
Full textThis thesis aims to economically analyze the quality and efficiency of universities and university systems at an international level of comparison, by using input/output indicators and the Data Envelopment Analysis (DEA) method. The thesis is composed of four chapters. The first chapter entitled "university rankings: a critical perspective" presents and evaluates the relevance of inputs/outputs indicators used by most university rankings. It is the opportunity to present a number of criticisms found in the literature and focus on a common methodological problem in the rankings. It is the use of inputs as measures of university quality. This practice confuses means and results and ignores the basic concepts of accounting models in terms of production functions and efficiency. The second chapter entitled "characteristics and rankings of universities : around some factors that can explain the differences in performance between universities", compares the results of two rankings: QS-Times and Shanghai and offers a list of factors that may explain why there are such differences in quality, according to these rankings between universities. [...] The third chapter entitled "performance and efficiency of universities and their determinants: an evaluation using world university rankings and DEA methodology" evaluates on the basis of a DEA methodology the efficiency of 214 universities from 13 different countries, in order to find if the top ranked universities among traditional rankings are also universities that best utilize their financial and human resources. [...] The fourth chapter titled "efficiency of university systems in 35 countries and its determinants: an assessment by DEA methodology and the calculation of Malmquist indices (2006-2012)" assesses the efficiency and performance of university systems of 35 countries. It offers new scores for overall efficiency that complement the first two studies on this topic in the literature by Agasisti (2011) and St.Aubyn et al (2009). Compared to the article of Agasisti (2011), we identify five new developments in our study : the sample is higher (35 countries instead of 18), the observation period is updated, the evolution of efficiency between two periods is calculated, the number of inputs and outputs incorporated into each model is higher and a specific model for evaluating the efficiency of research is proposed. Our study confirms the thesis that the university systems of Switzerland and the United Kingdom are the most efficient. It also shows based on the calculations of Malmquist indices between 2006 and 2012 that teaching efficiency of 35 reviewed university systems has a tendency of declining while the research efficiency and that of attractivity-reputation is rather increasing. This allows a better assessment of the impact of reforms inspired by the Shanghai ranking on university systems. These reforms led the academic staff of universities to abandon their focus on teaching in favor of research activities
Wang, Leye. "Facilitating mobile crowdsensing from both organizers’ and participants’ perspectives." Thesis, Evry, Institut national des télécommunications, 2016. http://www.theses.fr/2016TELE0008/document.
Full textMobile crowdsensing is a novel paradigm for urban sensing applications using a crowd of participants' sensor-equipped smartphones. To successfully complete mobile crowdsensing tasks, various concerns of participants and organizers need to be carefully considered. For participants, primary concerns include energy consumption, mobile data cost, privacy, etc. For organizers, data quality and budget are two critical concerns. In this dissertation, to address both participants' and organizers' concerns, two mobile crowdsensing mechanisms are proposed - collaborative data uploading and sparse mobile crowdsensing. In collaborative data uploading, participants help each other through opportunistic encounters and data relays in the data uploading process of crowdsensing, in order to save energy consumption, mobile data cost, etc. Specifically, two collaborative data uploading procedures are proposed (1) effSense, which helps participants with enough data plan to save energy consumption, and participants with little data plan to save mobile data cost; (2) ecoSense, which reduces organizers' incentive refund that is paid for covering participants' mobile data cost. In sparse mobile crowdsensing, spatial and temporal correlations among sensed data are leveraged to significantly reduce the number of allocated tasks thus organizers' budget, still ensuring data quality. Specifically, a sparse crowdsensing task allocation framework, CCS-TA, is implemented with compressive sensing, active learning, and Bayesian inference techniques. Furthermore, differential privacy is introduced into sparse mobile crowdsensing to address participants' location privacy concerns
Bouali, Tarek. "Platform for efficient and secure data collection and exploitation in intelligent vehicular networks." Thesis, Dijon, 2016. http://www.theses.fr/2016DIJOS003/document.
Full textNowadays, automotive area is witnessing a tremendous evolution due to the increasing growth in communication technologies, environmental sensing & perception aptitudes, and storage & processing capacities that we can find in recent vehicles. Indeed, a car is being a kind of intelligent mobile agent able to perceive its environment, sense and process data using on-board systems and interact with other vehicles or existing infrastructure. These advancements stimulate the development of several kinds of applications to enhance driving safety and efficiency and make traveling more comfortable. However, developing such advanced applications relies heavily on the quality of the data and therefore can be realized only with the help of a secure data collection and efficient data treatment and analysis. Data collection in a vehicular network has been always a real challenge due to the specific characteristics of these highly dynamic networks (frequent changing topology, vehicles speed and frequent fragmentation), which lead to opportunistic and non long lasting communications. Security, remains another weak aspect in these wireless networks since they are by nature vulnerable to various kinds of attacks aiming to falsify collected data and affect their integrity. Furthermore, collected data are not understandable by themselves and could not be interpreted and understood if directly shown to a driver or sent to other nodes in the network. They should be treated and analyzed to extract meaningful features and information to develop reliable applications. In addition, developed applications always have different requirements regarding quality of service (QoS). Several research investigations and projects have been conducted to overcome the aforementioned challenges. However, they still did not meet perfection and suffer from some weaknesses. For this reason, we focus our efforts during this thesis to develop a platform for a secure and efficient data collection and exploitation to provide vehicular network users with efficient applications to ease their travel with protected and available connectivity. Therefore, we first propose a solution to deploy an optimized number of data harvesters to collect data from an urban area. Then, we propose a new secure intersection based routing protocol to relay data to a destination in a secure manner based on a monitoring architecture able to detect and evict malicious vehicles. This protocol is after that enhanced with a new intrusion detection and prevention mechanism to decrease the vulnerability window and detect attackers before they persist their attacks using Kalman filter. In a second part of this thesis, we concentrate on the exploitation of collected data by developing an application able to calculate the most economic itinerary in a refined manner for drivers and fleet management companies. This solution is based on several information that may affect fuel consumption, which are provided by vehicles and other sources in Internet accessible via specific APIs, and targets to economize money and time. Finally, a spatio-temporal mechanism allowing to choose the best available communication medium is developed. This latter is based on fuzzy logic to assess a smooth and seamless handover, and considers collected information from the network, users and applications to preserve high quality of service
Saif, Abdulqawi. "Experimental Methods for the Evaluation of Big Data Systems." Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0001.
Full textIn the era of big data, many systems and applications are created to collect, to store, and to analyze massive data in multiple domains. Although those – big data systems – are subjected to multiple evaluations during their development life-cycle, academia and industry encourage further experimentation to ensure their quality of service and to understand their performance under various contexts and configurations. However, the experimental challenges of big data systems are not trivial. While many pieces of research still employ legacy experimental methods to face such challenges, we argue that experimentation activity can be improved by proposing flexible experimental methods. In this thesis, we address particular challenges to improve experimental context and observability for big data experiments. We firstly enable experiments to customize the performance of their environmental resources, encouraging researchers to perform scalable experiments over heterogeneous configurations. We then introduce two experimental tools: IOscope and MonEx to improve observability. IOscope allows performing low-level observations on the I/O stack to detect potential performance issues in target systems, convincing that the high-level evaluation techniques should be accompanied by such complementary tools to understand systems’ performance. In contrast, MonEx framework works on higher levels to facilitate experimental data collection. MonEx opens directions to practice experiment-based monitoring independently from the underlying experimental environments. We finally apply statistics to improve experimental designs, reducing the number of experimental scenarios and obtaining a refined set of experimental factors as fast as possible. At last, all contributions complement each other to facilitate the experimentation activity by working almost on all phases of big data experiments’ life-cycle
Girres, Jean-François. "Modèle d'estimation de l'imprécision des mesures géométriques de données géographiques." Thesis, Paris Est, 2012. http://www.theses.fr/2012PEST1080/document.
Full textMany GIS applications are based on length and area measurements computed from the geometry of the objects of a geographic database (such as route planning or maps of population density, for example). However, no information concerning the imprecision of these measurements is now communicated to the final user. Indeed, most of the indicators on geometric quality focuses on positioning errors, but not on measurement errors, which are very frequent. In this context, this thesis seeks to develop methods for estimating the imprecision of geometric measurements of length and area, in order to inform a user for decision support. To achieve this objective, we propose a model to estimate the impacts of representation rules (cartographic projection, terrain, polygonal approximation of curves) and production processes (digitizing error, cartographic generalisation) on geometric measurements of length and area, according to the characteristics and the spatial context of the evaluated objects. Methods for acquiring knowledge about the evaluated data are also proposed to facilitate the parameterization of the model by the user. The combination of impacts to produce a global estimation of the imprecision of measurement is a complex problem, and we propose approaches to approximate the cumulated error bounds. The proposed model is implemented in the EstIM prototype (Estimation of the Imprecision of Measurements)
Maarof, Salman. "L'applicabilité du système de comptabilité nationale 1993 en Syrie." Thesis, Paris Est, 2011. http://www.theses.fr/2011PEST0061.
Full textAlthough the SNA 1993 has been established for over fifteen years, some countries have still not yet implemented, while others pretend implementing it, without it being correctly applied.The difficulties in the implementation of the SNA 1993 are explained by a few main reasons among which we can identify the availability of data sources and databases.Syria has still not adopted the SNA 1993, and the Syrian national accounts are still set according to the 1968 SNA.In our research, we analyzed the quality of national accounts data to make a good applicability of the 1993 SNA of national accounts department of Syria.To achieve this goal, and taking advantage of the experiences that other countries offer us, it was necessary to analyze the quality of national accounts data produced in the department of the national accounts.The work should enable us to know our ability to respond to the recommendations of the system 93.The SNA 2008 has recently been published, however, we believe that when we are able to generate sound data, we will also be able to practice any developed system.It is essential to keep in mind that the goal is not to announce has only the implementation of the system. But it is urgent to produce and be able to do real data and to apply reasonably well the SNA.This research is not an end but will instead be the starting point of the national accounts, in order to produce sound data that reflect the reality of the economic in the future, for the establishment of economic strategies and economic development of Syria
Irain, Malik. "Plateforme d'analyse de performances des méthodes de localisation des données dans le cloud basées sur l'apprentissage automatique exploitant des délais de messages." Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30195.
Full textCloud usage is a necessity today, as data produced and used by all types of users (individuals, companies, administrative structures) has become too large to be stored otherwise. It requires to sign, explicitly or not, a contract with a cloud storage provider. This contract specifies the levels of quality of service required for various criteria. Among these criteria is the location of the data. However, this criterion is not easily verifiable by a user. This is why research in the field of data localization verification has led to several studies in recent years, but the proposed solutions can still be improved. The work proposed in this thesis consists in studying solutions of location verification by a user, i.e. solutions that estimate data location and operate using landmarks. The implemented approach can be summarized as follows: exploiting communication delays and using network time models to estimate, with some distance error, data location. To this end, the work carried out is as follows: • A survey of the state of the art on the different methods used to provide users with location information. • The design of a unified notation for the methods studied in the survey, with a proposal of two scores to assess methods. • Implementation of a network measurements collecting platform. Thanks to this platform, two datasets were collected, at both national level and international level. These two data sets are used to evaluate the different methods presented in the state of the art survey. • Implementation of an evaluation architecture based on the two data sets and the defined scores. This allows us to establish the quality of the methods (success rate) and the quality of the results (accuracy of the result) thanks to the proposed scores
Alili, Hiba. "Intégration de données basée sur la qualité pour l'enrichissement des sources de données locales dans le Service Lake." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLED019.
Full textIn the Big Data era, companies are moving away from traditional data-warehouse solutions whereby expensive and timeconsumingETL (Extract, Transform, Load) processes are used, towards data lakes in order to manage their increasinglygrowing data. Yet the stored knowledge in companies’ databases, even though in the constructed data lakes, can never becomplete and up-to-date, because of the continuous production of data. Local data sources often need to be augmentedand enriched with information coming from external data sources. Unfortunately, the data enrichment process is one of themanual labors undertaken by experts who enrich data by adding information based on their expertise or select relevantdata sources to complete missing information. Such work can be tedious, expensive and time-consuming, making itvery promising for automation. We present in this work an active user-centric data integration approach to automaticallyenrich local data sources, in which the missing information is leveraged on the fly from web sources using data services.Accordingly, our approach enables users to query for information about concepts that are not defined in the data sourceschema. In doing so, we take into consideration a set of user preferences such as the cost threshold and the responsetime necessary to compute the desired answers, while ensuring a good quality of the obtained results
Hoang, Cong Tuan. "Prise en compte des fluctuations spatio-temporelles pluies-débits pour une meilleure gestion de la ressource en eau et une meilleure évaluation des risques." Phd thesis, Université Paris-Est, 2011. http://pastel.archives-ouvertes.fr/pastel-00658537.
Full textTaillandier, Patrick. "Révision automatique des connaissances guidant l'exploration informée d'arbres d'états : application au contexte de la généralisation de données géographiques." Phd thesis, Université Paris-Est, 2008. http://tel.archives-ouvertes.fr/tel-00481927.
Full textBen, Hassine Soumaya. "Évaluation et requêtage de données multisources : une approche guidée par la préférence et la qualité des données : application aux campagnes marketing B2B dans les bases de données de prospection." Thesis, Lyon 2, 2014. http://www.theses.fr/2014LYO22012/document.
Full textIn Business-to-Business (B-to-B) marketing campaigns, manufacturing “the highest volume of sales at the lowest cost” and achieving the best return on investment (ROI) score is a significant challenge. ROI performance depends on a set of subjective and objective factors such as dialogue strategy, invested budget, marketing technology and organisation, and above all data and, particularly, data quality. However, data issues in marketing databases are overwhelming, leading to insufficient target knowledge that handicaps B-to-B salespersons when interacting with prospects. B-to-B prospection data is indeed mainly structured through a set of independent, heterogeneous, separate and sometimes overlapping files that form a messy multisource prospect selection environment. Data quality thus appears as a crucial issue when dealing with prospection databases. Moreover, beyond data quality, the ROI metric mainly depends on campaigns costs. Given the vagueness of (direct and indirect) cost definition, we limit our focus to price considerations.Price and quality thus define the fundamental constraints data marketers consider when designing a marketing campaign file, as they typically look for the "best-qualified selection at the lowest price". However, this goal is not always reachable and compromises often have to be defined. Compromise must first be modelled and formalized, and then deployed for multisource selection issues. In this thesis, we propose a preference-driven selection approach for multisource environments that aims at: 1) modelling and quantifying decision makers’ preferences, and 2) defining and optimizing a selection routine based on these preferences. Concretely, we first deal with the data marketer’s quality preference modelling by appraising multisource data using robust evaluation criteria (quality dimensions) that are rigorously summarized into a global quality score. Based on this global quality score and data price, we exploit in a second step a preference-based selection algorithm to return "the best qualified records bearing the lowest possible price". An optimisation algorithm, BrokerACO, is finally run to generate the best selection result
Robinson-Bryant, Federica. "Defining a Stakeholder-Relative Model to Measure Academic Department Efficiency at Achieving Quality in Higher Education." Doctoral diss., University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5842.
Full textPh.D.
Doctorate
Industrial Engineering and Management Systems
Engineering and Computer Science
Industrial Engineering
Ben, Khedher Anis. "Amélioration de la qualité des données produits échangées entre l'ingénierie et la production à travers l'intégration de systèmes d'information dédiés." Thesis, Lyon 2, 2012. http://www.theses.fr/2012LYO20012.
Full textThe research work contributes to improve the quality of data exchanged between the production and the engineering units which dedicated to product design and production system design. This improvement is qualified by studying the interactions between the product life cycle management and the production management. These two concepts are supported, wholly or partly by industrial information systems, the study of the interactions then lead to the integration of information systems (PLM, ERP and MES).In a highly competitive environment and globalization, companies are forced to innovate and reduce costs, especially the production costs. Facing with these challenges, the volume and frequency change of production data are increasing due to the steady reduction of the lifetime and the products marketing, the increasing of product customization and the generalization of continuous improvement in production. Consequently, the need to formalize and manage all production data is required. These data should be provided to the production operators and machines.After analysis the data quality for each existing architecture demonstrating the inability to address this problem, an architecture, based on the integration of three information systems involved in the production (PLM, ERP and MES) has been proposed. This architecture leads to two complementary sub-problems. The first one is the development of an architecture based on Web services to improve the accessibility, safety and completeness of data exchanged. The second is the integration architecture of integration based on ontologies to offer the integration mechanisms based on the semantics in order to ensure the correct interpretation of the data exchanged. Therefore, the model of the software tool supports the proposed solution and ensures that integration of data exchanged between engineering and production was carried out
Wolley, Chirine. "Apprentissage supervisé à partir des multiples annotateurs incertains." Thesis, Aix-Marseille, 2014. http://www.theses.fr/2014AIXM4070/document.
Full textIn supervised learning tasks, obtaining the ground truth label for each instance of the training dataset can be difficult, time-consuming and/or expensive. With the advent of infrastructures such as the Internet, an increasing number of web services propose crowdsourcing as a way to collect a large enough set of labels from internet users. The use of these services provides an exceptional facility to collect labels from anonymous annotators, and thus, it considerably simplifies the process of building labels datasets. Nonetheless, the main drawback of crowdsourcing services is their lack of control over the annotators and their inability to verify and control the accuracy of the labels and the level of expertise for each labeler. Hence, managing the annotators' uncertainty is a clue for learning from imperfect annotations. This thesis provides three algorithms when learning from multiple uncertain annotators. IGNORE generates a classifier that predict the label of a new instance and evaluate the performance of each annotator according to their level of uncertainty. X-Ignore, considers that the performance of the annotators both depends on their uncertainty and on the quality of the initial dataset to be annotated. Finally, ExpertS deals with the problem of annotators' selection when generating the classifier. It identifies experts annotators, and learn the classifier based only on their labels. We conducted in this thesis a large set of experiments in order to evaluate our models, both using experimental and real world medical data. The results prove the performance and accuracy of our models compared to previous state of the art solutions in this context
Gazdar, Kaouthar. "Institutions, développement financier et croissance économique dans la région MENA." Thesis, Reims, 2011. http://www.theses.fr/2011REIME002/document.
Full textThis thesis examines (i) the impact of banks and stock markets on economic growth (ii) the effect of institutional quality in determining financial development and (iii) how institutional quality affects the finance-growth nexus in the MENA region. To this end, we construct a yearly institutional index for MENA countries. Applying the generalized method- of-moments (GMM) estimators developed for dynamic panel data for a sample of 18 MENA countries over 1984-2007 period, we find that both bank and stock market development are unimportant or even harmful for economic growth. Considering both a panel data and the instrumental variable (IV) approaches of estimation, our results outline the importance of institutional quality in determining financial development in MENA region. Moreover, our results show that institutional quality affects the finance growth nexus in MENA countries. In fact, it mitigates the negative effect of financial development on economic growth. Therefore, our results provide empirical evidence that in order for financial development to contribute to economic growth, MENA countries must possess certain level of institutional quality. Examining the non-linear effect of institutional quality on the finance-growth nexus, our results show that banking sector development and growth exhibit an inverted-U shaped relationship. However, we do not find the same pattern in the stock market-growth relationship
Ferret, Laurie. "Anticoagulants oraux, réutilisation de données hospitalières informatisées dans une démarche de soutien à la qualité des soins." Thesis, Lille 2, 2015. http://www.theses.fr/2015LIL2S016/document.
Full textIntroduction :Oral anticoagulants raise major issues in terms of bleeding risk and appropriate use. The computerization of medical records offers the ability to access large databases that can be explored automatically. The objective of this work is to show how routinely collected data can be reused to study issues related to anticoagulants in a supportive approach to quality of care.MethodsThis work was carried out on the electronic data (97,355 records) of a community hospital. For each inpatient stay we have diagnostic, biological, drug and administrative data, and the discharge letters. This work is organized around three axes:Axis I. The objective is to evaluate the accuracy of the detection of factors that may increase the anticoagulant effect of vitamin K antagonists (VKA), using rules developed in the PSIP european project (grant agreement N° 216130). A case review on one year enabled the calculation of the positive predictive value and sensitivity of the rules. Axis II. We conducted a cohort study on data from 2007 to 2012 to determine the major elements involved in raising the risk of bleeding related to VKA in clinical reality. Cases were the stays with an elevation of the INR beyond 5, the controls did not have.Axis III. We made data reuse serve a study of the quality of the prescriptions. On the one hand we assessed treatment of the thromboembolic risk recommendations in atrial fibrillation (AF) in the elderly, on the other hand we investigated the prescription of direct oral anticoagulants.Results : Axis I : The positive predictive value of the rules intended to detect the factors favoring the elevation of INR in case of treatment with VKA is 22.4%, the sensitivity is 84.6%. The main contributive rules are the ones intended to detect an infectious syndrome and amiodarone.Axis II : The major factor increasing the INR with VKA treatment highlighted by the cohort study are infectious syndrome, cancer, hepatic insufficiency and hypoprotidemia. The recommendations compliance rate in atrial fibrillation in the elderly is 47.8%. Only 45% of patients receive oral anticoagulants, 22.9% do not receive antithrombotic treatment at all and 32.1% received platelet aggregation inhibitors. Direct oral anticoagulants are prescribed at inadequate dosages in 15 to 31.4% of patients, respectively for dabigatran and rivaroxaban. These errors are mainly underdosages in the elderly with atrial fibrillation (82.6%).Discussion : The computerization of medical records has led to the creation of large medical databases, which can be used for various purposes as we show in this work. In the first work axis we have shown that rule-based decision support systems detect the contributing factors for VKA overdose with a good sensitivity but a low positive predictive value. The second line shows that we could use the data for exploratory purposes to identify factors associated with increased INR in patients receiving VKA in “real life practice”. The third line shows that the rule-based systems can also be used to identify inappropriate prescribing for the purpose of improving the quality of care. In the field of anticoagulation this work opens up innovative perspectives for improving the quality of care
Nguyen, Hoang Viet Tuan. "Prise en compte de la qualité des données lors de l’extraction et de la sélection d’évolutions dans les séries temporelles de champs de déplacements en imagerie satellitaire." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAA011.
Full textThis PhD thesis deals with knowledge discovery from Displacement Field Time Series (DFTS) obtained by satellite imagery. Such series now occupy a central place in the study and monitoring of natural phenomena such as earthquakes, volcanic eruptions and glacier displacements. These series are indeed rich in both spatial and temporal information and can now be produced regularly at a lower cost thanks to spatial programs such as the European Copernicus program and its famous Sentinel satellites. Our proposals are based on the extraction of grouped frequent sequential patterns. These patterns, originally defined for the extraction of knowledge from Satellite Image Time Series (SITS), have shown their potential in early work to analyze a DFTS. Nevertheless, they cannot use the confidence indices coming along with DFTS and the swap method used to select the most promising patterns does not take into account their spatiotemporal complementarities, each pattern being evaluated individually. Our contribution is thus double. A first proposal aims to associate a measure of reliability with each pattern by using the confidence indices. This measure allows to select patterns having occurrences in the data that are on average sufficiently reliable. We propose a corresponding constraint-based extraction algorithm. It relies on an efficient search of the most reliable occurrences by dynamic programming and on a pruning of the search space provided by a partial push strategy. This new method has been implemented on the basis of the existing prototype SITS-P2miner, developed by the LISTIC and LIRIS laboratories to extract and rank grouped frequent sequential patterns. A second contribution for the selection of the most promising patterns is also made. This one, based on an informational criterion, makes it possible to take into account at the same time the confidence indices and the way the patterns complement each other spatially and temporally. For this aim, the confidence indices are interpreted as probabilities, and the DFTS are seen as probabilistic databases whose distributions are only partial. The informational gain associated with a pattern is then defined according to the ability of its occurrences to complete/refine the distributions characterizing the data. On this basis, a heuristic is proposed to select informative and complementary patterns. This method provides a set of weakly redundant patterns and therefore easier to interpret than those provided by swap randomization. It has been implemented in a dedicated prototype. Both proposals are evaluated quantitatively and qualitatively using a reference DFTS covering Greenland glaciers constructed from Landsat optical data. Another DFTS that we built from TerraSAR-X radar data covering the Mont-Blanc massif is also used. In addition to being constructed from different data and remote sensing techniques, these series differ drastically in terms of confidence indices, the series covering the Mont-Blanc massif being at very low levels of confidence. In both cases, the proposed methods operate under standard conditions of resource consumption (time, space), and experts’ knowledge of the studied areas is confirmed and completed
Walstra, Jan. "Historical aerial photographs and digital photogrammetry for landslide assessment." Thesis, Loughborough University, 2006. https://dspace.lboro.ac.uk/2134/2501.
Full textGuemeida, Abdelbasset. "Contributions à une nouvelle approche de Recherche d'Information basée sur la métaphore de l'impédance et illustrée sur le domaine de la santé." Phd thesis, Université Paris-Est, 2009. http://tel.archives-ouvertes.fr/tel-00581322.
Full textHeurteau, Foulon Stéphanie. "Prévalence, qualité de vie et coût de la Leucémie Myéloïde Chronique en France Using healthcare claims data to analyze the prevalence of BCR-ABL-positive chronic myeloid leukemia in France: A nationwide population-based study Health state utility and quality of life measures in patients with chronic myeloid leukemia in France." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS574.
Full textChronic myeloid leukemia (CML) is a rare myeloproliferative neoplasm whose prognosis has been transformed since the 2000s by tyrosine kinase inhibitors (TKI). The dramatic increase in patients' life expectancy has led to an increase in the prevalence of CML. CML has become a chronic disease that requires daily TKI treatment for several years, but which is compatible with a normal life span for the majority of patients. TKI are expensive treatments that, taken over the long term by an increasing number of patients, increase the economic burden of the disease. TKI have side effects that affect patients' quality of life. In France, however, there is little data on the prevalence of CML, on the economic burden and quality of life.The National Health Data System (Système National des Données de Santé, SNDS) is a health care claims database that covers 98,8% of the French population and contains exhaustive data on health cares reimbursed by the Health Insurance. We built and validated an algorithm identifying patients with CML in the SNDS, based on their healthcare consumption and estimated the prevalence of the disease on December 31st, 2014. On the population identified by the algorithm, we estimated the cost of TKI in 2013 and 2014 from a health insurance perspective. We also conducted a survey in CML patients to collect their quality of life data using generic (EuroQol EQ-5D-3L), cancer-specific (EORTC-QLQ-C30) and CML-specific (EORTC-QLQ-CML-24) questionnaires. Utility values in CML patients were assessed using the French EQ-5D-3L value set.The algorithm identified 10,789 patients with CML in France in 2014, corresponding to a crude prevalence of the disease of 16.3 per 100,000 inhabitants [95% confidence interval 16.0-16.6]. In the 10,158 prevalent CML patients in 2013, the reimbursement for TKI amounted to €238 million, all insurance schemes combined. This amount increased to €247 million for the 10,789 patients prevalent in 2014. In 2014, imatinib accounted for about 55% of TKI reimbursements, followed by nilotinib (22%) and dasatinib (22%). The quality of life in CML patients was significantly impaired compared to the general population of the same sex and age, mainly in the dimensions of social functioning, role functioning and cognitive functioning. Fatigue, dyspnea and pain were the symptoms with the highest deviation from general population norms. The mean utility score (standard deviation) was 0.72 (0.25) for patients in chronic phase and 0.84 (0.21) for patients in remission without treatment.Beyond the epidemiological, clinical and economic results, this work demonstrates that using a database such as the SNDS for research is feasible, relevant but also complex in rare diseases such as CML
Yahyaoui, Hasna. "Méthode d'analyse de données pour le diagnostic a posteriori de défauts de production - Application au secteur de la microélectronique." Thesis, Saint-Etienne, EMSE, 2015. http://www.theses.fr/2015EMSE0795/document.
Full textControlling the performance of a manufacturing site and the rapid identification of quality loss causes remain a daily challenge for manufacturers, who face continuing competition. In this context, this thesis aims to provide an analytical approach for the rapid identification of defect origins, by exploring data available thanks to different quality control systems, such FDC, metrology, parametric tests PT and the Electrical Wafer Sorting EWS. The proposed method, named CLARIF, combines three complementary data mining techniques namely clustering, association rules and decision trees induction. This method is based on unsupervised generation of a set of potentially problematic production modes, which are characterized by specific manufacturing conditions. Thus, we provide an analysis which descends to the level of equipment operating parameters. The originality of this method consists on (1) a pre-treatment step to identify spatial patterns from quality control data, (2) an unsupervised generation of manufacturing modes candidates to explain the quality loss case. We optimize the generation of association rules through the proposed ARCI algorithm, which is an adaptation of the famous association rules mining algorithm, APRIORI to integrate the constraints specific to our issue and filtering quality indicators, namely confidence, contribution and complexity, in order to identify the most interesting rules. Finally, we defined a Knowledge Discovery from Databases process, enabling to guide the user in applying CLARIF to explain both local and global quality loss problems