Dissertations / Theses on the topic 'Flux de données sémantiques'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Flux de données sémantiques.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Chevalier, Jules. "Raisonnement incrémental sur des flux de données." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSES008/document.
Full textIn this thesis, we propose an architecture for incremental reasoning on triple streams. To ensure scalability, it is composed of independent modules; thus allowing parallel reasoning. That is, several instances of a same rule can be simultaneously executed to enhance performance. We also focused our efforts to limit the duplicates spreading in the system, a recurrent issue for reasoning. To achieve this, we design a shared triplestore which allows each module to filter duplicates as soon as possible. The triples passes through the different independent modules of the architecture allows the reasoner to receive triple streams as input. Finally, our architecture is of agnostic nature regarding the fragment used for the inference. We also present three inference modes for our architecture: the first one infers all the implicit knowledge as fast as possible; the second mode should be used when the priority has to be defined for the inference of a specific type of knowledge; the third one proposes to maximize the amount of triples inferred per second. We implemented this architecture through Slider, an incremental reasoning natively supporting the fragments ρdf and RDFS: It can easily be extended to more complex fragments. Our experimentations show a 65% improvement over the reasoner OWLIM-SE. However, the recently published reasoner RDFox exhibits better performance, although this one does not provide prioritized inference. We also conducted experimentations showing that the use of incremental reasoning over batch-based reasoning offers systematically better performance for all the ontologies and fragments used
Belghaouti, Fethi. "Interopérabilité des systèmes distribués produisant des flux de données sémantiques au profit de l'aide à la prise de décision." Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLL003.
Full textInternet is an infinite source of data coming from sources such as social networks or sensors (home automation, smart city, autonomous vehicle, etc.). These heterogeneous and increasingly large data can be managed through semantic web technologies, which propose to homogenize, link these data and reason above them, and data flow management systems, which mainly address the problems related to volume, volatility and continuous querying. The alliance of these two disciplines has seen the growth of semantic data stream management systems also called RSP (RDF Stream Processing Systems). The objective of this thesis is to allow these systems, via new approaches and "low cost" algorithms, to remain operational, even more efficient, even for large input data volumes and/or with limited system resources.To reach this goal, our thesis is mainly focused on the issue of "Processing semantic data streamsin a context of computer systems with limited resources". It directly contributes to answer the following research questions : (i) How to represent semantic data stream ? And (ii) How to deal with input semantic data when their rates and/or volumes exceed the capabilities of the target system ?As first contribution, we propose an analysis of the data in the semantic data streams in order to consider a succession of star graphs instead of just a success of andependent triples, thus preserving the links between the triples. By using this approach, we significantly impoved the quality of responses of some well known sampling algoithms for load-shedding. The analysis of the continuous query allows the optimisation of this solution by selection the irrelevant data to be load-shedded first. In the second contribution, we propose an algorithm for detecting frequent RDF graph patterns in semantic data streams.We called it FreGraPaD for Frequent RDF Graph Patterns Detection. It is a one pass algorithm, memory oriented and "low-cost". It uses two main data structures : A bit-vector to build and identify the RDF graph pattern, providing thus memory space optimization ; and a hash-table for storing the patterns.The third contribution of our thesis consists of a deterministic load-shedding solution for RSP systems, called POL (Pattern Oriented Load-shedding for RDF Stream Processing systems). It uses very low-cost boolean operators, that we apply on the built binary patterns of the data and the continuous query inorder to determine which data is not relevant to be ejected upstream of the system. It guarantees a recall of 100%, reduces the system load and improves response time. Finally, in the fourth contribution, we propose Patorc (Pattern Oriented Compression for RSP systems). Patorc is an online compression toolfor RDF streams. It is based on the frequent patterns present in RDF data streams that factorizes. It is a data lossless compression solution whith very possible querying without any need to decompression.This thesis provides solutions that allow the extension of existing RSP systems and makes them able to scale in a bigdata context. Thus, these solutions allow the RSP systems to deal with one or more semantic data streams arriving at different speeds, without loosing their response quality while ensuring their availability, even beyond their physical limitations. The conducted experiments, supported by the obtained results show that the extension of existing systems with the new solutions improves their performance. They illustrate the considerable decrease in their engine’s response time, increasing their processing rate threshold while optimizing the use of their system resources
Belghaouti, Fethi. "Interopérabilité des systèmes distribués produisant des flux de données sémantiques au profit de l'aide à la prise de décision." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLL003.
Full textInternet is an infinite source of data coming from sources such as social networks or sensors (home automation, smart city, autonomous vehicle, etc.). These heterogeneous and increasingly large data can be managed through semantic web technologies, which propose to homogenize, link these data and reason above them, and data flow management systems, which mainly address the problems related to volume, volatility and continuous querying. The alliance of these two disciplines has seen the growth of semantic data stream management systems also called RSP (RDF Stream Processing Systems). The objective of this thesis is to allow these systems, via new approaches and "low cost" algorithms, to remain operational, even more efficient, even for large input data volumes and/or with limited system resources.To reach this goal, our thesis is mainly focused on the issue of "Processing semantic data streamsin a context of computer systems with limited resources". It directly contributes to answer the following research questions : (i) How to represent semantic data stream ? And (ii) How to deal with input semantic data when their rates and/or volumes exceed the capabilities of the target system ?As first contribution, we propose an analysis of the data in the semantic data streams in order to consider a succession of star graphs instead of just a success of andependent triples, thus preserving the links between the triples. By using this approach, we significantly impoved the quality of responses of some well known sampling algoithms for load-shedding. The analysis of the continuous query allows the optimisation of this solution by selection the irrelevant data to be load-shedded first. In the second contribution, we propose an algorithm for detecting frequent RDF graph patterns in semantic data streams.We called it FreGraPaD for Frequent RDF Graph Patterns Detection. It is a one pass algorithm, memory oriented and "low-cost". It uses two main data structures : A bit-vector to build and identify the RDF graph pattern, providing thus memory space optimization ; and a hash-table for storing the patterns.The third contribution of our thesis consists of a deterministic load-shedding solution for RSP systems, called POL (Pattern Oriented Load-shedding for RDF Stream Processing systems). It uses very low-cost boolean operators, that we apply on the built binary patterns of the data and the continuous query inorder to determine which data is not relevant to be ejected upstream of the system. It guarantees a recall of 100%, reduces the system load and improves response time. Finally, in the fourth contribution, we propose Patorc (Pattern Oriented Compression for RSP systems). Patorc is an online compression toolfor RDF streams. It is based on the frequent patterns present in RDF data streams that factorizes. It is a data lossless compression solution whith very possible querying without any need to decompression.This thesis provides solutions that allow the extension of existing RSP systems and makes them able to scale in a bigdata context. Thus, these solutions allow the RSP systems to deal with one or more semantic data streams arriving at different speeds, without loosing their response quality while ensuring their availability, even beyond their physical limitations. The conducted experiments, supported by the obtained results show that the extension of existing systems with the new solutions improves their performance. They illustrate the considerable decrease in their engine’s response time, increasing their processing rate threshold while optimizing the use of their system resources
Dia, Amadou Fall. "Filtrage sémantique et gestion distribuée de flux de données massives." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS495.
Full textOur daily use of the Internet and related technologies generates, at a rapid and variable speeds, large volumes of heterogeneous data issued from sensor networks, search engine logs, multimedia content sites, weather forecasting, geolocation, Internet of Things (IoT) applications, etc. Processing such data in conventional databases (Relational Database Management Systems) may be very expensive in terms of time and memory storage resources. To effectively respond to the needs of rapid decision-making, these streams require real-time processing. Data Stream Management Systems (SGFDs) evaluate queries on the recent data of a stream within structures called windows. The input data are different formats such as CSV, XML, RSS, or JSON. This heterogeneity lock comes from the nature of the data streams and must be resolved. For this, several research groups have benefited from the advantages of semantic web technologies (RDF and SPARQL) by proposing RDF data streams processing systems called RSPs. However, large volumes of RDF data, high input streams, concurrent queries, combination of RDF streams and large volumes of stored RDF data and expensive processing drastically reduce the performance of these systems. A new approach is required to considerably reduce the processing load of RDF data streams. In this thesis, we propose several complementary solutions to reduce the processing load in centralized environment. An on-the-fly RDF graphs streams sampling approach is proposed to reduce data and processing load while preserving semantic links. This approach is deepened by adopting a graph-oriented summary approach to extract the most relevant information from RDF graphs by using centrality measures issued from the Social Networks Analysis. We also adopt a compressed format of RDF data and propose an approach for querying compressed RDF data without decompression phase. To ensure parallel and distributed data streams management, the presented work also proposes two solutions for reducing the processing load in distributed environment. An engine and parallel processing approaches and distributed RDF graphs streams. Finally, an optimized processing approach for static and dynamic data combination operations is also integrated into a new distributed RDF graphs streams management system
Belaid, Nabil. "Modélisation de services et de workflows sémantiques à base d'ontologies de services et d'indexations." Phd thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2011. https://tel.archives-ouvertes.fr/tel-00605153.
Full textServices and workflows allow computer processing and information exchange. However, only information relevant to their computer management (storage, delivery, etc. ) is specified in the syntactic description languages such as WSDL, BPEL or XPDL. Indeed, these descriptions do not explicitly link the services and workflows to the implemented functions. To overcome these limitations, we propose an approach based on the definition of ontology of services (shared conceptualizations) and semantic indexations. Our proposal in ontology based databases to store and index the different services and workflows. The implementation of our approach is a prototype that enables to store, search, replace, reuse existing IT services and workflows and build new ones incrementally. This work is validated by being applied to the geological modeling field
Ren, Xiangnan. "Traitement et raisonnement distribués des flux RDF." Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1139/document.
Full textReal-time processing of data streams emanating from sensors is becoming a common task in industrial scenarios. In an Internet of Things (IoT) context, data are emitted from heterogeneous stream sources, i.e., coming from different domains and data models. This requires that IoT applications efficiently handle data integration mechanisms. The processing of RDF data streams hence became an important research field. This trend enables a wide range of innovative applications where the real-time and reasoning aspects are pervasive. The key implementation goal of such application consists in efficiently handling massive incoming data streams and supporting advanced data analytics services like anomaly detection. However, a modern RSP engine has to address volume and velocity characteristics encountered in the Big Data era. In an on-going industrial project, we found out that a 24/7 available stream processing engine usually faces massive data volume, dynamically changing data structure and workload characteristics. These facts impact the engine's performance and reliability. To address these issues, we propose Strider, a hybrid adaptive distributed RDF Stream Processing engine that optimizes logical query plan according to the state of data streams. Strider has been designed to guarantee important industrial properties such as scalability, high availability, fault-tolerant, high throughput and acceptable latency. These guarantees are obtained by designing the engine's architecture with state-of-the-art Apache components such as Spark and Kafka. Moreover, an increasing number of processing jobs executed over RSP engines are requiring reasoning mechanisms. It usually comes at the cost of finding a trade-off between data throughput, latency and the computational cost of expressive inferences. Therefore, we extend Strider to support real-time RDFS+ (i.e., RDFS + owl:sameAs) reasoning capability. We combine Strider with a query rewriting approach for SPARQL that benefits from an intelligent encoding of knowledge base. The system is evaluated along different dimensions and over multiple datasets to emphasize its performance. Finally, we have stepped further to exploratory RDF stream reasoning with a fragment of Answer Set Programming. This part of our research work is mainly motivated by the fact that more and more streaming applications require more expressive and complex reasoning tasks. The main challenge is to cope with the large volume and high-velocity dimensions in a scalable and inference-enabled manner. Recent efforts in this area still missing the aspect of system scalability for stream reasoning. Thus, we aim to explore the ability of modern distributed computing frameworks to process highly expressive knowledge inference queries over Big Data streams. To do so, we consider queries expressed as a positive fragment of LARS (a temporal logic framework based on Answer Set Programming) and propose solutions to process such queries, based on the two main execution models adopted by major parallel and distributed execution frameworks: Bulk Synchronous Parallel (BSP) and Record-at-A-Time (RAT). We implement our solution named BigSR and conduct a series of evaluations. Our experiments show that BigSR achieves high throughput beyond million-triples per second using a rather small cluster of machines
De, Oliveira Joffrey. "Gestion de graphes de connaissances dans l'informatique en périphérie : gestion de flux, autonomie et adaptabilité." Electronic Thesis or Diss., Université Gustave Eiffel, 2023. http://www.theses.fr/2023UEFL2069.
Full textThe research work carried out as part of this PhD thesis lies at the interface between the Semantic Web, databases and edge computing. Indeed, our objective is to design, develop and evaluate a database management system (DBMS) based on the W3C Resource Description Framework (RDF) data model, which must be adapted to the terminals found in Edge computing.The possible applications of such a system are numerous and cover a wide range of sectors such as industry, finance and medicine, to name but a few. As proof of this, the subject of this thesis was defined with the team from the Computer Science and Artificial Intelligence Laboratory (CSAI) at ENGIE Lab CRIGEN. The latter is ENGIE's research and development centre dedicated to green gases (hydrogen, biogas and liquefied gases), new uses of energy in cities and buildings, industry and emerging technologies (digital and artificial intelligence, drones and robots, nanotechnologies and sensors). CSAI financed this thesis as part of a CIFRE-type collaboration.The functionalities of a system satisfying these characteristics must enable anomalies and exceptional situations to be detected in a relevant and effective way from measurements taken by sensors and/or actuators. In an industrial context, this could mean detecting excessively high measurements, for example of pressure or flow rate in a gas distribution network, which could potentially compromise infrastructure or even the safety of individuals. This detection must be carried out using a user-friendly approach to enable as many users as possible, including non-programmers, to describe risk situations. The approach must therefore be declarative, not procedural, and must be based on a query language, such as SPARQL.We believe that Semantic Web technologies can make a major contribution in this context. Indeed, the ability to infer implicit consequences from explicit data and knowledge is a means of creating new services that are distinguished by their ability to adjust to the circumstances encountered and to make autonomous decisions. This can be achieved by generating new queries in certain alarming situations, or by defining a minimal sub-graph of knowledge that an instance of our DBMS needs in order to respond to all of its queries.The design of such a DBMS must also take into account the inherent constraints of Edge computing, i.e. the limits in terms of computing capacity, storage, bandwidth and sometimes energy (when the terminal is powered by a solar panel or a battery). Architectural and technological choices must therefore be made to meet these limitations. With regard to the representation of data and knowledge, our design choice fell on succinct data structures (SDS), which offer, among other advantages, the fact that they are very compact and do not require decompression during querying. Similarly, it was necessary to integrate data flow management within our DBMS, for example with support for windowing in continuous SPARQL queries, and for the various services supported by our system. Finally, as anomaly detection is an area where knowledge can evolve, we have integrated support for modifications to the knowledge graphs stored on the client instances of our DBMS. This support translates into an extension of certain SDS structures used in our prototype
Giustozzi, Franco. "STEaMINg : semantic time evolving models for industry 4.0 Stream reasoning to improve decision-making in cognitive systems Smart condition monitoring for industry 4.0 manufacturing processes: an ontology-based approach." Thesis, Normandie, 2020. http://www.theses.fr/2020NORMIR13.
Full textIn Industry 4.0, factory assets and machines are equipped with sensors that collect data for effective condition monitoring. This is a difficult task since it requires the integration and processing of heterogeneous data from different sources, with different temporal resolutions and underlying meanings. Ontologies have emerged as a pertinent method to deal with data integration and to represent manufacturing knowledge in a machine-interpretable way through the construction of semantic models. Moreover, the monitoring of industrial processes depends on the dynamic context of their execution. Under these circumstances, the semantic model must evolve in order to represent in which situation(s) a resource is in during the execution of its tasks to support decision making. This thesis studies the use of knowledge representation methods to build an evolving semantic model that represents the industrial domain, with an emphasis on context modeling to provide the notion of situation
Ait, Oubelli Lynda. "Transformations sémantiques pour l'évolution des modèles de données." Thesis, Toulouse, INPT, 2020. http://www.theses.fr/2020INPT0040.
Full textWhen developing a complex system, data models are the key to a successful engineering process because they contain and organize all the information manipulated by the different functions involved in system design. The fact that the data models evolve throughout the design raises problems of maintenance of the data already produced. Our work addresses the issue of evolving data models in a model-driven engineering environment (IDM). We focus on minimizing the impact of the evolution of the data model on the system development process in the specific area of space engineering. In the space industry, model-driven engineering (MDI) is a key area for modeling data exchange with satellites. When preparing a space mission, the associated data models are often updated and must be compared from one version to another. Thus, because of the growth of the changes, it becomes difficult to follow them. New methods and techniques to understand and represent the differences and commonalities between different versions of the model are essential. Recent research deals with the evolution process between the two architectural layers (M2 / M1) of the IDM. In this thesis, we have explored the use of the (M1 / M0) layers of the same architecture to define a set of complex operators and their composition that encapsulate both the evolution of the data model and the data migration. The use of these operators improves the quality of results when migrating data, ensuring the complete preservation of the information contained in the data. In the first part of this thesis, we focused on how to deal with structural differences during the evolution process. The proposed approach is based on the detection of differences and the construction of evolution operators. Then, we studied the performance of the model-based approach (MBD) on two space missions, named PHARAO and MICROSCOPE. Then, we presented a semantic observational approach to deal with the evolution of data models at M1 level. The main interest of the proposed approach is the transposition of the problem of accessibility of the information in a data model, into a problem of path in a labeled directed graph. The approach proved to be able to capture all the evolutions of a data model in a logical operator list instead of a non-exhaustive list of evolution operators. It is generic because, regardless of the type of input data model, if the data model is correctly interpreted to ldg and then project it onto a set of lts, we can check the conservation of the information
Chiky, Raja. "Résumé de flux de données ditribués." Phd thesis, Télécom ParisTech, 2009. http://pastel.archives-ouvertes.fr/pastel-00005137.
Full textCsernel, Baptiste. "Résumé généraliste de flux de données." Paris, ENST, 2008. http://www.theses.fr/2008ENST0048.
Full textThis thesis deals with the creation and management of general purpose summaries build from data streams. It is centered on the development of two algorithms, one designed to produce general purpose summaries for a single data stream, and the other for three data stream sharing relational information. A data stream is defined as a real-time, continuous, ordered sequence of items. It is impossible to control the order in which items arrive, nor is it feasible to locally store a stream in its entirety. Such data streams appear in many applications, such as utility networks, IT or in monitoring tasks for instance in meteorology, geology or even finance. The first step in this work is to define the meaning of a general purpose data stream summary. The first property of such a summary is that it should be suitable for a variety of data mining and querying tasks. The second one is that it should be possible to build from the main summary a summary concerning only a selected portion of the stream encountered so far. The first algorithm designed, StreamSamp, is a general purpose summary algorithm dealing with a single data stream and based around the principle of sampling. While the second algorithm, CrossStream, is is a general purpose summary algorithm dealing with three data streams sharing relational information with one another, one relation stream linking two entity streams. This algorithm is based on the use of micro clusters, inspired by the CluStream algorithm designed by Aggarwal combined with the use of Bloom Filter. Both algorithm were implemented and tested against various sets of data to assess their performance in a number of situations
Chiky, Raja. "Résumé de flux de données distribués." Paris, ENST, 2009. https://pastel.hal.science/pastel-00005137.
Full textIn this thesis, we consider a distributed computing environment, describing a collection of multiple remote sensors that feed a unique central server with numeric and uni-dimensional data streams (also called curves). The central server has a limited memory but should be able to compute aggregated value of any subset of the stream sources from a large time horizon including old and new data streams. Two approaches are studied to reduce the size of data : (1) spatial sampling only consider a random sample of the sources observed at every instant ; (2) temporal sampling consider all sources but samples the instants to be stored. In this thesis, we propose a new approach for summarizing temporally a set of distributed data streams : From the observation of what is happening during a period t -1, we determine a data collection model to apply to the sensors for period t. The computation of aggregates involves statistical inference in the case of spatial sampling and interpolation in the case of temporal sampling. To the best of our knowledge, there is no method for estimating interpolation errors at each timestamp that would take into account some curve features such as the knowledge of the integral of the curve during the period. We propose two approaches : one uses the past of the data curve (naive approach) and the other uses a stochastic process for interpolation (stochastic approach)
Peng, Tao. "Analyse de données loT en flux." Electronic Thesis or Diss., Aix-Marseille, 2021. http://www.theses.fr/2021AIXM0649.
Full textSince the advent of the IoT (Internet of Things), we have witnessed an unprecedented growth in the amount of data generated by sensors. To exploit this data, we first need to model it, and then we need to develop analytical algorithms to process it. For the imputation of missing data from a sensor f, we propose ISTM (Incremental Space-Time Model), an incremental multiple linear regression model adapted to non-stationary data streams. ISTM updates its model by selecting: 1) data from sensors located in the neighborhood of f, and 2) the near-past most recent data gathered from f. To evaluate data trustworthiness, we propose DTOM (Data Trustworthiness Online Model), a prediction model that relies on online regression ensemble methods such as AddExp (Additive Expert) and BNNRW (Bagging NNRW) for assigning a trust score in real time. DTOM consists: 1) an initialization phase, 2) an estimation phase, and 3) a heuristic update phase. Finally, we are interested predicting multiple outputs STS in presence of imbalanced data, i.e. when there are more instances in one value interval than in another. We propose MORSTS, an online regression ensemble method, with specific features: 1) the sub-models are multiple output, 2) adoption of a cost sensitive strategy i.e. the incorrectly predicted instance has a higher weight, and 3) management of over-fitting by means of k-fold cross-validation. Experimentation with with real data has been conducted and the results were compared with reknown techniques
Folch, Helka. "Articuler les classifications sémantiques induites d'un domaine." Paris 13, 2002. http://www.theses.fr/2002PA132015.
Full textChambefort, Françoise. "Mimèsis du flux, exploration des potentialités narratives des flux de données." Thesis, Bourgogne Franche-Comté, 2020. http://www.theses.fr/2020UBFCC004.
Full textSometimes called stream art or data art, digital art seizes data streams as its raw materials. Choosing a path of creative research, this thesis explores the story-telling potentialities of data streams. Structured around technical, social, semiotic and aesthetic approaches, its thinking draws on various fields of study : information and communication sciences, but also computer sciences, cognitive sciences, philosophy, sociology and narratology. The work Lucette, Gare de Clichy was especially designed to answer the researched question. The conformation of the work allowed for two different versions of it : a screen version and a performance. It is studied in all its stages, from its creation process to the public's response to it. Jonathan Fletcher Moore's installation, Artificial Killing Machine, is also analyzed. First, our object of research - stories made from a real-time data stream - is defined and the concept of data mills is crafted to refer to this type of work. Then four hypothesis are formulated and individually verified. If data mills are to be able to form a narrative representation, they must free themselves from the logic of action. Thus can fiction become powered by reality. The metaphor that links the data originated in reality and the crafted fiction generates in the viewer a shifting of focus between what is compared and what compares. This switching-metaphor has the power to reinforce the meaning it carries. Data mills are therefore able to convey the contingency of life as experienced by a vulnerable individual, tossed back and forth between objective and subjective time
Aseervatham, Sujeevan. "Apprentissage à base de Noyaux Sémantiques pour le Traitement de Données Textuelles." Phd thesis, Université Paris-Nord - Paris XIII, 2007. http://tel.archives-ouvertes.fr/tel-00274627.
Full textDans le cadre de cette thèse, nous nous intéressons principalement à deux axes.
Le premier axe porte sur l'étude des problématiques liées au traitement de données textuelles structurées par des approches à base de noyaux. Nous présentons, dans ce contexte, un noyau sémantique pour les documents structurés en sections notamment sous le format XML. Le noyau tire ses informations sémantiques à partir d'une source de connaissances externe, à savoir un thésaurus. Notre noyau a été testé sur un corpus de documents médicaux avec le thésaurus médical UMLS. Il a été classé, lors d'un challenge international de catégorisation de documents médicaux, parmi les 10 méthodes les plus performantes sur 44.
Le second axe porte sur l'étude des concepts latents extraits par des méthodes statistiques telles que l'analyse sémantique latente (LSA). Nous présentons, dans une première partie, des noyaux exploitant des concepts linguistiques provenant d'une source externe et des concepts statistiques issus de la LSA. Nous montrons qu'un noyau intégrant les deux types de concepts permet d'améliorer les performances. Puis, dans un deuxième temps, nous présentons un noyau utilisant des LSA locaux afin d'extraire des concepts latents permettant d'obtenir une représentation plus fine des documents.
Aseervatham, Sujeevan. "Apprentissage à base de noyaux sémantiques pour le traitement de données textuelles." Paris 13, 2007. https://theses.hal.science/tel-00274627.
Full textSemantic Kernel-based Machine Learning for Textual Data Processing. Since the early eighties, statistical methods and, more specifically, the machine learning for textual data processing have known a considerable growth of interest. This is mainly due to the fact that the number of documents to process is growing exponentially. Hus, expert-based methods have become too costly, losing the research focus to the profit of machine learning-based methods. In this thesis, we focus on two main issues. The first one is the processing of semi-structured textual data with kernel-based methods. We present,in this context,as emantic kernel for documents structured by sections under the XML format. This kernel captures these manticin formation with theuse of anexternal source of knowledge e. G. ,at hesaurus. Our kernel was evaluated on a medical document corpus with the UMLS thesaurus. It was ranked in the top ten of the best methods, according to the F1-score, among 44 algorithms at the 2007 CMC Medical NLP International Challenge. The second issue is the study of the use of latent concepts extracted by statistical methods such as the Latent Semantic Analysis (LSA). We present, in a first part, kernels based on linguistic concepts from external sources and on latent concepts of the LSA. We show that a kernel integrating both kinds of concepts improves the text categorization performances. Then, in a second part, we present a kernel that uses local LSAs to extract latent concepts. Local latent concepts are used to have a more finer representation of the documents
Castagliola, Carole. "Héritage et valuation dans les réseaux sémantiques pour les bases de données objets." Compiègne, 1991. http://www.theses.fr/1991COMPD363.
Full textPedraza, Linares Esperanza. "SGBD sémantiques pour un environnement bureautique : intégrité et gestion de transactions." Grenoble 1, 1988. http://tel.archives-ouvertes.fr/tel-00009437.
Full textSalperwyck, Christophe. "Apprentissage incrémental en ligne sur flux de données." Phd thesis, Université Charles de Gaulle - Lille III, 2012. http://tel.archives-ouvertes.fr/tel-00845655.
Full textDupont, Xavier. "Programmation par contraintes sur les flux de données." Caen, 2014. http://www.theses.fr/2014CAEN2016.
Full textIn this thesis, we investigate the generalisation of constraint programming on finite variables to stream variables. First, the concepts of streams, infinite sequences and infinite words have been extensively studied in the litterature, and we propose a state of the art that covers language theory, classical and temporal logics, as well as the numerous formalisms that are strongly related to those. The comparison with temporal logics is a first step towards the unification of formalisms over streams, and because the temporal logics are themselves numerous, the classification of these allows the extrapolation of our contributions to other contexts. The second goal involves identifying the features of the existing formalisms that lend themselve to the techniques of constraint programming over finite variables. Compared to the expressivity of temporal logics, that of our formalism is more limited. This stems from the fact that constraint programming allows only the conjunction of constraints, and requires encapsulating disjunction into constraint propagators. Nevertheless, our formalism allows a gain in concision and the reuse of the concept of propagator in a temporal setting. The question of the generalisation of these results to more expressive logics is left open
Hiscock, Thomas. "Microcontrôleur à flux chiffré d'instructions et de données." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLV074/document.
Full textEmbedded processors are today ubiquitous, dozen of them compose and orchestrate every technology surrounding us, from tablets to smartphones and a large amount of invisible ones. At the core of these systems, processors gather data, process them and interact with the outside world. As such, they are excepted to meet very strict safety and security requirements. From a security perspective, the task is even more difficult considering the user has a physical access to the device, allowing a wide range of specifically tailored attacks.Confidentiality, in terms of both software code and data is one of the fundamental properties expected for such systems. The first contribution of this work is a software encryption method based on the control flow graph of the program. This enables the use of stream ciphers to provide lightweight and efficient encryption, suitable for constrained processors. The second contribution is a data encryption mechanism based on homomorphic encryption. With this scheme, sensible data remain encrypted not only in memory, but also during computations. Then, the integration and evaluation of these solutions on Field Programmable Gate Array (FPGA) with some example programs will be discussed
Allesiardo, Robin. "Bandits Manchots sur Flux de Données Non Stationnaires." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS334/document.
Full textThe multi-armed bandit is a framework allowing the study of the trade-off between exploration and exploitation under partial feedback. At each turn t Є [1,T] of the game, a player has to choose an arm kt in a set of K and receives a reward ykt drawn from a reward distribution D(µkt) of mean µkt and support [0,1]. This is a challeging problem as the player only knows the reward associated with the played arm and does not know what would be the reward if she had played another arm. Before each play, she is confronted to the dilemma between exploration and exploitation; exploring allows to increase the confidence of the reward estimators and exploiting allows to increase the cumulative reward by playing the empirical best arm (under the assumption that the empirical best arm is indeed the actual best arm).In the first part of the thesis, we will tackle the multi-armed bandit problem when reward distributions are non-stationary. Firstly, we will study the case where, even if reward distributions change during the game, the best arm stays the same. Secondly, we will study the case where the best arm changes during the game. The second part of the thesis tacles the contextual bandit problem where means of reward distributions are now dependent of the environment's current state. We will study the use of neural networks and random forests in the case of contextual bandits. We will then propose meta-bandit based approach for selecting online the most performant expert during its learning
Togbe, Maurras Ulbricht. "Détection distribuée d'anomalies dans les flux de données." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS400.
Full textAnomaly detection is an important issue in many application areas such as healthcare, transportation, industry etc. It is a current topic that tries to meet the ever increasing demand in different areas such as intrusion detection, fraud detection, etc. In this thesis, after a general complet state of the art, the unsupervised method Isolation Forest (IForest) has been studied in depth by presenting its limitations that have not been addressed in the literature. Our new version of IForest called Majority Voting IForest improves its execution time. Our ADWIN-based IForest ASD and NDKSWIN-based IForest ASD methods allow the detection of anomalies in data stream with a better management of the drift concept. Finally, distributed anomaly detection using IForest has been studied and evaluated. All our proposals have been validated with experiments on different datasets
Gillani, Syed. "Semantically-enabled stream processing and complex event processing over RDF graph streams." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSES055/document.
Full textThere is a paradigm shift in the nature and processing means of today’s data: data are used to being mostly static and stored in large databases to be queried. Today, with the advent of new applications and means of collecting data, most applications on the Web and in enterprises produce data in a continuous manner under the form of streams. Thus, the users of these applications expect to process a large volume of data with fresh low latency results. This has resulted in the introduction of Data Stream Processing Systems (DSMSs) and a Complex Event Processing (CEP) paradigm – both with distinctive aims: DSMSs are mostly employed to process traditional query operators (mostly stateless), while CEP systems focus on temporal pattern matching (stateful operators) to detect changes in the data that can be thought of as events. In the past decade or so, a number of scalable and performance intensive DSMSs and CEP systems have been proposed. Most of them, however, are based on the relational data models – which begs the question for the support of heterogeneous data sources, i.e., variety of the data. Work in RDF stream processing (RSP) systems partly addresses the challenge of variety by promoting the RDF data model. Nonetheless, challenges like volume and velocity are overlooked by existing approaches. These challenges require customised optimisations which consider RDF as a first class citizen and scale the processof continuous graph pattern matching. To gain insights into these problems, this thesis focuses on developing scalable RDF graph stream processing, and semantically-enabled CEP systems (i.e., Semantic Complex Event Processing, SCEP). In addition to our optimised algorithmic and data structure methodologies, we also contribute to the design of a new query language for SCEP. Our contributions in these two fields are as follows: • RDF Graph Stream Processing. We first propose an RDF graph stream model, where each data item/event within streams is comprised of an RDF graph (a set of RDF triples). Second, we implement customised indexing techniques and data structures to continuously process RDF graph streams in an incremental manner. • Semantic Complex Event Processing. We extend the idea of RDF graph stream processing to enable SCEP over such RDF graph streams, i.e., temporalpattern matching. Our first contribution in this context is to provide a new querylanguage that encompasses the RDF graph stream model and employs a set of expressive temporal operators such as sequencing, kleene-+, negation, optional,conjunction, disjunction and event selection strategies. Based on this, we implement a scalable system that employs a non-deterministic finite automata model to evaluate these operators in an optimised manner. We leverage techniques from diverse fields, such as relational query optimisations, incremental query processing, sensor and social networks in order to solve real-world problems. We have applied our proposed techniques to a wide range of real-world and synthetic datasets to extract the knowledge from RDF structured data in motion. Our experimental evaluations confirm our theoretical insights, and demonstrate the viability of our proposed methods
Coquil, David. "Conception et Mise en Oeuvre de Proxies Sémantiques et Coopératifs." Lyon, INSA, 2006. http://theses.insa-lyon.fr/publication/2006ISAL0020/these.pdf.
Full textOne major issue related to the large-scale deployment of distributed information systems such as the Web is that of the efficient access to data for which caches are a possible solution. Web caches exist at the client level, at the server level, and on intermediate servers, the proxies. The conception and the implementation of efficient Web caches and especially proxies is the main focus of the thesis. Three performance improvement techniques are studied: replacement, prefetching and cooperation policies. Contrarily to traditional approaches that mainly us low-level parameters, we apply semantic catching techniques based on the indexing of documents and on analysis of user access patterns. Algorithms for the measurement of the usefulness of a document for a cache are detailed. This value called temperature is used to define a replacement policy and a prefetching heuristics. These techniques are used in a video server cache management application. A cooperative architecture based on the exchange of documents and of temperature monitoring results is defined. Another application of proxies and semantic catching is also presented in the context of content-based multimedia queries. Using previous research focused on integrating content-based queries with classical databases, we define a cooperative architecture dedicated to distributed content-based multimedia queries which basic components are cooperative proxies and semantic caches. Finally an application of temperature for the management of cache index for the members of theme-based virtual communities
Mokhtari, Noureddine. "Extraction et exploitation d'annotations sémantiques contextuelles à partir de texte." Nice, 2010. http://www.theses.fr/2010NICE4045.
Full textThis thesis falls within the framework of the European project SevenPro (Semantic Virtual Engineering Environment for Product Design) whose aim is to improve the engineering process of production in manufacturing companies, through acquisition, formalization and exploitation of knowledge. We propose a methodological approach and software for generating contextual semantic annotations from text. Our approach is based on ontologies and Semantic Web technologies. In the first part, we propose a model of the concept of "context" for the text. This modeling can be seen as a projection of various aspects of "context" covered by the definitions in literature. We also propose a model of contextual semantic annotations, with the definition of different types of contextual relationships that may exist in the text. Then, we propose a generic methodology for the generation of contextual semantic annotations based on domain ontology that operates at best with the knowledge contained in texts. The novelty in the methodology is that it uses language automatic processing techniques and grammar extraction (automatically generated) field relations, concepts and values of property in order to produce semantic annotations associated with contextual relations. In addition, we take into account the context of occurrence of semantic annotations for their generation. A system that supports this methodology has been implemented and evaluated
Boudellal, Toufik. "Extraction de l'information à partir des flux de données." Saint-Etienne, 2006. http://www.theses.fr/2006STET4014.
Full textThe aim of this work is an attempt to resolve a mining data streams specified problem. It is an adaptative analysis of data streams. The web generation proposes new challenges due to the complexity of data structures. As an example, the data issued from virtual galleries, credit card transactions,. . . Generally, such data are continuous in time, and their sizes are dynamic. We propose a new algorithm based on measures applied to adaptative data streams. The interpretation of results is possible due to such measures. In fact, we compare our algorithm experimentally to other adapted approaches that are considered fundamental in the field. A modified algorithm that is more useful in applications is also discussed. This thesis finishes with a suggestions set about our future work relating to noises data streams and another set of suggestions about the future needfully work
Gabsi, Nesrine. "Extension et interrogation de résumés de flux de données." Phd thesis, Télécom ParisTech, 2011. http://pastel.archives-ouvertes.fr/pastel-00613122.
Full textMarascu, Alice. "Extraction de motifs séquentiels dans les flux de données." Phd thesis, Université de Nice Sophia-Antipolis, 2009. http://tel.archives-ouvertes.fr/tel-00445894.
Full textPetit, Loïc. "Gestion de flux de données pour l'observation de systèmes." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00849106.
Full textGabsi, Nesrine. "Extension et interrogation de résumé de flux de données." Paris, Télécom ParisTech, 2011. http://pastel.archives-ouvertes.fr/pastel-00613122.
Full textIn the last few years, a new environment, in which data have to be collected and processed instantly when arriving, has emerged. To handle the large volume of data associated with this environment, new data processing model and techniques have to be set up ; they are referred as data stream management. Data streams are usually continuous, voluminous, and cannot be registered integrally as persistent data. Many research works have handled this issue. Therefore, new systems called DSMS (Data Stream Management Systems) appeared. The DSMS evaluates continuous queries on a stream or a window (finite subset of streams). These queries have to be specified before the stream's arrival. Nevertheless, in case of some applications, some data could be required after their expiration from the DSMS in-memory. In this case, the system cannot treat the queries as such data are definitely lost. To handle this issue, it is essential to keep a ummary of data stream. Many summaries algorithms have been developed. The selection of a summarizing method depends on the kind of data and the associated issue. In this thesis, we are first interested with the elaboration of a generic summary structure while coming to a compromise between the summary elaboration time and the quality of the summary. We introduce a new summary approach which is more efficient for querying very old data. Then, we focus on the uerying methods for these summaries. Our objective is to integrate the structure of generic summaries in the architecture of the existing DSMS. By this way, we extend the range of the possible queries. Thus, the processing of the queries on old stream data (expired data) becomes possible as well as queries on new stream data. To this end, we introduced two approaches. The difference between them is the role played by summary module when the query is evaluated
Francik, Jaroslaw. "Surveillance du flux des données dans l'animation des algorithmes." Lille 1, 1999. https://pepite-depot.univ-lille.fr/LIBRE/Th_Num/1999/50376-1999-483-1.pdf.
Full textWipliez, Matthieu. "Infrastructure de compilation pour des programmes flux de données." Phd thesis, INSA de Rennes, 2010. http://tel.archives-ouvertes.fr/tel-00598914.
Full textWipliez, Matthieu. "Infrastructure de compilation pour des programmes flux de données." Phd thesis, Rennes, INSA, 2010. http://www.theses.fr/2010ISAR0033.
Full textThe work presented in this thesis takes place in a context of growing demand for better video quality (High-Definition TV, home cinema. . . ) and unprecedented concern for power consumption. The limitations and lack of flexibility of current video standards make it increasingly long and complicated to implement standards on embedded systems. A new standard called Reconfigurable Video Coding aims to solve these problems by describing video coding with dataflow programs. A dataflow program is a program represented as a directed graph where vertices are computational units and edges represent the flow of data between vertices. This thesis presents a compilation infrastructure for dataflow programs that can compile these programs to a simple, high-level Intermediate Representation (IR). We show how this IR can be used to analyze, transform, and generate code for dataflow programs in many languages, from C to hardware description languages
Bouachera, Leïla. "Les flux transfrontières de données et le droit international." Paris 1, 1987. http://www.theses.fr/1987PA010297.
Full textThe arrival of telematics in the late 60s was the origin of the increase in transborder data flow. Most of the research work on international tdf laws has addressed only personal data. In the meantime, the impact of non personal data flows on national security and integrity, national sovereignty, the cultural identity of peoples and the balance of trade, has been neglected. There exists a legal vacuum which, unless filled soon, might give way to a no holds barred battle between the advocates of regulation and those of deregulation. Mention must be made however of the first signs of an international awareness of this problem, this is the tdf declaration approved on april 11, 1985, this opens up the debate into the complex sphere of non personal data. To establish a universal order based on a comprehensive and binding instrument is illusive because the interests of all the protagonists are too divergent. It would be preferable to establish a kind of soft law adapted to the intrinsic characteristics of the topics being regulated
Lechervy, Alexis. "Apprentissage interactif et multi-classes pour la détection de concepts sémantiques dans les données multimédia." Phd thesis, Université de Cergy Pontoise, 2012. http://tel.archives-ouvertes.fr/tel-00781763.
Full textFrancis, Danny. "Représentations sémantiques d'images et de vidéos." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS605.
Full textRecent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: thanks to new big datasets of annotated images and videos, Deep Neural Networks (DNN) have outperformed other models in most cases. In this thesis, we aim at developing DNN models for automatically deriving semantic representations of images and videos. In particular we focus on two main tasks : vision-text matching and image/video automatic captioning. Addressing the matching task can be done by comparing visual objects and texts in a visual space, a textual space or a multimodal space. Based on recent works on capsule networks, we define two novel models to address the vision-text matching problem: Recurrent Capsule Networks and Gated Recurrent Capsules. In image and video captioning, we have to tackle a challenging task where a visual object has to be analyzed, and translated into a textual description in natural language. For that purpose, we propose two novel curriculum learning methods. Moreover regarding video captioning, analyzing videos requires not only to parse still images, but also to draw correspondences through time. We propose a novel Learned Spatio-Temporal Adaptive Pooling method for video captioning that combines spatial and temporal analysis. Extensive experiments on standard datasets assess the interest of our models and methods with respect to existing works
Chartron, Ghislaine. "Analyse des corpus de données textuelles, sondage de flux d'informations." Paris 7, 1988. http://www.theses.fr/1988PA077211.
Full textHotte, Sylvain. "Traitements spatiaux dans un contexte de flux massifs de données." Master's thesis, Université Laval, 2018. http://hdl.handle.net/20.500.11794/30956.
Full textIn recent years we have witnessed a significant volume increase of data streams. The traditional way of processing this information is rendered inefficient or even impossible by this high volume of data. There is an increase in the interest of real time data processing in order to derive greater value of the data. Since those data are often georeferenced, it becomes relevant to offer methods that enable spatial processing on big data streams. However, the subject of spatial processing in a context of Big Data stream has seldom been discussed in scientific research. All the studies that have been done so far involve persistent data and none of them deals with the case where two Big Data streams are in relation. The problem is therefore to determine how to adapt the processing of spatial operators when their parameters derive from two Big Spatial Data stream. Our general objective is to explore the characteristics that allow the development of such analysis and to offer potential solutions. Our research has highlighted the factors influencing the adaptation of spatial processing in a context of Big Data stream. We have determined that adaptation methods can be categorized in different categories according to the characteristics of the spatial operator but also on the characteristics of the data itself and how it is made available. We proposed general methods of spatial processing for each category in order to guide adaptation strategies. For one of these categories, where a binary operator has both operands coming from Big Data stream, we have detailed a method allowing the use of spatial operators. In order to test the effectiveness and validity of the proposed method, we applied this method to an intersection operator and to a proximity analysis operator, the "k" nearest neighbors. These tests made it possible to check the validity and to quantify the effectiveness of the proposed methods in relation to the system evolution or scalability, i.e. increasing the number of processing cores. Our tests also made it possible to quantify the effect of the variation of the partitioning level on the performances of the treatment flow. Our contribution will, hopefully, serves as a starting point for more complex spatial operator adaptation.
Song, Ge. "Méthodes parallèles pour le traitement des flux de données continus." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLC059/document.
Full textWe live in a world where a vast amount of data is being continuously generated. Data is coming in a variety of ways. For example, every time we do a search on Google, every time we purchase something on Amazon, every time we click a ‘like’ on Facebook, every time we upload an image on Instagram, every time a sensor is activated, etc., it will generate new data. Data is different than simple numerical information, it now comes in a variety of forms. However, isolated data is valueless. But when this huge amount of data is connected, it is very valuable to look for new insights. At the same time, data is time sensitive. The most accurate and effective way of describing data is to express it as a data stream. If the latest data is not promptly processed, the opportunity of having the most useful results will be missed.So a parallel and distributed system for processing large amount of data streams in real time has an important research value and a good application prospect. This thesis focuses on the study of parallel and continuous data stream Joins. We divide this problem into two categories. The first one is Data Driven Parallel and Continuous Join, and the second one is Query Driven Parallel and Continuous Join
Cailhol, Simon. "Planification interactive de trajectoire en Réalité Virtuelle sur la base de données géométriques, topologiques et sémantiques." Thesis, Toulouse, INPT, 2015. http://www.theses.fr/2015INPT0058/document.
Full textTo save time and money while designing new products, industry needs tools to design, test and validate the product using virtual prototypes. These virtual prototypes must enable to test the product at all Product Lifecycle Management (PLM) stages. Many operations in product’s lifecycle involve human manipulation of product components (product assembly, disassembly or maintenance). Cue to the increasing integration of industrial products, these manipulations are performed in cluttered environment. Virtual Reality (VR) enables real operators to perform these operations with virtual prototypes. This research work introduces a novel path planning architecture allowing collaboration between a VR user and an automatic path planning system. This architecture is based on an original environment model including semantic, topological and geometric information. The automatic path planning process split in two phases. First, coarse planning uses semantic and topological information. This phase defines a topological path. Then, fine planning uses semantic and geometric information to define a geometrical trajectory within the topological path defined by the coarse planning. The collaboration between VR user and automatic path planner is made of two modes: on one hand, the user is guided along a pre-computed path through a haptic device, on the other hand, the user can go away from the proposed solution and doing it, he starts a re-planning process. Efficiency and ergonomics of both interaction modes is improved thanks to control sharing methods. First, the authority of the automatic system is modulated to provide the user with a sensitive guidance while he follows it and to free the user (weakened guidance) when he explores possible better ways. Second, when the user explores possible better ways, his intents are predicted (thanks to geometrical data associated to topological elements) and integrated in the re-planning process to guide the coarse planning. This thesis is divided in five chapters. The first one exposes the industrial context that motivated this work. Following a description of environment modeling tools, the second chapter introduces the multi-layer environment model proposed. The third chapter presents the path planning techniques from robotics research and details the two phases path planning process developed. The fourth introduce previous work on interactive path planning and control sharing techniques before to describe the interaction modes and control sharing techniques involved in our interactive path planner. Finally, last chapter introduces the experimentations performed with our path planner and analyses their results
Savinaud, Mickaël. "Recalage de flux de données cinématiques pour l'application à l'imagerie optique." Phd thesis, Ecole Centrale Paris, 2010. http://tel.archives-ouvertes.fr/tel-00545424.
Full textRoquier, Ghislain. "Etude de modèles flux de données pour la synthèse logicielle multiprocesseur." Rennes, INSA, 2004. http://www.theses.fr/2008ISAR0020.
Full textParallelism is a universal characteristic of modern computing platforms, from multi-core processorsto programmable logic devices. The sequential programming paradigm is no longer adapted in thecontext of parallel and distributed architectures. The work presented in this thesis document findtheir foundation in the AAA methodology to build parallel programs based on an high-level representationsof both application and architecture. This work has enabled to extend the class of applications that can be modelled by the specification of new graph formalism. The final part of the document shows our involvement in the MPEG RVC framework. The RVC standard intends to facilitate for building the reference codecs offuture MPEG standards, which is based on dataflow to build decoder using a new dataflow languagecalled CAL. This work has enabled to specify and develop a software synthesis tool that enables anautomatic translation of dataflow programs written in CAL
Gauwin, Olivier. "Flux XML, Requêtes XPath et Automates." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2009. http://tel.archives-ouvertes.fr/tel-00421911.
Full textDans cette thèse, nous étudions des algorithmes d'évaluation de requêtes sur des flux XML. Notre objectif est de gérer efficacement la mémoire, afin de pouvoir évaluer des requêtes sur des données volumineuses, tout en utilisant peu de mémoire. Cette tâche s'avère complexe, et nécessite des restrictions importantes sur les langages de requêtes. Nous étudions donc les requêtes définies par des automates déterministes ou par des fragments du standard W3C XPath, plutôt que par des langages plus puissants comme les standards W3C XQuery et XSLT.
Nous définissons tout d'abord les Streaming Tree Automata (STAs), qui opèrent sur les arbres d'arité non bornée dans l'ordre du document. Nous prouvons qu'ils sont équivalents aux Nested Word Automata et aux Pushdown Forest Automata. Nous élaborons ensuite un algorithme d'évaluation au plus tôt, pour les requêtes définies par des STAs déterministes. Bien qu'il ne stocke que les candidats nécessaires, cet algorithme est en temps polynomial à chaque événement du flux, et pour chaque candidat. Par conséquent, nous obtenons des résultats positifs pour l'évaluation en flux des requêtes définies par des STAs déterministes. Nous mesurons une telle adéquation d'un langage de requêtes à une évaluation en flux via un nouveau modèle de machines, appelées Streaming Random Access Machines (SRAMs), et via une mesure du nombre de candidats simultanément vivants, appelé concurrence. Nous montrons également qu'il peut être décidé en temps polynomial si la concurrence d'une requête définie par un STA déterministe est bornée. Notre preuve est basée sur une réduction au problème de la valuation bornée des relations reconnaissables d'arbres.
Concernant le standard W3C XPath, nous montrons que même de petits fragments syntaxiques ne sont pas adaptés à une évaluation en flux, sauf si P=NP. Les difficultés proviennent du non-déterminisme de ce langage, ainsi que du nombre de conjonctions et de disjonctions. Nous définissons des fragments de Forward XPath qui évitent ces problèmes, et prouvons, par compilation vers les STAs déterministes en temps polynomial, qu'ils sont adaptés à une évaluation en flux.
El, Haddadi Anass. "Fouille multidimensionnelle sur les données textuelles visant à extraire les réseaux sociaux et sémantiques pour leur exploitation via la téléphonie mobile." Toulouse 3, 2011. http://thesesups.ups-tlse.fr/1378/.
Full textCompetition is a fundamental concept of the liberal economy tradition that requires companies to resort to Competitive Intelligence (CI) in order to be advantageously positioned on the market, or simply to survive. Nevertheless, it is well known that it is not the strongest of the organizations that survives, nor the most intelligent, but rather, the one most adaptable to change, the dominant factor in society today. Therefore, companies are required to remain constantly on a wakeful state to watch for any change in order to make appropriate solutions in real time. However, for a successful vigil, we should not be satisfied merely to monitor the opportunities, but before all, to anticipate risks. The external risk factors have never been so many: extremely dynamic and unpredictable markets, new entrants, mergers and acquisitions, sharp price reduction, rapid changes in consumption patterns and values, fragility of brands and their reputation. To face all these challenges, our research consists in proposing a Competitive Intelligence System (CIS) designed to provide online services. Through descriptive and statistics exploratory methods of data, Xplor EveryWhere display, in a very short time, new strategic knowledge such as: the profile of the actors, their reputation, their relationships, their sites of action, their mobility, emerging issues and concepts, terminology, promising fields etc. The need for security in XPlor EveryWhere arises out of the strategic nature of information conveyed with quite a substantial value. Such security should not be considered as an additional option that a CIS can provide just in order to be distinguished from one another. Especially as the leak of this information is not the result of inherent weaknesses in corporate computer systems, but above all it is an organizational issue. With Xplor EveryWhere we completed the reporting service, especially the aspect of mobility. Lastly with this system, it's possible to: View updated information as we have access to our strategic database server in real-time, itself fed daily by watchmen. They can enter information at trade shows, customer visits or after meetings
Bernard, Luc. "Développement d'un jeu de structures de données et de contraintes sémantiques pour la compilation(séparée) du langage ADA." Doctoral thesis, Universite Libre de Bruxelles, 1985. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/213624.
Full textBenazouz, Mohamed. "Dimensionnement des mémoires pour les applications de traitement de flux de données." Paris 6, 2012. http://www.theses.fr/2012PA066067.
Full textAjib, Wessam. "Gestion d'un flux temporaire de données dans un réseau radio-mobile TDMA." Paris, ENST, 2001. http://www.theses.fr/2001ENST0001.
Full textChen, Xiaoyi. "Analyse de données de cytometrie de flux pour un grand nombre d'échantillons." Thesis, Cergy-Pontoise, 2015. http://www.theses.fr/2015CERG0777/document.
Full textIn the course of my Ph.D. work, I have developed and applied two new computational approaches for automatic identification of cell populations in multi-parameter flow cytometry across a large number of samples. Both approaches were motivated and taken by the LabEX "Milieu Intérieur" study (hereafter MI study). In this project, ten 8-color flow cytometry panels were standardized for assessment of the major and minor cell populations present in peripheral whole blood, and data were collected and analyzed from 1,000 cohorts of healthy donors.First, we aim at robust characterization of major cellular components of the immune system. We report a computational pipeline, called FlowGM, which minimizes operator input, is insensitive to compensation settings, and can be adapted to different analytic panels. A Gaussian Mixture Model (GMM) - based approach was utilized for initial clustering, with the number of clusters determined using Bayesian Information Criterion. Meta-clustering in a reference donor, by which we mean labeling clusters and merging those with the same label in a pre-selected representative donor, permitted automated identification of 24 cell populations across four panels. Cluster labels were then integrated into Flow Cytometry Standard (FCS) files, thus permitting comparisons to human expert manual analysis. We show that cell numbers and coefficient of variation (CV) are similar between FlowGM and conventional manual analysis of lymphocyte populations, but notably FlowGM provided improved discrimination of "hard-to-gate" monocyte and dendritic cell (DC) subsets. FlowGM thus provides rapid, high-dimensional analysis of cell phenotypes and is amenable to cohort studies.After having cell counts across a large number of cohort donors, some further analysis (for example, the agreement with other methods, the age and gender effect, etc.) are required naturally for the purpose of comprehensive evaluation, diagnosis and discovery. In the context of the MI project, the 1,000 healthy donors were stratified across gender (50% women and 50% men) and age (20-69 years of age). Analysis was streamlined using our established approach FlowGM, the results were highly concordant with the state-of-art gold standard manual gating. More important, further precision of the CD16+ monocytes and cDC1 population was achieved using FlowGM, CD14loCD16hi monocytes and HLADRhi cDC1 cells were consistently identified. We demonstrate that the counts of these two populations show a significant correlation with age. As for the cell populations that are well-known to be related to age, a multiple linear regression model was considered, and it is shown that our results provided higher regression coefficient. These findings establish a strong foundation for comprehensive evaluation of our previous work.When extending this FlowGM method for detailed characterization of certain subpopulations where more variations are revealed across a large number of samples, for example the T cells, we find that the conventional EM algorithm initiated with reference clustering is insufficient to guarantee the alignment of clusters between all samples due to the presence of technical and biological variations. We then improved FlowGM and presented FlowGMP pipeline to address this specific panel. We introduce a Bayesian mixture model by assuming a prior distribution of component parameters and derive a penalized EM algorithm. Finally the performance of FlowGMP on this difficult T cell panel with a comparison between automated and manual analysis shows that our method provides a reliable and efficient identification of eleven T cell subpopulations across a large number of samples