Dissertations / Theses on the topic 'Clustering (intelligence artificielle)'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 39 dissertations / theses for your research on the topic 'Clustering (intelligence artificielle).'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Lévy, Loup-Noé. "Advanced Clustering and AI-Driven Decision Support Systems for Smart Energy Management." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG027.
Full textThis thesis addresses the clustering of complex and heterogeneous energy systems within a Decision Support System (DSS).In chapter 1, we delve into the theory of complex systems and their modeling, recognizing buildings as complex systems, specifically as Sociotechnical Complex Systems. We examine the state of the art of the different agents involved in energy performance within the energy sector, identifying our case study as the Trusted Third Party for Energy Measurement and Performance (TTPEMP.) Given our constraints, we opt to concentrate on the need for a DSS to provide energy recommendations. We compare this system to supervision and recommender systems, highlighting their differences and complementarities and introduce the necessity for explainability in AI-aided decision-making (XAI). Acknowledging the complexity, numerosity, and heterogeneity of buildings managed by the TTPEMP, we argue that clustering serves as a pivotal first step in developing a DSS, enabling tailored recommendations and diagnostics for homogeneous subgroups of buildings. This is presented in Chapter 1.In Chapter 2, we explore DSSs' state of the art, emphasizing the need for governance in semi-automated systems for high-stakes decision-making. We investigate European regulations, highlighting the need for accuracy, reliability, and fairness in our decision system, and identify methodologies to address these needs, such as DevOps methodology and Data Lineage. We propose a DSS architecture that addresses these requirements and the challenges posed by big data, featuring a distributed architecture comprising a data lake for heterogeneous data handling, datamarts for specific data selection and processing, and an ML-Factory populating a model library. Different types of methods are selected for different needs based on the specificities of the data and of the question needing answering.Chapter 3 focuses on clustering as a primary machine learning method in our architecture, essential for identifying homogeneous groups of buildings. Given the combination of numerical, categorical and time series nature of the data describing buildings, we coin the term complex clustering to address this combination of data types. After reviewing the state-of-the-art, we identify the need for dimensionality reduction techniques and the most relevant mixed clustering methods. We also introduce Pretopology as an innovative approach for mixed and complex data clustering. We argue that it allows for greater explainability and interactability in the clustering as it enables Hierarchical clustering and the implementation of logical rules and custom proximity notions. The challenges of evaluating clustering are addressed, and adaptations of numerical clustering to mixed and complex clustering are proposed, taking into account the explainability of the methods.In the datasets and results chapter, we present the public, private, and generated datasets used for experimentation and discuss the clustering results. We analyze the computational performances of algorithms and the quality of clusters obtained on different datasets varying in size, number of clusters, distribution, and number of categorical and numerical parameters. Pretopology and Dimensionality Reduction show promising results compared to state-of-the-art mixed data clustering methods.Finally, we discuss our system's limitations, including the automation limits of the DSS at each step of the data flow. We focus on the critical role of data quality and the challenges in predicting the behavior of complex systems over time. The objectivity of our clustering evaluation methods is challenged due to the absence of ground truth and the reliance on dimensionality reduction to adapt state-of-the-art metrics to complex data. We discuss possible issues regarding the chosen elbow method and future work, such as automation of hyperparameter tuning and continuing the development of the DSS
Rastin, Parisa. "Automatic and Adaptive Learning for Relational Data Stream Clustering." Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCD052.
Full textThe research work presented in this thesis concerns the development of unsupervised learning approaches adapted to large relational and dynamic data-sets. The combination of these three characteristics (size, complexity and evolution) is a major challenge in the field of data mining and few satisfactory solutions exist at the moment, despite the obvious needs of companies. This is a real challenge, because the approaches adapted to relational data have a quadratic complexity, unsuited to the analysis of dynamic data. We propose here two complementary approaches for the analysis of this type of data. The first approach is able to detect well-separated clusters from a signal created during an incremental reordering of the dissimilarity matrix, with no parameter to choose (e.g., the number of clusters). The second proposes to use support points among the objects in order to build a representation space to define representative prototypes of the clusters. Finally, we apply the proposed approaches to real-time profiling of connected users. Profiling tasks are designed to recognize the "state of mind" of users through their navigations on different web-sites
Guillon, Arthur. "Opérateurs de régularisation pour le subspace clustering flou." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS121.
Full textSubspace clustering is a data mining task which consists in simultaneously identifiying groups of similar data and making this similarity explicit, for example by selecting features characteristic of the groups. In this thesis, we consider a specific family of fuzzy subspace clustering models, which are based on the minimization of a cost function. We propose three desirable qualities of clustering, which are absent from the solutions computed by the previous models. We then propose simple penalty terms which we use to encode these properties in the original cost functions. Some of these terms are non-differentiable and the techniques standard in fuzzy clustering cannot be applied to minimize the new cost functions. We thus propose a new, generic optimization algorithm, which extends the standard approach by combining alternate optimization and proximal gradient descent. We then instanciate this algorithm with operators minimizing the three previous penalty terms and show that the resulting algorithms posess the corresponding qualities
Sarazin, Tugdual. "Apprentissage massivement distribué dans un environnement Big Data." Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCD050.
Full textIn recent years, the amount of data analysed by companies and research laboratories increased strongly, opening the era of BigData. However, these raw data are frequently non-categorized and uneasy to use. This thesis aims to improve and ease the pre-treatment and comprehension of these big amount of data by using unsupervised machine learning algorithms.The first part of this thesis is dedicated to a state-of-the-art of clustering and biclustering algorithms and to an introduction to big data technologies. The first part introduces the conception of clustering Self-Organizing Map algorithm [Kohonen,2001] in big data environment. Our algorithm (SOM-MR) provides the same advantages as the original algorithm, namely the creation of data visualisation map based on data clusters. Moreover, it uses the Spark platform that makes it able to treat a big amount of data in a short time. Thanks to the popularity of this platform, it easily fits in many data mining environments. This is what we demonstrated it in our project \Square Predict" carried out in partnership with Axa insurance. The aim of this project was to provide a real-time data analysing platform in order to estimate the severity of natural disasters or improve residential risks knowledge. Throughout this project, we proved the efficiency of our algorithm through its capacity to analyse and create visualisation out of a big volume of data coming from social networks and open data.The second part of this work is dedicated to a new bi-clustering algorithm. BiClustering consists in making a cluster of observations and variables at the same time. In this contribution we put forward a new approach of bi-clustering based on the self-organizing maps algorithm that can scale on big amounts of data (BiTM-MR). To reach this goal, this algorithm is also based on a the Spark platform. It brings out more information than the SOM-MR algorithm because besides producing observation groups, it also associates variables to these groups,thus creating bi-clusters of variables and observations
Thépaut, Solène. "Problèmes de clustering liés à la synchronie en écologie : estimation de rang effectif et détection de ruptures sur les arbres." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS477/document.
Full textIn the view of actual global changes widely caused by human activities, it becomes urgent to understand the drivers of communities' stability. Synchrony between time series of abundances is one of the most important mechanisms. This thesis offers three different angles in order to answer different questions linked to interspecific and spatial synchrony. The works presented find applications beyond the ecological frame. A first chapter is dedicated to the estimation of effective rank of matrices in ℝ or ℂ. We offer tools allowing to measure the synchronisation rate of observations matrices. In the second chapter, we base on the existing work on change-points detection problem on chains in order to offer algorithms which detects change-points on trees. The methods can be used with most data that have to be represented as a tree. In order to study the link between interspecific synchrony and long term tendencies or traits of butterflies species, we offer in the last chapter adaptation of clustering and supervised machine learning methods, such as Random Forest or Artificial Neural Networks to ecological data
Masmoudi, Nesrine. "Modèle bio-inspiré pour le clustering de graphes : application à la fouille de données et à la distribution de simulations." Thesis, Normandie, 2017. http://www.theses.fr/2017NORMLH26/document.
Full textIn this work, we present a novel method based on behavior of real ants for solving unsupervised non-hierarchical classification problem. This approach dynamically creates data groups. It is based on the concept of artificial ants moving complexly at the same time with simple location rules. Each ant represents a data in the algorithm. The movements of ants aim to create homogenous data groups that evolve together in a graph structure. We also propose a method of incremental building neighborhood graphs by artificial ants. We propose two approaches that are derived among biomimetic algorithms, they are hybrid in the sense that the search for the number of classes starting, which are performed by the classical algorithm K-Means classification, it is used to initialize the first partition and the graph structure
Sublemontier, Jacques-Henri. "Classification non supervisée : de la multiplicité des données à la multiplicité des analyses." Phd thesis, Université d'Orléans, 2012. http://tel.archives-ouvertes.fr/tel-00801555.
Full textFalih, Issam. "Attributed Network Clustering : Application to recommender systems." Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCD011/document.
Full textIn complex networks analysis field, much effort has been focused on identifying graphs communities of related nodes with dense internal connections and few external connections. In addition to node connectivity information that are mostly composed by different types of links, most real-world networks contains also node and/or edge associated attributes which can be very relevant during the learning process to find out the groups of nodes i.e. communities. In this case, two types of information are available : graph data to represent the relationship between objects and attributes information to characterize the objects i.e nodes. Classic community detection and data clustering techniques handle either one of the two types but not both. Consequently, the resultant clustering may not only miss important information but also lead to inaccurate findings. Therefore, various methods have been developed to uncover communities in networks by combining structural and attribute information such that nodes in a community are not only densely connected, but also share similar attribute values. Such graph-shape data is often referred to as attributed graph.This thesis focuses on developing algorithms and models for attributed graphs. Specifically, I focus in the first part on the different types of edges which represent different types of relations between vertices. I proposed a new clustering algorithms and I also present a redefinition of principal metrics that deals with this type of networks.Then, I tackle the problem of clustering using the node attribute information by describing a new original community detection algorithm that uncover communities in node attributed networks which use structural and attribute information simultaneously. At last, I proposed a collaborative filtering model in which I applied the proposed clustering algorithms
Boudane, Abdelhamid. "Fouille de données par contraintes." Thesis, Artois, 2018. http://www.theses.fr/2018ARTO0403/document.
Full textIn this thesis, We adress the well-known clustering and association rules mining problems. Our first contribution introduces a new clustering framework, where complex objects are described by propositional formulas. First, we extend the two well-known k-means and hierarchical agglomerative clustering techniques to deal with these complex objects. Second, we introduce a new divisive algorithm for clustering objects represented explicitly by sets of models. Finally, we propose a propositional satisfiability based encoding of the problem of clustering propositional formulas without the need for an explicit representation of their models. In a second contribution, we propose a new propositional satisfiability based approach to mine association rules in a single step. The task is modeled as a propositional formula whose models correspond to the rules to be mined. To highlight the flexibility of our proposed framework, we also address other variants, namely the closed, minimal non-redundant, most general and indirect association rules mining tasks. Experiments on many datasets show that on the majority of the considered association rules mining tasks, our declarative approach achieves better performance than the state-of-the-art specialized techniques
Boutalbi, Rafika. "Model-based tensor (co)-clustering and applications." Electronic Thesis or Diss., Université Paris Cité, 2020. https://wo.app.u-paris.fr/cgi-bin/WebObjects/TheseWeb.woa/wa/show?t=7172&f=55867.
Full textClustering, which seeks to group together similar data points according to a given criterion, is an important unsupervised learning technique to deal with large scale data. In particular, given a data matrix where rows represent objects and columns represent features, clustering aims to partition only one dimension of the matrix at a time, by clustering either objects or features. Although successfully applied in several application domains, clustering techniques are often challenged by certain characteristics exhibited by some datasets such as high dimensionality and sparsity. When it comes to such data, co-clustering techniques, which allow the simultaneous clustering of rows and columns of a data matrix, has proven to be more beneficial. In particular, co-clustering techniques allow the exploitation of the inherent duality between the objects set and features set, which make them more effective even if we are interested in the clustering of only one dimension of our data matrix. In addition, co-clustering turns out to be more efficient since compressed matrices are used at each time step of the process instead of the whole matrix for traditional clustering. Although co-clustering approaches have been successfully applied in a variety of applications, existing approaches are specially tailored for datasets represented by double-entry tables. However, in several real-world applications, two dimensions are not sufficient to represent the dataset. For example, if we consider the articles clustering problem, several information linked to the articles can be collected, such as common words, co-authors and citations, which naturally lead to a tensorial representation. Intuitively, leveraging all this information would lead to a better clustering quality. In particular, two articles that share a large set of words, authors and citations are very likely to be similar. Despite the great interest of tensor co-clustering models, research works are extremely limited in this context and rely, for most of them, on tensor factorization methods. Inspired by the famous statement made by Jean Paul Benzécri "The model must follow the data and not vice versa", we have chosen in this thesis to rely on appropriate mixture models. More explicitly, we propose several new co-clustering models which are specially tailored for tensorial representations as well as robust towards data sparsity. Our contribution can be summarized as follows. First, we propose to extend the LBM (Latent Block Model) formalism to take into account tensorial structures. More specifically, we present Tensor LBM (TLBM), a powerful tensor co-clustering model that we successfully applied on diverse kind of data. Moreover, we highlight that the derived algorithm VEM-T, reveals the most meaningful co-clusters from tensor data. Second, we develop a novel Sparse TLBM taking into account sparsity. We extend its use for the management of multiple graphs (or multi-view graphs), leading to implicit consensus clustering of multiple graphs. As a last contribution of this thesis, we propose a new co-clusterwise method which integrates co-clustering in a supervised learning framework. These contributions have been successfully evaluated on tensorial data from various fields ranging from recommendation systems, clustering of hyperspectral images and categorization of documents, to waste management optimization. They also allow us to envisage interesting and immediate future research avenues. For instance, the extension of the proposed models to tri-clustering and multivariate time series
Abba, Ari Ado Adamou. "Bio-inspired Solutions for Optimal Management in Wireless Sensor Networks." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLV044.
Full textDuring the past few years, wireless sensor networks witnessed an increased interest in both the industrial and the scientific community due to the potential wide area of applications. However, sensors’ components are designed with extreme resource constraints, especially the power supply limitation. It is therefore necessary to design low power, scalable and energy efficient protocols in order to extend the lifetime of such networks. Cluster-based sensor networks are the most popular approach for optimizing the energy consumption of sensor nodes, in order to strongly influence the overall performance of the network. In addition, routing involves non negligible operations that considerably affect the network lifetime and the throughput. In this thesis, we addressed the clustering and routing problems by hiring intelligent optimization methods through biologically inspired computing, which provides the most powerful models that enabled a global intelligence through local and simple behaviors. We proposed a distributed clustering approach based on the nest-sites selection process of a honeybee swarm. We formulated the distributed clustering problem as a social decision-making process in which sensors act in a collective manner to choose their cluster heads. To achieve this choice, we proposed a multi- objective cost-based fitness function. In the design of our proposed algorithm, we focused on the distribution of load balancing among each cluster member in order to extend network lifetime by making a tradeoff between the energy consumption and the quality of the communication link among sensors. Then, we proposed a centralized cluster-based routing protocol for wireless sensor networks by using the fast and efficient searching features of the artificial bee colony algorithm. We formulated the clustering as a linear programming problem and the routing problem is solved by proposing a cost-based function. We designed a multi-objective fitness function that uses the weighted sum approach, in the assignment of sensors to a cluster. The clustering algorithm allows the efficient building of clusters by making a tradeoff between the energy consumption and the quality of the communication link within clusters while the routing is realized in a distributed manner. The proposed protocols have been intensively experimented with a number of topologies in various network scenarios and the results are compared with the well-known cluster-based routing protocols. The results demonstrated the effectiveness of the proposed protocols
Bubeck, Sébastien. "JEUX DE BANDITS ET FONDATIONS DU CLUSTERING." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2010. http://tel.archives-ouvertes.fr/tel-00845565.
Full textEnjalbert, Courrech Nicolas. "Inférence post-sélection pour l'analyse des données transcriptomiques." Electronic Thesis or Diss., Université de Toulouse (2023-....), 2024. http://www.theses.fr/2024TLSES199.
Full textIn the field of transcriptomics, technological advances, such as microarrays and high-throughput sequencing, have enabled large-scale quantification of gene expression. These advances have raised statistical challenges, particularly in differential expression analysis, which aims to identify genes that significantly differentiate between two populations. However, traditional inference procedures lose their ability to control the false positive rate when biologists select a subset of genes. Post-hoc inference methods address this limitation by providing control over the number of false positives, even for arbitrary gene sets. The first contribution of this manuscript demonstrates the effectiveness of these methods for the differential analysis of transcriptomic data between two biological conditions, notably through the introduction of a linear-time algorithm for computing post-hoc bounds, adapted to the high dimensionality of the data. An interactive application was also developed to facilitate the selection and simultaneous evaluation of post-hoc bounds for sets of genes of interest. These contributions are presented in the first part of the manuscript. The technological evolution towards single-cell sequencing has raised new questions, particularly regarding the identification of genes whose expression distinguishes one cellular group from another. This issue is complex because cell groups must first be estimated using clustering method before performing a comparative test, leading to a circular analysis. In the second part of this manuscript, we present a review of post-clustering inference methods addressing this problem, as well as a numerical comparison of multivariate and marginal approaches for cluster comparison. Finally, we explore how the use of mixture models in the clustering step can be exploited in post-clustering tests, and discuss perspectives for applying these tests to transcriptomic data
Ghesmoune, Mohammed. "Apprentissage non supervisé de flux de données massives : application aux Big Data d'assurance." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCD061/document.
Full textThe research outlined in this thesis concerns the development of approaches based on growing neural gas (GNG) for clustering of data streams. We propose three algorithmic extensions of the GNG approaches: sequential, distributed and parallel, and hierarchical; as well as a model for scalability using MapReduce and its application to learn clusters from the real insurance Big Data in the form of a data stream. We firstly propose the G-Stream method. G-Stream, as a “sequential" clustering method, is a one-pass data stream clustering algorithm that allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. G-Stream uses an exponential fading function to reduce the impact of old data whose relevance diminishes over time. The links between the nodes are also weighted. A reservoir is used to hold temporarily the distant observations in order to reduce the movements of the nearest nodes to the observations. The batchStream algorithm is a micro-batch based method for clustering data streams which defines a new cost function taking into account that subsets of observations arrive in discrete batches. The minimization of this function, which leads to a topological clustering, is carried out using dynamic clusters in two steps: an assignment step which assigns each observation to a cluster, followed by an optimization step which computes the prototype for each node. A scalable model using MapReduce is then proposed. It consists of decomposing the data stream clustering problem into the elementary functions, Map and Reduce. The observations received in each sub-dataset (within a time interval) are processed through deterministic parallel operations (Map and Reduce) to produce the intermediate states or the final clusters. The batchStream algorithm is validated on the insurance Big Data. A predictive and analysis system is proposed by combining the clustering results of batchStream with decision trees. The architecture and these different modules from the computational core of our Big Data project, called Square Predict. GH-Stream for both visualization and clustering tasks is our third extension. The presented approach uses a hierarchical and topological structure for both of these tasks
Morvan, Anne. "Contributions to unsupervised learning from massive high-dimensional data streams : structuring, hashing and clustering." Electronic Thesis or Diss., Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLED033.
Full textThis thesis focuses on how to perform efficiently unsupervised machine learning such as the fundamentally linked nearest neighbor search and clustering task, under time and space constraints for high-dimensional datasets. First, a new theoretical framework reduces the space cost and increases the rate of flow of data-independent Cross-polytope LSH for the approximative nearest neighbor search with almost no loss of accuracy.Second, a novel streaming data-dependent method is designed to learn compact binary codes from high-dimensional data points in only one pass. Besides some theoretical guarantees, the quality of the obtained embeddings are accessed on the approximate nearest neighbors search task.Finally, a space-efficient parameter-free clustering algorithm is conceived, based on the recovery of an approximate Minimum Spanning Tree of the sketched data dissimilarity graph on which suitable cuts are performed
Darty, Kevin. "Évaluation de la qualité des comportements des agents en simulation : application à un simulateur de conduite en environnement virtuel." Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066339/document.
Full textThis thesis is in the context of the Multi-Agents Simulation and is interested in evaluating the ability of agents to reproduce human behaviors. This problem appears in many domains such as Virtual Reality and Embodied Conversational Agents. The dominant approach to evaluate these behaviors uses Social Sciences questionnaires. There are only few approaches based on Artificial Intelligence and automatic data analysis at the microscopic scale. We show in this thesis that the evaluation of behavior can benefit from both approaches when used jointly. First, we present a method for evaluating the agents' behavior quality. It combines the Artificial Intelligence approach and the Social Science approach. The first one is based on simulation logs clustering. The second one evaluates the users by an annotation of the behaviors. We then present an algorithm that compare agents to humans in order to assess the capacities, the lacks, and the errors in the agent model, and provide metrics. We then make these behaviors explicite based on user categories. Finally, we present a cycle for automatic calibration of the agents and an exploration of the parameter space. Our evaluation method is usable for the analysis of an agent model, and for comparing several agent models. We applied this methodology on several driver behavior studies to analyse the road traffic simulation ARCHISIM, and we present the obtained results
Peignier, Sergio. "Subspace clustering on static datasets and dynamic data streams using bio-inspired algorithms." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSEI071/document.
Full textAn important task that has been investigated in the context of high dimensional data is subspace clustering. This data mining task is recognized as more general and complicated than standard clustering, since it aims to detect groups of similar objects called clusters, and at the same time to find the subspaces where these similarities appear. Furthermore, subspace clustering approaches as well as traditional clustering ones have recently been extended to deal with data streams by updating clustering models in an incremental way. The different algorithms that have been proposed in the literature, rely on very different algorithmic foundations. Among these approaches, evolutionary algorithms have been under-explored, even if these techniques have proven to be valuable addressing other NP-hard problems. The aim of this thesis was to take advantage of new knowledge from evolutionary biology in order to conceive evolutionary subspace clustering algorithms for static datasets and dynamic data streams. Chameleoclust, the first algorithm developed in this work, takes advantage of the large degree of freedom provided by bio-like features such as a variable genome length, the existence of functional and non-functional elements and mutation operators including chromosomal rearrangements. KymeroClust, our second algorithm, is a k-medians based approach that relies on the duplication and the divergence of genes, a cornerstone evolutionary mechanism. SubMorphoStream, the last one, tackles the subspace clustering task over dynamic data streams. It relies on two important mechanisms that favor fast adaptation of bacteria to changing environments, namely gene amplification and foreign genetic material uptake. All these algorithms were compared to the main state-of-the-art techniques, obtaining competitive results. Results suggest that these algorithms are useful complementary tools in the analyst toolbox. In addition, two applications called EvoWave and EvoMove have been developed to assess the capacity of these algorithms to address real world problems. EvoWave is an application that handles the analysis of Wi-Fi signals to detect different contexts. EvoMove, the second one, is a musical companion that produces sounds based on the clustering of dancer moves captured using motion sensors
Riverain, Paul. "Integrating prior knowledge into unsupervised learning for railway transportation." Electronic Thesis or Diss., Université Paris Cité, 2022. http://www.theses.fr/2022UNIP7326.
Full textIn a transportation network, supervision plays a key role to ensure smooth operations and satisfied voyagers. This includes providing adequate passenger information, managing the security of the passengers, the fixed assets, the traction power systems and supervising the traffic in real-time. In this thesis, we address the conception of new data-driven algorithmic tools to help urban railway operators in the task of supervision of the transportation network. As many decisions of the operators depend on how the trips of the users are distributed on the network, we seek to provide synthetic information about the current passenger flow and its evolution to the operators in order to help them in the supervision of the traffic and the fixed assets. Given the entries and exits of the users on the network, the passenger flow can be seen as a discrete-time dynamic graph whose nodes are the stations of the network and whose edges count the number of passengers between any two pairs of stations. We thus aim at summarizing this dynamic graph using clustering techniques. The Block Models, including the Stochastic Block Model and the Latent Block Model, are model-based approaches for co-clustering that appear adequate for this task of graph clustering. The clustering here only depends on how the passenger flow is distributed on the network and does not include the expert knowledge of the operators. Consequently, we also seek to take into account contextual information such as the stations characteristics, the network topology or the actions of the operators on the train regulation in the summarizing of the passenger flow. We first review the main concepts our works are based on as well as some related works on unsupervised learning for passenger flow analysis. We then propose a formalization of the operational problem. In our first contribution, we present an extension of the Stochastic Block Model (SBM) for discrete-time dynamic networks that takes into account the variability in node degrees, allowing us to model a broader class of networks. We derive an inference procedure based on Variational Expectation-Maximization that also provides the means to estimate the time-dependent degree corrections. For our second contribution, we propose to leverage prior knowledge in the form of pairwise semi-supervision in both row and column space to improve the clustering performances of the algorithms derived from the Latent Block Model (LBM). We introduce a general probabilistic framework for incorporating Must Link and Cannot Link relationships in the LBM based on Hidden Markov Random Fields and present two inference algorithms based on Variational and Classification EM. Finally, we present the application of the two previous algorithms on real-world passenger flow data. We then describe an interactive tool that we created to visualize the clusters obtained with the dynamic LBM and interpret them using the estimated parameters of the model. Next, we apply the co-clustering algorithms in three different contexts to analyze the passenger flow on different time scales. We present the practical aspects related to the utilization of these algorithms as well as possible use-cases the pairwise supervision. Finally, we detail the limits of the proposed algorithms and present some perspectives
Geiler, Louis. "Deep learning for churn prediction." Electronic Thesis or Diss., Université Paris Cité, 2022. http://www.theses.fr/2022UNIP7333.
Full textThe problem of churn prediction has been traditionally a field of study for marketing. However, in the wake of the technological advancements, more and more data can be collected to analyze the customers behaviors. This manuscript has been built in this frame, with a particular focus on machine learning. Thus, we first looked at the supervised learning problem. We have demonstrated that logistic regression, random forest and XGBoost taken as an ensemble offer the best results in terms of Area Under the Curve (AUC) among a wide range of traditional machine learning approaches. We also have showcased that the re-sampling approaches are solely efficient in a local setting and not a global one. Subsequently, we aimed at fine-tuning our prediction by relying on customer segmentation. Indeed,some customers can leave a service because of a cost that they deem to high, and other customers due to a problem with the customer’s service. Our approach was enriched with a novel deep neural network architecture, which operates with both the auto-encoders and the k-means approach. Going further, we focused on self-supervised learning in the tabular domain. More precisely, the proposed architecture was inspired by the work on the SimCLR approach, where we altered the architecture with the Mean-Teacher model from semi-supervised learning. We showcased through the win matrix the superiority of our approach with respect to the state of the art. Ultimately, we have proposed to apply what we have built in this manuscript in an industrial setting, the one of Brigad. We have alleviated the company churn problem with a random forest that we optimized through grid-search and threshold optimization. We also proposed to interpret the results with SHAP (SHapley Additive exPlanations)
Renaud, Jeremy. "Amélioration de la prédiction des commandes des pharmacies auprès de la CERP RRM." Electronic Thesis or Diss., Bourgogne Franche-Comté, 2024. http://www.theses.fr/2024UBFCD010.
Full textThe CERP Rhin Rhone Mediterranée (CERP RRM) is a wholesale distributor responsible for ensuring pharmacies' supply. Despite recent advancements in hospital logistics, the pharmaceutical sector notably lacks decision support tools. The thesis aims to establish a predictive system for all CERP clients to forecast orders with the highest possible accuracy. The data primarily consists of time series.Initially, the thesis focused on conducting a state-of-the-art review of time series prediction technologies, as well as implementing AI systems in industrial sectors related to wholesale distribution professions. The main contribution of this thesis was to enhance CERP RRM predictions at multiple levels using machine learning techniques. Our results demonstrate an improvement in predictions compared to the current method. The second contribution was to propose a new method based on sales curve analysis to group products together. This method was developed to address the issue of grouping parapharmacy products within CERP RRM. The final contribution of this thesis is a comparative study of different natural language processing models implemented in a conversational assistant for the technical service of a pharmacy management software. This solution has shown promising results, approaching those of an expert human
Makkhongkaew, Raywat. "Semi-supervised co-selection : instances and features : application to diagnosis of dry port by rail." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE1341.
Full textWe are drowning in massive data but starved for knowledge retrieval. It is well known through the dimensionality tradeoff that more data increase informative but pay a price in computational complexity, which has to be made up in some way. When the labeled sample size is too little to bring sufficient information about the target concept, supervised learning fail with this serious challenge. Unsupervised learning can be an alternative in this problem. However, as these algorithms ignore label information, important hints from labeled data are left out and this will generally downgrades the performance of unsupervised learning algorithms. Using both labeled and unlabeled data is expected to better procedure in semi-supervised learning, which is more adapted for large domain applications when labels are hardly and costly to obtain. In addition, when data are large, feature selection and instance selection are two important dual operations for removing irrelevant information. Both of tasks with semisupervised learning are different challenges for machine learning and data mining communities for data dimensionality reduction and knowledge retrieval. In this thesis, we focus on co-selection of instances and features in the context of semi-supervised learning. In this context, co-selection becomes a more challenging problem as the data contains labeled and unlabeled examples sampled from the same population. To do such semi-supervised coselection, we propose two unified frameworks, which efficiently integrate labeled and unlabeled parts into the co-selection process. The first framework is based on weighting constrained clustering and the second one is based on similarity preserving selection. Both approaches evaluate the usefulness of features and instances in order to select the most relevant ones, simultaneously. Finally, we present a variety of empirical studies over high-dimensional data sets, which are well-known in the literature. The results are promising and prove the efficiency and effectiveness of the proposed approaches. In addition, the developed methods are validated on a real world application, over data provided by the State Railway of Thailand (SRT). The purpose is to propose the application models from our methodological contributions to diagnose the performance of rail dry port systems. First, we present the results of some ensemble methods applied on a first data set, which is fully labeled. Second, we show how can our co-selection approaches improve the performance of learning algorithms over partially labeled data provided by SRT
Morvan, Anne. "Contributions to unsupervised learning from massive high-dimensional data streams : structuring, hashing and clustering." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLED033/document.
Full textThis thesis focuses on how to perform efficiently unsupervised machine learning such as the fundamentally linked nearest neighbor search and clustering task, under time and space constraints for high-dimensional datasets. First, a new theoretical framework reduces the space cost and increases the rate of flow of data-independent Cross-polytope LSH for the approximative nearest neighbor search with almost no loss of accuracy.Second, a novel streaming data-dependent method is designed to learn compact binary codes from high-dimensional data points in only one pass. Besides some theoretical guarantees, the quality of the obtained embeddings are accessed on the approximate nearest neighbors search task.Finally, a space-efficient parameter-free clustering algorithm is conceived, based on the recovery of an approximate Minimum Spanning Tree of the sketched data dissimilarity graph on which suitable cuts are performed
Muliukov, Artem. "Étude croisée des cartes auto-organisatrices et des réseaux de neurones profonds pour l'apprentissage multimodal inspiré du cerveau." Electronic Thesis or Diss., Université Côte d'Azur, 2024. https://intranet-theses.unice.fr/2024COAZ4008.
Full textCortical plasticity is one of the main features that enable our capability to learn and adapt in our environment. Indeed, the cerebral cortex has the ability to self-organize itself through two distinct forms of plasticity: the structural plasticity and the synaptic plasticity. These mechanisms are very likely at the basis of an extremely interesting characteristic of the human brain development: the multimodal association. The brain uses spatio-temporal correlations between several modalities to structure the data and create sense from observations. Moreover, biological observations show that one modality can activate the internal representation of another modality when both are correlated. To model such a behavior, Edelman and Damasio proposed respectively the Reentry and the Convergence Divergence Zone frameworks where bi-directional neural communications can lead to both multimodal fusion (convergence) and inter-modal activation (divergence). Nevertheless, these frameworks do not provide a computational model at the neuron level, and only few works tackle this issue of bio-inspired multimodal association which is yet necessary for a complete representation of the environment especially when targeting autonomous and embedded intelligent systems. In this doctoral project, we propose to pursue the exploration of brain-inspired computational models of self-organization for multimodal unsupervised learning in neuromorphic systems. These neuromorphic architectures get their energy-efficient from the bio-inspired models they support, and for that reason we only consider in our work learning rules based on local and distributed processing
Ngo, Ha Nhi. "Apprentissage continu et prédiction coopérative basés sur les systèmes de multi-agents adaptatifs appliqués à la prévision de la dynamique du trafic." Electronic Thesis or Diss., Université de Toulouse (2023-....), 2024. http://www.theses.fr/2024TLSES043.
Full textLe développement rapide des technologies matérielles, logicielles et de communication des systèmes de transport ont apporté des opportunités prometteuses et aussi des défis importants pour la société humaine. Parallèlement à l'amélioration de la qualité des transports, l'augmentation du nombre de véhicules a entraîné de fréquents embouteillages, en particulier dans les grandes villes aux heures de pointe. Les embouteillages ont de nombreuses conséquences sur le coût économique, l'environnement, la santé mentale des conducteurs et la sécurité routière. Il est donc important de prévoir la dynamique du trafic et d'anticiper l'apparition des embouteillages, afin de prévenir et d'atténuer les situations de trafic perturbées, ainsi que les collisions dangereuses à la fin de la queue d'un embouteillage. De nos jours, les technologies innovatives des systèmes de transport intelligents ont apporté des ensembles de données diverses et à grande échelle sur le trafic qui sont continuellement collectées et transférées entre les dispositifs sous forme de flux de données en temps réel. Par conséquent, de nombreux services de systèmes de transport intelligents ont été développés basé sur l'analyse de données massives, y compris la prévision du trafic. Cependant, le trafic contient de nombreux facteurs variés et imprévisibles qui rendent la modélisation, l'analyse et l'apprentissage de l'évolution historique du trafic difficiles. Le système que nous proposons vise donc à remplir les cinq composantes suivantes d'un système de prévision du trafic : textbf{analyse temporelle, analyse spatiale, interprétabilité, analyse de flux et adaptabilité à plusieurs échelles de données} pour capturer les patterns historiques de trafic à partir des flux de données, fournir une explication explicite de la causalité entrée-sortie et permettre différentes applications avec divers scénarios. Pour atteindre les objectifs mentionnés, nous proposons un modèle d'agent basé sur le clustering dynamique et la théorie des systèmes multi-agents adaptatifs afin de fournir des mécanismes d'apprentissage continu et de prédiction coopérative. Le modèle d'agent proposé comprend deux processus interdépendants fonctionnant en parallèle : textbf{apprentissage local continu} et textbf{prédiction coopérative}. Le processus d'apprentissage vise à détecter, au niveau de l'agent, différents états représentatifs à partir des flux de données reçus. Basé sur le clustering dynamique, ce processus permet la mise à jour continue de la base de données d'apprentissage en s'adaptant aux nouvelles données. Simultanément, le processus de prédiction exploite la base de données apprise, dans le but d'estimer les futurs états potentiels pouvant être observés. Ce processus prend en compte l'analyse de la dépendance spatiale en intégrant la coopération entre les agents et leur voisinage. Les interactions entre les agents sont conçues sur la base de la théorie AMAS avec un ensemble de mécanismes d'auto-adaptation comprenant textbf{l'auto-organisation}, textbf{l'autocorrection} et textbf{l'auto-évolution}, permettant au système d'éviter les perturbations, de gérer la qualité de la prédiction et de prendre en compte les nouvelles informations apprises dans le calcul de la prédiction. Les expériences menées dans le contexte de la prévision de la dynamique du trafic évaluent le système sur des ensembles de données générées et réelles à différentes échelles et dans différents scénarios. Les résultats obtenus ont montré la meilleure performance de notre proposition par rapport aux méthodes existantes lorsque les données de trafic expriment de fortes variations. En outre, les mêmes conclusions retirées de différents cas d'étude renforcent la capacité du système à s'adapter à des applications multi-échelles
Claeys, Emmanuelle. "Clusterisation incrémentale, multicritères de données hétérogènes pour la personnalisation d’expérience utilisateur." Thesis, Strasbourg, 2019. http://www.theses.fr/2019STRAD039.
Full textIn many activity sectors (health, online sales,...) designing from scratch an optimal solution for a defined problem (finding a protocol to increase the cure rate, designing a web page to promote the purchase of one or more products,...) is often very difficult or even impossible. In order to face this difficulty, designers (doctors, web designers, production engineers,...) often work incrementally by successive improvements of an existing solution. However, defining the most relevant changes remains a difficult problem. Therefore, a solution adopted more and more frequently is to compare constructively different alternatives (also called variations) in order to determine the best one by an A/B Test. The idea is to implement these alternatives and compare the results obtained, i.e. the respective rewards obtained by each variation. To identify the optimal variation in the shortest possible time, many test methods use an automated dynamic allocation strategy. Its allocate the tested subjects quickly and automatically to the most efficient variation, through a learning reinforcement algorithms (as one-armed bandit methods). These methods have shown their interest in practice but also limitations, including in particular a latency time (i.e. a delay between the arrival of a subject to be tested and its allocation) too long, a lack of explicitness of choices and the integration of an evolving context describing the subject's behaviour before being tested. The overall objective of this thesis is to propose a understable generic A/B test method allowing a dynamic real-time allocation which take into account the temporals static subjects’s characteristics
Grollemund, Vincent. "Exploration et modélisation de données peu ou pas structurées." Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS027.
Full textSupervised learning models are usually trained on data with limited constraints. Unfortunately, data are generally scarce, incomplete and biased in real-world use cases, which hampers efficient model design. Such data can and should still be leveraged to discover relevant patterns, glean insight and develop meaningful conclusions. In this thesis, we investigate an unsupervised learning approach to isolate minority samples encompassed within a larger population. Our review includes two different use cases: Amyotrophic Lateral Sclerosis prognosis and identification of potential innovation funding recipients. Despite differences in their purpose, these contexts face similar issues: poor data availability of partial and unrepresentative samples. In both cases, the aim is to detect samples from a minority population: patients with a poorer 1-year prognosis and companies that are more likely to be successful funding applicants. Data are projected into a lower-dimensional space using Uniform Manifold Approximation and Projection (UMAP), a nonlinear dimension reduction technique. Differences in data distributions are harnessed and used to isolate the target minority population, using Density Based Clustering of Applications with Noise (DBSCAN) and alpha shapes. Correlations between input and target variables become visible within the projection space and minority samples are isolated from the remaining data. As a result, in spite of poor data quality, we provide additional insight with regard to recently diagnosed patients and potential funding applicants
Sollenborn, Mikael. "Clustering and case-based reasoning for user stereotypes /." Västerås : Mälardalen University, 2004. http://www.mrtc.mdh.se/publications/0770.pdf.
Full textSîrbu, Adela-Maria. "Dynamic machine learning for supervised and unsupervised classification." Thesis, Rouen, INSA, 2016. http://www.theses.fr/2016ISAM0002/document.
Full textThe research direction we are focusing on in the thesis is applying dynamic machine learning models to salve supervised and unsupervised classification problems. We are living in a dynamic environment, where data is continuously changing and the need to obtain a fast and accurate solution to our problems has become a real necessity. The particular problems that we have decided te approach in the thesis are pedestrian recognition (a supervised classification problem) and clustering of gene expression data (an unsupervised classification. problem). The approached problems are representative for the two main types of classification and are very challenging, having a great importance in real life.The first research direction that we approach in the field of dynamic unsupervised classification is the problem of dynamic clustering of gene expression data. Gene expression represents the process by which the information from a gene is converted into functional gene products: proteins or RNA having different roles in the life of a cell. Modern microarray technology is nowadays used to experimentally detect the levels of expressions of thousand of genes, across different conditions and over time. Once the gene expression data has been gathered, the next step is to analyze it and extract useful biological information. One of the most popular algorithms dealing with the analysis of gene expression data is clustering, which involves partitioning a certain data set in groups, where the components of each group are similar to each other. In the case of gene expression data sets, each gene is represented by its expression values (features), at distinct points in time, under the monitored conditions. The process of gene clustering is at the foundation of genomic studies that aim to analyze the functions of genes because it is assumed that genes that are similar in their expression levels are also relatively similar in terms of biological function.The problem that we address within the dynamic unsupervised classification research direction is the dynamic clustering of gene expression data. In our case, the term dynamic indicates that the data set is not static, but it is subject to change. Still, as opposed to the incremental approaches from the literature, where the data set is enriched with new genes (instances) during the clustering process, our approaches tackle the cases when new features (expression levels for new points in time) are added to the genes already existing in the data set. To our best knowledge, there are no approaches in the literature that deal with the problem of dynamic clustering of gene expression data, defined as above. In this context we introduced three dynamic clustering algorithms which are able to handle new collected gene expression levels, by starting from a previous obtained partition, without the need to re-run the algorithm from scratch. Experimental evaluation shows that our method is faster and more accurate than applying the clustering algorithm from scratch on the feature extended data set
Darty, Kevin. "Évaluation de la qualité des comportements des agents en simulation : application à un simulateur de conduite en environnement virtuel." Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066339.
Full textThis thesis is in the context of the Multi-Agents Simulation and is interested in evaluating the ability of agents to reproduce human behaviors. This problem appears in many domains such as Virtual Reality and Embodied Conversational Agents. The dominant approach to evaluate these behaviors uses Social Sciences questionnaires. There are only few approaches based on Artificial Intelligence and automatic data analysis at the microscopic scale. We show in this thesis that the evaluation of behavior can benefit from both approaches when used jointly. First, we present a method for evaluating the agents' behavior quality. It combines the Artificial Intelligence approach and the Social Science approach. The first one is based on simulation logs clustering. The second one evaluates the users by an annotation of the behaviors. We then present an algorithm that compare agents to humans in order to assess the capacities, the lacks, and the errors in the agent model, and provide metrics. We then make these behaviors explicite based on user categories. Finally, we present a cycle for automatic calibration of the agents and an exploration of the parameter space. Our evaluation method is usable for the analysis of an agent model, and for comparing several agent models. We applied this methodology on several driver behavior studies to analyse the road traffic simulation ARCHISIM, and we present the obtained results
Amadou, Kountché Djibrilla. "Localisation dans les bâtiments des personnes handicapées et classification automatique de données par fourmis artificielles." Thesis, Tours, 2013. http://www.theses.fr/2013TOUR4021/document.
Full textThe concept of « smart » invades more and more our daily life. A typical example is the smartphone, which becames by years an essential device. Soon, it’s the city, the car and the home which will become « smart ». The intelligence is manifested by the ability for the environment to interact and to take decisons in its relationships with users and other environments. This needs information on state changes occurred on both sides. Sensor networks allow to collect these data, to apply on them some pre-processings and to transmit them. Sensor network, towards some of their caracteristics are closed to Swarm Intelligence in the sense that small entities with reduced capababilities can cooperate automatically, in unattended, decentralised and distributed manner in order to accomplish complex tasks. These bio-inspired methods have served as basis for the resolution of many problems, mostly optimization and this insipired us to apply them on problems met in Ambient Assisted Living and on the data clustering problem. AAL is a sub-field of context-aware services, and its goals are to facilitate the everyday life of elderly and disable people. These systems determine the context and then propose different kind of services. We have used two important elements of the context : the position and the disabilty. Although positioning has very good precision outdoor, it faces many challenges in indoor environments due to the electromagnetic wave propagation in harsh conditions, the cost of systems, interoperabilty, etc. Our works have been involved in positioning disabled people in indoor environment by using wireless sensor network for determining the caracteristics of the electromagnetic wave (signal strenght, time, angle) for estimating the position by geometric methods (triangulation, lateration), fingerprinting methods (k-nearest neighbours), baysiens filters (Kalman filter). The application is to offer AAL services like navigation. Therefore we extend the definition of sensor node to take into account any device, in the environment, capable of emiting and receiving a signal. Also, we have studied the possibility of using Pachycondylla Apicalis for data clustering and for indoor localization by casting this last problem as data clustering problem. Finally we have proposed a system based on a middleware architecture
Ait, Saada Mira. "Unsupervised learning from textual data with neural text representations." Electronic Thesis or Diss., Université Paris Cité, 2023. http://www.theses.fr/2023UNIP7122.
Full textThe digital era generates enormous amounts of unstructured data such as images and documents, requiring specific processing methods to extract value from them. Textual data presents an additional challenge as it does not contain numerical values. Word embeddings are techniques that transform text into numerical data, enabling machine learning algorithms to process them. Unsupervised tasks are a major challenge in the industry as they allow value creation from large amounts of data without requiring costly manual labeling. In thesis we explore the use of Transformer models for unsupervised tasks such as clustering, anomaly detection, and data visualization. We also propose methodologies to better exploit multi-layer Transformer models in an unsupervised context to improve the quality and robustness of document clustering while avoiding the choice of which layer to use and the number of classes. Additionally, we investigate more deeply Transformer language models and their application to clustering, examining in particular transfer learning methods that involve fine-tuning pre-trained models on a different task to improve their quality for future tasks. We demonstrate through an empirical study that post-processing methods based on dimensionality reduction are more advantageous than fine-tuning strategies proposed in the literature. Finally, we propose a framework for detecting text anomalies in French adapted to two cases: one where the data concerns a specific topic and the other where the data has multiple sub-topics. In both cases, we obtain superior results to the state of the art with significantly lower computation time
Morbieu, Stanislas. "Leveraging textual embeddings for unsupervised learning." Electronic Thesis or Diss., Université Paris Cité, 2020. http://www.theses.fr/2020UNIP5191.
Full textTextual data is ubiquitous and is a useful information pool for many companies. In particular, the web provides an almost inexhaustible source of textual data that can be used for recommendation systems, business or technological watch, information retrieval, etc. Recent advances in natural language processing have made possible to capture the meaning of words in their context in order to improve automatic translation systems, text summary, or even the classification of documents according to predefined categories. However, the majority of these applications often rely on a significant human intervention to annotate corpora: This annotation consists, for example in the context of supervised classification, in providing algorithms with examples of assigning categories to documents. The algorithm therefore learns to reproduce human judgment in order to apply it for new documents. The object of this thesis is to take advantage of these latest advances which capture the semantic of the text and use it in an unsupervised framework. The contributions of this thesis revolve around three main axes. First, we propose a method to transfer the information captured by a neural network for co-clustering of documents and words. Co-clustering consists in partitioning the two dimensions of a data matrix simultaneously, thus forming both groups of similar documents and groups of coherent words. This facilitates the interpretation of a large corpus of documents since it is possible to characterize groups of documents by groups of words, thus summarizing a large corpus of text. More precisely, we train the Paragraph Vectors algorithm on an augmented dataset by varying the different hyperparameters, classify the documents from the different vector representations and apply a consensus algorithm on the different partitions. A constrained co-clustering of the co-occurrence matrix between terms and documents is then applied to maintain the consensus partitioning. This method is found to result in significantly better quality of document partitioning on various document corpora and provides the advantage of the interpretation offered by the co-clustering. Secondly, we present a method for evaluating co-clustering algorithms by exploiting vector representations of words called word embeddings. Word embeddings are vectors constructed using large volumes of text, one major characteristic of which is that two semantically close words have word embeddings close by a cosine distance. Our method makes it possible to measure the matching between the partition of the documents and the partition of the words, thus offering in a totally unsupervised setting a measure of the quality of the co-clustering. Thirdly, we are interested in recommending classified ads. We present a system that allows to recommend similar classified ads when consulting one. The descriptions of classified ads are often short, syntactically incorrect, and the use of synonyms makes it difficult for traditional systems to accurately measure semantic similarity. In addition, the high renewal rate of classified ads that are still valid (product not sold) implies choices that make it possible to have low computation time. Our method, simple to implement, responds to this use case and is again based on word embeddings. The use of these has advantages but also involves some difficulties: the creation of such vectors requires choosing the values of some parameters, and the difference between the corpus on which the word embeddings were built upstream. and the one on which they are used raises the problem of out-of-vocabulary words, which have no vector representation. To overcome these problems, we present an analysis of the impact of the different parameters on word embeddings as well as a study of the methods allowing to deal with the problem of out-of-vocabulary words
Wang, Kun. "Algorithmes et méthodes pour le diagnostic ex-situ et in-situ de systèmes piles à combustible haute température de type oxyde solide." Phd thesis, Université de Franche-Comté, 2012. http://tel.archives-ouvertes.fr/tel-01017170.
Full textStrubel, Nicolas. "Brake squeal : identification and influence of frictional contact localizations." Electronic Thesis or Diss., Université de Lille (2022-....), 2023. http://www.theses.fr/2023ULILN059.
Full textAs intense acoustic radiations implying consequent environmental nuisances and customer complaints, squeal noises in brake systems are friction-induced vibration issues indubitably depending on multiphysics and multiscales problematics. Among these latter, system structure, braking operational parameters, frictional contact interfaces, coupled to temperature dependency, as well as contact non-linearities or tribological aspects, are elements considerably affecting squeal, making from this unpleasant noise a complex problem to apprehend. In this work, the full scale system is considered, and several principal tendencies are identified regarding the influence of contact localizations on acoustic emissions.NVH tests are conducted, this analysis involves several scales of interest aiming at changing contact characteristics: pads are modified either at the macroscopic scale -with the will of implicitly varying load bearing areas-, or at the mesoscopic one -tending to impact evolution of the tribological circuit-. The inherent purpose is to identify pads parameters influencing squeal, by affecting tribolayer as well as engaging noise signature differences between conducted experiments.Heavily instrumented tests are realized on a full scale brake system, focusing on different pad shapes: the development of an enriched instrumentation through in-operando thermal surface tracking allows to access to supplementary solicitation informations, permitting to follow the assumed load bearing area. The employment of clustering methods is considered to manage the analysis of thermal datas.Experimental / numerical correlated stability simulations are conducted. Subsequent analyses are realized, by investigating pads chamfer characteristic impact on squeal, influence of coefficient of friction, or implementation of global pads wear shapes. Furthermore, thermomechanical simulations are of interest, and the introduction of previously clustered-defined contact areas into models is realized.Although the full brake system consideration can involve severe experimental dispersions, initial correlations between modified pads at different scales -via pad shapes for the macroscopic one, and thermal treatments of friction material focusing on the mesoscopic level- and noise characteristics are observed. Enriched instrumented tests lead to the conclusion that contact localizations can evolve during NVH tests, depending on solicitation variables. A particular link between braking operational parameters (pressure, temperature), contact localizations, and squeal features is established through clustering. Finally, observed simulated tendencies tend to follow experimental ones, and model enrichment via a more accurate contact description could present improvements regarding squeal prediction capability of such simulation
Appert, Gautier. "Information k-means, fragmentation and syntax analysis. A new approach to unsupervised machine learning." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAG011.
Full textInformation k-means is a new mathematical framework that extends the classical k-means criterion, using the Kullback divergence as a distortion measure. The fragmentation criterion is an even broader extension where each signal is approximated by a combination of fragments instead of a single center. Using the fragmentation criterion as a distortion measure, we propose a new fragmentation algorithm for digital signals, conceived as a lossy data compression scheme. Our syntax analysis is based on two principles: factorization and relabeling of frequent patterns. It is an iterative scheme, decreasing at each step as much as possible the length of the representation of the training set. It produces for each signal a syntax tree, providing a multi-level classification of the signal components. We tested the method on grey level digital images, where it was possible to label successfully translated patterns and rotated patterns. This lets us hope that transformation invariant pattern recognition could be approached in a flexible way using a general purpose data compression criterion. From a mathematical point of view, we derived two kinds of generalization bounds. First we defined an implicit estimator based on an implicit statistical model, related to our lossy data compression scheme. We proved a lemma relating the data compression rate and the distortion level of the compression algorithm with the excess risk of the statistical estimator. This explains why our syntax trees may be meaningful. Second, combining PAC-Bayesian lemmas with the kernel trick, we proved non asymptotic dimension-free generalization bounds for the various information k-means and information fragmentation criteria we introduced. For instance, in the special case of the classical k-means criterion, we get a non asymptotic dimension free generalization bound of order O( k log(k) / n )^{1/4}) that gives the best sufficient consistency condition, namely that the excess risk goes to zero when (k log(k) / n) goes to zero. Using a new kind of PAC-Bayesian chaining, we also proved a bound of order O( log(n/k) sqrt{k log(k)/n} )
Navarro, Emmanuel. "Métrologie des graphes de terrain, application à la construction de ressources lexicales et à la recherche d'information." Phd thesis, Institut National Polytechnique de Toulouse - INPT, 2013. http://tel.archives-ouvertes.fr/tel-01020232.
Full textFaucheux, Lilith. "Learning from incomplete biomedical data : guiding the partition toward prognostic information." Electronic Thesis or Diss., Université Paris Cité, 2021. http://www.theses.fr/2021UNIP5242.
Full textThe topic of this thesis is partition learning analyses in the context of incomplete data. Two methodological development are presented, with two medical and biomedical applications. The first methodological development concerns the implementation of unsupervised partition learning in the presence of incomplete data. Two types of incomplete data were considered: missing data and left-censored data (that is, values “lower than some detection threshold"), and handled through multiple imputation (MI) framework. Multivariate imputation by chained equation (MICE) was used to perform tailored imputations for each type of incomplete data. Then, for each imputed dataset, unsupervised learning was performed, with a data-based selected number of clusters. Last, a consensus clustering algorithm was used to pool the partitions, as an alternative to Rubin's rules. The second methodological development concerns the implementation of semisupervised partition learning in an incomplete dataset, to combine data structure and patient survival. This aimed at identifying patient profiles that relate both to differences in the group structure extracted from the data, and in the patients' prognosis. The supervised (prognostic value) and unsupervised (group structure) objectives were combined through Pareto multi-objective optimization. Missing data were handled, as above, through MI, with Rubin's rules used to combine the supervised and unsupervised objectives across the imputations, and the optimal partitions pooled using consensus clustering. Two applications are provided, one on the immunological landscape of the breast tumor microenvironment and another on the COVID-19 infection in the context of a hematological disease
Hadouche, Fadila. "Annotation syntaxico-sémantique des actants en corpus spécialisé." Thèse, 2010. http://hdl.handle.net/1866/5032.
Full textSemantic role annotation is a process that aims to assign labels such as Agent, Patient, Instrument, Location, etc. to actants or circumstants (also called arguments or adjuncts) of predicative lexical units. This process often requires the use of rich lexical resources or corpora in which sentences are annotated manually by linguists. The automatic approaches (statistical or machine learning) are based on corpora. Previous work was performed for the most part in English which has rich resources, such as PropBank, VerbNet and FrameNet. These resources were used to serve the automated annotation systems. This type of annotation in other languages for which no corpora of annotated sentences are available often use FrameNet by projection. Although a resource such as FrameNet is necessary for the automated annotation systems and the manual annotation by linguists of a large number of sentences is a tedious and time consuming work. We have proposed an automated system to help linguists in this task so that they have only to validate annotations proposed. Our work focuses on verbs that are more likely than other predicative units (adjectives and nouns) to be accompanied by actants realized in sentences. These verbs are specialized terms of the computer science and Internet domains (ie. access, configure, browse, download) whose actantial structures have been annotated manually with semantic roles. The actantial structure is based on principles of Explanatory and Combinatory Lexicology, LEC of Mel’čuk and appeal in part (with regard to semantic roles) to the notion of Frame Element as described in the theory of frame semantics (FS) of Fillmore. What these two theories have in common is that they lead to the construction of dictionaries different from those resulting from the traditional theories. These manually annotated verbal units in several contexts constitute the specialized corpus that our work will use. Our system designed to assign automatically semantic roles to actants is based on rules and classifiers trained on more than 2300 contexts. We are limited to a restricted list of roles for certain roles in our corpus have not enough examples manually annotated. In our system, we addressed the roles Patient, Agent and destination that the number of examples is greater than 300. We have created a class that we called Autre which we bring to gether the other roles that the number of annotated examples is less than 100. We subdivided the annotation task in the identification of participant actants and circumstants and the assignment of semantic roles to actants that contribute to the sense of the verbal lexical unit. We parsed, with Syntex, the sentences of the corpus to extract syntactic informations that describe the participants of the verbal lexical unit in the sentence. These informations are used as features in our learning model. We have proposed two techniques for the task of participant detection: the technique based in rules and machine learning. These same techniques are used for the task of classification of these participants into actants and circumstants. We proposed to the task of assigning semantic roles to the actants, a partitioning method (clustering) semi supervised of instances that we have compared to the method of semantic role classification. We used CHAMELEON, an ascending hierarchical algorithm.
Martínez, Vargas Danae Mirel. "Régression de Cox avec partitions latentes issues du modèle de Potts." Thèse, 2019. http://hdl.handle.net/1866/22552.
Full text