Dissertations / Theses on the topic 'Apprentissage statistiques'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Apprentissage statistiques.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Solnon, Matthieu. "Apprentissage statistique multi-tâches." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2013. http://tel.archives-ouvertes.fr/tel-00911498.
Full textVayatis, Nicolas. "Approches statistiques en apprentissage : boosting et ranking." Habilitation à diriger des recherches, Université Pierre et Marie Curie - Paris VI, 2006. http://tel.archives-ouvertes.fr/tel-00120738.
Full textsélectionnant un estimateur au sein d'une classe massive telle que l'enveloppe convexe d'une classe de VC. Dans le premier volet du mémoire, on rappelle les interprétations des algorithmes de boosting comme des implémentations de principes de minimisation
de risques convexes et on étudie leurs propriétés sous cet angle. En particulier, on montre l'importance de la
régularisation pour obtenir des stratégies consistantes. On développe également une nouvelle classe d'algorithmes de type gradient stochastique appelés algorithmes de descente miroir avec moyennisation et on évalue leur comportement à travers des simulations informatiques. Après avoir présenté les principes fondamentaux du boosting, on s'attache dans le
deuxième volet à des questions plus avancées telles que
l'élaboration d'inégalités d'oracle. Ainsi, on étudie la
calibration précise des pénalités en fonction des critères
de coût utilisés. On présente des résultats
non-asymptotiques sur la performance des estimateurs du boosting pénalisés, notamment les vitesses rapides sous les conditions de marge de type Mammen-Tsybakov et on décrit les capacités d'approximation du boosting utilisant les "rampes" (stumps) de décision. Le troisième volet du mémoire explore le problème du ranking. Un enjeu important dans des applications
telles que la fouille de documents ou le "credit scoring" est d'ordonner les instances plutôt que de les catégoriser. On propose une formulation simple de ce problème qui permet d'interpréter le ranking comme une classification sur des paires d'observations. La différence dans ce cas vient du fait que les
critères empiriques sont des U-statistiques et on développe donc la théorie de la classification adaptée à ce contexte. On explore également la question de la généralisation de l'erreur de ranking afin de pouvoir inclure des a priori sur l'ordre des instances, comme dans le cas où on ne s'intéresse qu'aux "meilleures" instances.
Dimeglio, Chloé. "Méthodes d'estimations statistiques et apprentissage pour l'imagerie agricole." Toulouse 3, 2013. http://www.theses.fr/2013TOU30110.
Full textWe have to provide reliable information on the acreage estimate of crop areas. We have time series of indices contained in satellite images, and thus sets of curves. We propose to segment the space in order to reduce the variability of our initial classes of curves. Then, we reduce the data volume and we find a set of meaningful representative functions that characterizes the common behavior of each crop class. This method is close to the extraction of a "structural mean". We compare each unknown curve to a curve of the representative base and we allocate each curve to the class of the nearest representative curve. At the last step we learn the error of estimates on known data and correct the first estimate by calibration
BERNY, ARNAUD. "Apprentissage et optimisation statistiques. Application a la radiotelephonie mobile." Nantes, 2000. http://www.theses.fr/2000NANT2081.
Full textRoche, Mathieu. "Intégration de la construction de la terminologie de domaines spécialisés dans un processus global de fouille de textes." Paris 11, 2004. http://www.theses.fr/2004PA112330.
Full textInformation extraction from specialized texts requires the application of a complete process of text mining. One of the steps of this process is term detection. The terms are defined as groups of words representing a linguistic instance of some user-defined concept. For example, the term "data mining" evokes the concept of “computational technique”. Initially, the task of terminology acquisition consists in extracting groups of words instanciating simple syntactic patterns such as Noun-Noun, Adjective-Noun, etc. One specificity of our algorithm is its iterative mode used to build complex terms. For example, if at the first iteration the Noun-Noun term “data mining” is found, at the following step the term “data-mining application” can be obtained. Moreover, with EXIT (Iterative EXtraction of the Terminology) the expert stands at the center of the terminology extraction process and he can intervene throughout the process. In addition to the iterative aspect of the system, many parameters were added. One of these parameters makes possible the use of various statistical criteria to classify the terms according to their relevance for a task to achieve. Our approach was validated with four corpora of different languages and size, and different fields of specialty. Lastly, a method based on a supervised machine learning approach is proposed in order to improve the quality of the obtained terminology
Loustau, Sébastien. "Performances statistiques de méthodes à noyaux." Phd thesis, Université de Provence - Aix-Marseille I, 2008. http://tel.archives-ouvertes.fr/tel-00343377.
Full textLes méthodes de régularisation ont montrées leurs intérêts pour résoudre des problèmes de classification. L'algorithme des Machines à Vecteurs de Support (SVM) est aujourd'hui le représentant le plus populaire. Dans un premier temps, cette thèse étudie les performances statistiques de cet algorithme, et considère le problème d'adaptation à la marge et à la complexité. On étend ces résultats à une nouvelle procédure de minimisation de risque empirique pénalisée sur les espaces de Besov. Enfin la dernière partie se concentre sur une nouvelle procédure de sélection de modèles : la minimisation de l'enveloppe du risque (RHM). Introduite par L.Cavalier et Y.Golubev dans le cadre des problèmes inverses, on cherche à l'appliquer au contexte de la classification.
Szafranski, Marie. "Pénalités hiérarchiques pour l'ntégration de connaissances dans les modèles statistiques." Phd thesis, Université de Technologie de Compiègne, 2008. http://tel.archives-ouvertes.fr/tel-00369025.
Full textSzafranski, Marie. "Pénalités hiérarchiques pour l'intégration de connaissances dans les modèles statistiques." Compiègne, 2008. http://www.theses.fr/2008COMP1770.
Full textSupervised learning aims at predicting, but also analyzing or interpreting an observed phenomenon. Hierarchical penalization is a generic framework for integrating prior information in the fitting of statistical models. This prior information represents the relations shared by the characteristics of a given studied problem. In this thesis, the characteristics are organized in a two-levels tree structure, which defines distinct groups. The assumption is that few (groups of) characteristics are involved to discriminate between observations. Thus, for a learning problem, the goal is to identify relevant groups of characteristics, and at the same time, the significant characteristics within these groups. An adaptive penalization formulation is used to extract the significant components of each level. We show that the solution to this problem is equivalent to minimize a problem regularized by a mixed norm. These two approaches have been used to study the convexity and sparseness properties of the method. The latter is derived in parametric and non parametric function spaces. Experiences on brain-computer interfaces problems support our approach
Mathieu, Timothée. "M-estimation and Median of Means applied to statistical learning Robust classification via MOM minimization MONK – outlier-robust mean embedding estimation by median-of-means Excess risk bounds in robust empirical risk minimization." Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPASM002.
Full textThe main objective of this thesis is to study methods for robust statistical learning. Traditionally, in statistics we use models or simplifying assumptions that allow us to represent the real world. However, some deviations from the hypotheses can strongly disrupt the statistical analysis of a database. By robust statistics, we mean methods that can handle on the one hand so-called abnormal data (sensor error, human error) but also data of a highly variable nature. We apply robust techniques to statistical learning, giving theoretical efficiency results of the proposed methods as well as illustrations on simulated and real data
Gosselin, Philippe-Henri. "Apprentissage interactif pour la recherche par le contenu dans les bases multimédias." Habilitation à diriger des recherches, Université de Cergy Pontoise, 2011. http://tel.archives-ouvertes.fr/tel-00660316.
Full textColin, Igor. "Adaptation des méthodes d’apprentissage aux U-statistiques." Electronic Thesis or Diss., Paris, ENST, 2016. http://www.theses.fr/2016ENST0070.
Full textWith the increasing availability of large amounts of data, computational complexity has become a keystone of many machine learning algorithms. Stochastic optimization algorithms and distributed/decentralized methods have been widely studied over the last decade and provide increased scalability for optimizing an empirical risk that is separable in the data sample. Yet, in a wide range of statistical learning problems, the risk is accurately estimated by U-statistics, i.e., functionals of the training data with low variance that take the form of averages over d-tuples. We first tackle the problem of sampling for the empirical risk minimization problem. We show that empirical risks can be replaced by drastically computationally simpler Monte-Carlo estimates based on O(n) terms only, usually referred to as incomplete U-statistics, without damaging the learning rate. We establish uniform deviation results and numerical examples show that such approach surpasses more naive subsampling techniques. We then focus on the decentralized estimation topic, where the data sample is distributed over a connected network. We introduce new synchronous and asynchronous randomized gossip algorithms which simultaneously propagate data across the network and maintain local estimates of the U-statistic of interest. We establish convergence rate bounds with explicit data and network dependent terms. Finally, we deal with the decentralized optimization of functions that depend on pairs of observations. Similarly to the estimation case, we introduce a method based on concurrent local updates and data propagation. Our theoretical analysis reveals that the proposed algorithms preserve the convergence rate of centralized dual averaging up to an additive bias term. Our simulations illustrate the practical interest of our approach
Mahler, Nicolas. "Machine learning methods for discrete multi-scale fows : application to finance." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2012. http://tel.archives-ouvertes.fr/tel-00749717.
Full textChamma, Ahmad. "Statistical interpretation of high-dimensional complex prediction models for biomedical data." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG028.
Full textModern large health datasets represent population characteristics in multiple modalities, including brain imaging and socio-demographic data. These large cohorts make it possible to predict and understand individual outcomes, leading to promising results in the epidemiological context of forecasting/predicting the occurrence of diseases, health outcomes, or other events of interest. As data collection expands into different scientific domains, such as brain imaging and genomic analysis, variables are related by complex, possibly non-linear dependencies, along with high degrees of correlation. As a result, popular models such as linear and tree-based techniques are no longer effective in such high-dimensional settings. Powerful non-linear machine learning algorithms, such as Random Forests (RFs) and Deep Neural Networks (DNNs), have become important tools for characterizing inter-individual differences and predicting biomedical outcomes, such as brain age. Explaining the decision process of machine learning algorithms is crucial both to improve the performance of a model and to aid human understanding. This can be achieved by assessing the importance of variables. Traditionally, scientists have favored simple, transparent models such as linear regression, where the importance of variables can be easily measured by coefficients. However, with the use of more advanced methods, direct access to the internal structure has become limited and/or uninterpretable from a human perspective. As a result, these methods are often referred to as "black box" methods. Standard approaches based on Permutation Importance (PI) assess the importance of a variable by measuring the decrease in the loss score when the variable of interest is replaced by its permuted version. While these approaches increase the transparency of black box models and provide statistical validity, they can produce unreliable importance assessments when variables are correlated.The goal of this work is to overcome the limitations of standard permutation importance by integrating conditional schemes. Therefore, we investigate two model-agnostic frameworks, Conditional Permutation Importance (CPI) and Block-Based Conditional Permutation Importance (BCPI), which effectively account for correlations between covariates and overcome the limitations of PI. We present two new algorithms designed to handle situations with correlated variables, whether grouped or ungrouped. Our theoretical and empirical results show that CPI provides computationally efficient and theoretically sound methods for evaluating individual variables. The CPI framework guarantees type-I error control and produces a concise selection of significant variables in large datasets.BCPI presents a strategy for managing both individual and grouped variables. It integrates statistical clustering and uses prior knowledge of grouping to adapt the DNN architecture using stacking techniques. This framework is robust and maintains type-I error control even in scenarios with highly correlated groups of variables. It performs well on various benchmarks. Empirical evaluations of our methods on several biomedical datasets showed good face validity. Our methods have also been applied to multimodal brain data in addition to socio-demographics, paving the way for new discoveries and advances in the targeted areas. The CPI and BCPI frameworks are proposed as replacements for conventional permutation-based methods. They provide improved interpretability and reliability in estimating variable importance for high-performance machine learning models
Mallet, Grégory. "Méthodes statistiques pour la prédiction de température dans les composants hyperfréquences." Phd thesis, INSA de Rouen, 2010. http://tel.archives-ouvertes.fr/tel-00586089.
Full textLouis, Maxime. "Méthodes numériques et statistiques pour l'analyse de trajectoire dans un cadre de géométrie Riemannienne." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS570.
Full textThis PhD proposes new Riemannian geometry tools for the analysis of longitudinal observations of neuro-degenerative subjects. First, we propose a numerical scheme to compute the parallel transport along geodesics. This scheme is efficient as long as the co-metric can be computed efficiently. Then, we tackle the issue of Riemannian manifold learning. We provide some minimal theoretical sanity checks to illustrate that the procedure of Riemannian metric estimation can be relevant. Then, we propose to learn a Riemannian manifold so as to model subject's progressions as geodesics on this manifold. This allows fast inference, extrapolation and classification of the subjects
Löser, Kevin. "Apprentissage non-supervisé de la morphologie des langues à l’aide de modèles bayésiens non-paramétriques." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS203/document.
Full textA crucial issue in statistical natural language processing is the issue of sparsity, namely the fact that in a given learning corpus, most linguistic events have low occurrence frequencies, and that an infinite number of structures allowed by a language will not be observed in the corpus. Neural models have already contributed to solving this issue by inferring continuous word representations. These continuous representations allow to structure the lexicon by inducing semantic or syntactic similarity between words. However, current neural models only partially solve the sparsity issue, due to the fact that they require a vectorial representation for every word in the lexicon, but are unable to infer sensible representations for unseen words. This issue is especially present in morphologically rich languages, where word formation processes yield a proliferation of possible word forms, and little overlap between the lexicon observed during model training, and the lexicon encountered during its use. Today, several languages are used on the Web besides English, and engineering translation systems that can handle morphologies that are very different from western European languages has become a major stake. The goal of this thesis is to develop new statistical models that are able to infer in an unsupervised fashion the word formation processes underlying an observed lexicon, in order to produce morphological analyses of new unseen word forms
Colin, Igor. "Adaptation des méthodes d’apprentissage aux U-statistiques." Thesis, Paris, ENST, 2016. http://www.theses.fr/2016ENST0070/document.
Full textWith the increasing availability of large amounts of data, computational complexity has become a keystone of many machine learning algorithms. Stochastic optimization algorithms and distributed/decentralized methods have been widely studied over the last decade and provide increased scalability for optimizing an empirical risk that is separable in the data sample. Yet, in a wide range of statistical learning problems, the risk is accurately estimated by U-statistics, i.e., functionals of the training data with low variance that take the form of averages over d-tuples. We first tackle the problem of sampling for the empirical risk minimization problem. We show that empirical risks can be replaced by drastically computationally simpler Monte-Carlo estimates based on O(n) terms only, usually referred to as incomplete U-statistics, without damaging the learning rate. We establish uniform deviation results and numerical examples show that such approach surpasses more naive subsampling techniques. We then focus on the decentralized estimation topic, where the data sample is distributed over a connected network. We introduce new synchronous and asynchronous randomized gossip algorithms which simultaneously propagate data across the network and maintain local estimates of the U-statistic of interest. We establish convergence rate bounds with explicit data and network dependent terms. Finally, we deal with the decentralized optimization of functions that depend on pairs of observations. Similarly to the estimation case, we introduce a method based on concurrent local updates and data propagation. Our theoretical analysis reveals that the proposed algorithms preserve the convergence rate of centralized dual averaging up to an additive bias term. Our simulations illustrate the practical interest of our approach
Zwald, Laurent. "Performances statistiques d'algorithmes d'apprentissage : "Kernel projection machine" et analyse en composantes principales à noyau." Paris 11, 2005. https://tel.archives-ouvertes.fr/tel-00012011.
Full textThis thesis takes place within the framework of statistical learning. It brings contributions to the machine learning community using modern statistical techniques based on progress in the study of empirical processes. The first part investigates the statistical properties of Kernel Principal Component Analysis (KPCA). The behavior of the reconstruction error is studied with a non-asymptotique point of view and concentration inequalities of the eigenvalues of the kernel matrix are provided. All these results correspond to fast convergence rates. Non-asymptotic results concerning the eigenspaces of KPCA themselves are also provided. A new algorithm of classification has been designed in the second part: the Kernel Projection Machine (KPM). It is inspired by the Support Vector Machines (SVM). Besides, it highlights that the selection of a vector space by a dimensionality reduction method such as KPCA regularizes suitably. The choice of the vector space involved in the KPM is guided by statistical studies of model selection using the penalized minimization of the empirical loss. This regularization procedure is intimately connected with the finite dimensional projections studied in the statistical work of Birge and Massart. The performances of KPM and SVM are then compared on some data sets. Each topic tackled in this thesis raises new questions
Korba, Anna. "Learning from ranking data : theory and methods." Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLT009.
Full textRanking data, i.e., ordered list of items, naturally appears in a wide variety of situations, especially when the data comes from human activities (ballots in political elections, survey answers, competition results) or in modern applications of data processing (search engines, recommendation systems). The design of machine-learning algorithms, tailored for these data, is thus crucial. However, due to the absence of any vectorial structure of the space of rankings, and its explosive cardinality when the number of items increases, most of the classical methods from statistics and multivariate analysis cannot be applied in a direct manner. Hence, a vast majority of the literature rely on parametric models. In this thesis, we propose a non-parametric theory and methods for ranking data. Our analysis heavily relies on two main tricks. The first one is the extensive use of the Kendall’s tau distance, which decomposes rankings into pairwise comparisons. This enables us to analyze distributions over rankings through their pairwise marginals and through a specific assumption called transitivity, which prevents cycles in the preferences from happening. The second one is the extensive use of embeddings tailored to ranking data, mapping rankings to a vector space. Three different problems, unsupervised and supervised, have been addressed in this context: ranking aggregation, dimensionality reduction and predicting rankings with features.The first part of this thesis focuses on the ranking aggregation problem, where the goal is to summarize a dataset of rankings by a consensus ranking. Among the many ways to state this problem stands out the Kemeny aggregation method, whose solutions have been shown to satisfy many desirable properties, but can be NP-hard to compute. In this work, we have investigated the hardness of this problem in two ways. Firstly, we proposed a method to upper bound the Kendall’s tau distance between any consensus candidate (typically the output of a tractable procedure) and a Kemeny consensus, on any dataset. Then, we have casted the ranking aggregation problem in a rigorous statistical framework, reformulating it in terms of ranking distributions, and assessed the generalization ability of empirical Kemeny consensus.The second part of this thesis is dedicated to machine learning problems which are shown to be closely related to ranking aggregation. The first one is dimensionality reduction for ranking data, for which we propose a mass-transportation approach to approximate any distribution on rankings by a distribution exhibiting a specific type of sparsity. The second one is the problem of predicting rankings with features, for which we investigated several methods. Our first proposal is to adapt piecewise constant methods to this problem, partitioning the feature space into regions and locally assigning as final label (a consensus ranking) to each region. Our second proposal is a structured prediction approach, relying on embedding maps for ranking data enjoying theoretical and computational advantages
Cornec, Matthieu. "Inégalités probabilistes pour l'estimateur de validation croisée dans le cadre de l'apprentissage statistique et Modèles statistiques appliqués à l'économie et à la finance." Phd thesis, Université de Nanterre - Paris X, 2009. http://tel.archives-ouvertes.fr/tel-00530876.
Full textAllain, Guillaume. "Prévision et analyse du trafic routier par des méthodes statistiques." Toulouse 3, 2008. http://thesesups.ups-tlse.fr/351/.
Full textThe industrial partner of this work is Mediamobile/V-trafic, a company which processes and broadcasts live road-traffic information. The goal of our work is to enhance traffic information with forecasting and spatial extending. Our approach is sometimes inspired by physical modelling of traffic dynamic, but it mainly uses statistical methods in order to propose self-organising and modular models suitable for industrial constraints. In the first part of this work, we describe a method to forecast trafic speed within a time frame of a few minutes up to several hours. Our method is based on the assumption that traffic on the a road network can be summarized by a few typical profiles. Those profiles are linked to the users' periodical behaviors. We therefore make the assumption that observed speed curves on each point of the network are stemming from a probabilistic mixture model. The following parts of our work will present how we can refine the general method. Medium term forecasting uses variables built from the calendar. The mixture model still stands. Additionnaly we use a fonctionnal regression model to forecast speed curves. We then introduces a local regression model in order to stimulate short-term trafic dynamics. The kernel function is built from real speed observations and we integrate some knowledge about traffic dynamics. The last part of our work focuses on the analysis of speed data from in traffic vehicles. These observations are gathered sporadically in time and on the road segment. The resulting data is completed and smoothed by local polynomial regression
Wang, Xuanzhou. "Détermination de classes de modalités de dégradation significatives pour le pronostic et la maintenance." Thesis, Troyes, 2013. http://www.theses.fr/2013TROY0022/document.
Full textThe work presented in this thesis deals with the problem of determination of classes of systems according to their aging mode in the aim of preventing a failure and making a decision of maintenance. The evolution of the observed deterioration levels of a system can be modeled by a parameterized stochastic process. A commonly used model is the Gamma process. We are interested in the case where all the systems do not age identically and the aging mode depends on the condition of usage of systems or system properties, called the set of covariates. Then, we aims to group the systems that age similarly by taking into account the covariate and to identify the parameters of the model associated with each class.At first, the problem is presented especially with the definition of constraints: time increments of irregular observations, any number of observations per path which describes an evolution, consideration of the covariate. Then the methods are proposed. They combine a likelihood criterion in the space of the increments of deterioration levels, and a coherence criterion in the space of the covariate. A normalization technique is introduced to control the importance of each of these two criteria. Experimental studies are performed to illustrate the effectiveness of the proposed methods
Chiapino, Maël. "Apprentissage de structures dans les valeurs extrêmes en grande dimension." Electronic Thesis or Diss., Paris, ENST, 2018. http://www.theses.fr/2018ENST0035.
Full textWe present and study unsupervised learning methods of multivariate extreme phenomena in high-dimension. Considering a random vector on which each marginal is heavy-tailed, the study of its behavior in extreme regions is no longer possible via usual methods that involve finite means and variances. Multivariate extreme value theory provides an adapted framework to this study. In particular it gives theoretical basis to dimension reduction through the angular measure. The thesis is divided in two main part: - Reduce the dimension by finding a simplified dependence structure in extreme regions. This step aim at recover subgroups of features that are likely to exceed large thresholds simultaneously. - Model the angular measure with a mixture distribution that follows a predefined dependence structure. These steps allow to develop new clustering methods for extreme points in high dimension
Guillouet, Brendan. "Apprentissage statistique : application au trafic routier à partir de données structurées et aux données massives." Thesis, Toulouse 3, 2016. http://www.theses.fr/2016TOU30205/document.
Full textThis thesis focuses on machine learning techniques for application to big data. We first consider trajectories defined as sequences of geolocalized data. A hierarchical clustering is then applied on a new distance between trajectories (Symmetrized Segment-Path Distance) producing groups of trajectories which are then modeled with Gaussian mixture in order to describe individual movements. This modeling can be used in a generic way in order to resolve the following problems for road traffic : final destination, trip time or next location predictions. These examples show that our model can be applied to different traffic environments and that, once learned, can be applied to trajectories whose spatial and temporal characteristics are different. We also produce comparisons between different technologies which enable the application of machine learning methods on massive volumes of data
Azzi, Soumaya. "Surrogate modeling of stochastic simulators." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT009.
Full textThis thesis is a contribution to the surrogate modeling and the sensitivity analysis on stochastic simulators. Stochastic simulators are a particular type of computational models, they inherently contain some sources of randomness and are generally computationally prohibitive. To overcome this limitation, this manuscript proposes a method to build a surrogate model for stochastic simulators based on Karhunen-Loève expansion. This thesis also aims to perform sensitivity analysis on such computational models. This analysis consists on quantifying the influence of the input variables onto the output of the model. In this thesis, the stochastic simulator is represented by a stochastic process, and the sensitivity analysis is then performed on the differential entropy of this process.The proposed methods are applied to a stochastic simulator assessing the population’s exposure to radio frequency waves in a city. Randomness is an intrinsic characteristic of the stochastic city generator. Meaning that, for a set of city parameters (e.g. street width, building height and anisotropy) does not define a unique city. The context of the electromagnetic dosimetry case study is presented, and a surrogate model is built. The sensitivity analysis is then performed using the proposed method
Korba, Anna. "Learning from ranking data : theory and methods." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLT009/document.
Full textRanking data, i.e., ordered list of items, naturally appears in a wide variety of situations, especially when the data comes from human activities (ballots in political elections, survey answers, competition results) or in modern applications of data processing (search engines, recommendation systems). The design of machine-learning algorithms, tailored for these data, is thus crucial. However, due to the absence of any vectorial structure of the space of rankings, and its explosive cardinality when the number of items increases, most of the classical methods from statistics and multivariate analysis cannot be applied in a direct manner. Hence, a vast majority of the literature rely on parametric models. In this thesis, we propose a non-parametric theory and methods for ranking data. Our analysis heavily relies on two main tricks. The first one is the extensive use of the Kendall’s tau distance, which decomposes rankings into pairwise comparisons. This enables us to analyze distributions over rankings through their pairwise marginals and through a specific assumption called transitivity, which prevents cycles in the preferences from happening. The second one is the extensive use of embeddings tailored to ranking data, mapping rankings to a vector space. Three different problems, unsupervised and supervised, have been addressed in this context: ranking aggregation, dimensionality reduction and predicting rankings with features.The first part of this thesis focuses on the ranking aggregation problem, where the goal is to summarize a dataset of rankings by a consensus ranking. Among the many ways to state this problem stands out the Kemeny aggregation method, whose solutions have been shown to satisfy many desirable properties, but can be NP-hard to compute. In this work, we have investigated the hardness of this problem in two ways. Firstly, we proposed a method to upper bound the Kendall’s tau distance between any consensus candidate (typically the output of a tractable procedure) and a Kemeny consensus, on any dataset. Then, we have casted the ranking aggregation problem in a rigorous statistical framework, reformulating it in terms of ranking distributions, and assessed the generalization ability of empirical Kemeny consensus.The second part of this thesis is dedicated to machine learning problems which are shown to be closely related to ranking aggregation. The first one is dimensionality reduction for ranking data, for which we propose a mass-transportation approach to approximate any distribution on rankings by a distribution exhibiting a specific type of sparsity. The second one is the problem of predicting rankings with features, for which we investigated several methods. Our first proposal is to adapt piecewise constant methods to this problem, partitioning the feature space into regions and locally assigning as final label (a consensus ranking) to each region. Our second proposal is a structured prediction approach, relying on embedding maps for ranking data enjoying theoretical and computational advantages
Schreuder, Nicolas. "A study of some trade-offs in statistical learning : online learning, generative models and fairness." Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAG004.
Full textMachine learning algorithms are celebrated for their impressive performance on many tasksthat we thought were dedicated to human minds, from handwritten digits recognition (LeCunet al. 1990) to cancer prognosis (Kourou et al. 2015). Nevertheless, as machine learning becomes more and more ubiquitous in our daily lives, there is a growing need for precisely understanding their behaviours and their limits.Statistical learning theory is the branch of machine learning which aims at providing a powerful modelling formalism for inference problems as well as a better understanding of the statistical properties of learning algorithms.Importantly, statistical learning theory allows one to (i) get a better understanding of the cases in which an algorithm performs well (ii) quantify trade-offs inherent to learning for better-informed algorithmic choices (iii) provide insights to develop new algorithms which will eventually outperform existing ones or tackle new tasks. Relying on the statistical learning framework, this thesis presents contributions related to three different learning problems: online learning, learning generative models and, finally, fair learning.In the online learning setup -- in which the sample size is not known in advance -- we provide general anytime deviation bounds (or confidence intervals) whose width has the rate given in the Law of Iterated Logarithm for a general class of convex M-estimators -- comprising the mean, the median, quantiles, Huber’s M-estimators.Regarding generative models, we propose a convenient framework for studying adversarial generative models (Goodfellow et al. 2014) from a statistical perspective to assess the impact of (eventual) low intrinsic dimensionality of the data on the error of the generative model. In our framework, we establish non-asymptotic risk bounds for the Empirical Risk Minimizer (ERM).Finally, our work on fair learning consists in a broad study of the Demographic Parity (DP) constraint, a popular constraint in the fair learning literature. DP essentially constrains predictors to treat groups defined by a sensitive attribute (e.g., gender or ethnicity) to be “treated the same”. In particular, we propose a statistical minimax framework to precisely quantify the cost in risk of introducing this constraint in the regression setting
Debèse, Nathalie. "Recalage de la navigation par apprentissage sur les données bathymètriques." Compiègne, 1992. http://www.theses.fr/1992COMPD538.
Full textJouvet, Denis. "Reconnaissance de mots connectes indépendamment du locuteur par des méthodes statistiques." Paris, ENST, 1988. http://www.theses.fr/1988ENST0006.
Full textLefort, Tanguy. "Label ambiguity in crowdsourcing for classification and expert feedback." Electronic Thesis or Diss., Université de Montpellier (2022-....), 2024. http://www.theses.fr/2024UMONS020.
Full textWhile classification datasets are composed of more and more data, the need for human expertise to label them is still present. Crowdsourcing platforms are a way to gather expert feedback at a low cost. However, the quality of these labels is not always guaranteed. In this thesis, we focus on the problem of label ambiguity in crowdsourcing. Label ambiguity has mostly two sources: the worker's ability and the task's difficulty. We first present a new indicator, the mathrm{WAUM} (Weighted Area Under the Magin), to detect ambiguous tasks given to workers. Based on the existing mathrm{AUM} in the classical supervised setting, this lets us explore large datasets while focusing on tasks that might require more relevant expertise or should be discarded from the actual dataset. We then present a new open-source texttt{python} library, PeerAnnot, that we developed to handle crowdsourced datasets in image classification. We created a benchmark in the Benchopt library to evaluate our label aggregation strategies for more reproducible results. Finally, we present a case study on the Pl@ntNet dataset, where we evaluate the current state of the platform's label aggregation strategy and propose ways to improve it. This setting with a large number of tasks, experts and classes is highly challenging for current crowdsourcing aggregation strategies. We report consistently better performance against competitors and propose a new aggregation strategy that could be used in the future to improve the quality of the Pl@ntNet dataset. We also release this large dataset of expert feedback that could be used to improve the quality of the current aggregation methods and provide a new benchmark
Chesneau, Nicolas. "Learning to Recognize Actions with Weak Supervision." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAM007/document.
Full textWith the rapid growth of digital video content, automaticvideo understanding has become an increasingly important task. Video understanding spansseveral applications such as web-video content analysis, autonomous vehicles, human-machine interfaces (eg, Kinect). This thesismakes contributions addressing two major problems in video understanding:webly-supervised action detection and human action localization.Webly-supervised action recognition aims to learn actions from video content on the internet, with no additional supervision. We propose a novel approach in this context, which leverages thesynergy between visual video data and the associated textual metadata, to learnevent classifiers with no manual annotations. Specifically, we first collect avideo dataset with queries constructed automatically from textual descriptionof events, prune irrelevant videos with text and video data, and then learn thecorresponding event classifiers. We show the importance of both the main steps of our method, ie,query generation and data pruning, with quantitative results. We evaluate this approach in the challengingsetting where no manually annotated training set is available, i.e., EK0 in theTrecVid challenge, and show state-of-the-art results on MED 2011 and 2013datasets.In the second part of the thesis, we focus on human action localization, which involves recognizing actions that occur in a video, such as ``drinking'' or ``phoning'', as well as their spatial andtemporal extent. We propose a new person-centric framework for action localization that trackspeople in videos and extracts full-body human tubes, i.e., spatio-temporalregions localizing actions, even in the case of occlusions or truncations.The motivation is two-fold. First, it allows us to handle occlusions and camera viewpoint changes when localizing people, as it infers full-body localization. Second, it provides a better reference grid for extracting action information than standard human tubes, ie, tubes which frame visible parts only.This is achieved by training a novel human part detector that scores visibleparts while regressing full-body bounding boxes, even when they lie outside the frame. The core of our method is aconvolutional neural network which learns part proposals specific to certainbody parts. These are then combined to detect people robustly in each frame.Our tracking algorithm connects the image detections temporally to extractfull-body human tubes. We evaluate our new tube extraction method on a recentchallenging dataset, DALY, showing state-of-the-art results
Lacombe, Théo. "Statistiques sur les descripteurs topologiques à base de transport optimal." Thesis, Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAX036.
Full textTopological data analysis (TDA) allows one to extract rich information from structured data (such as graphs or time series) that occurs in modern machine learning problems. This information will be represented as descriptors such as persistence diagrams, which can be described as point measures supported on a half-plane. While persistence diagrams are not elements of a vector space, they can still be compared using partial matching metrics. The similarities between these metrics and those routinely used in optimal transport—another field of mathematics—are known for long, but a formal connection between these two fields is yet to come.The purpose of this thesis is to clarify this connection and develop new theoretical and computational tools to manipulate persistence diagrams, targeting statistical applications. First, we show how optimal partial transport with boundary, a variation of classic optimal transport theory, provides a formalism that encompasses standard metrics in TDA. We then show-case the benefits of this connection in different situations: a theoretical study and the development of an algorithm to perform fast estimation of barycenters of persistence diagrams, the characterization of continuous linear representations of persistence diagrams and how to learn such representations using a neural network, and eventually a stability result in the context of linearly averaging random persistence diagrams
Yahiaoui, Meriem. "Modèles statistiques avancés pour la segmentation non supervisée des images dégradées de l'iris." Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLL006.
Full textIris is considered as one of the most robust and efficient modalities in biometrics because of its low error rates. These performances were observed in controlled situations, which impose constraints during the acquisition in order to have good quality images. The renouncement of these constraints, at least partially, implies degradations in the quality of the acquired images and it is therefore a degradation of these systems’ performances. One of the main proposed solutions in the literature to take into account these limits is to propose a robust approach for iris segmentation. The main objective of this thesis is to propose original methods for the segmentation of degraded images of the iris. Markov chains have been well solicited to solve image segmentation problems. In this context, a feasibility study of unsupervised segmentation into regions of degraded iris images by Markov chains was performed. Different image transformations and different segmentation methods for parameters initialization have been studied and compared. Optimal modeling has been inserted in iris recognition system (with grayscale images) to produce a comparison with the existing methods. Finally, an extension of the modeling based on the hidden Markov chains has been developed in order to realize an unsupervised segmentation of the iris images acquired in visible light
Boulfani, Fériel. "Caractérisation du comportement de systèmes électriques aéronautiques à partir d'analyses statistiques." Thesis, Toulouse 1, 2021. http://publications.ut-capitole.fr/43780/.
Full textThe characterization of electrical systems is an essential task in aeronautic conception. It consists in particular of sizing the electrical components, defining maintenance frequency and finding the root cause of aircraft failures. Nowadays, the computations are made using electrical engineering theory and simulated physical models. The aim of this thesis is to use statistical approaches based on flight data and machine learning models to characterize the behavior of aeronautic electrical systems. In the first part, we estimate the maximal electrical consumption that the generator should deliver to optimize the generator size and to better understand its real margin. Using the extreme value theory we estimate quantiles that we compare to the theoretical values computed by the electrical engineers. In the second part, we compare different regularized procedures to predict the oil temperature of a generator in a functional data framework. In particular, this study makes it possible to understand the generator behavior under extreme conditions that could not be reproduced physically. Finally, in the last part, we develop a predictive maintenance model that detects the abnormal behavior of a generator to anticipate failures. This model is based on variants of "Invariant Coordinate Selection" adapted to functional data
Barreyre, Clementine. "Statistiques en grande dimension pour la détection d'anomalies dans les données fonctionnelles issues des satellites." Thesis, Toulouse, INSA, 2018. http://www.theses.fr/2018ISAT0009/document.
Full textIn this PhD, we have developed statistical methods to detect abnormal events in all the functional data produced by the satellite all through its lifecycle. The data we are dealing with come from two main phases in the satellite’s life, telemetries and test data. A first work on this thesis was to understand how to highlight the outliers thanks to projections onto functional bases. On these projections, we have also applied several outlier detection methods, such as the One-Class SVM, the Local Outlier Factor (LOF). In addition to these two methods, we have developed our own outlier detection method, by taking into account the seasonality of the data we consider. Based on this study, we have developed an original procedure to select automatically the most interesting coefficients in a semi-supervised framework for the outlier detection, from a given projection. Our method is a multiple testing procedure where we apply the two sample-test to all the levels of coefficients.We have also chosen to analyze the covariance matrices representing the covariance of the te- lemetries between themselves for the outlier detection in multivariate data. In this purpose, we are comparing the covariance of a cluster of several telemetries deriving from two consecutive days, or consecutive orbit periods. We have applied three statistical tests targeting this same issue with different approaches. We have also developed an original asymptotic test, inspired by both first tests. In addition to the proof of the convergence of this test, we demonstrate thanks to examples that this new test is the most powerful. In this PhD, we have tackled several aspects of the anomaly detection in the functional data deriving from satellites. For each of these methods, we have detected all the major anomalies, improving significantly the false discovery rate
Sfikas, Giorgos. "Modèles statistiques non linéaires pour l'analyse de formes : application à l'imagerie cérébrale." Phd thesis, Université de Strasbourg, 2012. http://tel.archives-ouvertes.fr/tel-00789793.
Full textHarrison, Josquin. "Imagerie médicale, formes et statistiques pour la prédiction du risque d'accident vasculaire cérébral dans le cadre de la fibrillation atriale." Electronic Thesis or Diss., Université Côte d'Azur, 2024. http://www.theses.fr/2024COAZ4027.
Full textAtrial Fibrillation (AF) is a complex heart disease of epidemic proportions. It is characterized by chaotic electrical activation which creates a haemodynamic environment prone to clot formation and an increase in risk of ischemic strokes. Although treatments and interventions exist to reduce stroke incidence, they often imply an increase in risk of other complications or consist in invasive procedures. As so, attempts of stratifying stroke risk in AF is of crucial importance for clinical decision-making. However, current widely used risk scores only rely on basic patient information and show poor performance. Importantly, no known markers reflect the mechanistic process of stroke, all the while more and more patient data is routinely available. In parallel, many clinical experts have hypothesized that the Left Atrium (LA) has an important role in stroke occurrence, yet have only relied on subjective measures to verify it. In this study, we aim at taking advantage of the evolving patient imaging stratification to substantiate this claim. Linking the anatomy of the LA to the risk of stroke can directly be translated into a geometric problem. Thankfully, the study and analysis of shapes knows a long-standing mathematical history, in theory as well as application, of which we can take full advantage. We first walk through the many facets of shape analysis, to realise that, while powerful, global methods lack clinically meaningful interpretations. We then set out to use these general tools to build a compact representation specific to the LA, enabling a more interpretable study. This first attempt allows us to identify key facts for a realistic solution to the study of the LA. Amongst them, any tool we build must be fast and robust enough for potentially large and prospective studies. Since the computational crux of our initial pipeline lies in the semantic segmentation of the anatomical parts of the LA, we focus on the use of neural networks specifically designed for surfaces to accelerate this problem. In particular, we show that representing input shapes using principal curvature is a better choice than what is currently used, regardless the architecture. As we iteratively update our pipeline, we further the use of the semantic segmentation and the compact representation by proposing a set of expressive geometric features describing the LA which are well in line with clinicians expectations yet offering the possibility for robust quantitative analysis. We make use of these local features and shed light on the complex relations between LA shape and stroke incidence, by conducting statistical analysis and classification using decision tree based methods. Results yield valuable insights for stroke prediction: a list of shape features directly linked to stroke patients; features that explain important indicators of haemodynamic disorder; and a better understanding of the impact of AF state related LA remodelling. Finally, we discuss other possible use of the set of tools developed in this work, from larger cohorts study, to the integration into multi-modal models, as well as opening possibilities for precise sensitivity analysis of haemodynamic simulation, a valuable next step to better understand the mechanistic process of stroke
Cherief-Abdellatif, Badr-Eddine. "Contributions to the theoretical study of variational inference and robustness." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAG001.
Full textThis PhD thesis deals with variational inference and robustness. More precisely, it focuses on the statistical properties of variational approximations and the design of efficient algorithms for computing them in an online fashion, and investigates Maximum Mean Discrepancy based estimators as learning rules that are robust to model misspecification.In recent years, variational inference has been extensively studied from the computational viewpoint, but only little attention has been put in the literature towards theoretical properties of variational approximations until very recently. In this thesis, we investigate the consistency of variational approximations in various statistical models and the conditions that ensure the consistency of variational approximations. In particular, we tackle the special case of mixture models and deep neural networks. We also justify in theory the use of the ELBO maximization strategy, a model selection criterion that is widely used in the Variational Bayes community and is known to work well in practice.Moreover, Bayesian inference provides an attractive online-learning framework to analyze sequential data, and offers generalization guarantees which hold even under model mismatch and with adversaries. Unfortunately, exact Bayesian inference is rarely feasible in practice and approximation methods are usually employed, but do such methods preserve the generalization properties of Bayesian inference? In this thesis, we show that this is indeed the case for some variational inference algorithms. We propose new online, tempered variational algorithms and derive their generalization bounds. Our theoretical result relies on the convexity of the variational objective, but we argue that our result should hold more generally and present empirical evidence in support of this. Our work presents theoretical justifications in favor of online algorithms that rely on approximate Bayesian methods. Another point that is addressed in this thesis is the design of a universal estimation procedure. This question is of major interest, in particular because it leads to robust estimators, a very hot topic in statistics and machine learning. We tackle the problem of universal estimation using a minimum distance estimator based on the Maximum Mean Discrepancy. We show that the estimator is robust to both dependence and to the presence of outliers in the dataset. We also highlight the connections that may exist with minimum distance estimators using L2-distance. Finally, we provide a theoretical study of the stochastic gradient descent algorithm used to compute the estimator, and we support our findings with numerical simulations. We also propose a Bayesian version of our estimator, that we study from both a theoretical and a computational points of view
Yahiaoui, Meriem. "Modèles statistiques avancés pour la segmentation non supervisée des images dégradées de l'iris." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLL006/document.
Full textIris is considered as one of the most robust and efficient modalities in biometrics because of its low error rates. These performances were observed in controlled situations, which impose constraints during the acquisition in order to have good quality images. The renouncement of these constraints, at least partially, implies degradations in the quality of the acquired images and it is therefore a degradation of these systems’ performances. One of the main proposed solutions in the literature to take into account these limits is to propose a robust approach for iris segmentation. The main objective of this thesis is to propose original methods for the segmentation of degraded images of the iris. Markov chains have been well solicited to solve image segmentation problems. In this context, a feasibility study of unsupervised segmentation into regions of degraded iris images by Markov chains was performed. Different image transformations and different segmentation methods for parameters initialization have been studied and compared. Optimal modeling has been inserted in iris recognition system (with grayscale images) to produce a comparison with the existing methods. Finally, an extension of the modeling based on the hidden Markov chains has been developed in order to realize an unsupervised segmentation of the iris images acquired in visible light
Depersin, Jules. "Statistical and Computational Complexities of Robust and High-Dimensional Estimation Problems." Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAG009.
Full textStatistical learning theory aims at providing a better understanding of the statistical properties of learning algorithms. These properties are often derived assuming the underlying data are gathered by sampling independent and identically distributed gaussian (or subgaussian) random variables. These properties can thus be drastically affected by the presence of gross errors (also called "outliers") in the data, and by data being heavy-tailed. We are interested in procedures that have good properties even when part of the data is corrupted and heavy-tailed, procedures that we call extit{robusts}, that we often get in this thesis by using the Median-Of-Mean heuristic.We are especially interested in procedures that are robust in high-dimensional set-ups, and we study (i) how dimensionality affects the statistical properties of robust procedures, and (ii) how dimensionality affects the computational complexity of the associated algorithms. In the study of the statistical properties (i), we find that for a large range of problems, the statistical complexity of the problems and its "robustness" can be in a sense "decoupled", leading to bounds where the dimension-dependent term is added to the term that depends on the corruption, rather than multiplied by it. We propose ways of measuring the statistical complexities of some problems in that corrupted framework, using for instance VC-dimension. We also provide lower bounds for some of those problems.In the study of computational complexity of the associated algorithm (ii), we show that in two special cases, namely robust mean-estimation with respect to the euclidean norm and robust regression, one can relax the associated optimization problems that becomes exponentially hard with the dimension to get tractable algorithm that behaves polynomially in the dimension
Raja, Suleiman Raja Fazliza. "Méthodes de detection robustes avec apprentissage de dictionnaires. Applications à des données hyperspectrales." Thesis, Nice, 2014. http://www.theses.fr/2014NICE4121/document.
Full textThis Ph.D dissertation deals with a "one among many" detection problem, where one has to discriminate between pure noise under H0 and one among L known alternatives under H1. This work focuses on the study and implementation of robust reduced dimension detection tests using optimized dictionaries. These detection methods are associated with the Generalized Likelihood Ratio test. The proposed approaches are principally assessed on hyperspectral data. In the first part, several technical topics associated to the framework of this dissertation are presented. The second part highlights the theoretical and algorithmic aspects of the proposed methods. Two issues linked to the large number of alternatives arise in this framework. In this context, we propose dictionary learning techniques based on a robust criterion that seeks to minimize the maximum power loss (type minimax). In the case where the learned dictionary has K = 1 column, we show that the exact solution can be obtained. Then, we propose in the case K > 1 three minimax learning algorithms. Finally, the third part of this manuscript presents several applications. The principal application regards astrophysical hyperspectral data of the Multi Unit Spectroscopic Explorer instrument. Numerical results show that the proposed algorithms are robust and in the case K > 1 they allow to increase the minimax detection performances over the K = 1 case. Other possible applications such as worst-case recognition of faces and handwritten digits are presented
Zwald, Laurent. "PERFORMANCES STATISTIQUES D'ALGORITHMES D'APPRENTISSAGE : ``KERNEL PROJECTION MACHINE'' ET ANALYSE EN COMPOSANTES PRINCIPALES A NOYAU." Phd thesis, Université Paris Sud - Paris XI, 2005. http://tel.archives-ouvertes.fr/tel-00012011.
Full textdes contributions à la communauté du machine learning en utilisant des
techniques de statistiques modernes basées sur des avancées dans l'étude
des processus empiriques. Dans une première partie, les propriétés statistiques de
l'analyse en composantes principales à noyau (KPCA) sont explorées. Le
comportement de l'erreur de reconstruction est étudié avec un point de vue
non-asymptotique et des inégalités de concentration des valeurs propres de la matrice de
Gram sont données. Tous ces résultats impliquent des vitesses de
convergence rapides. Des propriétés
non-asymptotiques concernant les espaces propres de la KPCA eux-mêmes sont également
proposées. Dans une deuxième partie, un nouvel
algorithme de classification a été
conçu : la Kernel Projection Machine (KPM).
Tout en s'inspirant des Support Vector Machines (SVM), il met en lumière que la sélection d'un espace vectoriel par une méthode de
réduction de la dimension telle que la KPCA régularise
convenablement. Le choix de l'espace vectoriel utilisé par la KPM est guidé par des études statistiques de sélection de modéle par minimisation pénalisée de la perte empirique. Ce
principe de régularisation est étroitement relié à la projection fini-dimensionnelle étudiée dans les travaux statistiques de
Birgé et Massart. Les performances de la KPM et de la SVM sont ensuite comparées sur différents jeux de données. Chaque thème abordé dans cette thèse soulève de nouvelles questions d'ordre théorique et pratique.
Salomon, Antoine. "Apprentissage stratégique statistique." Paris 13, 2010. http://www.theses.fr/2010PA132039.
Full textThis thesis studies strategic interaction between several agents who are facing an exploration vs. Exploitation dilemma. In game theory, this situation is well described by models of bandit games. Each player faces a two-arm bandit machine, one arm being safe, the other being risky. At each stage of the game, each player has to decide which arm he uses. If he chooses the risky arm (exploration), he gets a random payoff which gives him partial information on the rentability of his machine. If he chooses the safe arm, he gets a known payoff, but possibly less than what he could have got from exploration. The rentability of the machine depends on an unknown state of the nature, which can be learnt from exploration. Learning is a strategic issue: for instance a player could benefit from others' information without taking risks himself. We study Nash equilibria of such games. We mainly wonder if equilibria are efficient: does a player gain significanlty more from strategic interaction than he would alone? Is there some kind of cooperation that helps getting more information? Do players manage to have a good knowledge of the state of the nature? This depends on what agents are able to see from each other (actions and/or payoffs), and also on how the types of the machines are correlated. We will also study the way equilibria are evolving when the number of players get large. In particular, we wonder if this increase leads to better pieces of information, and better gains
Guinot, Florent. "Statistical learning for omics association and interaction studies based on blockwise feature compression." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLE029/document.
Full textSince the last decade, the rapid advances in genotyping technologies have changed the way genes involved in mendelian disorders and complex diseases are mapped, moving from candidate genes approaches to linkage disequilibrium mapping. In this context, Genome-Wide Associations Studies (GWAS) aim at identifying genetic markers implied in the expression of complex disease and occuring at different frequencies between unrelated samples of affected individuals and unaffected controls. These studies exploit the fact that it is easier to establish, from the general population, large cohorts of affected individuals sharing a genetic risk factor for a complex disease than within individual families, as is the case with traditional linkage analysis.From a statistical point of view, the standard approach in GWAS is based on hypothesis testing, with affected individuals being tested against healthy individuals at one or more markers. However, classical testing schemes are subject to false positives, that is markers that are falsely identified as significant. One way around this problem is to apply a correction on the p-values obtained from the tests, increasing in return the risk of missing true associations that have only a small effect on the phenotype, which is usually the case in GWAS.Although GWAS have been successful in the identification of genetic variants associated with complex multifactorial diseases (Crohn's disease, diabetes I and II, coronary artery disease,…) only a small proportion of the phenotypic variations expected from classical family studies have been explained .This missing heritability may have multiple causes amongst the following: strong correlations between genetic variants, population structure, epistasis (gene by gene interactions), disease associated with rare variants,…The main objectives of this thesis are thus to develop new methodologies that can face part of the limitations mentioned above. More specifically we developed two new approaches: the first one is a block-wise approach for GWAS analysis which leverages the correlation structure among the genomic variants to reduce the number of statistical hypotheses to be tested, while in the second we focus on the detection of interactions between groups of metagenomic and genetic markers to better understand the complex relationship between environment and genome in the expression of a given phenotype
Chiapino, Maël. "Apprentissage de structures dans les valeurs extrêmes en grande dimension." Thesis, Paris, ENST, 2018. http://www.theses.fr/2018ENST0035/document.
Full textWe present and study unsupervised learning methods of multivariate extreme phenomena in high-dimension. Considering a random vector on which each marginal is heavy-tailed, the study of its behavior in extreme regions is no longer possible via usual methods that involve finite means and variances. Multivariate extreme value theory provides an adapted framework to this study. In particular it gives theoretical basis to dimension reduction through the angular measure. The thesis is divided in two main part: - Reduce the dimension by finding a simplified dependence structure in extreme regions. This step aim at recover subgroups of features that are likely to exceed large thresholds simultaneously. - Model the angular measure with a mixture distribution that follows a predefined dependence structure. These steps allow to develop new clustering methods for extreme points in high dimension
Maza, Elie. "Prévision de trafic routier par des méthodes statistiques : espérance structurelle d’une fonction aléatoire." Toulouse 3, 2004. http://www.theses.fr/2004TOU30238.
Full textIn the first part of this thesis, we describe a travel time forecasting method on the Parisian motorway network. This method is based on a mixture model. Parameters are estimated by an automatic classification method and a training concept. The second part is devoted to the study of a semi-parametric curve translation model. Estimates are carried out by an M-estimation method. We show the consistency and the asymptotic normality of the estimators. In the third part, we widen the function warping model by considering that the warping functions result from a random process. That enables us to define, in an intrinsic way, a concept of structural expectation and thus to get round the non identifiability of the model. We propose an empirical estimator of this structural expectation and we show consistency and asymptotic normality
Smith, Isabelle. "Les comportements de jeu et l'illusion de contrôle chez des universitaires avec et sans maîtrise des statistiques et des probabilités." Doctoral thesis, Université Laval, 2019. http://hdl.handle.net/20.500.11794/35235.
Full textAfter 30 years of research, it has been shown empirically that cognitive distortions act as fundamental factors underlying gambling and gambling problems. They are explained mainly by a misunderstanding of the notions of chance, statistics and probabilities (SP) and by an illusion of control over the outcome of the game. That is why prevention and treatment programs of gambling problems have been developed around the teaching of these mathematical concepts and correction of cognitive distortions. Despite a common use of these intervention techniques with problem gamblers, studies of gambling attitudes and behaviors have not all concluded that having or acquiring SP knowledge decreases gambling habits. The first study of this thesis thus sought to compare the gambling behavior of 45 university students and graduates demonstrating a reasonable mastery of SP to those of 29 people who do not demonstrate knowledge in this field of mathematics. The results show that the participation rate of the individuals surveyed is high, but that they gamble at a minimum frequency and that they invest little money, whether or not they have SP knowledge. In addition, they experience few gambling problems. The moderate contribution of SP knowledge on gambling behaviors of an already highly educated and low-gambling university population is discussed, as is the repetition of this absence of effect in the literature. These results have led to further our understanding of how individuals with high levels of education are also engaged in gambling activities, although we can expect a better understanding of the issues related to gambling and, as a result, to a greater precaution. That these people are tempted by gambling is surprising and brings its lot of questions. Their level of education is superior, but their gambling behaviors do not demonstrate it, which gives the impression that some of their characteristics could lead them to overestimate their ability to control the outcome of the games, rather than other types of erroneous beliefs. However, this hypothesis is neglected in the literature. From the data originally collected, the second study examines the relationship between the illusion of control over gambling and different cognitive and personality variables among 142 university students and graduates. First, it aims to draw a portrait of their beliefs related to gambling (illusion of control, gambler’s fallacy and superstitions) and other elements that can lead to an illusion of control, which are, the degree of optimism, the internality of their locus of control, whether or not they have particular SP knowledge, and their degree of confidence in their understanding of gambling. Finally, in a multiple regression model, this study tests potential predictors of the illusion of control related to gambling within this sample. The results agree on an association between higher SP knowledge, fewer misconceptions related to superstition, and a higher degree of optimism. A strong negative association also exists between illusion of control related to gambling and the degree of confidence about those gambling beliefs. Among these participants, the illusion of control over gambling can be predicted by a weaker SP knowledge, lower confidence in beliefs and being male. The function of doubt about gambling beliefs in educated individuals is examined in terms of potential metacognitive protective factor. The thesis concludes with a discussion about the implication of these results for the understanding of gambling in a context of cognitive switching in order to adapt prevention strategies. Finally, the strengths and limitations of the thesis are listed, and we make recommendations for variables and samples to be studied in the future.
Cottet, Vincent R. "Theoretical study of some statistical procedures applied to complex data." Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLG002.
Full textThe main part of this thesis aims at studying the theoretical and algorithmic aspects of three distinct statistical procedures. The first problem is the binary matrix completion. We propose an estimator based on a variational approximation of a pseudo-Bayesian estimator. We use a different loss function of the ones used in the literature. We are able to compute non asymptotic risk bounds. It is much faster to compute the estimator than a MCMC method and we show on examples that it is efficient in practice. In a second part we study the theoretical properties of the regularized empirical risk minimizer for Lipschitz loss functions. We are therefore able to apply it on the logistic regression with the SLOPE regularization and on the matrix completion as well. The third chapter develops an Expectation-Propagation approximation when the likelihood is not explicit. We then use an ABC approximation in a second stage. This procedure may be applied to many models and is more precise and faster than the classic ABC approximation. It is used in a spatial extremes model
Pinault, Florian. "Apprentissage par renforcement pour la généralisation des approches automatiques dans la conception des systèmes de dialogue oral." Phd thesis, Université d'Avignon, 2011. http://tel.archives-ouvertes.fr/tel-00933937.
Full textCarriere, Mathieu. "On Metric and Statistical Properties of Topological Descriptors for geometric Data." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS433/document.
Full textIn the context of supervised Machine Learning, finding alternate representations, or descriptors, for data is of primary interest since it can greatly enhance the performance of algorithms. Among them, topological descriptors focus on and encode the topological information contained in geometric data. One advantage of using these descriptors is that they enjoy many good and desireable properties, due to their topological nature. For instance, they are invariant to continuous deformations of data. However, the main drawback of these descriptors is that they often lack the structure and operations required by most Machine Learning algorithms, such as a means or scalar products. In this thesis, we study the metric and statistical properties of the most common topological descriptors, the persistence diagrams and the Mappers. In particular, we show that the Mapper, which is empirically instable, can be stabilized with an appropriate metric, that we use later on to conpute confidence regions and automatic tuning of its parameters. Concerning persistence diagrams, we show that scalar products can be defined with kernel methods by defining two kernels, or embeddings, into finite and infinite dimensional Hilbert spaces