Dissertations / Theses on the topic 'Kn plus proches voisins'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 41 dissertations / theses for your research on the topic 'Kn plus proches voisins.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Labrador, Boris. "Sur l'estimation fonctionnelle par le temps d'occupation." Paris 6, 2008. http://www.theses.fr/2008PA066462.
Full textLallich, Stéphane. "La méthode des plus proches voisins : de la dispersion spatiale à l'analyse multidimensionnelle." Saint-Etienne, 1989. http://www.theses.fr/1989STET4006.
Full textCzesnalowicz, Eric. "Applications de l'estimateur non paramétrique des K plus proches voisins en classification automatique multidimensionnelle." Lille 1, 1992. http://www.theses.fr/1992LIL10137.
Full textGan, Changquan. "Une approche de classification non supervisée basée sur la notion des K plus proches voisins." Compiègne, 1994. http://www.theses.fr/1994COMP765S.
Full textQamar, Ali Mustafa. "Mesures de similarité et cosinus généralisé : une approche d'apprentissage supervisé fondée sur les k plus proches voisins." Phd thesis, Université de Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00591988.
Full textDelaunay, Eric. "Conception et réalisation d'un accééerateur matériel pour la reconnaissance de formes basée sur l'algorithme des plus proches voisins." Paris 11, 1997. http://www.theses.fr/1997PA112396.
Full textTaïleb, Mounira. "NOHIS-tree nouvelle méthode de recherche de plus proches voisins : application à la recherche d'images par le contenu." Paris 11, 2008. http://www.theses.fr/2008PA112164.
Full textThe increasing of image databases requires the use of a content-based image retrieval system (CBIR). A such system consist first to describe automatically the images, visual properties of each image are represented as multidimensional vectors called descriptors. Next, finding similar images to the query image is achieved by searching for the nearest neighbors of each descriptor of the query image. In this thesis, we propose a new method for indexing multidimensional bases with the search algorithm of nearest neighbors adapted. The originality of our multidimensional index is the disposition of the bounding forms avoiding overlapping. Indeed, the overlapping is one of the main drawbacks that slow the search of nearest neighbors search. Our index with its search algorithm speeds the nearest neighbors search while doing an exact search. Our method has been integrated and tested within a real content-based image system. The results of tests carried out show the robustness of our method in terms of accuracy and speed in search time
Berrani, Sid-Ahmed. "Recherche approximative de plus proches voisins avec contrôle probabiliste de la précision ; application à la recherche d'images par le contenu." Phd thesis, Université Rennes 1, 2004. http://tel.archives-ouvertes.fr/tel-00532854.
Full textDo, Cao Tri. "Apprentissage de métrique temporelle multi-modale et multi-échelle pour la classification robuste de séries temporelles par plus proches voisins." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM028/document.
Full textThe definition of a metric between time series is inherent to several data analysis and mining tasks, including clustering, classification or forecasting. Time series data present naturally several characteristics, called modalities, covering their amplitude, behavior or frequential spectrum, that may be expressed with varying delays and at different temporal granularity and localization - exhibited globally or locally. Combining several modalities at multiple temporal scales to learn a holistic metric is a key challenge for many real temporal data applications. This PhD proposes a Multi-modal and Multi-scale Temporal Metric Learning (M2TML) approach for robust time series nearest neighbors classification. The solution is based on the embedding of pairs of time series into a pairwise dissimilarity space, in which a large margin optimization process is performed to learn the metric. The M2TML solution is proposed for both linear and non linear contexts, and is studied for different regularizers. A sparse and interpretable variant of the solution shows the ability of the learned temporal metric to localize accurately discriminative modalities as well as their temporal scales.A wide range of 30 public and challenging datasets, encompassing images, traces and ECG data, that are linearly or non linearly separable, are used to show the efficiency and the potential of M2TML for time series nearest neighbors classification
Servien, Rémi. "Estimation de régularité locale." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2010. http://tel.archives-ouvertes.fr/tel-00730491.
Full textKouahla, Zineddine. "Indexation dans les espaces métriques : index arborescent et parallélisation." Phd thesis, Université de Nantes, 2013. http://tel.archives-ouvertes.fr/tel-00912743.
Full textTuleau, Christine. "Sélection de variables pour la discrimination en grande dimension et classification de données fonctionnelles." Paris 11, 2005. https://tel.archives-ouvertes.fr/tel-00012008.
Full textThis thesis deals with nonparametric statistics and is related to classification and discrimination in high dimension, and more particularly on variable selection. A first part is devoted to variable selection through cart, both the regression and binary classification frameworks. The proposed exhaustive procedure is based on model selection which leads to “oracle” inequalities and allows to perform variable selection by penalized empirical contrast. A second part is motivated by an industrial problem. It consists of determining among the temporal signals, measured during experiments, those able to explain the subjective drivability, and then to define the ranges responsible for this relevance. The adopted methodology is articulated around the preprocessing of the signals, dimensionality reduction by compression using a common wavelet basis and selection of useful variables involving cart and a strategy step by step. A last part deals with functional data classification with k-nearest neighbors. The procedure consists of applying k-nearest neighbors on the coordinates of the projections of the data on a suitable chosen finite dimesional space. The procedure involves selecting simultaneously the space dimension and the number of neighbors. The traditional version of k-nearest neighbors and a slightly penalized version are theoretically considered. A study on real and simulated data shows that the introduction of a small penalty term stabilizes the selection while preserving good performance
Bereau, Martine. "Contribution de la théorie des sous-ensembles flous à la règle de discrimination des K plus proches voisins en mode partiellement supervisé." Grenoble 2 : ANRT, 1986. http://catalogue.bnf.fr/ark:/12148/cb37595968c.
Full textBéreau, Martine. "Contribution de la théorie des sous-ensembles flous à la règle de discrimination des K plus proches voisins en mode partiellement supervisé." Compiègne, 1986. http://www.theses.fr/1986COMPD032.
Full textAuclair, Adrien. "Méthodes rapides pour la recherche des plus proches voisins SIFT : application à la recherche d'images et contributions à la reconstruction 3D multi-vues." Paris 5, 2009. http://www.theses.fr/2009PA05S012.
Full textIn the first part of this thesis, we are concerned by the nearest neighbour problem, applied on local image descriptors. We restricted ourselves to the SIFT descriptors because of its efficiency. The application of this work is the retrieval of similar images in large databases. First, we compare performances of linear search, on CPU and on GPU (graphic processors), and also when using partial distances. Then, we propose new hash functions t solve the approximate nearest neighbours problem. The hash functions we propose are based on a selection of a few distinctive dimensions per point. For the application of near duplicate retrieval, our algorithm is more efficient than state-of-the-art algorithms. Tested on a database containing 500. 000 images, it finds similar images in less than 300ms. Eventually, we show that it fits very simply within a Bag-Of-Features approach, and it retrieves mor images than kmeans based vocabularies. In a second part, we propose several results on the problem of multi-view 3D reconstruction. We first propose a robust method to obtain the 3D reconstruction of a car from a video sequence. Our system uses the hypothesis that the car is in linear translation in order to fit a point cloud with polynomial surfaces. Then, we propose an algorithm, not dedicated to cars, that uses SIFT descriptors to obtain the 3D surface from images of an object. The descriptors correspondences are searched between input images and virtual images of the temporary object. With this method, the reconstructed surface converges to the true surface object
Debreuve, Eric. "Mesures de similarité statistiques et estimateurs par k plus proches voisins : une association pour gérer des descripteurs de haute dimension en traitement d'images et de vidéos." Habilitation à diriger des recherches, Université de Nice Sophia-Antipolis, 2009. http://tel.archives-ouvertes.fr/tel-00457710.
Full textLefèvre, Fabrice. "Estimation de probabilité non-paramétrique pour la reconnaissance markovienne de la parole." Paris 6, 2000. http://www.theses.fr/2000PA066281.
Full textTuleau, Christine. "SELECTION DE VARIABLES POUR LA DISCRIMINATION EN GRANDE DIMENSION ET CLASSIFICATION DE DONNEES FONCTIONNELLES." Phd thesis, Université Paris Sud - Paris XI, 2005. http://tel.archives-ouvertes.fr/tel-00012008.
Full textTrad, Riadh. "Découverte d'évènements par contenu visuel dans les médias sociaux." Thesis, Paris, ENST, 2013. http://www.theses.fr/2013ENST0030/document.
Full textThe ease of publishing content on social media sites brings to the Web an ever increasing amount of user generated content captured during, and associated with, real life events. Social media documents shared by users often reflect their personal experience of the event. Hence, an event can be seen as a set of personal and local views, recorded by different users. These event records are likely to exhibit similar facets of the event but also specific aspects. By linking different records of the same event occurrence we can enable rich search and browsing of social media events content. Specifically, linking all the occurrences of the same event would provide a general overview of the event. In this dissertation we present a content-based approach for leveraging the wealth of social media documents available on the Web for event identification and characterization. To match event occurrences in social media, we develop a new visual-based method for retrieving events in huge photocollections, typically in the context of User Generated Content. The main contributions of the thesis are the following : (1) a new visual-based method for retrieving events in photo collections, (2) a scalable and distributed framework for Nearest Neighbors Graph construction for high dimensional data, (3) a collaborative content-based filtering technique for selecting relevant social media documents for a given event
Kanj, Sawsan. "Méthodes d'apprentissage pour la classification multi label." Thesis, Compiègne, 2013. http://www.theses.fr/2013COMP2076.
Full textMulti-label classification is an extension of traditional single-label classification, where classes are not mutually exclusive, and each example can be assigned by several classes simultaneously . It is encountered in various modern applications such as scene classification and video annotation. the main objective of this thesis is the development of new techniques to adress the problem of multi-label classification that achieves promising classification performance. the first part of this manuscript studies the problem of multi-label classification in the context of the theory of belief functions. We propose a multi-label learning method that is able to take into account relationships between labels ant to classify new instances using the formalism of representation of uncertainty for set-valued variables. The second part deals withe the problem of prototype selection in the framework of multi-label learning. We propose an editing algorithm based on the k-nearest neighbor rule in order to purify training dataset and improve the performances of multi-label classification algorithms. Experimental results on synthetic and real-world datasets show the effectiveness of our approaches
Olivares, Javier. "Scaling out-of-core k-nearest neighbors computation on single machines." Thesis, Rennes 1, 2016. http://www.theses.fr/2016REN1S073/document.
Full textThe K-Nearest Neighbors (KNN) is an efficient method to find similar data among a large set of it. Over the years, a huge number of applications have used KNN's capabilities to discover similarities within the data generated in diverse areas such as business, medicine, music, and computer science. Despite years of research have brought several approaches of this algorithm, its implementation still remains a challenge, particularly today where the data is growing at unthinkable rates. In this context, running KNN on large datasets brings two major issues: huge memory footprints and very long runtimes. Because of these high costs in terms of computational resources and time, KNN state-of the-art works do not consider the fact that data can change over time, assuming always that the data remains static throughout the computation, which unfortunately does not conform to reality at all. In this thesis, we address these challenges in our contributions. Firstly, we propose an out-of-core approach to compute KNN on large datasets, using a commodity single PC. We advocate this approach as an inexpensive way to scale the KNN computation compared to the high cost of a distributed algorithm, both in terms of computational resources as well as coding, debugging and deployment effort. Secondly, we propose a multithreading out-of-core approach to face the challenges of computing KNN on data that changes rapidly and continuously over time. After a thorough evaluation, we observe that our main contributions address the challenges of computing the KNN on large datasets, leveraging the restricted resources of a single machine, decreasing runtimes compared to that of the baselines, and scaling the computation both on static and dynamic datasets
Marih, Mohamed. "Mise en œuvre de l'approximation diffuse et de la méthode des éléments diffus pour la résolution des problèmes de mécanique." Compiègne, 1994. http://www.theses.fr/1994COMPD767.
Full textDakkak, Mustapha. "Géo-localisation en environnement fermé des terminaux mobiles." Phd thesis, Université Paris-Est, 2012. http://tel.archives-ouvertes.fr/tel-00794586.
Full textAlves, do Valle Junior Eduardo. "Local-Descriptor Matching for Image Identification Systems." Cergy-Pontoise, 2008. http://biblioweb.u-cergy.fr/theses/08CERG0351.pdf.
Full textImage identification (or copy detection) consists in retrieving the original from which a query image possibly derives, as well as any related metadata, such as titles, authors, copyright information, etc. The task is challenging because of the variety of transformations that the original image may have suffered. Image identification systems based on local descriptors have shown excellent efficacy, but often suffer from efficiency issues, since hundreds, even thousands of descriptors, have to be matched in order to find a single image. The objective of our work is to provide fast methods for descriptor matching, by creating efficient ways to perform the k-nearest neighbours search in high-dimensional spaces. In this way, we can gain the advantages from the use of local descriptors, while minimising the efficiency issues. We propose three new methods for the k-nearest neighbours search: the 3-way trees — an improvement over the KD-trees using redundant, overlapping nodes; the projection KD-forests — a technique which uses multiple moderate dimensional KD-trees; and the multicurves, which is based on multiple moderate dimensional Hilbert space-filling curves. Those techniques try to reduce the amount of random access to the data, in order to be well adapted to the implementation in secondary memory
Jain, Himalaya. "Learning compact representations for large scale image search." Thesis, Rennes 1, 2018. http://www.theses.fr/2018REN1S027/document.
Full textThis thesis addresses the problem of large-scale image search. To tackle image search at large scale, it is required to encode images with compact representations which can be efficiently employed to compare images meaningfully. Obtaining such compact representation can be done either by compressing effective high dimensional representations or by learning compact representations in an end-to-end manner. The work in this thesis explores and advances in both of these directions. In our first contribution, we extend structured vector quantization approaches such as Product Quantization by proposing a weighted codeword sum representation. We test and verify the benefits of our approach for approximate nearest neighbor search on local and global image features which is an important way to approach large scale image search. Learning compact representation for image search recently got a lot of attention with various deep hashing based approaches being proposed. In such approaches, deep convolutional neural networks are learned to encode images into compact binary codes. In this thesis we propose a deep supervised learning approach for structured binary representation which is a reminiscent of structured vector quantization approaches such as PQ. Our approach benefits from asymmetric search over deep hashing approaches and gives a clear improvement for search accuracy at the same bit-rate. Inverted index is another important part of large scale search system apart from the compact representation. To this end, we extend our ideas for supervised compact representation learning for building inverted indexes. In this work we approach inverted indexing with supervised deep learning and make an attempt to unify the learning of inverted index and compact representation. We thoroughly evaluate all the proposed methods on various publicly available datasets. Our methods either outperform, or are competitive with the state-of-the-art
Zhu, Jie. "Entropic measures of connectivity with an application to intracerebral epileptic signals." Thesis, Rennes 1, 2016. http://www.theses.fr/2016REN1S006/document.
Full textThe work presented in this thesis deals with brain connectivity, including structural connectivity, functional connectivity and effective connectivity. These three types of connectivities are obviously linked, and their joint analysis can give us a better understanding on how brain structures and functions constrain each other. Our research particularly focuses on effective connectivity that defines connectivity graphs with information on causal links that may be direct or indirect, unidirectional or bidirectional. The main purpose of our work is to identify interactions between different brain areas from intracerebral recordings during the generation and propagation of seizure onsets, a major issue in the pre-surgical phase of epilepsy surgery treatment. Exploring effective connectivity generally follows two kinds of approaches, model-based techniques and data-driven ones. In this work, we address the question of improving the estimation of information-theoretic quantities, mainly mutual information and transfer entropy, based on k-Nearest Neighbors techniques. The proposed approaches we developed are first evaluated and compared with existing estimators on simulated signals including white noise processes, linear and nonlinear vectorial autoregressive processes, as well as realistic physiology-based models. Some of them are then applied on intracerebral electroencephalographic signals recorded on an epileptic patient, and compared with the well-known directed transfer function. The experimental results show that the proposed techniques improve the estimation of information-theoretic quantities for simulated signals, while the analysis is more difficult in real situations. Globally, the different estimators appear coherent and in accordance with the ground truth given by the clinical experts, the directed transfer function leading to interesting performance
Morvan, Anne. "Contributions to unsupervised learning from massive high-dimensional data streams : structuring, hashing and clustering." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLED033/document.
Full textThis thesis focuses on how to perform efficiently unsupervised machine learning such as the fundamentally linked nearest neighbor search and clustering task, under time and space constraints for high-dimensional datasets. First, a new theoretical framework reduces the space cost and increases the rate of flow of data-independent Cross-polytope LSH for the approximative nearest neighbor search with almost no loss of accuracy.Second, a novel streaming data-dependent method is designed to learn compact binary codes from high-dimensional data points in only one pass. Besides some theoretical guarantees, the quality of the obtained embeddings are accessed on the approximate nearest neighbors search task.Finally, a space-efficient parameter-free clustering algorithm is conceived, based on the recovery of an approximate Minimum Spanning Tree of the sketched data dissimilarity graph on which suitable cuts are performed
Laloë, Thomas. "Sur quelques problèmes d'apprentissage supervisé et non supervisé." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2009. http://tel.archives-ouvertes.fr/tel-00455528.
Full textChafik, Sanaa. "Machine learning techniques for content-based information retrieval." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLL008/document.
Full textThe amount of media data is growing at high speed with the fast growth of Internet and media resources. Performing an efficient similarity (nearest neighbor) search in such a large collection of data is a very challenging problem that the scientific community has been attempting to tackle. One of the most promising solutions to this fundamental problem is Content-Based Media Retrieval (CBMR) systems. The latter are search systems that perform the retrieval task in large media databases based on the content of the data. CBMR systems consist essentially of three major units, a Data Representation unit for feature representation learning, a Multidimensional Indexing unit for structuring the resulting feature space, and a Nearest Neighbor Search unit to perform efficient search. Media data (i.e. image, text, audio, video, etc.) can be represented by meaningful numeric information (i.e. multidimensional vector), called Feature Description, describing the overall content of the input data. The task of the second unit is to structure the resulting feature descriptor space into an index structure, where the third unit, effective nearest neighbor search, is performed.In this work, we address the problem of nearest neighbor search by proposing three Content-Based Media Retrieval approaches. Our three approaches are unsupervised, and thus can adapt to both labeled and unlabeled real-world datasets. They are based on a hashing indexing scheme to perform effective high dimensional nearest neighbor search. Unlike most recent existing hashing approaches, which favor indexing in Hamming space, our proposed methods provide index structures adapted to a real-space mapping. Although Hamming-based hashing methods achieve good accuracy-speed tradeoff, their accuracy drops owing to information loss during the binarization process. By contrast, real-space hashing approaches provide a more accurate approximation in the mapped real-space as they avoid the hard binary approximations.Our proposed approaches can be classified into shallow and deep approaches. In the former category, we propose two shallow hashing-based approaches namely, "Symmetries of the Cube Locality Sensitive Hashing" (SC-LSH) and "Cluster-based Data Oriented Hashing" (CDOH), based respectively on randomized-hashing and shallow learning-to-hash schemes. The SC-LSH method provides a solution to the space storage problem faced by most randomized-based hashing approaches. It consists of a semi-random scheme reducing partially the randomness effect of randomized hashing approaches, and thus the memory storage problem, while maintaining their efficiency in structuring heterogeneous spaces. The CDOH approach proposes to eliminate the randomness effect by combining machine learning techniques with the hashing concept. The CDOH outperforms the randomized hashing approaches in terms of computation time, memory space and search accuracy.The third approach is a deep learning-based hashing scheme, named "Unsupervised Deep Neuron-per-Neuron Hashing" (UDN2H). The UDN2H approach proposes to index individually the output of each neuron of the top layer of a deep unsupervised model, namely a Deep Autoencoder, with the aim of capturing the high level individual structure of each neuron output.Our three approaches, SC-LSH, CDOH and UDN2H, were proposed sequentially as the thesis was progressing, with an increasing level of complexity in terms of the developed models, and in terms of the effectiveness and the performances obtained on large real-world datasets
Vincent, Garcia. "Suivi d'objets d'intérêt dans une séquence d'images : des points saillants aux mesures statistiques." Phd thesis, Université de Nice Sophia-Antipolis, 2008. http://tel.archives-ouvertes.fr/tel-00374657.
Full textLa première méthode repose sur l'analyse de trajectoires temporelles de points saillants et réalise un suivi de régions d'intérêt. Des points saillants (typiquement des lieux de forte courbure des lignes isointensité) sont détectés dans toutes les images de la séquence. Les trajectoires sont construites en liant les points des images successives dont les voisinages sont cohérents. Notre contribution réside premièrement dans l'analyse des trajectoires sur un groupe d'images, ce qui améliore la qualité d'estimation du mouvement. De plus, nous utilisons une pondération spatio-temporelle pour chaque trajectoire qui permet d'ajouter une contrainte temporelle sur le mouvement tout en prenant en compte les déformations géométriques locales de l'objet ignorées par un modèle de mouvement global.
La seconde méthode réalise une segmentation spatio-temporelle. Elle repose sur l'estimation du mouvement du contour de l'objet en s'appuyant sur l'information contenue dans une couronne qui s'étend de part et d'autre de ce contour. Cette couronne nous renseigne sur le contraste entre le fond et l'objet dans un contexte local. C'est là notre première contribution. De plus, la mise en correspondance par une mesure de similarité statistique, à savoir l'entropie du résiduel, d'une portion de la couronne et d'une zone de l'image suivante dans la séquence permet d'améliorer le suivi tout en facilitant le choix de la taille optimale de la couronne.
Enfin, nous proposons une implémentation rapide d'une méthode de suivi de régions d'intérêt existante. Cette méthode repose sur l'utilisation d'une mesure de similarité statistique : la divergence de Kullback-Leibler. Cette divergence peut être estimée dans un espace de haute dimension à l'aide de multiples calculs de distances au k-ème plus proche voisin dans cet espace. Ces calculs étant très coûteux, nous proposons une implémentation parallèle sur GPU (grâce à l'interface logiciel CUDA de NVIDIA) de la recherche exhaustive des k plus proches voisins. Nous montrons que cette implémentation permet d'accélérer le suivi des objets, jusqu'à un facteur 15 par rapport à une implémentation de cette recherche nécessitant au préalable une structuration des données.
Viallon, Vivian. "Processus empiriques, estimation non paramétrique et données censurées." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2006. http://tel.archives-ouvertes.fr/tel-00119260.
Full textMittal, Nupur. "Data, learning and privacy in recommendation systems." Thesis, Rennes 1, 2016. http://www.theses.fr/2016REN1S084/document.
Full textRecommendation systems have gained tremendous popularity, both in academia and industry. They have evolved into many different varieties depending mostly on the techniques and ideas used in their implementation. This categorization also marks the boundary of their application domain. Regardless of the types of recommendation systems, they are complex and multi-disciplinary in nature, involving subjects like information retrieval, data cleansing and preprocessing, data mining etc. In our work, we identify three different challenges (among many possible) involved in the process of making recommendations and provide their solutions. We elaborate the challenges involved in obtaining user-demographic data, and processing it, to render it useful for making recommendations. The focus here is to make use of Online Social Networks to access publicly available user data, to help the recommendation systems. Using user-demographic data for the purpose of improving the personalized recommendations, has many other advantages, like dealing with the famous cold-start problem. It is also one of the founding pillars of hybrid recommendation systems. With the help of this work, we underline the importance of user’s publicly available information like tweets, posts, votes etc. to infer more private details about her. As the second challenge, we aim at improving the learning process of recommendation systems. Our goal is to provide a k-nearest neighbor method that deals with very large amount of datasets, surpassing billions of users. We propose a generic, fast and scalable k-NN graph construction algorithm that improves significantly the performance as compared to the state-of-the art approaches. Our idea is based on leveraging the bipartite nature of the underlying dataset, and use a preprocessing phase to reduce the number of similarity computations in later iterations. As a result, we gain a speed-up of 14 compared to other significant approaches from literature. Finally, we also consider the issue of privacy. Instead of directly viewing it under trivial recommendation systems, we analyze it on Online Social Networks. First, we reason how OSNs can be seen as a form of recommendation systems and how information dissemination is similar to broadcasting opinion/reviews in trivial recommendation systems. Following this parallelism, we identify privacy threat in information diffusion in OSNs and provide a privacy preserving algorithm for the same. Our algorithm Riposte quantifies the privacy in terms of differential privacy and with the help of experimental datasets, we demonstrate how Riposte maintains the desirable information diffusion properties of a network
Jiao, Lianmeng. "Classification of uncertain data in the framework of belief functions : nearest-neighbor-based and rule-based approaches." Thesis, Compiègne, 2015. http://www.theses.fr/2015COMP2222/document.
Full textIn many classification problems, data are inherently uncertain. The available training data might be imprecise, incomplete, even unreliable. Besides, partial expert knowledge characterizing the classification problem may also be available. These different types of uncertainty bring great challenges to classifier design. The theory of belief functions provides a well-founded and elegant framework to represent and combine a large variety of uncertain information. In this thesis, we use this theory to address the uncertain data classification problems based on two popular approaches, i.e., the k-nearest neighbor rule (kNN) andrule-based classification systems. For the kNN rule, one concern is that the imprecise training data in class over lapping regions may greatly affect its performance. An evidential editing version of the kNNrule was developed based on the theory of belief functions in order to well model the imprecise information for those samples in over lapping regions. Another consideration is that, sometimes, only an incomplete training data set is available, in which case the ideal behaviors of the kNN rule degrade dramatically. Motivated by this problem, we designedan evidential fusion scheme for combining a group of pairwise kNN classifiers developed based on locally learned pairwise distance metrics.For rule-based classification systems, in order to improving their performance in complex applications, we extended the traditional fuzzy rule-based classification system in the framework of belief functions and develop a belief rule-based classification system to address uncertain information in complex classification problems. Further, considering that in some applications, apart from training data collected by sensors, partial expert knowledge can also be available, a hybrid belief rule-based classification system was developed to make use of these two types of information jointly for classification
Peltier, Marie-Agnès. "Un système adaptatif de diagnostic d'évolution basé sur la reconnaissance des formes floues : application au diagnostic du comportement d'un conducteur automobile." Compiègne, 1993. http://www.theses.fr/1993COMPD634.
Full textZepeda, Salvatierra Joaquin. "Nouvelles méthodes de représentations parcimonieuses ; application à la compression et l'indexation d'images." Phd thesis, Université Rennes 1, 2010. http://tel.archives-ouvertes.fr/tel-00567851.
Full textDe, Moliner Anne. "Estimation robuste de courbes de consommmation électrique moyennes par sondage pour de petits domaines en présence de valeurs manquantes." Thesis, Bourgogne Franche-Comté, 2017. http://www.theses.fr/2017UBFCK021/document.
Full textIn this thesis, we address the problem of robust estimation of mean or total electricity consumption curves by sampling in a finite population for the entire population and for small areas. We are also interested in estimating mean curves by sampling in presence of partially missing trajectories.Indeed, many studies carried out in the French electricity company EDF, for marketing or power grid management purposes, are based on the analysis of mean or total electricity consumption curves at a fine time scale, for different groups of clients sharing some common characteristics.Because of privacy issues and financial costs, it is not possible to measure the electricity consumption curve of each customer so these mean curves are estimated using samples. In this thesis, we extend the work of Lardin (2012) on mean curve estimation by sampling by focusing on specific aspects of this problem such as robustness to influential units, small area estimation and estimation in presence of partially or totally unobserved curves.In order to build robust estimators of mean curves we adapt the unified approach to robust estimation in finite population proposed by Beaumont et al (2013) to the context of functional data. To that purpose we propose three approaches : application of the usual method for real variables on discretised curves, projection on Functional Spherical Principal Components or on a Wavelets basis and thirdly functional truncation of conditional biases based on the notion of depth.These methods are tested and compared to each other on real datasets and Mean Squared Error estimators are also proposed.Secondly we address the problem of small area estimation for functional means or totals. We introduce three methods: unit level linear mixed model applied on the scores of functional principal components analysis or on wavelets coefficients, functional regression and aggregation of individual curves predictions by functional regression trees or functional random forests. Robust versions of these estimators are then proposed by following the approach to robust estimation based on conditional biais presented before.Finally, we suggest four estimators of mean curves by sampling in presence of partially or totally unobserved trajectories. The first estimator is a reweighting estimator where the weights are determined using a temporal non parametric kernel smoothing adapted to the context of finite population and missing data and the other ones rely on imputation of missing data. Missing parts of the curves are determined either by using the smoothing estimator presented before, or by nearest neighbours imputation adapted to functional data or by a variant of linear interpolation which takes into account the mean trajectory of the entire sample. Variance approximations are proposed for each method and all the estimators are compared to each other on real datasets for various missing data scenarios
Pham, The Anh. "Détection robuste de jonctions et points d'intérêt dans les images et indexation rapide de caractéristiques dans un espace de grande dimension." Thesis, Tours, 2013. http://www.theses.fr/2013TOUR4023/document.
Full textLocal features are of central importance to deal with many different problems in image analysis and understanding including image registration, object detection and recognition, image retrieval, etc. Over the years, many local detectors have been presented to detect such features. Such a local detector usually works well for some particular applications but not all. Taking an application of image retrieval in large database as an example, an efficient method for detecting binary features should be preferred to other real-valued feature detection methods. The reason is easily seen: it is expected to have a reasonable precision of retrieval results but the time response must be as fast as possible. Generally, local features are used in combination with an indexing scheme. This is highly needed for the case where the dataset is composed of billions of data points, each of which is in a high-dimensional feature vector space
Guillaumin, Matthieu. "Données multimodales pour l'analyse d'image." Phd thesis, Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00522278/en/.
Full textAhmed, Mohamed Salem. "Contribution à la statistique spatiale et l'analyse de données fonctionnelles." Thesis, Lille 3, 2017. http://www.theses.fr/2017LIL30047/document.
Full textThis thesis is about statistical inference for spatial and/or functional data. Indeed, weare interested in estimation of unknown parameters of some models from random or nonrandom(stratified) samples composed of independent or spatially dependent variables.The specificity of the proposed methods lies in the fact that they take into considerationthe considered sample nature (stratified or spatial sample).We begin by studying data valued in a space of infinite dimension or so-called ”functionaldata”. First, we study a functional binary choice model explored in a case-controlor choice-based sample design context. The specificity of this study is that the proposedmethod takes into account the sampling scheme. We describe a conditional likelihoodfunction under the sampling distribution and a reduction of dimension strategy to definea feasible conditional maximum likelihood estimator of the model. Asymptotic propertiesof the proposed estimates as well as their application to simulated and real data are given.Secondly, we explore a functional linear autoregressive spatial model whose particularityis on the functional nature of the explanatory variable and the structure of the spatialdependence. The estimation procedure consists of reducing the infinite dimension of thefunctional variable and maximizing a quasi-likelihood function. We establish the consistencyand asymptotic normality of the estimator. The usefulness of the methodology isillustrated via simulations and an application to some real data.In the second part of the thesis, we address some estimation and prediction problemsof real random spatial variables. We start by generalizing the k-nearest neighbors method,namely k-NN, to predict a spatial process at non-observed locations using some covariates.The specificity of the proposed k-NN predictor lies in the fact that it is flexible and allowsa number of heterogeneity in the covariate. We establish the almost complete convergencewith rates of the spatial predictor whose performance is ensured by an application oversimulated and environmental data. In addition, we generalize the partially linear probitmodel of independent data to the spatial case. We use a linear process for disturbancesallowing various spatial dependencies and propose a semiparametric estimation approachbased on weighted likelihood and generalized method of moments methods. We establishthe consistency and asymptotic distribution of the proposed estimators and investigate thefinite sample performance of the estimators on simulated data. We end by an applicationof spatial binary choice models to identify UADT (Upper aerodigestive tract) cancer riskfactors in the north region of France which displays the highest rates of such cancerincidence and mortality of the country
Vincent, Pascal. "Modèles à noyaux à structure locale." Thèse, 2003. http://hdl.handle.net/1866/14543.
Full textVicente, Sergio. "Apprentissage statistique avec le processus ponctuel déterminantal." Thesis, 2021. http://hdl.handle.net/1866/25249.
Full textThis thesis presents the determinantal point process, a probabilistic model that captures repulsion between points of a certain space. This repulsion is encompassed by a similarity matrix, the kernel matrix, which selects which points are more similar and then less likely to appear in the same subset. This point process gives more weight to subsets characterized by a larger diversity of its elements, which is not the case with the traditional uniform random sampling. Diversity has become a key concept in domains such as medicine, sociology, forensic sciences and behavioral sciences. The determinantal point process is considered a promising alternative to traditional sampling methods, since it takes into account the diversity of selected elements. It is already actively used in machine learning as a subset selection method. Its application in statistics is illustrated with three papers. The first paper presents the consensus clustering, which consists in running a clustering algorithm on the same data, a large number of times. To sample the initials points of the algorithm, we propose the determinantal point process as a sampling method instead of a uniform random sampling and show that the former option produces better clustering results. The second paper extends the methodology developed in the first paper to large-data. Such datasets impose a computational burden since sampling with the determinantal point process is based on the spectral decomposition of the large kernel matrix. We introduce two methods to deal with this issue. These methods also produce better clustering results than consensus clustering based on a uniform sampling of initial points. The third paper addresses the problem of variable selection for the linear model and the logistic regression, when the number of predictors is large. A Bayesian approach is adopted, using Markov Chain Monte Carlo methods with Metropolis-Hasting algorithm. We show that setting the determinantal point process as the prior distribution for the model space selects a better final model than the model selected by a uniform prior on the model space.