Дисертації з теми "Analyse exploratoire des données"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Analyse exploratoire des données".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Verbanck, Marie. "Analyse exploratoire de données transcriptomiques : de leur visualisation à l'intégration d’information extérieure." Rennes, Agrocampus Ouest, 2013. http://www.theses.fr/2013NSARG011.
We propose new methodologies of exploratory statistics which are dedicated to the analysis of transcriptomic data (DNA microarray data). Transcriptomic data provide an image of the transcriptome which itself is the result of phenomena of activation or inhibition of gene expression. However, the image of the transcriptome is noisy. That is why, firstly we focus on the issue of transcriptomic data denoising, in a visualisation framework. To do so, we propose a regularised version of principal component analysis. This regularised version allows to better estimate and visualise the underlying signal of noisy data. In addition, we can wonder if the knowledge of only the transcriptome is enough to understand the complexity of relationships between genes. That is why we propose to integrate other sources of information about genes, and in an active way, in the analysis of transcriptomic data. Two major mechanisms seem to be involved in the regulation of gene expression, regulatory proteins (for instance transcription factors) and regulatory networks on the one hand, chromosomal localisation and genome architecture on the other hand. Firstly, we focus on the regulation of gene expression by regulatory proteins; we propose a gene clustering algorithm based on the integration of functional knowledge about genes, which is provided by Gene Ontology annotations. This algorithm provides clusters constituted by genes which have both similar expression profiles and similar functional annotations. The clusters thus constituted are then better candidates for interpretation. Secondly, we propose to link the study of transcriptomic data to chromosomal localisation in a methodology developed in collaboration with geneticists
Rigouste, Loïs. "Méthodes probabilistes pour l'analyse exploratoire de données textuelles." Phd thesis, Télécom ParisTech, 2006. http://pastel.archives-ouvertes.fr/pastel-00002424.
Bry, Xavier. "Une méthodologie exploratoire pour l'analyse et la synthèse d'un modèle explicatif : l'Analyse en Composantes Thématiques." Paris 9, 2004. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2004PA090055.
Ghalamallah, Ilhème. "Proposition d'un modèle d'analyse exploratoire multidimensionnelle dans un contexte d'intelligence économique." Toulouse 3, 2009. http://www.theses.fr/2009TOU30293.
A successful business is often conditioned by its ability to identify, collect, process and disseminate information for strategic purposes. Moreover, information technology and knowledge provide constraints that companies must adapt : a continuous stream, a circulation much faster techniques increasingly complex. The risk of being swamped by this information and no longer able to distinguish the essential from the trivial. Indeed, with the advent of new economy dominated by the market, the problem of industrial and commercial enterprise is become very complex. Now, to be competitive, the company must know how to manage their intangible capital. Competitive Intelligence (CI) is a response to the upheavals of the overall business environment and more broadly to any organization. In an economy where everything moves faster and more complex, management Strategic Information has become a key driver of overall business performance. CI is a process and an organizational process that can be more competitive, by monitoring its environment and its dynamics. In this context, we found that much information has strategic significance to the relationship: links between actors in the field, semantic networks, alliances, mergers, acquisitions, collaborations, co-occurrences of all kinds. Our work consists in proposing a model of multivariate analysis dedicated to the IE. This approach is based on the extraction of knowledge by analyzing the evolution of relational databases. We offer a model for understanding the activity of actors in a given field, but also their interactions their development and strategy, this decision in perspective. This approach is based on the designing a system of generic information online analysis to homogenize and organize text data in relational form, and thence to extract implicit knowledge of the content and formatting are adapted to non-specialist decision makers in the field of knowledge extraction
Guigourès, Romain. "Utilisation des modèles de co-clustering pour l'analyse exploratoire des données." Phd thesis, Université Panthéon-Sorbonne - Paris I, 2013. http://tel.archives-ouvertes.fr/tel-00935278.
Truong, Thérèse Quy Thy. "Le vandalisme de l’information géographique volontaire : analyse exploratoire et proposition d’une méthodologie de détection automatique." Thesis, Paris Est, 2020. http://www.theses.fr/2020PESC2009.
The quality of Volunteered Geographic Information (VGI) is currently a topic that question spatial data users as well as authoritative data producers who are willing to exploit the benefits of crowdsourcing. Contrary to most authoritative databases, the advantage of VGI provides open access to spatial data. However, VGI is prone to errors, even to deliberate defacement perpetrated by ill-intended contributors. In the latter case, we may speak of cartographic vandalism of carto-vandalism. This phenomenon is one the main downsides of crowsdsourcing, and despite the small amount of incidents, it may be a barrier to the use of collaborative spatial data. This thesis follows an approach based on VGI quality -- in particular, the objective of this work is to detect vandalism in spatial collaborative data. First, we formalize a definition of the concept of carto-vandalism. Then, assuming that corrupted spatial data come from malicious contributors, we demonstate that qualifying contributors enables to assess the corresponding contributed data. Finally, the experiments explore the ability of learning methods to detect carto-vandalism
Guigourès, Romain. "Utilisation des modèles de co-clustering pour l'analyse exploratoire des données." Thesis, Paris 1, 2013. http://www.theses.fr/2013PA010070.
Co-clustering is a clustering technique aiming at simultaneously partitioning the rows and the columns of a data matrix. Among the existing approaches, MODL is suitable for processing huge data sets with several continuous or categorical variables. We use it as the baseline approach in this thesis. We discuss the reliability of applying such an approach on data mining problems like graphs partitioning, temporal graphs segmentation or curve clustering.MODL tracks very fine patterns in huge data sets, that makes the results difficult to study. That is why, exploratory analysis tools must be defined in order to explore them. In order to help the user in interpreting the results, we define exploratory analysis tools aiming at simplifying the results in order to make possible an overall interpretation, tracking the most interesting patterns, determining the most representative values of the clusters and visualizing the results. We investigate the asymptotic behavior of these exploratory analysis tools in order to make the connection with the existing approaches.Finally, we highlight the value of MODL and the exploratory analysis tools owing to an application on call detailed records from the telecom operator Orange, collected in Ivory Coast
Heymann, Sébastien. "Analyse exploratoire de flots de liens pour la détection d'événements." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2013. http://tel.archives-ouvertes.fr/tel-00994766.
Posse, Christian. "Analyse exploratoire de données et discrimination à l'aide de projection pursuit /." [S.l.] : [s.n.], 1993. http://library.epfl.ch/theses/?display=detail&nr=1124.
Moudden, Yassir. "Estimation de paramètres physiques de combustion par modélisation du signal d'ionisation et inversion paramétrique." Paris 11, 2003. http://www.theses.fr/2003PA112004.
The work described in this thesis investigates the possibility of constructing an indirect measurement algorithm of relevant combustion parameters based on ionization signal processing. Indeed, automobile manufacturers are in need of low cost combustion diagnoses to enhance engine control. Because of the extreme complexity of the physical phenomena in which the ionization signal originates, the traditional model-based approach appeared unrealistic and did not bring about conclusive results. We hence turned to performing a blind statistical analysis of experimental data acquired on a test engine. The analysis of high dimensional data being notoriously awkward, it is necessary to first reduce the apparent dimension of the signal data, keeping in mind the necessity of preserving the information useful in terms of our estimation problem. The usual techniques such as Principal Component Analysis, Projection Pursuit, etc. Are used to form and detect relevant variables. Further, a procedure for high dimensional data analysis derived as an extension of Exploratory Projection Pursuit, is suggested and shown to be a profitable tool. With this method, we seek interesting projections of high dimensional data by optimizing probabilistic measures of dependence such as Mutual Information, Hellinger divergence, etc. Finally, results are presented that demonstrate the quality and the stability of the low complexity in-cylinder peak pressure position estimators we derived, for a wide range of engine states
Pfaender, Fabien. "Spatialisation de l'information." Compiègne, 2009. http://www.theses.fr/2009COMP1813.
The goal of this work is to understand how information presentations affect cognition so as to use them efficiently to mine date, synthesize information and explorer large heterogeneous datasets. We chose an enactive approach as a conceptual framework to understand how informations are perceived and how the way they are presented affects and transform us. In enaction, the world as perceived by a subject is the result of a dynamic coupling between the organism and its environment. Perception itself emerges from the coupling between subject’s actions and its sensations. Following these principles, we proposed that lines are a perceptive support for actions of reading that lead to complex perceptive gestures. Those gestures are the basis of what we called primary structures which exist in every presentation of informations. The structures are analyzed in terms of constraints and liberties they offer both for global gesture support and for local gesture variations. The five structures identified are the list, the diagram, the array, the graph of nodes and edges and the map. Primary structures themselves can also be combined into secondary structures. Thus, knowing how primary and secondary structure are perceived, it becomes possible to understand perceptive and cognitive effect of all spatialization of informations. Finally, given the semiological principles we discovered, we were able to come up with a systematic and spatialization-based metho to explore complex systems and reveal their structure. The method and the semiology have been integrated and tested in a web exploration software we developed for the occasion
Omidvar, Tehrani Behrooz. "Optimization-based User Group Management : Discovery, Analysis, Recommendation." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAM038/document.
User data is becoming increasingly available in multiple domains ranging from phone usage traces to data on the social Web. User data is a special type of data that is described by user demographics (e.g., age, gender, occupation, etc.) and user activities (e.g., rating, voting, watching a movie, etc.) The analysis of user data is appealing to scientists who work on population studies, online marketing, recommendations, and large-scale data analytics. However, analysis tools for user data is still lacking.In this thesis, we believe there exists a unique opportunity to analyze user data in the form of user groups. This is in contrast with individual user analysis and also statistical analysis on the whole population. A group is defined as set of users whose members have either common demographics or common activities. Group-level analysis reduces the amount of sparsity and noise in data and leads to new insights. In this thesis, we propose a user group management framework consisting of following components: user group discovery, analysis and recommendation.The very first step in our framework is group discovery, i.e., given raw user data, obtain user groups by optimizing one or more quality dimensions. The second component (i.e., analysis) is necessary to tackle the problem of information overload: the output of a user group discovery step often contains millions of user groups. It is a tedious task for an analyst to skim over all produced groups. Thus we need analysis tools to provide valuable insights in this huge space of user groups. The final question in the framework is how to use the found groups. In this thesis, we investigate one of these applications, i.e., user group recommendation, by considering affinities between group members.All our contributions of the proposed framework are evaluated using an extensive set of experiments both for quality and performance
Paillé, Pierre. "Les études sur la paix dans les collèges et universités : une analyse des données, des débats et des courants, avec survol exploratoire de la situation au Québec." Mémoire, Université de Sherbrooke, 1988. http://hdl.handle.net/11143/9209.
Borderon, Marion. "Entre distance géographique et distance sociale : le risque de paludisme-infection en milieu urbain africain : l'exemple de l'agglomération de Dakar, Sénégal." Thesis, Aix-Marseille, 2016. http://www.theses.fr/2016AIXM3004/document.
This thesis applies an Exploratory Spatial Data Analysis (ESDA) approach to study a complex phenomenon in a data scarce environment: malaria infection in Dakar. Each component of the malaria pathogenic system is necessary but not sufficient to result in an infection when acting in isolation. For malaria infection to occur, three components need to interact: the parasite, the vector, and the human host. The identification of areas where these three components can easily interact is therefore essential in the fight against malaria and the improvement of programs for the prevention and control or elimination of the disease. ESDA, still rarely applied in developing countries, is thus defined as a research approach but also as a way to provide answers to global health challenges. It leads to observation, from different angles, on the social and spatial determinants of malaria infection, as well as the examination of existing interactions between its three components. Several streams of quantitative information were collected, both directly and indirectly related to the study of malaria. More specifically, multi-temporal satellite imagery, census data, and results from social and health surveys have been integrated into a Geographic Information System (GIS) to describe the city and its inhabitants. Combining these datasets has enabled to study the spatial variability of the risk of malaria infection
Loubier, Eloïse. "Analyse et visualisation de données relationnelles par morphing de graphe prenant en compte la dimension temporelle." Phd thesis, Université Paul Sabatier - Toulouse III, 2009. http://tel.archives-ouvertes.fr/tel-00423655.
Nos travaux conduisent à l'élaboration des techniques graphiques permettant la compréhension des activités humaines, de leurs interactions mais aussi de leur évolution, dans une perspective décisionnelle. Nous concevons un outil alliant simplicité d'utilisation et précision d'analyse se basant sur deux types de visualisations complémentaires : statique et dynamique.
L'aspect statique de notre modèle de visualisation repose sur un espace de représentation, dans lequel les préceptes de la théorie des graphes sont appliqués. Le recours à des sémiologies spécifiques telles que le choix de formes de représentation, de granularité, de couleurs significatives permet une visualisation plus juste et plus précise de l'ensemble des données. L'utilisateur étant au cœur de nos préoccupations, notre contribution repose sur l'apport de fonctionnalités spécifiques, qui favorisent l'identification et l'analyse détaillée de structures de graphes. Nous proposons des algorithmes qui permettent de cibler le rôle des données au sein de la structure, d'analyser leur voisinage, tels que le filtrage, le k-core, la transitivité, de retourner aux documents sources, de partitionner le graphe ou de se focaliser sur ses spécificités structurelles.
Une caractéristique majeure des données stratégiques est leur forte évolutivité. Or l'analyse statistique ne permet pas toujours d'étudier cette composante, d'anticiper les risques encourus, d'identifier l'origine d'une tendance, d'observer les acteurs ou termes ayant un rôle décisif au cœur de structures évolutives.
Le point majeur de notre contribution pour les graphes dynamiques représentant des données à la fois relationnelles et temporelles, est le morphing de graphe. L'objectif est de faire ressortir les tendances significatives en se basant sur la représentation, dans un premier temps, d'un graphe global toutes périodes confondues puis en réalisant une animation entre les visualisations successives des graphes attachés à chaque période. Ce procédé permet d'identifier des structures ou des événements, de les situer temporellement et d'en faire une lecture prédictive.
Ainsi notre contribution permet la représentation des informations, et plus particulièrement l'identification, l'analyse et la restitution des structures stratégiques sous jacentes qui relient entre eux et à des moments donnés les acteurs d'un domaine, les mots-clés et concepts qu'ils utilisent.
Mahmoudysepehr, Mehdi. "Modélisation du comportement du tunnelier et impact sur son environnement." Thesis, Centrale Lille Institut, 2020. http://www.theses.fr/2020CLIL0028.
This PhD thesis research work consists in understanding the behavior of the TBM according to the environment encountered in order to propose safe, durable and quality solutions for the digging of the tunnel.The main objective of this doctoral thesis work is to better understand the behavior of the TBM according to its environment. Thus, we will explore how the TBM reacts according to the different types of terrain and how it acts on the various elements of tunnel structure (voussoirs). This will make it possible to propose an intelligent and optimal dimensioning of the voussoirs and instructions of adapted piloting
Loubier, Éloïse. "Analyse et visualisation de données relationnelles par morphing de graphe prenant en compte la dimension temporelle." Toulouse 3, 2009. http://thesesups.ups-tlse.fr/2264/.
With word wide exchanges, companies must face increasingly strong competition and masses of information flows. They have to remain continuously informed about innovations, competition strategies and markets and at the same time they have to keep the control of their environment. The Internet development and globalization reinforced this requirement and on the other hand provided means to collect information. Once summarized and synthesized, information generally is under a relational form. To analyze such a data, graph visualization brings a relevant mean to users to interpret a form of knowledge which would have been difficult to understand otherwise. The research we have carried out results in designing graphical techniques that allow understanding human activities, their interactions but also their evolution, from the decisional point of view. We also designed a tool that combines ease of use and analysis precision. It is based on two types of complementary visualizations: statics and dynamics. The static aspect of our visualization model rests on a representation space in which the precepts of the graph theory are applied. Specific semiologies such as the choice of representation forms, granularity, and significant colors allow better and precise visualizations of the data set. The user being a core component of our model, our work rests on the specification of new types of functionalities, which support the detection and the analysis of graph structures. We propose algorithms which make it possible to target the role of the data within the structure, to analyze their environment, such as the filtering tool, the k-core, and the transitivity, to go back to the documents, and to give focus on the structural specificities. One of the main characteristics of strategic data is their strong evolution. However the statistical analysis does not make it possible to study this component, to anticipate the incurred risks, to identify the origin of a trend, and to observe the actors or terms having a decisive role in the evolution structures. With regard to dynamic graphs, our major contribution is to represent relational and temporal data at the same time; which is called graph morphing. The objective is to emphasize the significant tendencies considering the representation of a graph that includes all the periods and then by carrying out an animation between successive visualizations of the graphs attached to each period. This process makes it possible to identify structures or events, to locate them temporally, and to make a predictive reading of it. Thus our contribution allows the representation of advanced information and more precisely the identification, the analysis, and the restitution of the underlying strategic structures which connect the actors of a domain, the key words, and the concepts they use; this considering the evolution feature
Irichabeau, Gabrielle. "Évaluation économique de la dépendance d'une activité au milieu naturel. L'exemple de l'ostréiculture arcachonnaise." Phd thesis, Université Montesquieu - Bordeaux IV, 2011. http://tel.archives-ouvertes.fr/tel-00662006.
Béranger, Boris. "Modélisation de la structure de dépendance d'extrêmes multivariés et spatiaux." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066004/document.
Projection of future extreme events is a major issue in a large number of areas including the environment and risk management. Although univariate extreme value theory is well understood, there is an increase in complexity when trying to understand the joint extreme behavior between two or more variables. Particular interest is given to events that are spatial by nature and which define the context of infinite dimensions. Under the assumption that events correspond marginally to univariate extremes, the main focus is then on the dependence structure that links them. First, we provide a review of parametric dependence models in the multivariate framework and illustrate different estimation strategies. The spatial extension of multivariate extremes is introduced through max-stable processes. We derive the finite-dimensional distribution of the widely used Brown-Resnick model which permits inference via full and composite likelihood methods. We then use Skew-symmetric distributions to develop a spectral representation of a wider max-stable model: the extremal Skew-t model from which most models available in the literature can be recovered. This model has the nice advantages of exhibiting skewness and nonstationarity, two properties often held by environmental spatial events. The latter enables a larger spectrum of dependence structures. Indicators of extremal dependence can be calculated using its finite-dimensional distribution. Finally, we introduce a kernel based non-parametric estimation procedure for univariate and multivariate tail density and apply it for model selection. Our method is illustrated by the example of selection of physical climate models
Irichabeau, Gabrielle. "Evaluation économique de la dépendance d’une activité au milieu naturel : l'exemple de l'ostréiculrure arcachonnaise." Thesis, Bordeaux 4, 2011. http://www.theses.fr/2011BOR40035/document.
Economic activities have forms and degrees of dependency variables to the environment. The environment can act as a factor of production as a constraint to the use of certain inputs, such as a constraint for some inputs. Dependence may be related to the availability or quality of certain environmental resources. It will explore the implications of different forms of dependencies bio-physico-chemical as well as legal. In the case of the Arcachon Bay oyster-farming will examine the forms of dependence and economic measure, through the economic impacts associated with the variable availability of living marine resources but also to the natural productivity of the environment. The analysis of socio-economic characteristics of Arcachon Bay oyster-farms will develop a typology of the latter and thus characterize the activity. A production function approach will be used to highlight the varying degrees of sensitivity to changes in environmental conditions of production while the evaluation by the hedonic price method will determine the implicit price of environmental components of the oyster leases value taking into account also the geographical location of oyster leases
Marie, Nicolas. "Recherche exploratoire basée sur des données liées." Thesis, Nice, 2014. http://www.theses.fr/2014NICE4129/document.
The general topic of the thesis is web search. It focused on how to leverage the data semantics for exploratory search. Exploratory search refers to cognitive consuming search tasks that are open-ended, multi-faceted, and iterative like learning or topic investigation. Semantic data and linked data in particular offer new possibilities to solve complex search queries and information needs including exploratory search ones. In this context the linked open data cloud plays an important role by allowing advanced data processing and innovative interactions model elaboration. First, we detail a state-of-the-art review of linked data based exploratory search approaches and systems. Then we propose a linked data based exploratory search solution which is mainly based on an associative retrieval algorithm. We started from a spreading activation algorithm and proposed new diffusion formula optimized for typed graph. Starting from this formalization we proposed additional formalizations of several advanced querying modes in order to solve complex exploratory search needs. We also propose an innovative software architecture based on two paradigmatic design choices. First the results have to be computed at query-time. Second the data are consumed remotely from distant SPARQL endpoints. This allows us to reach a high level of flexibility in terms of querying and data selection. We specified, designed and evaluated the Discovery Hub web application that retrieves the results and present them in an interface optimized for exploration. We evaluate our approach thanks to several human evaluations and we open the discussion about new ways to evaluate exploratory search engines
El, Moussawi Adnan. "Clustering exploratoire pour la segmentation de données clients." Thesis, Tours, 2018. http://www.theses.fr/2018TOUR4010/document.
The research work presented in this thesis focuses on the exploration of the multiplicity of clustering solutions. The goal is to provide to marketing experts an interactive tool for exploring customer data that considers expert preferences on the space of attributes. We first give the definition of an exploratory clustering system. Then, we propose a new semi-supervised clustering method that considers user’s quantitative preferences on the analysis attributes and manages the sensitivity to these preferences. Our method takes advantage of metric learning to find a compromise solution that is both well adapted to the data structure and consistent with the expert’s preferences. Finally, we propose a prototype of exploratory clustering for customer relationship data segmentation that integrates the proposed method. The prototype also integrates visual and interaction components essential for the implementation of the exploratory clustering process
Pimentel, Cachapuz Rocha Eduardo. "Analyse exploratoire des génomes bactériens." Versailles-St Quentin en Yvelines, 2000. http://www.theses.fr/2000VERSA001.
Dumouchel, Bernard. "Analyse exploratoire des effets de l'internationalisation des universités." Paris 8, 2001. http://www.theses.fr/2001PA082006.
Marine, Cadoret. "Analyse factorielle de données de catégorisation. : Application aux données sensorielles." Rennes, Agrocampus Ouest, 2010. http://www.theses.fr/2010NSARG006.
In sensory analysis, holistic approaches in which objects are considered as a whole are increasingly used to collect data. Their interest comes on a one hand from their ability to acquire other types of information as the one obtained by traditional profiling methods and on the other hand from the fact they require no special skills, which makes them feasible by all subjects. Categorization (or free sorting), in which subjects are asked to provide a partition of objects, belongs to these approaches. The first part of this work focuses on categorization data. After seeing that this method of data collection is relevant, we focus on the statistical analysis of these data through the research of Euclidean representations. The proposed methodology which consists in using factorial methods such as Multiple Correspondence Analysis (MCA) or Multiple Factor Analysis (MFA) is also enriched with elements of validity. This methodology is then illustrated by the analysis of two data sets obtained from beers on a one hand and perfumes on the other hand. The second part is devoted to the study of two data collection methods related to categorization: sorted Napping® and hierarchical sorting. For both data collections, we are also interested in statistical analysis by adopting an approach similar to the one used for categorization data. The last part is devoted to the implementation in the R software of functions to analyze the three kinds of data that are categorization data, hierarchical sorting data and sorted Napping® data
Derquenne, Christian. "Traitements statistiques de données catégorielles : recherche exploratoire de structures et modélisation de phénomènes." Paris 9, 2006. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2006PA090002.
The goal of this thesis is to present research works on ten years (1995-2005) in statistical methods concerning categorical data, with two approaches: discover of structures by exploratory data analysis and phenomena modeling by inferential statistics. The first one introduces new concepts in clustering mixture variables (numeric and/or categorical). In the second approach, many new statistical tools are developed: an heteroskedastic logit model, statistical tests on marginal and hierarchy contribution of explanatory variables, multivariate modeling of several response categorical variables (Partial Maximum Likelihood Regression) and path modeling on mixture variables (Partial Maximum Likelihood Approach). These methods have been applied on many real cases
Gomes, Da Silva Alzennyr. "Analyse des données évolutives : application aux données d'usage du Web." Phd thesis, Université Paris Dauphine - Paris IX, 2009. http://tel.archives-ouvertes.fr/tel-00445501.
Gomes, da Silva Alzennyr. "Analyse des données évolutives : Application aux données d'usage du Web." Paris 9, 2009. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2009PA090047.
Nowadays, more and more organizations are becoming reliant on the Internet. The Web has become one of the most widespread platforms for information change and retrieval. The growing number of traces left behind user transactions (e. G. : customer purchases, user sessions, etc. ) automatically increases the importance of usage data analysis. Indeed, the way in which a web site is visited can change over time. These changes can be related to some temporal factors (day of the week, seasonality, periods of special offer, etc. ). By consequence, the usage models must be continuously updated in order to reflect the current behaviour of the visitors. Such a task remains difficult when the temporal dimension is ignored or simply introduced into the data description as a numeric attribute. It is precisely on this challenge that the present thesis is focused. In order to deal with the problem of acquisition of real usage data, we propose a methodology for the automatic generation of artificial usage data over which one can control the occurrence of changes and thus, analyse the efficiency of a change detection system. Guided by tracks born of some exploratory analyzes, we propose a tilted window approach for detecting and following-up changes on evolving usage data. In order measure the level of changes, this approach applies two external evaluation indices based on the clustering extension. The proposed approach also characterizes the changes undergone by the usage groups (e. G. Appearance, disappearance, fusion and split) at each timestamp. Moreover, the refereed approach is totally independent of the clustering method used and is able to manage different kinds of data other than usage data. The effectiveness of this approach is evaluated on artificial data sets of different degrees of complexity and also on real data sets from different domains (academic, tourism, e-business and marketing)
Périnel, Emmanuel. "Segmentation en analyse de données symboliques : le cas de données probabilistes." Paris 9, 1996. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1996PA090079.
Sibony, Eric. "Analyse mustirésolution de données de classements." Thesis, Paris, ENST, 2016. http://www.theses.fr/2016ENST0036/document.
This thesis introduces a multiresolution analysis framework for ranking data. Initiated in the 18th century in the context of elections, the analysis of ranking data has attracted a major interest in many fields of the scientific literature : psychometry, statistics, economics, operations research, machine learning or computational social choice among others. It has been even more revitalized by modern applications such as recommender systems, where the goal is to infer users preferences in order to make them the best personalized suggestions. In these settings, users express their preferences only on small and varying subsets of a large catalog of items. The analysis of such incomplete rankings poses however both a great statistical and computational challenge, leading industrial actors to use methods that only exploit a fraction of available information. This thesis introduces a new representation for the data, which by construction overcomes the two aforementioned challenges. Though it relies on results from combinatorics and algebraic topology, it shares several analogies with multiresolution analysis, offering a natural and efficient framework for the analysis of incomplete rankings. As it does not involve any assumption on the data, it already leads to overperforming estimators in small-scale settings and can be combined with many regularization procedures for large-scale settings. For all those reasons, we believe that this multiresolution representation paves the way for a wide range of future developments and applications
Trudeau-Malo, Jennifer. "Analyse exploratoire de quatre Centres de la petite enfance au Nunavik." Master's thesis, Université Laval, 2016. http://hdl.handle.net/20.500.11794/27177.
This exploratory research, conducted in 2011, describes the implementation of several childcare centres within a socioeconomic context in four communities in Nunavik. The main objectives of this project are to analyze community conditions such as employement possibilities, health issues, childcare services and education, that engendered a demand for childcare centres, as well as to examine the impact of such services on community life. The development of childcare centres derives from a collaboration between the Kativik Regional Government (KRG) and Northern community members. Hence, we also explore the role KRG played in the developement and maintance of such services in the North. To this day, there exists few scientific studies dealing with childcare centres in Nunavik and so, we worked in collaboration with KRG to conduct this project.
Aaron, Catherine. "Connexité et analyse des données non linéaires." Phd thesis, Université Panthéon-Sorbonne - Paris I, 2005. http://tel.archives-ouvertes.fr/tel-00308495.
Darlay, Julien. "Analyse combinatoire de données : structures et optimisation." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00683651.
Operto, Grégory. "Analyse structurelle surfacique de données fonctionnelles cétrébrales." Aix-Marseille 3, 2009. http://www.theses.fr/2009AIX30060.
Functional data acquired by magnetic resonance contain a measure of the activity in every location of the brain. If many methods exist, the automatic analysis of these data remains an open problem. In particular, the huge majority of these methods consider these data in a volume-based fashion, in the 3D acquisition space. However, most of the activity is generated within the cortex, which can be considered as a surface. Considering the data on the cortical surface has many advantages : on one hand, its geometry can be taken into account in every processing step, on the other hand considering the whole volume reduces the detection power of usually employed statistical tests. This thesis hence proposes an extension of the application field of volume-based methods to the surface-based domain by adressing problems such as projecting data onto the surface, performing surface-based multi-subjects analysis, and estimating results validity
Le, Béchec Antony. "Gestion, analyse et intégration des données transcriptomiques." Rennes 1, 2007. http://www.theses.fr/2007REN1S051.
Aiming at a better understanding of diseases, transcriptomic approaches allow the analysis of several thousands of genes in a single experiment. To date, international standard initiatives have allowed the utilization of large quantity of data generated using transcriptomic approaches by the whole scientific community, and a large number of algorithms are available to process and analyze the data sets. However, the major challenge remaining to tackle is now to provide biological interpretations to these large sets of data. In particular, their integration with additional biological knowledge would certainly lead to an improved understanding of complex biological mechanisms. In my thesis work, I have developed a novel and evolutive environment for the management and analysis of transcriptomic data. Micro@rray Integrated Application (M@IA) allows for management, processing and analysis of large scale expression data sets. In addition, I elaborated a computational method to combine multiple data sources and represent differentially expressed gene networks as interaction graphs. Finally, I used a meta-analysis of gene expression data extracted from the literature to select and combine similar studies associated with the progression of liver cancer. In conclusion, this work provides a novel tool and original analytical methodologies thus contributing to the emerging field of integrative biology and indispensable for a better understanding of complex pathophysiological processes
Abdali, Abdelkebir. "Systèmes experts et analyse de données industrielles." Lyon, INSA, 1992. http://www.theses.fr/1992ISAL0032.
To analyses industrial process behavio, many kinds of information are needed. As tye ar mostly numerical, statistical and data analysis methods are well-suited to this activity. Their results must be interpreted with other knowledge about analysis prcess. Our work falls within the framework of the application of the techniques of the Artificial Intelligence to the Statistics. Its aim is to study the feasibility and the development of statistical expert systems in an industrial process field. The prototype ALADIN is a knowledge-base system designed to be an intelligent assistant to help a non-specialist user analyze data collected on industrial processes, written in Turbo-Prolong, it is coupled with the statistical package MODULAD. The architecture of this system is flexible and combing knowledge with general plants, the studied process and statistical methods. Its validation is performed on continuous manufacturing processes (cement and cast iron processes). At present time, we have limited to principal Components analysis problems
David, Claire. "Analyse de XML avec données non-bornées." Paris 7, 2009. http://www.theses.fr/2009PA077107.
The motivation of the work is the specification and static analysis of schema for XML documents paying special attention to data values. We consider words and trees whose positions are labeled both by a letter from a finite alphabet and a data value from an infinite domain. Our goal is to find formalisms which offer good trade-offs between expressibility, decidability and complexity (for the satisfiability problem). We first study an extension of first-order logic with a binary predicate representing data equality. We obtain interesting some interesting results when we consider the two variable fragment. This appraoch is elegant but the complexity results are not encouraging. We proposed another formalism based data patterns which can be desired, forbidden or any boolean combination thereof. We drw precisely the decidability frontier for various fragments on this model. The complexity results that we get, while still high, seems more amenable. In terms of expressivity theses two approaches are orthogonal, the two variable fragment of the extension of FO can expressed unary key and unary foreign key while the boolean combination of data pattern can express arbitrary key but can not express foreign key
Bobin, Jérôme. "Diversité morphologique et analyse de données multivaluées." Paris 11, 2008. http://www.theses.fr/2008PA112121.
Carvalho, Francisco de. "Méthodes descriptives en analyse de données symboliques." Paris 9, 1992. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1992PA090025.
Royer, Jean-Jacques. "Analyse multivariable et filtrage des données régionalisées." Vandoeuvre-les-Nancy, INPL, 1988. http://www.theses.fr/1988NAN10312.
Faye, Papa Abdoulaye. "Planification et analyse de données spatio-temporelles." Thesis, Clermont-Ferrand 2, 2015. http://www.theses.fr/2015CLF22638/document.
Spatio-temporal modeling allows to make the prediction of a regionalized variable at unobserved points of a given field, based on the observations of this variable at some points of field at different times. In this thesis, we proposed a approach which combine numerical and statistical models. Indeed by using the Bayesian methods we combined the different sources of information : spatial information provided by the observations, temporal information provided by the black-box and the prior information on the phenomenon of interest. This approach allowed us to have a good prediction of the variable of interest and a good quantification of incertitude on this prediction. We also proposed a new method to construct experimental design by establishing a optimality criterion based on the uncertainty and the expected value of the phenomenon
Jamal, Sara. "Analyse spectrale des données du sondage Euclid." Thesis, Aix-Marseille, 2017. http://www.theses.fr/2017AIXM0263.
Large-scale surveys, as Euclid, will produce a large set of data that will require the development of fully automated data-processing pipelines to analyze the data, extract crucial information and ensure that all requirements are met. From a survey, the redshift is an essential quantity to measure. Distinct methods to estimate redshifts exist in the literature but there is no fully-automated definition of a reliability criterion for redshift measurements. In this work, we first explored common techniques of spectral analysis, as filtering and continuum extraction, that could be used as preprocessing to improve the accuracy of spectral features measurements, then focused on developing a new methodology to automate the reliability assessment of spectroscopic redshift measurements by exploiting Machine Learning (ML) algorithms and features of the posterior redshift probability distribution function (PDF). Our idea consists in quantifying, through ML and zPDFs descriptors, the reliability of a redshift measurement into distinct partitions that describe different levels of confidence. For example, a multimodal zPDF refers to multiple (plausible) redshift solutions possibly with similar probabilities, while a strong unimodal zPDF with a low dispersion and a unique and prominent peak depicts of a more "reliable" redshift estimate. We assess that this new methodology could be very promising for next-generation large spectroscopic surveys on the ground and space such as Euclid and WFIRST
Lambert, Thierry. "Réalisation d'un logiciel d'analyse de données." Paris 11, 1986. http://www.theses.fr/1986PA112274.
Richer, Gaëlle. "Passage à l'échelle pour la visualisation interactive exploratoire de données : approches par abstraction et par déformation spatiale." Thesis, Bordeaux, 2019. http://www.theses.fr/2019BORD0264/document.
Interactive visualization is helpful for exploring, understanding, and analyzing data. However, increasingly large and complex data challenges the efficiency of visualization systems, both visually and computationally. The visual challenge stems from human perceptual and cognitive limitations as well as screen space limitations while the computational challenge stems from the processing and memory limitations of standard computers.In this thesis, we present techniques addressing the two scalability issues for several interactive visualization applications.To address visual scalability requirements, we present a versatile spatial-distortion approach for linked emphasis on multiple views and an abstract and multi-scale representation based on parallel coordinates. Spatial distortion aims at alleviating the weakened emphasis effect of highlighting when applied to small-sized visual elements. Multiscale abstraction simplifies the representation while providing detail on demand by pre-aggregating data at several levels of detail.To address computational scalability requirements and scale data processing to billions of items in interactive times, we use pre-computation and real-time computation on a remote distributed infrastructure. We present a system for multi-/dimensional data exploration in which the interactions and abstract representation comply with a visual item budget and in return provides a guarantee on network-related interaction latencies. With the same goal, we compared several geometric reduction strategies for the reconstruction of density maps of large-scale point sets
Fraisse, Bernard. "Automatisation, traitement du signal et recueil de données en diffraction x et analyse thermique : Exploitation, analyse et représentation des données." Montpellier 2, 1995. http://www.theses.fr/1995MON20152.
Kezouit, Omar Abdelaziz. "Bases de données relationnelles et analyse de données : conception et réalisation d'un système intégré." Paris 11, 1987. http://www.theses.fr/1987PA112130.
Gonzalez, Ignacio. "Analyse canonique régularisée pour des données fortement multidimensionnelles." Toulouse 3, 2007. http://thesesups.ups-tlse.fr/99/.
Motivated by the study of relationships between gene expressions and other biological variables, our work consists in presenting and developing a methodology answering this problem. Among the statistical methods treating this subject, Canonical Analysis (CA) seemed well adapted, but the high dimension is at present one of the major obstacles for the statistical techniques of analysis data coming from microarrays. Typically the axis of this work was the research of solutions taking into account this crucial aspect in the implementation of the CA. Among the approaches considered to handle this problem, we were interested in the methods of regularization. The method developed here, called Regularised Canonical Analysis (RCA), is based on the principle of ridge regularization initially introduced in multiple linear regression. RCA needing the choice of two parameters of regulation for its implementation, we proposed the method of M-fold cross-validation to handle this problem. We presented in detail RCA applications to high multidimensional data coming from genomic studies as well as to data coming from other domains. Among other we were interested in a visualization of the data in order to facilitate the interpretation of the results. For that purpose, we proposed some graphical methods: representations of variables (correlations graphs), representations of individuals as well as alternative representations as networks and heatmaps. .
Bazin, Gurvan. "Analyse différée des données du SuperNova Legacy Survey." Paris 7, 2008. http://www.theses.fr/2008PA077135.
The SuperNova Legacy Survey (SNLS) experiment observed type la supemovae (SNeHa) during 5 years. Its aim is the contraint cosmological parameters. The online reduction pipeline is based on spectroscopic identification for each supernova. Systematically using spectroscopy requires a sufficient signal to noise level. Thus, it could lead to selection biases and would not be possible for future surveys The PhD thesis report a complementary method for data reduction based on a completely photometric selection. This analysis, more efficient to select faint events, approximately double the SNeHa sample of the SNLS. This method show a clear bias in the spectroscopic selection. Brighter SNeHa are systematically selected beyond a redshift of 0. 7. On the other hand, no important impact on cosmology was found. So, corrections on intrinsic variability of SNeHa luminosity are robust. In addition, this work is a first step to study the feasibility of such a purely photometric analysis for cosmology. This is a promising method for future projects
Hapdey, Sébastien. "Analyse de données multi-isotopiques en imagerie monophotonique." Paris 11, 2002. http://www.theses.fr/2002PA11TO35.
Feydy, Jean. "Analyse de données géométriques, au delà des convolutions." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASN017.
Geometric data analysis, beyond convolutionsTo model interactions between points, a simple option is to rely on weighted sums known as convolutions. Over the last decade, this operation has become a building block for deep learning architectures with an impact on many applied fields. We should not forget, however, that the convolution product is far from being the be-all and end-all of computational mathematics.To let researchers explore new directions, we present robust, efficient and principled implementations of three underrated operations: 1. Generic manipulations of distance-like matrices, including kernel matrix-vector products and nearest-neighbor searches.2. Optimal transport, which generalizes sorting to spaces of dimension D > 1.3. Hamiltonian geodesic shooting, which replaces linear interpolation when no relevant algebraic structure can be defined on a metric space of features.Our PyTorch/NumPy routines fully support automatic differentiation and scale up to millions of samples in seconds. They generally outperform baseline GPU implementations with x10 to x1,000 speed-ups and keep linear instead of quadratic memory footprints. These new tools are packaged in the KeOps (kernel methods) and GeomLoss (optimal transport) libraries, with applications that range from machine learning to medical imaging. Documentation is available at: www.kernel-operations.io/keops and /geomloss