Dissertations / Theses on the topic 'Arbres de données'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Arbres de données.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Brossier, Gildas. "Problèmes de représentation de données par des arbres." Rennes 2, 1986. http://www.theses.fr/1986REN20014.
Full textFirst, we begin by studying the properties of distance tables associated with tree-representations, and the relation between these distances. Then we define ordered representations, construct a class of ordering algorithms and study their optimal properties under different conditions. The decomposition properties of distance tables allow us to construct fast algorithms for representations with some optimal properties we extend results when data are asymmetry matrices. Last of all we show in the case of rectangular matrices the necessary and sufficient conditions for the simultaneous representations of two sets of data. When conditions are not satisfied we propose some approximation algorithms
Acosta, Francisco. "Les arbres balances : spécification, performances et contrôle de concurrence." Montpellier 2, 1991. http://www.theses.fr/1991MON20201.
Full textDufort, Julie. "Estimation automatisée de la hauteur des arbres à partir de données d'altimétrie laser." Mémoire, École de technologie supérieure, 2000. http://espace.etsmtl.ca/855/1/DUFORT_Julie.pdf.
Full textGermain, Christian. "Etude algébrique, combinatoire et algorithmique de certaines structures non associatives (magmas, arbres, parenthésages)." Dijon, 1996. http://www.theses.fr/1996DIJOS018.
Full textFlitti, Farid. "Techniques de réduction de données et analyse d'images multispectrales astronomiques par arbres de Markov." Phd thesis, Université Louis Pasteur - Strasbourg I, 2005. http://tel.archives-ouvertes.fr/tel-00156963.
Full textFlitti, Farid. "Techniques de réduction de données et analyse d'images multispéctrales astronomiques par arbres de Markov." Université Louis Pasteur (Strasbourg) (1971-2008), 2005. https://publication-theses.unistra.fr/public/theses_doctorat/2005/FLITTI_Farid_2005.pdf.
Full textThe development of astronomical multispectral sensors allows data of a great richness. Nevertheless, the classification of multidimensional images is often limited by Hughes phenomenon: when dimensionality increases the number of parameters of the model grows and the precision of their estimates falls inevitably, therefore the quality of the segmentation dramatically decreases. It is thus imperative to discard redundant information in order to carry out robust segmentation or classification. In this thesis, we have proposed two methods for multispectral image dimensionnality reduction: 1) bands regrouping followed by local projections; 2) radio cubes reduction by a mixture of Gaussians model. We have also proposed joint reduction/segmentation scheme based on the regularization of the mixture of probabilistic principal components analyzers (MPPCA). For the segmentation task, we have used a Bayesian approach based on hierarchical Markov models namely the hidden Markov tree and the pairwise Markov tree. These models allow fast and exact computation of the a posteriori probabilities. For the data driven term, we have used three formulations: 1) the classical multidimensional Gaussian distribution 2) the multidimensional generalized Gaussian distribution formulated using copulas theory 3) the likelihood of the probabilistic PCA model (within the framework of the regularized MPPCA). The major contribution of this work consists in introducing various hierarchical Markov models for multidimensional and multiresolution data segmentation. Their exploitation for data issued from wavelets analysis, adapted to the astronomical context, enabled us to develop new denoising and fusion techniques of multispectral astronomical images. All our algorithms are unsupervised and were validated on synthetic and real images
Galluccio, Laurent. "Analyse et segmentation de données non supervisées à l'aide de graphe." Nice, 2010. http://www.theses.fr/2010NICE4022.
Full textThis thesis presents new data segmentation and data clustering methods applied to astrophysical data. A priori information such as the number of classes or the underlying data distribution is not necessarily known. Many classification methods in astrophysics community are based on a priori knowledges or on observations already realized on data. Classifications obtained will depend on these information and will be limited by the experts knowledge. The goal of developing clustering algorithms is to get rid of these limitations, to be able to potentially detect new classes. The main approach chosen in this thesis is the use of a graph built on the data : the Minimal Spanning Tree (MST). By connecting the points by segments we build a structure which encapsulates the being relations between each pair of points. We propose a method to estimate both the number and the position of clusters by exploring the connections of the MST built. A data partition is obtained by using this information to initialize some clustering algorithms. A new class of multi-rooted MSTs is introduced. From their construction, new distance measures are derived allowing to take into account both the local and global data neighborhood. A clustering method which combines results of multiple partitionments realized on the multi-rooted trees is also exposed. The methods proposed are validated on benchmarks and applied to astrophysical datasets
Fournier, Dominique. "Etude de la qualité de données à partir de l'apprentissage automatique : application aux arbres d'induction." Caen, 2001. http://www.theses.fr/2001CAEN2048.
Full textBoneva, Iovka. "Expressivité, satisfiabilité et model checking d'une logique spatiale pour arbres non ordonnés." Lille 1, 2006. https://ori-nuxeo.univ-lille1.fr/nuxeo/site/esupversions/dffac6b2-50d6-4e6d-9e4c-f8f5731c75e2.
Full textJabbour-Hattab, Jean. "Une approche probabiliste du profil des arbres binaires de recherche." Versailles-St Quentin en Yvelines, 2001. http://www.theses.fr/2001VERS002V.
Full textAuber, David. "Outils de visualisation de larges structures de données." Bordeaux 1, 2002. http://www.theses.fr/2002BOR12607.
Full textHaddad, Raja. "Apprentissage supervisé de données symboliques et l'adaptation aux données massives et distribuées." Thesis, Paris Sciences et Lettres (ComUE), 2016. http://www.theses.fr/2016PSLED028/document.
Full textThis Thesis proposes new supervised methods for Symbolic Data Analysis (SDA) and extends this domain to Big Data. We start by creating a supervised method called HistSyr that converts automatically continuous variables to the most discriminant histograms for classes of individuals. We also propose a new method of symbolic decision trees that we call SyrTree. SyrTree accepts many types of inputs and target variables and can use all symbolic variables describing the target to construct the decision tree. Finally, we extend HistSyr to Big Data, by creating a distributed method called CloudHistSyr. Using the Map/Reduce framework, CloudHistSyr creates of the most discriminant histograms for data too big for HistSyr. We tested CloudHistSyr on Amazon Web Services. We show the efficiency of our method on simulated data and on actual car traffic data in Nantes. We conclude on overall utility of CloudHistSyr which, through its results, allows the study of massive data using existing symbolic analysis methods
Travers, Nicolas. "Optimisation extensible dans un médiateur de données semi-structurées." Versailles-St Quentin en Yvelines, 2006. http://www.theses.fr/2006VERS0049.
Full textThis thesis proposes to evaluate XQuery queries into a mediation context. This mediator must federate several heterogeneous data sources with an appropriate query model. On this model, an optimization framework must be defined to increase performance. The well-known tree pattern model can represent a subset of XPath queries in a tree form. Because of the complexity of XQuery, no model has been proposed that is able to represent all the structural components of the language. Then, we propose a new logical model for XQuery queries called TGV. It aims at supporting the whole XQuery into a canonical form in order to check more XQuery specifications. This form allows us to translate in a unique way queries into our TGV model. This model takes into account a distributed heterogenous context and eases the optimization process. It integrates transformation rules, cost evaluation, and therefore, execution of XQuery queries. The TGV can be used as a basis for processing XQuery queries, since it is flexible, it provides abstracts data types wich can be implemented according to the underneath data model. Moreover, it allows user-defined annotating ans also cost-related annotating for cost estimation. Althouogh the model will be useful, it relies on XQuery complicates specifications. TGV are illustrated in this thesis with several figures on W3C's uses cases. Finally, a framework to define transformation rules is added to the extensible optimizer to increase the XLive mediator performances. The XLive mediation system has been developped at the PRISM laboratory
Vera, Carine. "Modèles linéaires mixtes multiphasiques pour l'analyse de données longitudinales : Application à la croissance des plantes." Montpellier 2, 2004. http://www.theses.fr/2004MON20161.
Full textFayad, Ibrahim. "Estimation de la hauteur des arbres à l'échelle régionale : application à la Guyane Française." Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS143/document.
Full textRemote sensing has facilitated the techniques used for the mapping, modelling and understanding of forest parameters. Remote sensing applications usually use information from either passive optical systems or active radar sensors. These systems have shown satisfactory results for estimating, for example, aboveground biomass in some biomes. However, they presented significant limitations for ecological applications, as the sensitivity from these sensors has been shown to be limited in forests with medium levels of aboveground biomass. On the other hand, LiDAR remote sensing has been shown to be a good technique for the estimation of forest parameters such as canopy heights and above ground biomass. Whilst airborne LiDAR data are in general very dense but only available over small areas due to the cost of their acquisition, spaceborne LiDAR data acquired from the Geoscience Laser Altimeter System (GLAS) have low acquisition density with global geographical cover. It is therefore valuable to analyze the integration relevance of canopy heights estimated from LiDAR sensors with ancillary data (geological, meteorological, slope, vegetation indices etc.) in order to propose a forest canopy height map with good precision and high spatial resolution. In addition, estimating forest canopy heights from large-footprint satellite LiDAR waveforms, is challenging given the complex interaction between LiDAR waveforms, terrain, and vegetation, especially in dense tropical and equatorial forests. Therefore, the research carried out in this thesis aimed at: 1) estimate, and validate canopy heights using raw data from airborne LiDAR and then evaluate the potential of spaceborne LiDAR GLAS data at estimating forest canopy heights. 2) evaluate the fusion potential of LiDAR (using either sapceborne and airborne data) and ancillary data for forest canopy height estimation at very large scales. This research work was carried out over the French Guiana.The estimation of the canopy heights using the airborne showed an RMSE on the canopy height estimates of 1.6 m. Next, the potential of GLAS for the estimation of canopy heights was assessed using multiple linear (ML) and Random Forest (RF) regressions using waveform metrics and principal component analssis (PCA). Results showed canopy height estimations with similar precisions using either LiDAR metrics or the principal components (PCs) (RMSE ~ 3.6 m). However, a regression model (ML or RF) based on the PCA of waveform samples is an interesting alternative for canopy height estimation as it does not require the extraction of some metrics from LiDAR waveforms that are in general difficult to derive in dense forests, such as those in French Guiana. Next, canopy heights extracted from both airborne and spaceborne LiDAR were first used to map canopy heights from available mapped environmental data (geological, meteorological, slope, vegetation indices etc.). Results showed an RMSE on the canopy height estimates of 6.5 m from the GLAS dataset and of 5.8 m from the airborne LiDAR dataset. Then, in order to improve the precision of the canopy height estimates, regression-kriging (kriging of random forest regression residuals) was used. Results indicated a decrease in the RMSE from 6.5 to 4.2 m for the regression-kriging maps from the GLAS dataset, and from 5.8 to 1.8 m for the regression-kriging map from the airborne LiDAR dataset. Finally, in order to study the impact of the spatial sampling of future LiDAR missions on the precision of canopy height estimates, six subsets were derived from the airborne LiDAR dataset with flight line spacing of 5, 10, 20, 30, 40 and 50 km (corresponding to 0.29, 0.11, 0.08, 0.05, 0.04, and 0.03 points/km², respectively). Results indicated that using the regression-kriging approach, the precision on the canopy height map was 1.8 m with flight line spacing of 5 km and decreased to an RMSE of 4.8 m for the configuration for the 50 km flight line spacing
Seck, Djamal. "Arbres de décisions symboliques, outils de validations et d'aide à l'interprétation." Thesis, Paris 9, 2012. http://www.theses.fr/2012PA090067.
Full textIn this thesis, we propose the STREE methodology for the construction of decision trees with symbolic data. This data type allows us to characterize individuals of higher levels which may be classes or categories of individuals or concepts within the meaning of the Galois lattice. The values of the variables, called symbolic variables, may be sets, intervals or histograms. The criterion of recursive partitioning is a combination of a criterion related to the explanatory variables and a criterion related to the dependant variable. The first criterion is the variation of the variance of the explanatory variables. When it is applied alone, STREE acts as a top-down clustering methodology. The second criterion enables us to build a decision tree. This criteron is expressed as the variation of the Gini index if the dependant variable is nominal, and as the variation of the variance if thedependant variable is continuous or is a symbolic variable. Conventional data are a special case of symbolic data on which STREE can also get good results. It has performed well on multiple sets of UCI data compared to conventional methodologies of Data Mining such as CART, C4.5, Naive Bayes, KNN, MLP and SVM. The STREE methodology also allows for the construction of ensembles of symbolic decision trees either by bagging or by boosting. The use of such ensembles is designed to overcome shortcomings related to the decisions trees themselves and to obtain a finaldecision that is in principle more reliable than that obtained from a single tree
Tallieu, Clara. "État sanitaire et croissance radiale des arbres : Analyse spatiale et temporelle des données du réseau systématique de suivi des dommages forestiers." Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0185.
Full textFor the past 30 years, annual visual assessments of crown condition, on the French part of the transnational monitoring network (ICP Forests, level 1), was essential for monitoring forest health. However, the use of crown condition as an indicator of tree health is regularly questioned for methodological reasons but also because of the lack of knowledge on the determinism of crown condition and its functional impact on the tree. In this context, and through the records of 9 tree species’ crown condition (deciduous and coniferous) spread over more than 300 plots in France, we have 1) described and interpreted the spatial and inter-annual variations of leaf loss, in addition to 2) discussing the use of crown condition as an indicator of tree health status based on the joint analysis of inter-annual variations of leaf loss and radial growth. The analysis of spatial variations in crown condition between plots showed multiple relationships with edaphic and climatic factors, but with relatively moderate explanatory power. The study of inter-annual variations in crown condition confirmed that the climatic factors of the previous year control crown condition of the current year. However, compared to radial growth, crown condition presents a less dynamic and inconsistent response to climate between trees in the same plot. The joint analysis of the two signals showed the existence of a weak link between growth and crown condition. We only observed a decrease in tree growth in the case of important leaf loss during years of extreme climatic hazards (dry or cold). However, the introduction of leaf loss as a predictor of radial growth had little or no significant effect for beech and fir. Finally, the evidence of the major influence of age on leaf loss precludes the interpretation of raw crown condition as an indicator of tree health
Daniel-Vatonne, Marie-Christine. "Les termes : un modèle de représentation et structuration de données symboliques." Montpellier 2, 1993. http://www.theses.fr/1993MON20031.
Full textTournier, Nicolas. "Synchronisation pour l'insertion de données dans des maillages 3D." Thesis, Montpellier 2, 2014. http://www.theses.fr/2014MON20221/document.
Full textData security is one of the main issue in computer science. We need to develop solutions for confidentiality, communication, fingerprinting or identification applications for exemple. In this thesis made with STRATEGIES S.A., the chosen method to protect 3D meshes is watermarking.Watermarking is divided in two steps, the embedding and the extraction. In both of them a synchronization phase is needed. It is one of the most important step for 3D mesh because it permits to look for areas available to embed information, and order them. All the thesis is devoted to the synchronization step. First of all, we propose a classification of watermarking techniques based on the type of synchronization method instead of evaluation criterions such as robustness or capacity.Then, from methods based on Euclidean minimum spanning tree, we propose a theoritical analysis of the mobility of the vertices in that kind of structure. First, we explain the reasons of the sensibility of the structure. Secondly, we propose another scheme based on the Euclidean minimum spanning tree knowing its fragility
Albert, Isabelle. "Inférence bayesienne par les methodes de Monte Carlo par chaînes de Markov et arbres de régression pour l'analyse statistique des données corrélées." Paris 11, 1998. http://www.theses.fr/1998PA11T020.
Full textSyla, Burhan. "Relais de perte de synchronisme par exploration de données." Thesis, Université Laval, 2012. http://www.theses.ulaval.ca/2012/29102/29102.pdf.
Full textThe goal of this document is to verify the feasability of an out-of-step relay using data mining and decision trees. Using EMTP-RV and the Anderson network, 180 simulations were done while changing the place of the short circuit, the length, the type and the load-flow. For these simulations, 39 electrical measures and 8 mechanical measures were made. These simulations were then classified as stable or instable using the center of inertia of angle and speed. With MATLAB, 33 new other variables were created by using the first 39, and then with KNIME, decisions trees such as C4.5, CART, ADABoost, ADTree and random forest were simulated and the sampling time versus the performances were compared. Using Consistency Subset Eval, Symmetrical Uncert Attribute Set Eval and Correlation-based Feature Subset Selection, the features were reduced and the simulations were visualised using the validation set. Results show that with a sampling frequency of 240 [Hz] and 28 variables is enough to obtain a mean area under the curve of 0.9591 for the training and the validation set of the 4 generators.
Gardy, Danièle. "Bases de données, allocations aléatoires : quelques analyses de performances." Paris 11, 1989. http://www.theses.fr/1989PA112221.
Full textThis thesis is devoted to the analysis of some parameters of interest for estimating the performance of computer systems, most notably database systems. The unifying features are the description of the phenomena to be studied in terms of random allocations and the systematic use of methods from the average-case analysis of algorithms. We associate a generating function with each parameter of interest, which we use to derive an asymptotic expression of this parameter. The main problem studied in this work is the estimation of the sizes of derived relations in a relational database framework. We show that this is closely related to the so-called "occupancy problem" in urn models, a classical tool of discrete probability theory. We characterize the conditional distribution of the size of a relation derived from relations whose sizes are known, and give conditions which ensure the a. Symptotic normality of the limiting distribution. We next study the implementation of "logical" relations by multi-attribute or doubly chained trees, for which we give results on the complexity of a random orthogonal range query. Finally, we study some "dynamic" random allocation phenomena, such as the birthday problem, which models the occurrence of collisions in hashing, and a model of the Least Recently Used cache memory algorithm
Verma, Kumar Neeraj. "Automates d'arbres bidirectionnels modulo théories équationnelles." Cachan, Ecole normale supérieure, 2003. http://www.theses.fr/2003DENS0027.
Full textBlin, Lélia. "K-partionnement de graphes du séquentiel au distribué." Paris 8, 2001. http://www.theses.fr/2001PA081993.
Full textTusa, jumbo Eduardo Alejandro. "Apport de la fusion LiDAR - hyperspectral pour la caractérisation géométrique et radiométrique des arbres." Thesis, Université Grenoble Alpes, 2020. https://tel.archives-ouvertes.fr/tel-03212453.
Full textMountain forests provide environmental ecosystem services (EES) to communities: supplying of recreational landscapes, protection against natural hazards, supporting biodiversity conservation, among others. The preservation of these EES through space and time requires a good characterization of the resources. Especially in mountains, stands are very heterogeneous and timber harvesting is economically possible thanks to trees of higher value. This is why we want to be able to map each tree and estimate its characteristics, including quality, which is related to its shape and growth conditions. Field inventories are not able to provide a wall to wall cover of detailed tree-level information on a large scale. On the other hand, remote sensing tools seem to be a promising technology because of the time efficient and the affordable costs for studying forest areas. LiDAR data provide detailed information from the vertical distribution and location of the trees, but it is limited for mapping species. Hyperspectral data are associated to absorption features in the canopy reflectance spectrum, but is not effective for characterizing tree geometry. Hyperspectral and LiDAR systems provide independent and complementary data that are relevant for the assessment of biophysical and biochemical attributes of forest areas. This PhD thesis deals with the fusion of LiDAR and hyperspectral data to characterize individual forest trees. The leading idea is to improve methods to derive forest information at tree-level by extracting geometric and radiometric features. The contributions of this research work relies on: i) an updated review of data fusion methods of LiDAR and hyperspectral data for forest monitoring, ii) an improved 3D segmentation algorithm for delineating individual tree crowns based on Adaptive Mean Shift (AMS3D) and an ellipsoid crown shape model, iii) a criterion for feature selection based on random forests score, $5$-fold cross validation and a cumulative error function for forest tree species classification. The two main methods used to derive forest information at tree level are tested with remote sensing data acquired in the French Alps
Del, Razo Lopez Federico. "Recherche de sous-structures arborescentes ordonnées fréquentes au sein de bases de données semi-structurées." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2007. http://tel.archives-ouvertes.fr/tel-00203608.
Full textL'objectif de cette thèse est de proposer une méthode d'extraction d'arborescences fréquentes. Cette approche est basée sur une représentation compacte des arborescences cherchant à diminuer la consommation de mémoire dans le processus de fouille. En particulier, nous présentons une nouvelle technique de génération d'arborescences candidates visant à réduire leur nombre. Par ailleurs, nous proposons différents algorithmes pour valider le support des arborescences candidates dans une base de données selon divers types de contraintes d'inclusion d'arbres : induite, incrustée et floue. Finalement nous appliquons nos algorithmes à des jeux de données synthétiques et réels et nous présentons les résultats obtenus.
Jabiri, Fouad. "Applications de méthodes de classification non supervisées à la détection d'anomalies." Master's thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/67914.
Full textIn this thesis, we will first present the binary tree partitioning algorithm and isolation forests. Binary trees are very popular classifiers in supervised machine learning. The isolation forest belongs to the family of unsupervised methods. It is an ensemble of binary trees used in common to isolate outlying instances. Subsequently, we will present the approach that we have named "Exponential smoothig" (or "pooling"). This technique consists in encoding sequences of variables of different lengths into a single vector of fixed size. Indeed, the objective of this thesis is to apply the algorithm of isolation forests to identify anomalies in insurance claim forms available in the database of a large Canadian insurance company in order to detect cases of fraud. However, a form is a sequence of claims. Each claim is characterized by a set of variables and thus it will be impossible to apply the isolation forest algorithm directly to this kind of data. It is for this reason that we are going to apply Exponential smoothing. Our application effectively isolates claims and abnormal forms, and we find that the latter tend to be audited by the company more often than regular forms.
Amri, Anis. "Autour de quelques statistiques sur les arbres binaires de recherche et sur les automates déterministes." Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0301.
Full textThis Phd thesis is divided into two independent parts. In the first part, we provide an asymptotic analysis of some statistics on the binary search tree. In the second part, we study the coupon collector problem with a constraint. In the first part, following the model introduced by Aguech, Lasmar and Mahmoud [Probab. Engrg. Inform. Sci. 21 (2007) 133—141], the weighted depth of a node in a labelled rooted tree is the sum of all labels on the path connecting the node to the root. We analyze the following statistics : the weighted depths of nodes with given labels, the last inserted node, nodes ordered as visited by the depth first search procees, the weighted path length, the weighted Wiener index and the weighted depths of nodes with at most one child in a random binary search tree. In the second part, we study the asymptotic shape of the completion curve of the collection conditioned to T_n≤ (1+Λ), Λ>0, where T_n≃n lnn is the time needed to complete accessible automata, we provide a new derivation of a formula due to Korsunov [Kor78, Kor86]
Gauwin, Olivier. "Flux XML, Requêtes XPath et Automates." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2009. http://tel.archives-ouvertes.fr/tel-00421911.
Full textDans cette thèse, nous étudions des algorithmes d'évaluation de requêtes sur des flux XML. Notre objectif est de gérer efficacement la mémoire, afin de pouvoir évaluer des requêtes sur des données volumineuses, tout en utilisant peu de mémoire. Cette tâche s'avère complexe, et nécessite des restrictions importantes sur les langages de requêtes. Nous étudions donc les requêtes définies par des automates déterministes ou par des fragments du standard W3C XPath, plutôt que par des langages plus puissants comme les standards W3C XQuery et XSLT.
Nous définissons tout d'abord les Streaming Tree Automata (STAs), qui opèrent sur les arbres d'arité non bornée dans l'ordre du document. Nous prouvons qu'ils sont équivalents aux Nested Word Automata et aux Pushdown Forest Automata. Nous élaborons ensuite un algorithme d'évaluation au plus tôt, pour les requêtes définies par des STAs déterministes. Bien qu'il ne stocke que les candidats nécessaires, cet algorithme est en temps polynomial à chaque événement du flux, et pour chaque candidat. Par conséquent, nous obtenons des résultats positifs pour l'évaluation en flux des requêtes définies par des STAs déterministes. Nous mesurons une telle adéquation d'un langage de requêtes à une évaluation en flux via un nouveau modèle de machines, appelées Streaming Random Access Machines (SRAMs), et via une mesure du nombre de candidats simultanément vivants, appelé concurrence. Nous montrons également qu'il peut être décidé en temps polynomial si la concurrence d'une requête définie par un STA déterministe est bornée. Notre preuve est basée sur une réduction au problème de la valuation bornée des relations reconnaissables d'arbres.
Concernant le standard W3C XPath, nous montrons que même de petits fragments syntaxiques ne sont pas adaptés à une évaluation en flux, sauf si P=NP. Les difficultés proviennent du non-déterminisme de ce langage, ainsi que du nombre de conjonctions et de disjonctions. Nous définissons des fragments de Forward XPath qui évitent ces problèmes, et prouvons, par compilation vers les STAs déterministes en temps polynomial, qu'ils sont adaptés à une évaluation en flux.
Hawarah, Lamis. "Une approche probabiliste pour le classement d'objets incomplètement connus dans un arbre de décision." Phd thesis, Université Joseph Fourier (Grenoble), 2008. http://tel.archives-ouvertes.fr/tel-00335313.
Full textNous expliquons notre méthode et nous la testons sur des bases de données réelles. Nous comparons nos résultats avec ceux donnés par la méthode C4.5 et AAO.
Nous proposons également un algorithme basé sur la méthode des k plus proches voisins qui calcule pour chaque objet de la base de test sa fréquence dans la base d'apprentissage. Nous comparons ces fréquences avec les résultats de classement données par notre approche, C4.5 et AAO. Finalement, nous calculons la complexité de construction des arbres d'attributs ainsi que la complexité de classement d'un objet incomplet en utilisant notre approche, C4.5 et AAO.
Nguyen, Kim. "Langage de combinateurs pour XML : conception, typage, implantation." Paris 11, 2008. http://www.theses.fr/2008PA112071.
Full textThis thesis details the theoretical and practical study of a language of combinators for XML. XML documents, which are a de facto standard used to represent heterogeneous data in a structured and generic way so that they can be easily shared by many programs, are usually manipulated by all-purpose languages (JAVA, C,. . . ). Alongside these languages, one finds specialised languages, designs specifically to deal with XML documents (retrieving information from a document, transforming from a document format to another. . . ). We focus on statically typed languages. It is indeed possible to specify the ''shape'' of a document (sets of tags, order,. . . ) by the mean of a schema. Statically typed languages perform a static analysis of the source code of the program to ensure that every operation is valid with respect to the schema of a processed document. The analysis is said to be static because it only relies on the source code of the program, not on any runtime information or document sample. This thesis presents the theoretical foundations of a language for manipulating XML documents, in a statically typed way. It also features a practical study as well as an implementation of the formal language. Lastly, it presents many use case of type based optimisation in the context of XML processing (transformation, loading of a document in memory. . . )
Tchougong, Ngongang Rodrigue. "Grammaires attribuées comme transducteurs d'arbres et leur composition descriptionnelle." Rennes 1, 2012. http://www.theses.fr/2012REN1S006.
Full textLes grammaires attribuées introduites à l'origine par Knuth pour décrire les sémantiques dirigées parla syntaxe ont été présentées de façon modulaire par Ganzinger et Giegerich sous la forme de grammaires couplées par attributs. La composition de ces grammaires, appelée composition descriptionnelle, s'apparente aux techniques d'optimisation des programmes fonctionnels, telles que la déforestation qui consiste à éliminer des structures de données intermédiaires lors de la composition des fonctions. Dans ce travail, nous présentons une approche fonctionnelle d'ordre supérieure pour l'évaluation des attributs basée sur les dépendances locales entre les attributs synthétisés et les attributs hérités. Cette traduction, de nature non syntaxique et à ce titre non compatible avec les techniques de déforestation, procure néanmoins une implémentation directe des grammaires attribuées dans un langage fonctionnel paresseux d'ordre supérieur. Nous présentons alternativement une traduction fonctionnelle du premier ordre dans laquelle l'arbre d'entrée et son contexte sont représentés simultanéement par un arbre sur une signature étendue. Nous montrons que la composition descriptionnelle des grammaires attribuées se ramène, par cette traduction, en une simple composition de transducteurs d'arbres
Mondal, Kartick Chandra. "Algorithmes pour la fouille de données et la bio-informatique." Thesis, Nice, 2013. http://www.theses.fr/2013NICE4049.
Full textKnowledge pattern extraction is one of the major topics in the data mining and background knowledge integration domains. Out of several data mining techniques, association rule mining and bi-clustering are two major complementary tasks for these topics. These tasks gained much importance in many domains in recent years. However, no approach was proposed to perform them in one process. This poses the problems of resources required (memory, execution times and data accesses) to perform independent extractions and of the unification of the different results. We propose an original approach for extracting different categories of knowledge patterns while using minimum resources. This approach is based on the frequent closed patterns theoretical framework and uses a novel suffix-tree based data structure to extract conceptual minimal representations of association rules, bi-clusters and classification rules. These patterns extend the classical frameworks of association and classification rules, and bi-clusters as data objects supporting each pattern and hierarchical relationships between patterns are also extracted. This approach was applied to the analysis of HIV-1 and human protein-protein interaction data. Analyzing such inter-species protein interactions is a recent major challenge in computational biology. Databases integrating heterogeneous interaction information and biological background knowledge on proteins have been constructed. Experimental results show that the proposed approach can efficiently process these databases and that extracted conceptual patterns can help the understanding and analysis of the nature of relationships between interacting proteins
Samuelides, Mathias. "Automates d'arbres à jetons." Phd thesis, Université Paris-Diderot - Paris VII, 2007. http://tel.archives-ouvertes.fr/tel-00255024.
Full textUne première contribution a été de prouver que les variantes déterministes des deux modèles d'automates d'arbres à jetons sont fermées par complément. Nous donnons alors une nouvelle présentation de la preuve de la caractérisation du modèle fort des automates d'arbres à jetons qui a été établie par Engelfriet et Hoogeboom.
Une autre contribution a été de montrer que les deux modèles d'automates à jetons sont équivalents, que le pouvoir d'expression des automates d'arbres à jetons augmente avec le nombre de jetons et qu'il n'est pas toujours possible de déterminiser un automate d'arbres cheminant même si on s'autorise à ajouter un nombre fixé de jetons.
Une dernière contribution a été de prouver que les problèmes du vide et de l'inclusion sont n-EXPTIME complets pour les classes d'automates à n jetons avec n supérieur à 1.
Fournier, Jonathan. "Exploitation de données tridimensionnelles pour la cartographie et l'exploration autonome d'environnements urbains." Thesis, Université Laval, 2007. http://www.theses.ulaval.ca/2007/24421/24421.pdf.
Full textMorin, Anne. "Arbres pour donnees multinomiales." Rennes 1, 1989. http://www.theses.fr/1989REN10048.
Full textSaita, Cristian-Augustin. "Groupements d'objets multidimensionnels étendus avec un modèle de coût adaptatif aux requêtes." Versailles-St Quentin en Yvelines, 2006. http://www.theses.fr/2006VERS0007.
Full textNous proposons une méthode de groupement en clusters d'objets multidimensionnels étendus, basée sur un modèle de cout adaptatif aux requêtes, pour accélérer l'exécution des requêtes spatiales de type intervalle (e. G. , intersection, inclusion). Notre travail a été motivé par l'émergence de nombreuses applications de dissémination sélective d'informations posant de nouveaux défis au domaine de l'indexation multidimensionnelle. Dans ce contexte, les approches d'indexation existantes (e. G. , R-trees) ne sont pas adaptées aux besoins applicatifs tels que scalabilité (beaucoup d'objets avec des dimensions élevées et des extensions spatiales), performance de recherche (taux élevés de requêtes), performance de mise à jour (insertions et suppressions fréquentes d'objets) et adaptabilité (à la distribution des objets et des requêtes, et aux paramètres systèmes). Dans notre méthode, nous relâchons plusieurs propriétés spécifiques aux structures d'indexation arborescentes classiques (i. E. équilibrage de l'arbre et du partitionnement, englobement minimal des objets) en faveur d'une stratégie de groupement basée sur un modèle de coût adaptatif. Ce modèle de coût tient compte des caractéristiques de la plateforme d'exécution, de la distribution spatiale des objets et surtout de la distribution spatiale des requêtes. Plus précisément, la distribution des requêtes permet de déterminer les dimensions les plus sélectives et discriminantes à utiliser dans le regroupement des objets. Nous avons validé notre approche par des études expérimentales de performance impliquant de grandes collections d'objets et des requêtes d'intervalles avec des distributions uniformes et non-uniformes
Qureshi, Taimur. "Contributions to decision tree based learning." Thesis, Lyon 2, 2010. http://www.theses.fr/2010LYO20051/document.
Full textLa recherche avancée dans les méthodes d'acquisition de données ainsi que les méthodes de stockage et les technologies d'apprentissage, s'attaquent défi d'automatiser de manière systématique les techniques d'apprentissage de données en vue d'extraire des connaissances valides et utilisables.La procédure de découverte de connaissances s'effectue selon les étapes suivants: la sélection des données, la préparation de ces données, leurs transformation, le fouille de données et finalement l'interprétation et validation des résultats trouvés. Dans ce travail de thèse, nous avons développé des techniques qui contribuent à la préparation et la transformation des données ainsi qu'a des méthodes de fouille des données pour extraire les connaissances. A travers ces travaux, on a essayé d'améliorer l'exactitude de la prédiction durant tout le processus d'apprentissage. Les travaux de cette thèse se basent sur les arbres de décision. On a alors introduit plusieurs approches de prétraitement et des techniques de transformation; comme le discrétisation, le partitionnement flou et la réduction des dimensions afin d'améliorer les performances des arbres de décision. Cependant, ces techniques peuvent être utilisées dans d'autres méthodes d'apprentissage comme la discrétisation qui peut être utilisées pour la classification bayesienne.Dans le processus de fouille de données, la phase de préparation de données occupe généralement 80 percent du temps. En autre, elle est critique pour la qualité de la modélisation. La discrétisation des attributs continus demeure ainsi un problème très important qui affecte la précision, la complexité, la variance et la compréhension des modèles d'induction. Dans cette thèse, nous avons proposes et développé des techniques qui ce basent sur le ré-échantillonnage. Nous avons également étudié d'autres alternatives comme le partitionnement flou pour une induction floue des arbres de décision. Ainsi la logique floue est incorporée dans le processus d'induction pour augmenter la précision des modèles et réduire la variance, en maintenant l'interprétabilité.Finalement, nous adoptons un schéma d'apprentissage topologique qui vise à effectuer une réduction de dimensions non-linéaire. Nous modifions une technique d'apprentissage à base de variété topologiques `manifolds' pour savoir si on peut augmenter la précision et l'interprétabilité de la classification
Ehrhardt, Adrien. "Formalisation et étude de problématiques de scoring en risque de crédit : inférence de rejet, discrétisation de variables et interactions, arbres de régression logistique." Thesis, Lille 1, 2019. http://www.theses.fr/2019LIL1I051/document.
Full textThis manuscript deals with model-based statistical learning in the binary classification setting. As an application, credit scoring is widely examined with a special attention on its specificities. Proposed and existing approaches are illustrated on real data from Crédit Agricole Consumer Finance, a financial institute specialized in consumer loans which financed this PhD through a CIFRE funding. First, we consider the so-called reject inference problem, which aims at taking advantage of the information collected on rejected credit applicants for which no repayment performance can be observed (i.e. unlabelled observations). This industrial problem led to a research one by reinterpreting unlabelled observations as an information loss that can be compensated by modelling missing data. This interpretation sheds light on existing reject inference methods and allows to conclude that none of them should be recommended since they lack proper modelling assumptions that make them suitable for classical statistical model selection tools. Next, yet another industrial problem, corresponding to the discretization of continuous features or grouping of levels of categorical features before any modelling step, was tackled. This is motivated by practical (interpretability) and theoretical reasons (predictive power). To perform these quantizations, ad hoc heuristics are often used, which are empirical and time-consuming for practitioners. They are seen here as a latent variable problem, setting us back to a model selection problem. The high combinatorics of this model space necessitated a new cost-effective and automatic exploration strategy which involves either a particular neural network architecture or Stochastic-EM algorithm and gives precise statistical guarantees. Third, as an extension to the preceding problem, interactions of covariates may be introduced in the problem in order to improve the predictive performance. This task, up to now again manually processed by practitioners and highly combinatorial, presents an accrued risk of misselecting a “good” model. It is performed here with a Metropolis Hastings sampling procedure which finds the best interactions in an automatic fashion while ensuring its standard convergence properties, thus good predictive performance is guaranteed. Finally, contrary to the preceding problems which tackled a particular scorecard, we look at the scoring system as a whole. It generally consists of a tree-like structure composed of many scorecards (each relative to a particular population segment), which is often not optimized but rather imposed by the company’s culture and / or history. Again, ad hoc industrial procedures are used, which lead to suboptimal performance. We propose some lines of approach to optimize this logistic regression tree which result in good empirical performance and new research directions illustrating the predictive strength and interpretability of a mix of parametric and non-parametric models. This manuscript is concluded by a discussion on potential scientific obstacles, among which the high dimensionality (in the number of features). The financial industry is indeed investing massively in unstructured data storage, which remains to this day largely unused for Credit Scoring applications. Doing so will need statistical guarantees to achieve the additional predictive performance that was hoped for
Michel, Pierre. "Sélection d'items en classification non supervisée et questionnaires informatisés adaptatifs : applications à des données de qualité de vie liée à la santé." Thesis, Aix-Marseille, 2016. http://www.theses.fr/2016AIXM4097/document.
Full textAn adaptive test provides a valid measure of quality of life of patients and reduces the number of items to be filled. This approach is dependent on the models used, sometimes based on unverifiable assumptions. We propose an alternative approach based on decision trees. This approach is not based on any assumptions and requires less calculation time for item administration. We present different simulations that demonstrate the relevance of our approach.We present an unsupervised classification method called CUBT. CUBT includes three steps to obtain an optimal partition of a data set. The first step grows a tree by recursively dividing the data set. The second step groups together the pairs of terminal nodes of the tree. The third step aggregates terminal nodes that do not come from the same split. Different simulations are presented to compare CUBT with other approaches. We also define heuristics for the choice of CUBT parameters.CUBT identifies the variables that are active in the construction of the tree. However, although some variables may be irrelevant, they may be competitive for the active variables. It is essential to rank the variables according to an importance score to determine their relevance in a given model. We present a method to measure the importance of variables based on CUBT and competitive binary splis to define a score of variable importance. We analyze the efficiency and stability of this new index, comparing it with other methods
Candillier, Christophe. "Méthodes d'Extraction de Connaissances à partir de Données (ECD) appliquées aux Systèmes d'Information Géographiques (SIG)." Phd thesis, Université de Nantes, 2006. http://tel.archives-ouvertes.fr/tel-00101491.
Full textValero, Mathieu. "Enhancing performance and reliability of tree based P2P overlays." Paris 6, 2011. http://www.theses.fr/2011PA066600.
Full textGroz, Benoît. "XML security views : queries, updates and schemas." Thesis, Lille 1, 2012. http://www.theses.fr/2012LIL10143/document.
Full textThe evolution of web technologies and social trends fostered a shift from traditional enterprise databases to web services and online data. While making data more readily available to users, this evolution also raises additional security concerns regarding the privacy of users and more generally the disclosure of sensitive information. The implementation of appropriate access control models is one of the approaches to mitigate the threat. We investigate an access control model based on (non-materialized) XML views, as presented among others by Fan et al. The simplicity of such views, and in particular the absence of arithmetic features and restructuring, facilitates their modelization with tree alignments. Our objective is therefore to investigate how to manipulate efficiently such views, using formal methods, and especially query rewriting and tree automata. Our research follows essentially three directions: we first develop new algorithms to assess the expressivity of views, in terms of determinacy, query rewriting and certain answers. We show that those problems, although undecidable in our most general setting, can be decided under reasonable restrictions. Then we address the problem of handling updates in the security view framework. And last, we investigate the classical issues raised by schemata, focusing on the specific "determinism'' requirements of DTDs and XML Schemata. In particular, we survey some techniques to approximate the set of all possible view documents with a DTD, and we provide new algorithms to check if the content models of a DTD are deterministic
Chen, Xiao. "Contrôle et optimisation de la perception humaine sur les vêtements virtuels par évaluation sensorielle et apprentissage de données expérimentales." Thesis, Lille 1, 2015. http://www.theses.fr/2015LIL10019/document.
Full textUnder the exacerbated worldwide competition, the mass customization or personalization of products is now becoming an important strategy for companies to enhance the perceived value of their products. However, the current online customization experiences are not fully satisfying for consumers because the choices are mostly limited to colors and motifs. The sensory fields of products, particularly the material’s appearance and hand as well as the garment fit are barely concerned.In my PhD research project, we have proposed a new collaborative design platform. It permits merchants, designers and consumers to have a new experience during the development of highly valued personalized garments without extra industrial costs. The construction of this platform consists of several parts. At first, we have selected, through a sensory experiment, an appropriate 3D garment CAD software in terms of rending quality. Then we have proposed an active leaning-based experimental design in order to find the most appropriate values of the fabric technical parameters permitting to minimize the overall perceptual difference between real and virtual fabrics in static and dynamic scenarios. Afterwards, we have quantitatively characterized the human perception on virtual garment by using a number of normalized sensory descriptors. These descriptors involve not only the appearance and the hand of the fabric but also the garment fit. The corresponding sensory data have been collected through two sensory experiments respectively. By learning from the experimental data, two models have been established. The first model permits to characterize the relationship between the appearance and hand perception of virtual fabrics and corresponding technical parameters that constitute the inputs of the 3D garment CAD software. The second model concerns the relationship between virtual garment fit perception and the pattern design parameters. These two models constitute the main components of the collaborative design platform. Using this platform, we have realized a number of garments meeting consumer’s personalized perceptual requirements
ARMAND, Stéphane. "Analyse Quantifiée de la Marche : extraction de connaissances à partir de données pour l'aide à l'interprétation clinique de la marche digitigrade." Phd thesis, Université de Valenciennes et du Hainaut-Cambresis, 2005. http://tel.archives-ouvertes.fr/tel-00010618.
Full textGaudel, Romaric. "Paramètres d'ordre et sélection de modèles en apprentissage : caractérisation des modèles et sélection d'attributs." Phd thesis, Université Paris Sud - Paris XI, 2010. http://tel.archives-ouvertes.fr/tel-00549090.
Full textCostermans, Christian. "Calcul symbolique non commutatif : analyse des constantes d'arbre de fouille." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2008. http://tel.archives-ouvertes.fr/tel-00338482.
Full textNos travaux visant à appliquer des méthodes symboliques pour l'étude de ces variables aléatoires, nous remplaçons l'utilisation de multi-indices par des codages sur des alphabets distincts, et nous appuyons alors sur des résultats importants en combinatoire des mots pour les appliquer à nos suites de SHM, et aux fonctions polylogarithmes, qui sont des variantes des génératrices ordinaires des SHM. Dans les cas convergents, les deux objets convergent (respectivement lorsque z tend vers 1 et lorsque N tend vers l'infini) vers la même limite, appelée polyzêta. Pour les cas divergents, l'utilisation de séries génératrices non commutatives nous permet d'établir un théorème ``à l'Abel'', faisant apparaître une limite commune. Ce théorème permet de donner une forme explicite aux constantes d'Euler généralisées associées à des SHM divergentes et ainsi d'obtenir un algorithme très efficace pour calculer leur développement asymptotique.
Finalement, nous proposons des applications des sommes harmoniques dans le domaine des structures de données multidimensionnelles, pour lesquelles notre approche donne naissance à des calculs exacts, qui peuvent par la suite être aisément évalués asymptotiquement.
Huynh, Lê Duy. "Taking into account inclusion and adjacency information in morphological hierarchical representations, with application to the extraction of text in natural images and videos." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS341.
Full textThe inclusion and adjacency relationship between image regions usually carry contextual information. The later is widely used since it tells how regions are arranged in images. The former is usually not taken into account although it parallels the object-background relationship. The mathematical morphology framework provides several hierarchical image representations. They include the Tree of Shapes (ToS), which encodes the inclusion of level-line, and the hierarchies of segmentation (e.g., alpha-tree, BPT), which is useful in the analysis of the adjacency relationship. In this work, we take advantage of both inclusion and adjacency information in these representations for computer vision applications. We introduce the spatial alignment graph w.r.t inclusion that is constructed by adding a new adjacency relationship to nodes of the ToS. In a simple ToS such as our Tree of Shapes of Laplacian sign, which encodes the inclusion of Morphological Laplacian 0-crossings, the graph is reduced to a disconnected graph where each connected component is a semantic group. In other cases, e.g., classic ToS, the spatial alignment graph is more complex. To address this issue, we expand the shape-spaces morphology. Our expansion has two primary results: 1)It allows the manipulation of any graph of shapes. 2)It allows any tree filtering strategy proposed by the connected operators frameworks. With this expansion, the spatial graph could be analyzed with the help of an alpha-tree. We demonstrated the application aspect of our method in the application of text detection. The experiment results show the efficiency and effectiveness of our methods, which is appealing to mobile applications
Atighehchi, Kevin. "Contributions à l'efficacité des mécanismes cryptographiques." Thesis, Aix-Marseille, 2015. http://www.theses.fr/2015AIXM4037.
Full textThe need for continuing innovation in terms of performances and resource savings impel us to optimize the design and the use of cryptographic mechanisms. This leads us to consider several aspects in this dissertation: parallel cryptographic algorithms, incremental cryptographic algorithms and authenticated dictionaries.In the context of parallel cryptography we are interested in hash functions. In particular, we show which tree structures to use to reach an optimal running time. For this running time, we show how to decrease the amount of involved processors. We also explore alternative (sub-optimal) tree structures which decrease the number of synchronizations in multithreaded implementations while balancing at best the load of the work among the threads.Incremental cryptographic schemes allow the efficient updating of cryptographic forms when we change some blocks of the corresponding documents. We show that the existing incremental schemes restrict too much the possible modification operations. We then introduce new algorithms which use these ones as black boxes to allow a broad range of modification operations, while preserving a privacy property about these operations.We then turn our attention to authenticated dictionaries which are used to authenticate answers to queries on a dictionary, by providing to users an authentication proof for each answer. We focus on authenticated dictionaries based on hash trees and we propose a solution to remedy their main shortcoming, the size of proofs provided to users
Laurence, Grégoire. "Normalisation et Apprentissage de Transductions d'Arbres en Mots." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2014. http://tel.archives-ouvertes.fr/tel-01053084.
Full text