Literatura académica sobre el tema "Fouille de Données Textuelles Hétérogènes"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Fouille de Données Textuelles Hétérogènes".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Artículos de revistas sobre el tema "Fouille de Données Textuelles Hétérogènes"
Ganachaud, Clément, Ludovic Seifert y David Adé. "L’importation de méthodes non-supervisées en fouille de données dans le programme de recherche empirique et technologique du cours d’action : Apports et réflexions critiques". Staps N° 141, n.º 3 (17 de enero de 2024): 97–108. http://dx.doi.org/10.3917/sta.141.0097.
Texto completoReboul, Marianne y Alexandre Gefen. "Mesures et savoirs : Quelles méthodes pour l’histoire culturelle à l’heure du big data ?" Semiotica 2019, n.º 230 (25 de octubre de 2019): 97–120. http://dx.doi.org/10.1515/sem-2018-0103.
Texto completoDunoyer, C., M. Morell y D. Pellecuer. "Évaluation de l’intérêt de l’utilisation d’outils de fouille de données textuelles pour le codage du PMSI". Revue d'Épidémiologie et de Santé Publique 66 (marzo de 2018): S26—S27. http://dx.doi.org/10.1016/j.respe.2018.01.056.
Texto completoLonghi, Julien. "Types de discours, formes textuelles et normes sémantiques : expression et doxa dans un corpus de données hétérogènes". Langages 187, n.º 3 (2012): 41. http://dx.doi.org/10.3917/lang.187.0041.
Texto completoMélou, Cécile y François Melou. "Faites de votre vocation un métier : engagez-vous ! Quand les convictions rencontrent la responsabilité". Sociographe N° 88, n.º 4 (29 de noviembre de 2024): XIX—XXXIII. https://doi.org/10.3917/graph1.088.xix.
Texto completoForest, Dominic y Michael E. Sinatra. "Lire à l’ère du numérique Le nénuphar et l’araignée de Claire Legendre". Sens public, 22 de diciembre de 2016. http://dx.doi.org/10.7202/1044409ar.
Texto completoGauld, C. y J. A. Micoulaud-Franchi. "Analyse en réseau par fouille de données textuelles systématique du concept de psychiatrie personnalisée et de précision". L'Encéphale, noviembre de 2020. http://dx.doi.org/10.1016/j.encep.2020.08.008.
Texto completoTesis sobre el tema "Fouille de Données Textuelles Hétérogènes"
Alencar, Medeiros Gabriel Henrique. "ΡreDiViD Τοwards the Ρredictiοn οf the Disseminatiοn οf Viral Disease cοntagiοn in a pandemic setting". Electronic Thesis or Diss., Normandie, 2025. http://www.theses.fr/2025NORMR005.
Texto completoEvent-Based Surveillance (EBS) systems are essential for detecting and tracking emerging health phenomena such as epidemics and public health crises. However, they face limitations, including strong dependence on human expertise, challenges processing heterogeneous textual data, and insufficient consideration of spatiotemporal dynamics. To overcome these issues, we propose a hybrid approach combining knowledge-driven and data-driven methodologies, anchored in the Propagation Phenomena Ontology (PropaPhen) and the Description-Detection-Prediction Framework (DDPF), to enhance the description, detection, and prediction of propagation phenomena. PropaPhen is a FAIR ontology designed to model the spatiotemporal spread of phenomena. It has been specialized in the biomedical domain through the integration of UMLS and World-KG, leading to the creation of the BioPropaPhenKG knowledge graph. The DDPF framework consists of three modules: description, which generates domain-specific ontologies; detection, which applies relation extraction techniques to heterogeneous textual sources; and prediction, which uses advanced clustering methods. Tested on COVID-19 and Monkeypox datasets and validated against WHO data, DDPF demonstrated its effectiveness in detecting and predicting spatiotemporal clusters. Its modular architecture ensures scalability and adaptability to various domains, opening perspectives in public health, environmental monitoring, and social phenomena
Tisserant, Guillaume. "Généralisation de données textuelles adaptée à la classification automatique". Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS231/document.
Texto completoWe have work for a long time on the classification of text. Early on, many documents of different types were grouped in order to centralize knowledge. Classification and indexing systems were then created. They make it easy to find documents based on readers' needs. With the increasing number of documents and the appearance of computers and the internet, the implementation of text classification systems becomes a critical issue. However, textual data, complex and rich nature, are difficult to treat automatically. In this context, this thesis proposes an original methodology to organize and facilitate the access to textual information. Our automatic classification approache and our semantic information extraction enable us to find quickly a relevant information.Specifically, this manuscript presents new forms of text representation facilitating their processing for automatic classification. A partial generalization of textual data (GenDesc approach) based on statistical and morphosyntactic criteria is proposed. Moreover, this thesis focuses on the phrases construction and on the use of semantic information to improve the representation of documents. We will demonstrate through numerous experiments the relevance and genericity of our proposals improved they improve classification results.Finally, as social networks are in strong development, a method of automatic generation of semantic Hashtags is proposed. Our approach is based on statistical measures, semantic resources and the use of syntactic information. The generated Hashtags can then be exploited for information retrieval tasks from large volumes of data
Azé, Jérôme. "Extraction de Connaissances à partir de Données Numériques et Textuelles". Phd thesis, Université Paris Sud - Paris XI, 2003. http://tel.archives-ouvertes.fr/tel-00011196.
Texto completoL'analyse de telles données est souvent contrainte par la définition d'un support minimal utilisé pour filtrer les connaissances non intéressantes.
Les experts des données ont souvent des difficultés pour déterminer ce support.
Nous avons proposé une méthode permettant de ne pas fixer un support minimal et fondée sur l'utilisation de mesures de qualité.
Nous nous sommes focalisés sur l'extraction de connaissances de la forme "règles d'association".
Ces règles doivent vérifier un ou plusieurs critères de qualité pour être considérées comme intéressantes et proposées à l'expert.
Nous avons proposé deux mesures de qualité combinant différents critères et permettant d'extraire des règles intéressantes.
Nous avons ainsi pu proposer un algorithme permettant d'extraire ces règles sans utiliser la contrainte du support minimal.
Le comportement de notre algorithme a été étudié en présence de données bruitées et nous avons pu mettre en évidence la difficulté d'extraire automatiquement des connaissances fiables à partir de données bruitées.
Une des solutions que nous avons proposée consiste à évaluer la résistance au bruit de chaque règle et d'en informer l'expert lors de l'analyse et de la validation des connaissances obtenues.
Enfin, une étude sur des données réelles a été effectuée dans le cadre d'un processus de fouille de textes.
Les connaissances recherchées dans ces textes sont des règles d'association entre des concepts définis par l'expert et propres au domaine étudié.
Nous avons proposé un outil permettant d'extraire les connaissances et d'assister l'expert lors de la validation de celles-ci.
Les différents résultats obtenus montrent qu'il est possible d'obtenir des connaissances intéressantes à partir de données textuelles en minimisant la sollicitation de l'expert dans la phase d'extraction des règles d'association.
Fize, Jacques. "Mise en correspondance de données textuelles hétérogènes fondée sur la dimension spatiale". Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS099.
Texto completoWith the rise of Big Data, the processing of Volume, Velocity (growth and evolution) and data Variety concentrates the efforts of communities to exploit these new resources. These new resources have become so important that they are considered the new "black gold". In recent years, volume and velocity have been aspects of the data that are controlled, unlike variety, which remains a major challenge. This thesis presents two contributions in the field of heterogeneous data matching, with a focus on the spatial dimension.The first contribution is based on a two-step process for matching heterogeneous textual data: georepresentation and geomatching. In the first phase, we propose to represent the spatial dimension of each document in a corpus through a dedicated structure, the Spatial Textual Representation (STR). This graph representation is composed of the spatial entities identified in the document, as well as the spatial relationships they maintain. To identify the spatial entities of a document and their spatial relationships, we propose a dedicated resource, called Geodict. The second phase, geomatching, computes the similarity between the generated representations (STR). Based on the nature of the STR structure (i.e. graph), different algorithms of graph matching were studied. To assess the relevance of a match, we propose a set of 6 criteria based on a definition of the spatial similarity between two documents.The second contribution is based on the thematic dimension of textual data and its participation in the spatial matching process. We propose to identify the themes that appear in the same contextual window as certain spatial entities. The objective is to induce some of the implicit spatial similarities between the documents. To do this, we propose to extend the structure of STR using two concepts: the thematic entity and the thematic relationship. The thematic entity represents a concept specific to a particular field (agronomic, medical) and represented according to different spellings present in a terminology resource, in this case a vocabulary. A thematic relationship links a spatial entity to a thematic entity if they appear in the same window. The selected vocabularies and the new form of STR integrating the thematic dimension are evaluated according to their coverage on the studied corpora, as well as their contributions to the heterogeneous textual matching process on the spatial dimension
Holat, Pierre. "Fouille de motifs et modélisation statistique pour l'extraction de connaissances textuelles". Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCD045.
Texto completoIn natural language processing, two main approaches are used : machine learning and data mining. In this context, cross-referencing data mining methods based on patterns and statistical machine learning methods is apromising but hardly explored avenue. In this thesis, we present three major contributions: the introduction of delta-free patterns, used as statistical model features; the introduction of a semantic similarity constraint for the mining, calculated using a statistical model; and the introduction of sequential labeling rules, created from the patterns and selected by a statistical model
Séguéla, Julie. "Fouille de données textuelles et systèmes de recommandation appliqués aux offres d'emploi diffusées sur le web". Thesis, Paris, CNAM, 2012. http://www.theses.fr/2012CNAM0801/document.
Texto completoLast years, e-recruitment expansion has led to the multiplication of web channels dedicated to job postings. In an economic context where cost control is fundamental, assessment and comparison of recruitment channel performances have become necessary. The purpose of this work is to develop a decision-making tool intended to guide recruiters while they are posting a job on the Internet. This tool provides to recruiters the expected performance on job boards for a given job offer. First, we identify the potential predictors of a recruiting campaign performance. Then, we apply text mining techniques to the job offer texts in order to structure postings and to extract information relevant to improve their description in a predictive model. The job offer performance predictive algorithm is based on a hybrid recommender system, suitable to the cold-start problem. The hybrid system, based on a supervised similarity measure, outperforms standard multivariate models. Our experiments are led on a real dataset, coming from a job posting database
Séguéla, Julie. "Fouille de données textuelles et systèmes de recommandation appliqués aux offres d'emploi diffusées sur le web". Electronic Thesis or Diss., Paris, CNAM, 2012. http://www.theses.fr/2012CNAM0801.
Texto completoLast years, e-recruitment expansion has led to the multiplication of web channels dedicated to job postings. In an economic context where cost control is fundamental, assessment and comparison of recruitment channel performances have become necessary. The purpose of this work is to develop a decision-making tool intended to guide recruiters while they are posting a job on the Internet. This tool provides to recruiters the expected performance on job boards for a given job offer. First, we identify the potential predictors of a recruiting campaign performance. Then, we apply text mining techniques to the job offer texts in order to structure postings and to extract information relevant to improve their description in a predictive model. The job offer performance predictive algorithm is based on a hybrid recommender system, suitable to the cold-start problem. The hybrid system, based on a supervised similarity measure, outperforms standard multivariate models. Our experiments are led on a real dataset, coming from a job posting database
Zenasni, Sarah. "Extraction d'information spatiale à partir de données textuelles non-standards". Thesis, Montpellier, 2018. http://www.theses.fr/2018MONTS076/document.
Texto completoThe extraction of spatial information from textual data has become an important research topic in the field of Natural Language Processing (NLP). It meets a crucial need in the information society, in particular, to improve the efficiency of Information Retrieval (IR) systems for different applications (tourism, spatial planning, opinion analysis, etc.). Such systems require a detailed analysis of the spatial information contained in the available textual data (web pages, e-mails, tweets, SMS, etc.). However, the multitude and the variety of these data, as well as the regular emergence of new forms of writing, make difficult the automatic extraction of information from such corpora.To meet these challenges, we propose, in this thesis, new text mining approaches allowing the automatic identification of variants of spatial entities and relations from textual data of the mediated communication. These approaches are based on three main contributions that provide intelligent navigation methods. Our first contribution focuses on the problem of recognition and identification of spatial entities from short messages corpora (SMS, tweets) characterized by weakly standardized modes of writing. The second contribution is dedicated to the identification of new forms/variants of spatial relations from these specific corpora. Finally, the third contribution concerns the identification of the semantic relations associated withthe textual spatial information
Pantin, Jérémie. "Détection et caractérisation sémantique de données textuelles aberrantes". Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS347.pdf.
Texto completoMachine learning answers to the problem of handling dedicated tasks with a wide variety of data. Such algorithms can be either simple or difficult to handle depending of the data. Low dimensional data (2-dimension or 3-dimension) with an intuitive representation (average of baguette price by years) are easier to interpret/explain for a human than data with thousands of dimensions. For low dimensional data, the error leads to a significant shift against normal data, but for the case of high dimensional data it is different. Outlier detection (or anomaly detection, or novelty detection) is the study of outlying observations for detecting what is normal and abnormal. Methods that perform such task are algorithms, methods or models that are based on data distributions. Different families of approaches can be found in the literature of outlier detection, and they are mainly independent of ground truth. They perform outlier analysis by detecting the principal behaviors of majority of observations. Thus, data that differ from normal distribution are considered noise or outlier. We detail the application of outlier detection with text. Despite recent progress in natural language processing, computer still lack profound understanding of human language in absence of information. For instance, the sentence "A smile is a curve that set everything straight" has several levels of understanding and a machine can encounter hardship to chose the right level of lecture. This thesis presents the analysis of high-dimensional outliers, applied to text. Recent advances in anomaly detection and outlier detection are not significantly represented with text data and we propose to highlight the main differences with high-dimensional outliers. We also approach ensemble methods that are nearly nonexistent in the literature for our context. Finally, an application of outlier detection for elevate results on abstractive summarization is conducted. We propose GenTO, a method that prepares and generates split of data in which anomalies and outliers are inserted. Based on this method, evaluation and benchmark of outlier detection approaches is proposed with documents. The proposed taxonomy allow to identify difficult and hierarchised outliers that the literature tackles without knowing. Also, learning without supervision often leads models to rely in some hyperparameter. For instance, Local Outlier Factor relies to the k-nearest neighbors for computing the local density. Thus, choosing the right value for k is crucial. In this regard, we explore the influence of such parameter for text data. While choosing one model can leads to obvious bias against real-world data, ensemble methods allow to mitigate such problem. They are particularly efficient with outlier analysis. Indeed, the selection of several values for one hyperparameter can help to detect strong outliers.Importance is then tackled and can help a human to understand the output of black box model. Thus, the interpretability of outlier detection models is questioned. We find that for numerous dataset, a low number of features can be selected as oracle. The association of complete models and restrained models helps to mitigate the black-box effect of some approaches. In some cases, outlier detection refers to noise removal or anomaly detection. Some applications can benefit from the characteristic of such task. Mail spam detection and fake news detection are one example, but we propose to use outlier detection approaches for weak signal exploration in marketing project. Thus, we find that the model of the literature help to improve unsupervised abstractive summarization, and also to find weak signals in text
Hussain, Syed Fawad. "Une nouvelle mesure de co-similarité : applications aux données textuelles et génomique". Phd thesis, Grenoble, 2010. http://www.theses.fr/2010GRENM049.
Texto completoClustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into homogeneous and contrasted groups (clusters As datasets become larger and more varied, adaptations to existing algorithms are required to maintain the quality of cluster. Ln this regard, high¬dimensional data poses sorne problems for traditional clustering algorithms known as the curse of dimensionality. This thesis proposes a co-similarity based algorithm that is based on the concept of higher-order co-occurrences, which are extracted from the given data. Ln the case of text analysis, for example, document similarity is calculated based on word similarity, which in turn is calculated on the basis of document similarity. Using this iterative approach, we can bring similar documents closer together even if they do not share the same words but share similar words. This approach doesn't need externallinguistic resources like a thesaurus Furthermore this approach can also be extended to incorporate prior knowledge from a training dataset for the task of text categorization. Prior categor labels coming from data in the training set can be used to influence similarity measures between worlds to better classify incoming test dataset among the different categories. Thus, the same conceptual approach, that can be expressed in the framework of the graph theory, can be used for both clustering and categorization task depending on the amount of prior information available. Our results show significant increase in the accuracy with respect to the state of the art of both one-way and two-way clustering on the different datasets that were tested
Capítulos de libros sobre el tema "Fouille de Données Textuelles Hétérogènes"
Kaddour, Cyrille Ben, François Capron y Olivier Labat. "Archéologie des habitats ruraux alto-médiévaux en Eure-et-Loir". En L’archéologie des ve-xiie siècles en région Centre-Val de Loire, 127–41. Tours: Fédération pour l’édition de la Revue archéologique du Centre de la France, 2024. https://doi.org/10.4000/13ibo.
Texto completoMarchaisseau, Vincent, Muriel Boulen, Adrien Gonnet, Blandine Lecomte-Schmitt, Willy Tegel, Pierre Testard y Cédric Roms. "Le drainage d’une zone humide intégrée à la ville de Troyes (xiiie-xviiie s.)". En L’eau dans les villes d’Europe au Moyen Âge (IVe-XVe siècle) : un vecteur de transformation de l’espace urbain, 55–69. Tours: Fédération pour l’édition de la Revue archéologique du Centre de la France, 2023. https://doi.org/10.4000/1377c.
Texto completoLecroère, Thomas, Hervé Sellès y Vincent Acheré. "Les enceintes urbaines de Chartres, entre Bas-Empire et haut Moyen Âge. État des connaissances et hypothèses de tracés". En L’archéologie des ve-xiie siècles en région Centre-Val de Loire, 19–36. Tours: Fédération pour l’édition de la Revue archéologique du Centre de la France, 2024. https://doi.org/10.4000/13ib6.
Texto completoLechevrel, Nadège. "Chapitre 5. Fouille de données textuelles et recherche documentaire automatiques pour l’histoire des théories linguistiques". En Apparenter la pensée ?, 219. Editions Matériologiques, 2014. http://dx.doi.org/10.3917/edmat.charb.2014.01.0219.
Texto completo