Dissertations / Theses on the topic 'Résumés'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Résumés.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Yahiaoui, Itheri. "Construction automatique de résumés vidéos." Paris, ENST, 2003. http://www.theses.fr/2003ENST0038.
Full textThe ever-growing availability of multimedia data, creates a strong requirement for efficient tools to manipulate and present data in an effective manner. Automatic video summarization tools aim at creating with little or no human interaction short versions which contains the salient information of original video. The key issue here is to identify what should be kept in the summary and how relevant information can be automatically extracted. In this thesis, we present a new approach for the automatic construction and evaluation of video summaries. This approach is based on a task that we feel relevant to many applications of summaries: the user is asked to identify if a short clip comes from an original audio-video sequence or not, using only the knowledge of the summary (rather than the full sequence). The performance of the user is measured by the percentage of correct decisions over all possible clips taken from the original sequence. We call this task a Maximum Recollection Task (MRT), in the sense that the summary should let the user identify as many clips as possible. The best summary is therefore chosen according to a Maximum Recollection Principle (MRP). In this work, we have extended the MRP to propose different methods of summaries creation according to the type of the media used. First, we presented a method for automatic construction of video summaries based on visual information only. Then we compared some methodologies for multi video summaries construction, where the focus is not necessarily on what is important in a video, but rather what distinguishes this video from the others. We have also illustrated the adaptation of this principle to build summaries from text documents. Finally, we presented a framework in which text and video are combined during the construction of summaries of audio-video sequences
Saggion, Horacio. "Génération automatique de résumés par analyse sélective." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0017/NQ55476.pdf.
Full textGabsi, Nesrine. "Extension et interrogation de résumés de flux de données." Phd thesis, Télécom ParisTech, 2011. http://pastel.archives-ouvertes.fr/pastel-00613122.
Full textBoly, Aliou. "Fonctions d'oubli et résumés dans les entrepôts de données." Paris, ENST, 2006. http://www.theses.fr/2006ENST0049.
Full textThe amount of data stored in data warehouses grows very quickly so that they get saturated. To overcome this problem, the solution is generally to archive older data when new data arrive if there is no space left. This solution is not satisfactory because data mining analyses based on long term historical data become impossible. As a matter of fact data mining analysis cannot be done on archived data without re-loading them in the data warehouse; and the cost of loading back a large dataset of archived data is too high to be operated just for one analysis. So, archived data must be considered as lost data regarding to data mining applications. In this thesis, we propose a solution for solving this problem: a language is defined to specify forgetting functions on older data. The specifications include the definition of some summaries of deleted data to define what data should be present in the data warehouse at each step of time. These summaries are aggregates and samples of deleted data and will be kept in the data warehouse. The goal of these forgetting functions is to control the size of the data warehouse. This control is provided both for the aggregate summaries and the samples. The specification language for forgetting function is defined in the context of relational databases. Once forgetting functions have been specified, the data warehouse is automatically updated in order to follow the specifications. This thesis presents both the language for specifications, the structure of the summaries, the algorithms to update the data warehouse and the possibility of performing interesting analyses of historical data
Boukadida, Haykel. "Création automatique de résumés vidéo par programmation par contraintes." Thesis, Rennes 1, 2015. http://www.theses.fr/2015REN1S074/document.
Full textThis thesis focuses on the issue of automatic video summarization. The idea is to create an adaptive video summary that takes into account a set of rules defined on the audiovisual content on the one hand, and that adapts to the users preferences on the other hand. We propose a novel approach that considers the problem of automatic video summarization as a constraint satisfaction problem. The solution is based on constraint satisfaction programming (CSP) as programming paradigm. A set of general rules for summary production are inherently defined by an expert. These production rules are related to the multimedia content of the input video. The rules are expressed as constraints to be satisfied. The final user can then define additional constraints (such as the desired duration of the summary) or enter a set of high-level parameters involving to the constraints already defined by the expert. This approach has several advantages. This will clearly separate the summary production rules (the problem modeling) from the summary generation algorithm (the problem solving by the CSP solver). The summary can hence be adapted without reviewing the whole summary generation process. For instance, our approach enables users to adapt the summary to the target application and to their preferences by adding a constraint or modifying an existing one, without changing the summaries generation algorithm. We have proposed three models of video representation that are distinguished by their flexibility and their efficiency. Besides the originality related to each of the three proposed models, an additional contribution of this thesis is an extensive comparative study of their performance and the quality of the resulting summaries using objective and subjective measures. Finally, and in order to assess the quality of automatically generated summaries, the proposed approach was evaluated by a large-scale user evaluation. This evaluation involved more than 60 people. All these experiments have been performed within the challenging application of tennis match automatic summarization
Norrby, Sara. "Using Morphological Analysis in an Information Retrieval System for Résumés." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189578.
Full textI detta examensarbete undersöks hur användning av morfologisk analys, så som lemmatisering, påverkar prestandan hos ett informationssökningssystem för CV:n på svenska. Det tas också upp hur relevans hos ett CV kan bedömas och informationssökningssystemet utvärderas utifrån precision och täckning men även ''discounted cumulative gain'' vilket är ett mått på rankningsförmåga. Resultaten visar att morfologisk analys ger positiva effekter i de fall då frågan till söksystemet innehåller många svenska ord. När frågan innehöll många namn på olika tekniker så visade det sig vara negativt att använda morfologi, framförallt när det gäller uppdelning av sammansatta ord. Lemmatisering var den metod som hade positiv effekt i vissa fall medan uppdelning av sammansatta ord endast hade en negativ effekt.
Sanabria, Rosas Laura Melissa. "Détection et caractérisation des moments saillants pour les résumés automatiques." Thesis, Université Côte d'Azur, 2021. http://www.theses.fr/2021COAZ4104.
Full textVideo content is present in an ever-increasing number of fields, both scientific and commercial. Sports, particularly soccer, is one of the industries that has invested the most in the field of video analytics, due to the massive popularity of the game. Although several state-of-the-art methods rely on handcrafted heuristics to generate summaries of soccer games, they have proven that multiple modalities help detect the best actions of the game. On the other hand, the field of general-purpose video summarization has advanced rapidly, offering several deep learning approaches. However, many of them are based on properties that are not feasible for sports videos. Video content has been for many years the main source for automatic tasks in soccer but the data that registers all the events happening on the field have become lately very important in sports analytics, since these event data provide richer information and requires less processing. Considering that in automatic sports summarization, the goal is not only to show the most important actions of the game, but also to evoke as much emotion as those evoked by human editors, we propose a method to generate the summary of a soccer match video exploiting the event metadata of the entire match and the content broadcast on TV. We have designed an architecture, introducing (1) a Multiple Instance Learning method that takes into account the sequential dependency among events, (2) a hierarchical multimodal attention layer that grasps the importance of each event in an action and (3) a method to automatically generate multiple summaries of a soccer match by sampling from a ranking distribution, providing multiple candidate summaries which are similar enough but with relevant variability to provide different options to the final user.We also introduced solutions to some additional challenges in the field of sports summarization. Based on the internal signals of an attention model that uses event data as input, we proposed a method to analyze the interpretability of our model through a graphical representation of actions where the x-axis of the graph represents the sequence of events, and the y-axis is the weight value learned by the attention layer. This new representation provides a new tool for the editor containing meaningful information to decide whether an action is important. We also proposed the use of keyword spotting and boosting techniques to detect every time a player is mentioned by the commentators as a solution for the missing event data
Moyse, Gilles. "Résumés linguistiques de données numériques : interprétabilité et périodicité de séries." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066526/document.
Full textOur research is in the field of fuzzy linguistic summaries (FLS) that allow to generate natural language sentences to describe very large amounts of numerical data, providing concise and intelligible views of these data. We first focus on the interpretability of FLS, crucial to provide end-users with an easily understandable text, but hard to achieve due to its linguistic form. Beyond existing works on that topic, based on the basic components of FLS, we propose a general approach for the interpretability of summaries, considering them globally as groups of sentences. We focus more specifically on their consistency. In order to guarantee it in the framework of standard fuzzy logic, we introduce a new model of oppositions between increasingly complex sentences. The model allows us to show that these consistency properties can be satisfied by selecting a specific negation approach. Moreover, based on this model, we design a 4-dimensional cube displaying all the possible oppositions between sentences in a FLS and show that it generalises several existing logical opposition structures. We then consider the case of data in the form of numerical series and focus on linguistic summaries about their periodicity: the sentences we propose indicate the extent to which the series are periodic and offer an appropriate linguistic expression of their periods. The proposed extraction method, called DPE, standing for Detection of Periodic Events, splits the data in an adaptive manner and without any prior information, using tools from mathematical morphology. The segments are then exploited to compute the period and the periodicity, measuring the quality of the estimation and the extent to which the series is periodic. Lastly, DPE returns descriptive sentences of the form ``Approximately every 2 hours, the customer arrival is important''. Experiments with artificial and real data show the relevance of the proposed DPE method. From an algorithmic point of view, we propose an incremental and efficient implementation of DPE, based on established update formulas. This implementation makes DPE scalable and allows it to process real-time streams of data. We also present an extension of DPE based on the local periodicity concept, allowing the identification of local periodic subsequences in a numerical series, using an original statistical test. The method validated on artificial and real data returns natural language sentences that extract information of the form ``Every two weeks during the first semester of the year, sales are high''
Moyse, Gilles. "Résumés linguistiques de données numériques : interprétabilité et périodicité de séries." Electronic Thesis or Diss., Paris 6, 2016. http://www.theses.fr/2016PA066526.
Full textOur research is in the field of fuzzy linguistic summaries (FLS) that allow to generate natural language sentences to describe very large amounts of numerical data, providing concise and intelligible views of these data. We first focus on the interpretability of FLS, crucial to provide end-users with an easily understandable text, but hard to achieve due to its linguistic form. Beyond existing works on that topic, based on the basic components of FLS, we propose a general approach for the interpretability of summaries, considering them globally as groups of sentences. We focus more specifically on their consistency. In order to guarantee it in the framework of standard fuzzy logic, we introduce a new model of oppositions between increasingly complex sentences. The model allows us to show that these consistency properties can be satisfied by selecting a specific negation approach. Moreover, based on this model, we design a 4-dimensional cube displaying all the possible oppositions between sentences in a FLS and show that it generalises several existing logical opposition structures. We then consider the case of data in the form of numerical series and focus on linguistic summaries about their periodicity: the sentences we propose indicate the extent to which the series are periodic and offer an appropriate linguistic expression of their periods. The proposed extraction method, called DPE, standing for Detection of Periodic Events, splits the data in an adaptive manner and without any prior information, using tools from mathematical morphology. The segments are then exploited to compute the period and the periodicity, measuring the quality of the estimation and the extent to which the series is periodic. Lastly, DPE returns descriptive sentences of the form ``Approximately every 2 hours, the customer arrival is important''. Experiments with artificial and real data show the relevance of the proposed DPE method. From an algorithmic point of view, we propose an incremental and efficient implementation of DPE, based on established update formulas. This implementation makes DPE scalable and allows it to process real-time streams of data. We also present an extension of DPE based on the local periodicity concept, allowing the identification of local periodic subsequences in a numerical series, using an original statistical test. The method validated on artificial and real data returns natural language sentences that extract information of the form ``Every two weeks during the first semester of the year, sales are high''
Molina, Villegas Alejandro. "Compression automatique de phrases : une étude vers la génération de résumés." Phd thesis, Université d'Avignon, 2013. http://tel.archives-ouvertes.fr/tel-00998924.
Full textHébrail, Georges. "Définition de résumés et incertitude dans les grandes bases de données." Paris 11, 1987. http://www.theses.fr/1987PA112223.
Full textTwo apparently different problems are addressed in this study: building summaries of a database and modelling errors contained in a database. A model of summaries of a database is proposed. The summaries are physically stored in the database as redundant data and automatically updated when changes occur in the database. The cost of the summaries update is shown to be low. Lt is then possible to extract synthetic information from the database with a response time which is independent of the size of the database. The multiple applications of summaries in a database are also presented: extraction of synthetic information, query optimisation, data security, check of integrity constraints, distributed databases. A model of representation of errors contained in a database is then proposed. The model, based on a probabilistic approach, leads to a computation of the effect of errors on the result of database queries. The links which exist between these two problems are pointed out: a single concept is used both for the definition of the summaries and for the representation of errors, and particular summaries are required to compute the error associated to a query. The study is independent of the data model (relational, network, hierarchical). The results of the study are nevertheless applied to the relational model. The best area for application of the developped concepts is the area of very large databases
Godbout, Mathieu. "Approches par bandit pour la génération automatique de résumés de textes." Master's thesis, Université Laval, 2021. http://hdl.handle.net/20.500.11794/69488.
Full textThis thesis discusses the use of bandit methods to solve the problem of training extractive abstract generation models. The extractive models, which build summaries by selecting sentences from an original document, are difficult to train because the target summary of a document is usually not built in an extractive way. It is for this purpose that we propose to see the production of extractive summaries as different bandit problems, for which there exist algorithms that can be leveraged for training summarization models.In this paper, BanditSum is first presented, an approach drawn from the literature that sees the generation of the summaries of a set of documents as a contextual bandit problem. Next,we introduce CombiSum, a new algorithm which formulates the generation of the summary of a single document as a combinatorial bandit. By exploiting the combinatorial formulation,CombiSum manages to incorporate the notion of the extractive potential of each sentence of a document in its training. Finally, we propose LinCombiSum, the linear variant of Com-biSum which exploits the similarities between sentences in a document and uses the linear combinatorial bandit formulation instead
Bordet, Geneviève. "Étude contrastive de résumés de thèse dans une perspective d'analyse de genre." Phd thesis, Université Paris-Diderot - Paris VII, 2011. http://tel.archives-ouvertes.fr/tel-00650637.
Full textNaoum, Lamiaa. "Un modèle multidimensionnel pour un processus d'analyse en ligne de résumés flous." Phd thesis, Université de Nantes, 2006. http://tel.archives-ouvertes.fr/tel-00481046.
Full textZneika, Mussab. "Interrogation du web sémantique à l'aide de résumés de graphes de données." Thesis, Cergy-Pontoise, 2019. http://www.theses.fr/2019CERG1010.
Full textThe amount of RDF data available increases fast both in size and complexity, making available RDF Knowledge Bases (KBs) with millions or even billions of triples something usual, e.g. more than 1000 datasets are now published as part of the Linked Open Data (LOD) cloud, which contains more than 62 billion RDF triples, forming big and complex RDF data graphs. This explosion of size, complexity and number of available RDF Knowledge Bases (KBs) and the emergence of Linked Datasets made querying, exploring, visualizing, and understanding the data in these KBs difficult both from a human (when trying to visualize) and a machine (when trying to query or compute) perspective. To tackle this problem, we propose a method of summarizing a large RDF KBs based on representing the RDF graph using the (best) top-k approximate RDF graph patterns. The method is named SemSum+ and extracts the meaningful/descriptive information from RDF Knowledge Bases and produces a succinct overview of these RDF KBs. It extracts from the RDF graph, an RDF schema that describes the actual contents of the KB, something that has various advantages even compared to an existing schema, which might be partially used by the data in the KB. While computing the approximate RDF graph patterns, we also add information on the number of instances each of the patterns represents. So, when we query the RDF summary graph, we can easily identify whether the necessary information is present and if it is present in significant numbers whether to be included in a federated query result. The method we propose does not require the presence of the initial schema of the KB and works equally well when there is no schema information at all (something realistic with modern KBs that are constructed either ad-hoc or by merging fragments of other existing KBs). Additionally, the proposed method works equally well with homogeneous (having the same structure) and heterogeneous (having different structure, possibly the result of data described under different schemas/ontologies) RDF graphs.Given that RDF graphs can be large and complex, methods that need to compute the summary by fitting the whole graph in the memory of a (however large) machine will not scale. In order to overcome this problem, we proposed, as part of this thesis, a parallel framework that allows us to have a scalable parallel version of our proposed method. This will allow us to compute the summaries of any RDF graph regardless of size. Actually, we generalized this framework so as to be usable by any approximate pattern mining algorithm that needs parallelization.But working on this problem, introduced us to the issue of measuring the quality of the produced summaries. Given that in the literature exist various algorithms that can be used to summarize RDF graphs, we need to understand which one is better suited for a specific task or a specific RDF KB. In the literature, there is a lack of widely accepted evaluation criteria or an extensive empirical evaluation. This leads to the necessity of a method to compare and evaluate the quality of the produced summaries. So, in this thesis, we provide a comprehensive Quality Framework for RDF Graph Summarization to cover the gap that exists in the literature. This framework allows a better, deeper and more complete understanding of the quality of the different summaries and facilitates their comparison. It is independent of the way RDF summarization algorithms work and makes no assumptions on the type or structure neither of the input nor of the final results. We provide a set of metrics that help us understand not only if this is a valid summary but also how a summary compares to another in terms of the specified quality characteristic(s). The framework has the ability, which was experimentally validated, to capture subtle differences among summaries and produce metrics that depict that and was used to provide an extensive experimental evaluation and comparison of our method
Naoum, Lamiaa. "Un modèle multidimensionnel pour un processus d'analyse en ligne de résumés flous." Nantes, 2006. http://www.theses.fr/2006NANT2101.
Full textCicchetti, Rosine. "Contribution à la modélisation des résumés dans les bases de données statistiques." Nice, 1990. http://www.theses.fr/1990NICE4394.
Full textCastillo, Reitz Maria. "Etude d'un système d'extraction et présentation de résumés pour les bases de données." Montpellier 2, 1994. http://www.theses.fr/1994MON20277.
Full textLopez, Guillen Karla Ivon. "Contributions aux résumés visuels des bases de données géographiques basés sur les chorèmes." Lyon, INSA, 2010. http://www.theses.fr/2010ISAL0055.
Full text[When dealing with complex situations, as in political, economic and demographic trends, use of visual metaphors ls a very effective approach to help users discover relationships and new knowledge. The traditional cartography is an essential tool to describe the facts and relations in the territory. The geographic concepts are associated with graphic symbols that help readers get an immediate understanding of the data represented. From a geographic database, il is common to extract multiple maps (cartographic restitution of ali data). My thesis ls an international research project whose objective ls to study an innovative mapping solution thal can represent both the existing situation, dynamics, movement and change in order to extract visual resumes synthetic of geographic data bases. The proposed solution is based on the concept of chorem defined by Brunet as a mapping of a territory. This represents a methodological tool instant snapshot of relevant information and gives expert users an overview of objects and phenomena. Based on preliminary, first, we provide a formal definition and classification of chorems in terms of structure and meaning to standardize both the construction and use of these chorems. Then a phase of data mining is launched to extract the most significant patterns, which will be the basis of chorems. Then, a system to general chorématique maps from available data sets is described and an XML-based language, called ChorML specified, allowing communication between the modules of the system (data mining to extract chorems, visualization of chorems Level 0 of the language corresponds to the content of the database by GML standard, then the level1 is the one who will describe the patterns extracted and chorems, and finally level 2 ls the visualisation by the SVG standard. In addition, Level integrals information such as external information (eg, names of seas and surrounding countries) and topological constraints to meet in the display: eg. ]
Ndiaye, Marie. "Exploration de grands ensembles de motifs." Thesis, Tours, 2010. http://www.theses.fr/2010TOUR4029/document.
Full textThe abundance of patterns generated by knowledge extraction algorithms is a major problem in data mining. Ta facilitate the exploration of these patterns, two approaches are often used: the first is to summarize the sets of extracted patterns and the second approach relies on the construction of visual representations of the patterns. However, the summaries are not structured and they are proposed without exploration method. Furthermore, visualizations do not provide an overview of the pattern .sets. We define a generic framework that combines the advantages of bath approaches. It allows building summaries of patterns sets at different levels of detail. These summaries provide an overview of the pattern sets and they are structured in the form of cubes on which OLAP navigational operators can be applied in order to explore the pattern sets. Moreover, we propose an algorithm which provides a summary of good quality whose size is below a given threshold. Finally, we instantiate our framework with association rules
Ngom, Bassirou. "FreeCore : un système d'indexation de résumés de document sur une Table de Hachage Distribuée (DHT)." Thesis, Sorbonne université, 2018. http://www.theses.fr/2018SORUS180/document.
Full textThis thesis examines the problem of indexing and searching in Distributed Hash Table (DHT). It provides a distributed system for storing document summaries based on their content. Concretely, the thesis uses Bloom filters (BF) to represent document summaries and proposes an efficient method for inserting and retrieving documents represented by BFs in an index distributed on a DHT. Content-based storage has a dual advantage. It allows to group similar documents together and to find and retrieve them more quickly at the same by using Bloom filters for keywords searches. However, processing a keyword query represented by a Bloom filter is a difficult operation and requires a mechanism to locate the Bloom filters that represent documents stored in the DHT. Thus, the thesis proposes in a second time, two Bloom filters indexes schemes distributed on DHT. The first proposed index system combines the principles of content-based indexing and inverted lists and addresses the issue of the large amount of data stored by content-based indexes. Indeed, by using Bloom filters with long length, this solution allows to store documents on a large number of servers and to index them using less space. Next, the thesis proposes a second index system that efficiently supports superset queries processing (keywords-queries) using a prefix tree. This solution exploits the distribution of the data and proposes a configurable distribution function that allow to index documents with a balanced binary tree. In this way, documents are distributed efficiently on indexing servers. In addition, the thesis proposes in the third solution, an efficient method for locating documents containing a set of keywords. Compared to solutions of the same category, the latter solution makes it possible to perform subset searches at a lower cost and can be considered as a solid foundation for supersets queries processing on over-dht index systems. Finally, the thesis proposes a prototype of a peer-to-peer system for indexing content and searching by keywords. This prototype, ready to be deployed in a real environment, is experimented with peersim that allowed to measure the theoretical performances of the algorithms developed throughout the thesis
Ngom, Bassirou. "FreeCore : un système d'indexation de résumés de document sur une Table de Hachage Distribuée (DHT)." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS180.
Full textThis thesis examines the problem of indexing and searching in Distributed Hash Table (DHT). It provides a distributed system for storing document summaries based on their content. Concretely, the thesis uses Bloom filters (BF) to represent document summaries and proposes an efficient method for inserting and retrieving documents represented by BFs in an index distributed on a DHT. Content-based storage has a dual advantage. It allows to group similar documents together and to find and retrieve them more quickly at the same by using Bloom filters for keywords searches. However, processing a keyword query represented by a Bloom filter is a difficult operation and requires a mechanism to locate the Bloom filters that represent documents stored in the DHT. Thus, the thesis proposes in a second time, two Bloom filters indexes schemes distributed on DHT. The first proposed index system combines the principles of content-based indexing and inverted lists and addresses the issue of the large amount of data stored by content-based indexes. Indeed, by using Bloom filters with long length, this solution allows to store documents on a large number of servers and to index them using less space. Next, the thesis proposes a second index system that efficiently supports superset queries processing (keywords-queries) using a prefix tree. This solution exploits the distribution of the data and proposes a configurable distribution function that allow to index documents with a balanced binary tree. In this way, documents are distributed efficiently on indexing servers. In addition, the thesis proposes in the third solution, an efficient method for locating documents containing a set of keywords. Compared to solutions of the same category, the latter solution makes it possible to perform subset searches at a lower cost and can be considered as a solid foundation for supersets queries processing on over-dht index systems. Finally, the thesis proposes a prototype of a peer-to-peer system for indexing content and searching by keywords. This prototype, ready to be deployed in a real environment, is experimented with peersim that allowed to measure the theoretical performances of the algorithms developed throughout the thesis
Bernié, Jean-Paul. "Approche semiologique et pragmatique du texte d'idees. Prealable linguistique a la didactique du resume." Toulouse 2, 1991. http://www.theses.fr/1991TOU20029.
Full textRaschia, Guillaume. "SaintEtiq : une approche floue pour la génération de résumés à partir de bases de données relationnelles." Nantes, 2001. http://www.theses.fr/2001NANT2099.
Full textChaar, Nouira Sana-Leila. "Filtrage d'information pour la construction de résumés multi-documents guidée par le profil utilisateur : le système REDUIT." Université de Marne-la-Vallée, 2004. http://www.theses.fr/2004MARN0223.
Full textIn this work, we present an information filtering method that selects from a set of documents their most significant excerpts in relation to an user profile. We developed a method which takes into account the topical heterogeneity of the information needs of an user to produce a multi-document summary that is specific of its requirements. The information needs of an user are represented by an user profile that is structured from a topical viewpoint. More precisely, we chose to structure user profiles according to a topical criterion: a profile is a set of terms that are grouped into topically homogeneous subsets. Each of these subsets represents a sub-topic of the profile. The summarizing of documents is based on the extraction of segments that are the most likely to match with the profile. The first step of this extraction is the filtering document step. The input documents are pre-processed, both for selecting and normalizing their content words and segmenting them into topically coherent segments. The filtering is based on the matching of the profile and the topical segments of documents. The result of this matching is first used for discarding the documents without any relation with the profile and then, for selecting the excerpts in relation with the profile. This selection step is also based on the detection of the vocabulary of segments that is closely linked to the profile. When a global compatibility between the profile and the document is found, an additional topical analysis is performed to expand the vocabulary defining each sub-topic of the profile and to add to it the terms of the document that are linked to this sub-topic but that are not already present in the profile. This expansion is a way for selecting in a more reliable way excerpts that are linked to profiles but also for selecting excerpts that may bring new and interesting information about their topics. The third step performs information fusion by detecting and removing redundancies among the selected segments. This operation is first achieved among the segments of a document and then, among the segments coming from all the selected documents. Finally, the fourth step is turned toward users: the selected segments are ranked according to their significance, both from the viewpoint of the profile and the viewpoint of documents, and they are pruned for limiting the amount of text to read. This method was implemented by the REDUIT system, whose the evaluation showed that taking into account the topical heterogeneity of profiles can improve the results of the processes at the different steps of the building the a multi-document summary
Voglozin, W. Amenel. "Le résumé linguistique de données structurées comme support pour l'interrogation." Phd thesis, Université de Nantes, 2007. http://tel.archives-ouvertes.fr/tel-00481049.
Full textPalvadeau, Sophie. "Approche contrastive de la rédaction scientifique - Les consignes éditoriales et les résumés de revues japonaises et françaises de chimie." Phd thesis, Ecole des Hautes Etudes en Sciences Sociales (EHESS), 2006. http://tel.archives-ouvertes.fr/tel-00816797.
Full textMotta, Jesus Antonio. "VENCE : un modèle performant d'extraction de résumés basé sur une approche d'apprentissage automatique renforcée par de la connaissance ontologique." Doctoral thesis, Université Laval, 2014. http://hdl.handle.net/20.500.11794/26076.
Full textSeveral methods and techniques of artificial intelligence for information extraction, pattern recognition and data mining are used for extraction of summaries. More particularly, new machine learning models with the introduction of ontological knowledge allow the extraction of the sentences containing the greatest amount of information from a corpus. This corpus is considered as a set of sentences on which different optimization methods are applied to identify the most important attributes. They will provide a training set from which a machine learning algorithm will can abduce a classification function able to discriminate the sentences of new corpus according their information content. Currently, even though the results are interesting, the effectiveness of models based on this approach is still low, especially in the discriminating power of classification functions. In this thesis, a new model based on this approach is proposed and its effectiveness is improved by inserting ontological knowledge to the training set. The originality of this model is described through three papers. The first paper aims to show how linear techniques could be applied in an original way to optimize workspace in the context of extractive summary. The second article explains how to insert ontological knowledge to significantly improve the performance of classification functions. This introduction is performed by inserting lexical chains of ontological knowledge based in the training set. The third article describes VENCE , the new machine learning model to extract sentences with the most information content in order to produce summaries. An assessment of the VENCE performance is achieved comparing the results with those produced by current commercial and public software as well as those published in very recent scientific articles. The use of usual metrics recall, precision and F_measure and the ROUGE toolkit showed the superiority of VENCE. This model could benefit other contexts of information extraction as for instance to define models for sentiment analysis.
Veilex, Florence. "Approche expérimentale des processus humains de compréhension en vue d'une indexation automatique des résumés scientifiques : application à un corpus de géologie." Grenoble 2, 1985. http://www.theses.fr/1985GRE2A005.
Full textHayek, Rabab. "Techniques de localisation et de résumé des données dans les systèmes P2P." Phd thesis, Université de Nantes, 2009. http://tel.archives-ouvertes.fr/tel-00475913.
Full textArion, Andrei. "XML access modules : towards physical data independence in XML databases." Paris 11, 2007. http://www.theses.fr/2007PA112288.
Full textThe purpose of this thesis is to design a framework for achieving the goal of physical data independence in XML databases. We first propose the XML Access Modules - a rich tree pattern language featuring multiple returned nodes, nesting, structural identifiers and optional nodes, and we show how it can be used to uniformly describe a large set of XML storage schemes, indices and materialized views. A second part of this thesis focuses on the problem of XQuery rewriting using XML Access Modules. As a first step of our rewriting approach we present an algorithm to extract XML Access Modules patterns from XQuery and we show that the patterns we identify are strictly larger than in previous works, and in particular may span over nested XQuery blocks. We characterize the complexity of tree pattern containment (which is a key subproblem of rewriting) and rewriting itself, under the constraints expressed by a structural summary, whose enhanced form also entails integrity constraints. We also show how to exploit the structural identifiers from the view definitions in order to enhance the rewriting opportunities
Duclos, Cartolano Catherine. "Représentation de l'information pharmaco-thérapeutique des résumés des caractéristiques produit des médicaments : apport des méthodes de traitement automatique du langage naturel, développement, validation et utilisation de modèles." Paris 5, 2003. http://www.theses.fr/2003PA05CD01.
Full textLaurent, Anne. "Bases de données multidimensionnelles floues et leur utilisation pour la fouille de données." Paris 6, 2002. http://www.theses.fr/2002PA066426.
Full textElisabeth, Erol. "Fouille de données spatio-temporelles, résumés de données et apprentissage automatique : application au système de recommandations touristique, données médicales et détection des transactions atypiques dans le domaine financier." Thesis, Antilles, 2021. http://www.theses.fr/2021ANTI0607.
Full textData mining is one of the components of Customer Relationship Management (CRM), widely deployed in companies. It is the process of extracting interesting, non-trivial, implicit, unknown and potentially useful knowledge from data. This process relies on algorithms from various scientific disciplines (statistics, artificial intelligence, databases) to build models from data stored in data warehouses.The objective of determining models, established from clusters in the service of improving knowledge of the customer in the generic sense, the prediction of his behavior and the optimization of the proposed offer. Since these models are intended to be used by users who are specialists in the field of data, researchers in health economics and management sciences or professionals in the sector studied, this research work emphasizes the usability of data mining environments.This thesis is concerned with spatio-temporal data mining. It particularly highlights an original approach to data processing with the aim of enriching practical knowledge in the field.This work includes an application component in four chapters which corresponds to four systems developed:- A model for setting up a recommendation system based on the collection of GPS positioning data,- A data summary tool optimized for the speed of responses to requests for the medicalization of information systems program (PMSI),- A machine learning tool for the fight against money laundering in the financial system,- A model for the prediction of activity in VSEs which are weather-dependent (tourism, transport, leisure, commerce, etc.). The problem here is to identify classification algorithms and neural networks for data analysis aimed at adapting the company's strategy to economic changes
Goulet, Marie-Josée. "Analyse d'évaluations en résumé automatique : proposition d'une terminologie française, description des paramètres expérimentaux et recommandations." Thesis, Université Laval, 2008. http://www.theses.ulaval.ca/2008/25346/25346.pdf.
Full textOudni, Amal. "Fouille de données par extraction de motifs graduels : contextualisation et enrichissement." Thesis, Paris 6, 2014. http://www.theses.fr/2014PA066437/document.
Full textThis thesis's works belongs to the framework of knowledge extraction and data mining applied to numerical or fuzzy data in order to extract linguistic summaries in the form of gradual itemsets: the latter express correlation between attribute values of the form « the more the temperature increases, the more the pressure increases ». Our goal is to contextualize and enrich these gradual itemsets by proposing different types of additional information so as to increase their quality and provide a better interpretation. We propose four types of new itemsets: first of all, reinforced gradual itemsets, in the case of fuzzy data, perform a contextualization by integrating additional attributes linguistically introduced by the expression « all the more ». They can be illustrated by the example « the more the temperature decreases, the more the volume of air decreases, all the more its density increases ». Reinforcement is interpreted as increased validity of the gradual itemset. In addition, we study the extension of the concept of reinforcement to association rules, discussing their possible interpretations and showing their limited contribution. We then propose to process the contradictory itemsets that arise for example in the case of simultaneous extraction of « the more the temperature increases, the more the humidity increases » and « the more the temperature increases, the less the humidity decreases ». To manage these contradictions, we define a constrained variant of the gradual itemset support, which, in particular, does not only depend on the considered itemset, but also on its potential contradictors. We also propose two extraction methods: the first one consists in filtering, after all itemsets have been generated, and the second one integrates the filtering process within the generation step. We introduce characterized gradual itemsets, defined by adding a clause linguistically introduced by the expression « especially if » that can be illustrated by a sentence such as « the more the temperature decreases, the more the humidity decreases, especially if the temperature varies in [0, 10] °C »: the additional clause precise value ranges on which the validity of the itemset is increased. We formalize the quality of this enrichment as a trade-off between two constraints imposed to identified interval, namely a high validity and a high size, as well as an extension taking into account the data density. We propose a method to automatically extract characterized gradual based on appropriate mathematical morphology tools and the definition of an appropriate filter and transcription
Massé, Luc de. "Evaluation de la charge de travail des soignants et du handicap des patients au cours de la rééducation après pose de prothèse totale de hanche à partir de résumés informatisés." Montpellier 1, 1989. http://www.theses.fr/1989MON11079.
Full textVroonland, Joy Phelps. "The Evaluation of Academic Vitae in Low, Moderate, and High Paradigm Academic Disciplines." Thesis, University of North Texas, 1992. https://digital.library.unt.edu/ark:/67531/metadc278603/.
Full textLundgreen, Michael Scott. "A career preparation curriculum for [the] San Bernardino Employment Development Department." CSUSB ScholarWorks, 2002. https://scholarworks.lib.csusb.edu/etd-project/2121.
Full textBechchi, Mounir. "Clustering-based Approximate Answering of Query Result in Large and Distributed Databases." Phd thesis, Université de Nantes, 2009. http://tel.archives-ouvertes.fr/tel-00475917.
Full textOudni, Amal. "Fouille de données par extraction de motifs graduels : contextualisation et enrichissement." Electronic Thesis or Diss., Paris 6, 2014. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2014PA066437.pdf.
Full textThis thesis's works belongs to the framework of knowledge extraction and data mining applied to numerical or fuzzy data in order to extract linguistic summaries in the form of gradual itemsets: the latter express correlation between attribute values of the form « the more the temperature increases, the more the pressure increases ». Our goal is to contextualize and enrich these gradual itemsets by proposing different types of additional information so as to increase their quality and provide a better interpretation. We propose four types of new itemsets: first of all, reinforced gradual itemsets, in the case of fuzzy data, perform a contextualization by integrating additional attributes linguistically introduced by the expression « all the more ». They can be illustrated by the example « the more the temperature decreases, the more the volume of air decreases, all the more its density increases ». Reinforcement is interpreted as increased validity of the gradual itemset. In addition, we study the extension of the concept of reinforcement to association rules, discussing their possible interpretations and showing their limited contribution. We then propose to process the contradictory itemsets that arise for example in the case of simultaneous extraction of « the more the temperature increases, the more the humidity increases » and « the more the temperature increases, the less the humidity decreases ». To manage these contradictions, we define a constrained variant of the gradual itemset support, which, in particular, does not only depend on the considered itemset, but also on its potential contradictors. We also propose two extraction methods: the first one consists in filtering, after all itemsets have been generated, and the second one integrates the filtering process within the generation step. We introduce characterized gradual itemsets, defined by adding a clause linguistically introduced by the expression « especially if » that can be illustrated by a sentence such as « the more the temperature decreases, the more the humidity decreases, especially if the temperature varies in [0, 10] °C »: the additional clause precise value ranges on which the validity of the itemset is increased. We formalize the quality of this enrichment as a trade-off between two constraints imposed to identified interval, namely a high validity and a high size, as well as an extension taking into account the data density. We propose a method to automatically extract characterized gradual based on appropriate mathematical morphology tools and the definition of an appropriate filter and transcription
Farzindar, Atefeh. "Résumé automatique de textes juridiques." Paris 4, 2005. http://www.theses.fr/2005PA040032.
Full textWe have developed a summarization system, called LetSum, for producing short summaries for legal decisions. We have collaborated with the lawyers of the Public Law Research Center of Université de Montréal. Our method is based on the manual analysis of the judgments by comparing manually written summaries and source documents, which investigates the extraction of the most important units based on the identification of thematic structure of the document. The production of the summary is done in four steps:1. Thematic segmentation detects the thematic structure of a judgment. We distinguish seven themes: Decision data (gives the complete reference of the decision and the relation between the parties for planning the decision. ), Introduction (who? did what? to whom?), Context (recomposes the story from the facts and events), Submission (presents the point of view the parties), Issues (identifies the questions of law), Juridical Analysis (describes the analysis of the judge), Conclusion (the final decision of the court). 2. Filtering identifies parts of the text which can be eliminated, without losing relevant information for the summary, like the citations. 3. Selection builds a list of the best candidate units for each structural level of the summary. 4. Production chooses the units for the final summary and combines them in order to produce a summary of about 10% of the judgement. The evaluations of 120 summaries by 12 lawyers show the quality of summaries produced by LetSum, which are judgedexcellent
Mnasri, Maali. "Résumé automatique multi-document dynamique." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS342/document.
Full textThis thesis focuses on text Automatic Summarization and particularly on UpdateSummarization. This research problem aims to produce a differential summary of a set of newdocuments with regard to a set of old documents assumed to be known. It thus adds two issues to thetask of generic automatic summarization: the temporal dimension of the information and the history ofthe user. In this context, the work presented here is based on an extractive approach using integerlinear programming (ILP) and is organized around two main axes: the redundancy detection betweenthe selected information and the user history and the maximization of their saliency . For the first axis,we were particularly interested in the exploitation of inter-sentence similarities to detect theredundancies between the information of the new documents and those present in the already knownones, by defining a method of semantic clustering of sentences. Concerning our second axis, westudied the impact of taking into account the discursive structure of documents, in the context of theRhetorical Structure Theory (RST), to favor the selection of information considered as the mostimportant. The benefit of the methods thus defined has been demonstrated in the context ofevaluations carried out on the data of TAC and DUC campaigns. Finally, the integration of thesesemantic and discursive criteria through a delayed fusion mechanism has proved the complementarityof these two axes and the benefit of their combination
Idrissi, Najlae. "La navigation dans les bases d'images : prise en compte des attributs de texture." Phd thesis, Nantes, 2008. https://archive.bu.univ-nantes.fr/pollux/show/show?id=84546d6f-7efd-4662-ba95-e74b15907689.
Full textThis work contributes to the field of Content-based Image Retrieval (CBIR) particularly texture-based retrieval. The main goal of this work is to enable the user to navigate through a large image database without making any query in specific language. To achieve this goal, we divided the work into two main parts. The first part involves the extraction of a texture model made of relevant attributes. We proposed to study two models of texture: the co-occurrence matrices and Tamura’s attributes. The selection and validation of the model features are based on several applications that we have proposed in this thesis after reducing the dimension of the representation’s space. Then, navigation is achieved using Galois’ lattices with a simple HTML interface while passing through a phase of interpretation of numerical model texture into a semantic model. The problem of the transcription from the numerical values to the semantics is regarded as a problem of discretisation of continuous attributes. Another problem occurs when the size of the database of images increases: the performance of the navigation system are deteriorating. To overcome this problem, we propose to use techniques of summarisation to create summaries that help users to navigate through target collections instead of the whole database
Idrissi, Najlae. "La navigation dans les bases d'images : prise en compte des attributs de texture." Phd thesis, Université de Nantes, 2008. http://tel.archives-ouvertes.fr/tel-00465960.
Full textChiky, Raja. "Résumé de flux de données ditribués." Phd thesis, Télécom ParisTech, 2009. http://pastel.archives-ouvertes.fr/pastel-00005137.
Full textBossard, Aurélien. "Contribution au résumé automatique multi-documents." Phd thesis, Université Paris-Nord - Paris XIII, 2010. http://tel.archives-ouvertes.fr/tel-00573567.
Full textCsernel, Baptiste. "Résumé généraliste de flux de données." Paris, ENST, 2008. http://www.theses.fr/2008ENST0048.
Full textThis thesis deals with the creation and management of general purpose summaries build from data streams. It is centered on the development of two algorithms, one designed to produce general purpose summaries for a single data stream, and the other for three data stream sharing relational information. A data stream is defined as a real-time, continuous, ordered sequence of items. It is impossible to control the order in which items arrive, nor is it feasible to locally store a stream in its entirety. Such data streams appear in many applications, such as utility networks, IT or in monitoring tasks for instance in meteorology, geology or even finance. The first step in this work is to define the meaning of a general purpose data stream summary. The first property of such a summary is that it should be suitable for a variety of data mining and querying tasks. The second one is that it should be possible to build from the main summary a summary concerning only a selected portion of the stream encountered so far. The first algorithm designed, StreamSamp, is a general purpose summary algorithm dealing with a single data stream and based around the principle of sampling. While the second algorithm, CrossStream, is is a general purpose summary algorithm dealing with three data streams sharing relational information with one another, one relation stream linking two entity streams. This algorithm is based on the use of micro clusters, inspired by the CluStream algorithm designed by Aggarwal combined with the use of Bloom Filter. Both algorithm were implemented and tested against various sets of data to assess their performance in a number of situations
Chiky, Raja. "Résumé de flux de données distribués." Paris, ENST, 2009. https://pastel.hal.science/pastel-00005137.
Full textIn this thesis, we consider a distributed computing environment, describing a collection of multiple remote sensors that feed a unique central server with numeric and uni-dimensional data streams (also called curves). The central server has a limited memory but should be able to compute aggregated value of any subset of the stream sources from a large time horizon including old and new data streams. Two approaches are studied to reduce the size of data : (1) spatial sampling only consider a random sample of the sources observed at every instant ; (2) temporal sampling consider all sources but samples the instants to be stored. In this thesis, we propose a new approach for summarizing temporally a set of distributed data streams : From the observation of what is happening during a period t -1, we determine a data collection model to apply to the sensors for period t. The computation of aggregates involves statistical inference in the case of spatial sampling and interpolation in the case of temporal sampling. To the best of our knowledge, there is no method for estimating interpolation errors at each timestamp that would take into account some curve features such as the knowledge of the integral of the curve during the period. We propose two approaches : one uses the past of the data curve (naive approach) and the other uses a stochastic process for interpolation (stochastic approach)
Bahri, Maroua. "Improving IoT data stream analytics using summarization techniques." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT017.
Full textWith the evolution of technology, the use of smart Internet-of-Things (IoT) devices, sensors, and social networks result in an overwhelming volume of IoT data streams, generated daily from several applications, that can be transformed into valuable information through machine learning tasks. In practice, multiple critical issues arise in order to extract useful knowledge from these evolving data streams, mainly that the stream needs to be efficiently handled and processed. In this context, this thesis aims to improve the performance (in terms of memory and time) of existing data mining algorithms on streams. We focus on the classification task in the streaming framework. The task is challenging on streams, principally due to the high -- and increasing -- data dimensionality, in addition to the potentially infinite amount of data. The two aspects make the classification task harder.The first part of the thesis surveys the current state-of-the-art of the classification and dimensionality reduction techniques as applied to the stream setting, by providing an updated view of the most recent works in this vibrant area.In the second part, we detail our contributions to the field of classification in streams, by developing novel approaches based on summarization techniques aiming to reduce the computational resource of existing classifiers with no -- or minor -- loss of classification accuracy. To address high-dimensional data streams and make classifiers efficient, we incorporate an internal preprocessing step that consists in reducing the dimensionality of input data incrementally before feeding them to the learning stage. We present several approaches applied to several classifications tasks: Naive Bayes which is enhanced with sketches and hashing trick, k-NN by using compressed sensing and UMAP, and also integrate them in ensemble methods