Relevant bibliographies by topics / Automatic multi-document summarization

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers

Academic literature on the topic 'Automatic multi-document summarization'

Author: Grafiati

Published: 4 June 2021

Last updated: 19 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Automatic multi-document summarization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Automatic multi-document summarization"

Kumar. "Automatic Multi Document Summarization Approaches." Journal of Computer Science 8, no. 1 (January 1, 2012): 133–40. http://dx.doi.org/10.3844/jcssp.2012.133.140.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Faguo Zhou. "Research on Chinese Multi-document Automatic Summarization Algorithms." International Journal of Advancements in Computing Technology 4, no. 23 (December 31, 2012): 43–49. http://dx.doi.org/10.4156/ijact.vol4.issue23.6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Diedrichsen, Elke. "Linguistic challenges in automatic summarization technology." Journal of Computer-Assisted Linguistic Research 1, no. 1 (June 26, 2017): 40. http://dx.doi.org/10.4995/jclr.2017.7787.

Full text

Abstract:

Automatic summarization is a field of Natural Language Processing that is increasingly used in industry today. The goal of the summarization process is to create a summary of one document or a multiplicity of documents that will retain the sense and the most important aspects while reducing the length considerably, to a size that may be user-defined. One differentiates between extraction-based and abstraction-based summarization. In an extraction-based system, the words and sentences are copied out of the original source without any modification. An abstraction-based summary can compress, fuse or paraphrase sections of the source document. As of today, most summarization systems are extractive. Automatic document summarization technology presents interesting challenges for Natural Language Processing. It works on the basis of coreference resolution, discourse analysis, named entity recognition (NER), information extraction (IE), natural language understanding, topic segmentation and recognition, word segmentation and part-of-speech tagging. This study will overview some current approaches to the implementation of auto summarization technology and discuss the state of the art of the most important NLP tasks involved in them. We will pay particular attention to current methods of sentence extraction and compression for single and multi-document summarization, as these applications are based on theories of syntax and discourse and their implementation therefore requires a solid background in linguistics. Summarization technologies are also used for image collection summarization and video summarization, but the scope of this paper will be limited to document summarization.

APA, Harvard, Vancouver, ISO, and other styles

Dief, Nada A., Ali E. Al-Desouky, Amr Aly Eldin, and Asmaa M. El-Said. "An Adaptive Semantic Descriptive Model for Multi-Document Representation to Enhance Generic Summarization." International Journal of Software Engineering and Knowledge Engineering 27, no. 01 (February 2017): 23–48. http://dx.doi.org/10.1142/s0218194017500024.

Full text

Abstract:

Due to the increasing accessibility of online data and the availability of thousands of documents on the Internet, it becomes very difficult for a human to review and analyze each document manually. The sheer size of such documents and data presents a significant challenge for users. Providing automatic summaries of specific topics helps the users to overcome this problem. Most of the current extractive multi-document summarization systems can successfully extract summary sentences; however, many limitations exist which include the degree of redundancy, inaccurate extraction of important sentences, low coverage and poor coherence among the selected sentences. This paper introduces an adaptive extractive multi-document generic (EMDG) methodology for automatic text summarization. The framework of this methodology relies on a novel approach for sentence similarity measure, a discriminative sentence selection method for sentence scoring and a reordering technique for the extracted sentences after removing the redundant ones. Extensive experiments are done on the summarization benchmark datasets DUC2005, DUC2006 and DUC2007. This proves that the proposed EMDG methodology is more effective than the current extractive multi-document summarization systems. Rouge evaluation for automatic summarization is used to validate the proposed EMDG methodology, and the experimental results showed that it is more effective and outperforms the baseline techniques, where the generated summary is characterized by high coverage and cohesion.

APA, Harvard, Vancouver, ISO, and other styles

Rahamat Basha, S., J. Keziya Rani, and J. J. C. Prasad Yadav. "A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification Accuracy." Engineering, Technology & Applied Science Research 9, no. 6 (December 1, 2019): 5001–5. http://dx.doi.org/10.48084/etasr.3173.

Full text

Abstract:

Automatic summarization is the process of shortening one (in single document summarization) or multiple documents (in multi-document summarization). In this paper, a new feature selection method for the nearest neighbor classifier by summarizing the original training documents based on sentence importance measure is proposed. Our approach for single document summarization uses two measures for sentence similarity: the frequency of the terms in one sentence and the similarity of that sentence to other sentences. All sentences were ranked accordingly and the sentences with top ranks (with a threshold constraint) were selected for summarization. The summary of every document in the corpus is taken into a new document used for the summarization evaluation process.

APA, Harvard, Vancouver, ISO, and other styles

Kongara, Srinivasa Rao, Dasika Sree Rama Chandra Murthy, and Gangadhara Rao Kancherla. "An Automatic Text Summarization Method with the Concern of Covering Complete Formation." Recent Advances in Computer Science and Communications 13, no. 5 (November 5, 2020): 977–86. http://dx.doi.org/10.2174/2213275912666190716105347.

Full text

Abstract:

Background: Text summarization is the process of generating a short description of the entire document which is more difficult to read. This method provides a convenient way of extracting the most useful information and a short summary of the documents. In the existing research work, this is focused by introducing the Fuzzy Rule-based Automated Summarization Method (FRASM). Existing work tends to have various limitations which might limit its applicability to the various real-world applications. The existing method is only suitable for the single document summarization where various applications such as research industries tend to summarize information from multiple documents. Methods: This paper proposed Multi-document Automated Summarization Method (MDASM) to introduce the summarization framework which would result in the accurate summarized outcome from the multiple documents. In this work, multi-document summarization is performed whereas in the existing system only single document summarization was performed. Initially document clustering is performed using modified k means cluster algorithm to group the similar kind of documents that provides the same meaning. This is identified by measuring the frequent term measurement. After clustering, pre-processing is performed by introducing the Hybrid TF-IDF and Singular value decomposition technique which would eliminate the irrelevant content and would result in the required content. Then sentence measurement is one by introducing the additional metrics namely Title measurement in addition to the existing work metrics to accurately retrieve the sentences with more similarity. Finally, a fuzzy rule system is applied to perform text summarization. Results: The overall evaluation of the research work is conducted in the MatLab simulation environment from which it is proved that the proposed research method ensures the optimal outcome than the existing research method in terms of accurate summarization. MDASM produces 89.28% increased accuracy, 89.28% increased precision, 89.36% increased recall value and 70% increased the f-measure value which performs better than FRASM. Conclusion: The summarization processes carried out in this work provides the accurate summarized outcome.

APA, Harvard, Vancouver, ISO, and other styles

CHALI, YLLIAS, and SADID A. HASAN. "Query-focused multi-document summarization: automatic data annotations and supervised learning approaches." Natural Language Engineering 18, no. 1 (April 7, 2011): 109–45. http://dx.doi.org/10.1017/s1351324911000167.

Full text

Abstract:

AbstractIn this paper, we apply different supervised learning techniques to build query-focused multi-document summarization systems, where the task is to produce automatic summaries in response to a given query or specific information request stated by the user. A huge amount of labeled data is a prerequisite for supervised training. It is expensive and time-consuming when humans perform the labeling task manually. Automatic labeling can be a good remedy to this problem. We employ five different automatic annotation techniques to build extracts from human abstracts using ROUGE, Basic Element overlap, syntactic similarity measure, semantic similarity measure, and Extended String Subsequence Kernel. The supervised methods we use are Support Vector Machines, Conditional Random Fields, Hidden Markov Models, Maximum Entropy, and two ensemble-based approaches. During different experiments, we analyze the impact of automatic labeling methods on the performance of the applied supervised methods. To our knowledge, no other study has deeply investigated and compared the effects of using different automatic annotation techniques on different supervised learning approaches in the domain of query-focused multi-document summarization.

APA, Harvard, Vancouver, ISO, and other styles

Manju, K., S. David Peter, and Sumam Idicula. "A Framework for Generating Extractive Summary from Multiple Malayalam Documents." Information 12, no. 1 (January 18, 2021): 41. http://dx.doi.org/10.3390/info12010041.

Full text

Abstract:

Automatic extractive text summarization retrieves a subset of data that represents most notable sentences in the entire document. In the era of digital explosion, which is mostly unstructured textual data, there is a demand for users to understand the huge amount of text in a short time; this demands the need for an automatic text summarizer. From summaries, the users get the idea of the entire content of the document and can decide whether to read the entire document or not. This work mainly focuses on generating a summary from multiple news documents. In this case, the summary helps to reduce the redundant news from the different newspapers. A multi-document summary is more challenging than a single-document summary since it has to solve the problem of overlapping information among sentences from different documents. Extractive text summarization yields the sensitive part of the document by neglecting the irrelevant and redundant sentences. In this paper, we propose a framework for extracting a summary from multiple documents in the Malayalam Language. Also, since the multi-document summarization data set is sparse, methods based on deep learning are difficult to apply. The proposed work discusses the performance of existing standard algorithms in multi-document summarization of the Malayalam Language. We propose a sentence extraction algorithm that selects the top ranked sentences with maximum diversity. The system is found to perform well in terms of precision, recall, and F-measure on multiple input documents.

APA, Harvard, Vancouver, ISO, and other styles

Fejer, Hamzah Noori, and Nazlia Omar. "Automatic Multi-Document Arabic Text Summarization Using Clustering and Keyphrase Extraction." Journal of Artificial Intelligence 8, no. 1 (December 15, 2014): 1–9. http://dx.doi.org/10.3923/jai.2015.1.9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Park, Sun, ByungRea Cha, and DongUn An. "Automatic Multi-document Summarization Based on Clustering and Nonnegative Matrix Factorization." IETE Technical Review 27, no. 2 (2010): 167. http://dx.doi.org/10.4103/0256-4602.60169.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Automatic multi-document summarization"

Ou, Shiyan, Christopher S. G. Khoo, and Dion H. Goh. "Automatic multi-document summarization for digital libraries." School of Communication & Information, Nanyang Technological University, 2006. http://hdl.handle.net/10150/106042.

Full text

Abstract:

With the rapid growth of the World Wide Web and online information services, more and more information is available and accessible online. Automatic summarization is an indispensable solution to reduce the information overload problem. Multi-document summarization is useful to provide an overview of a topic and allow users to zoom in for more details on aspects of interest. This paper reports three types of multi-document summaries generated for a set of research abstracts, using different summarization approaches: a sentence-based summary generated by a MEAD summarization system that extracts important sentences using various features, another sentence-based summary generated by extracting research objective sentences, and a variable-based summary focusing on research concepts and relationships. A user evaluation was carried out to compare the three types of summaries. The evaluation results indicated that the majority of users (70%) preferred the variable-based summary, while 55% of the users preferred the research objective summary, and only 25% preferred the MEAD summary.

APA, Harvard, Vancouver, ISO, and other styles

Kipp, Darren. "Shallow semantics for topic-oriented multi-document automatic text summarization." Thesis, University of Ottawa (Canada), 2008. http://hdl.handle.net/10393/27772.

Full text

Abstract:

There are presently a number of NLP tools available which can provide semantic information about a sentence. Connexor Machinese Semantics is one of the most elaborate of such tools in terms of the information it provides. It has been hypothesized that semantic analysis of sentences is required in order to make significant improvements in automatic summarization. Elaborate semantic analysis is still not particularly feasible. In this thesis, I will look at what shallow semantic features are available from an off the shelf semantic analysis tool which might improve the responsiveness of a summary. The aim of this work is to use the information made available as an intermediary approach to improving the responsiveness of summaries. While this approach is not likely to perform as well as full semantic analysis, it is considerably easier to achieve and could provide an important stepping stone in the direction of deeper semantic analysis. As a significant portion of this task we develop mechanisms in various programming languages to view, process, and extract relevant information and features from the data.

APA, Harvard, Vancouver, ISO, and other styles

Camargo, Renata Tironi de. "Investigação de estratégias de sumarização humana multidocumento." Universidade Federal de São Carlos, 2013. https://repositorio.ufscar.br/handle/ufscar/5781.

Full text

Abstract:

Made available in DSpace on 2016-06-02T20:25:21Z (GMT). No. of bitstreams: 1 5583.pdf: 2165924 bytes, checksum: 9508776d3397fc5a516393218f88c50f (MD5) Previous issue date: 2013-08-30
Universidade Federal de Minas Gerais
The multi-document human summarization (MHS), which is the production of a manual summary from a collection of texts from different sources on the same subject, is a little explored linguistic task. Considering the fact that single document summaries comprise information that present recurrent features which are able to reveal summarization strategies, we aimed to investigate multi-document summaries in order to identify MHS strategies. For the identification of MHS strategies, the source texts sentences from the CSTNews corpus (CARDOSO et al., 2011) were manually aligned to their human summaries. The corpus has 50 clusters of news texts and their multi-document summaries in Portuguese. Thus, the alignment revealed the origin of the selected information to compose the summaries. In order to identify whether the selected information show recurrent features, the aligned (and nonaligned) sentences were semi automatically characterized considering a set of linguistic attributes identified in some related works. These attributes translate the content selection strategies from the single document summarization and the clues about MHS. Through the manual analysis of the characterizations of the aligned and non-aligned sentences, we identified that the selected sentences commonly have certain attributes such as sentence location in the text and redundancy. This observation was confirmed by a set of formal rules learned by a Machine Learning (ML) algorithm from the same characterizations. Thus, these rules translate MHS strategies. When the rules were learned and tested in CSTNews by ML, the precision rate was 71.25%. To assess the relevance of the rules, we performed 3 different kinds of intrinsic evaluations: (i) verification of the occurrence of the same strategies in another corpus, and (ii) comparison of the quality of summaries produced by the HMS strategies with the quality of summaries produced by different strategies. Regarding the evaluation (i), which was automatically performed by ML, the rules learned from the CSTNews were tested in a different newspaper corpus and its precision was 70%, which is very close to the precision obtained in the training corpus (CSTNews). Concerning the evaluating (ii), the quality, which was manually evaluated by 10 computational linguists, was considered better than the quality of other summaries. Besides describing features concerning multi-document summaries, this work has the potential to support the multi-document automatic summarization, which may help it to become more linguistically motivated. This task consists of automatically generating multi-document summaries and, therefore, it has been based on the adjustment of strategies identified in single document summarization or only on not confirmed clues about MHS. Based on this work, the automatic process of content selection in multi-document summarization methods may be performed based on strategies systematically identified in MHS.
A sumarização humana multidocumento (SHM), que consiste na produção manual de um sumário a partir de uma coleção de textos, provenientes de fontes-distintas, que abordam um mesmo assunto, é uma tarefa linguística até então pouco explorada. Tomando-se como motivação o fato de que sumários monodocumento são compostos por informações que apresentam características recorrentes, a ponto de revelar estratégias de sumarização, objetivou-se investigar sumários multidocumento com o objetivo de identificar estratégias de SHM. Para a identificação das estratégias de SHM, os textos-fonte (isto é, notícias) das 50 coleções do corpus multidocumento em português CSTNews (CARDOSO et al., 2011) foram manualmente alinhados em nível sentencial aos seus respectivos sumários humanos, relevando, assim, a origem das informações selecionadas para compor os sumários. Com o intuito de identificar se as informações selecionadas para compor os sumários apresentam características recorrentes, as sentenças alinhadas (e não-alinhadas) foram caracterizadas de forma semiautomática em função de um conjunto de atributos linguísticos identificados na literatura. Esses atributos traduzem as estratégias de seleção de conteúdo da sumarização monodocumento e os indícios sobre a SHM. Por meio da análise manual das caracterizações das sentenças alinhadas e não-alinhadas, identificou-se que as sentenças selecionadas para compor os sumários multidocumento comumente apresentam certos atributos, como localização das sentenças no texto e redundância. Essa constatação foi confirmada pelo conjunto de regras formais aprendidas por um algoritmo de Aprendizado de Máquina (AM) a partir das mesmas caracterizações. Tais regras traduzem, assim, estratégias de SHM. Quando aprendidas e testadas no CSTNews pelo AM, as regras obtiveram precisão de 71,25%. Para avaliar a pertinência das regras, 2 avaliações intrínsecas foram realizadas, a saber: (i) verificação da ocorrência das estratégias em outro corpus, e (ii) comparação da qualidade de sumários produzidos pelas estratégias de SHM com a qualidade de sumários produzidos por estratégias diferentes. Na avaliação (i), realizada automaticamente por AM, as regras aprendidas a partir do CSTNews foram testadas em um corpus jornalístico distinto e obtiveram a precisão de 70%, muito próxima da obtida no corpus de treinamento (CSTNews). Na avaliação (ii), a qualidade, avaliada de forma manual por 10 linguistas computacionais, foi considerada superior à qualidade dos demais sumários de comparação. Além de descrever características relativas aos sumários multidocumento, este trabalho, uma vez que gera regras formais (ou seja, explícitas e não-ambíguas), tem potencial de subsidiar a Sumarização Automática Multidocumento (SAM), tornando-a mais linguisticamente motivada. A SAM consiste em gerar sumários multidocumento de forma automática e, para tanto, baseava-se na adaptação das estratégias identificadas na sumarização monodocumento ou apenas em indícios, não comprovados sistematicamente, sobre a SHM. Com base neste trabalho, a seleção de conteúdo em métodos de SAM poderá ser feita com base em estratégias identificadas de forma sistemática na SHM.

APA, Harvard, Vancouver, ISO, and other styles

Zacarias, Andressa Caroline Inácio. "Investigação de métodos de sumarização automática multidocumento baseados em hierarquias conceituais." Universidade Federal de São Carlos, 2016. https://repositorio.ufscar.br/handle/ufscar/7974.

Full text

Abstract:

Submitted by Livia Mello (liviacmello@yahoo.com.br) on 2016-09-30T19:20:49Z No. of bitstreams: 1 DissACIZ.pdf: 2734710 bytes, checksum: bf061fead4f2a8becfcbedc457a68b25 (MD5)
Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T16:19:10Z (GMT) No. of bitstreams: 1 DissACIZ.pdf: 2734710 bytes, checksum: bf061fead4f2a8becfcbedc457a68b25 (MD5)
Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T16:19:17Z (GMT) No. of bitstreams: 1 DissACIZ.pdf: 2734710 bytes, checksum: bf061fead4f2a8becfcbedc457a68b25 (MD5)
Made available in DSpace on 2016-10-20T16:19:25Z (GMT). No. of bitstreams: 1 DissACIZ.pdf: 2734710 bytes, checksum: bf061fead4f2a8becfcbedc457a68b25 (MD5) Previous issue date: 2016-03-29
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
The Automatic Multi-Document Summarization (MDS) aims at creating a single summary, coherent and cohesive, from a collection of different sources texts, on the same topic. The creation of these summaries, in general extracts (informative and generic), requires the selection of the most important sentences from the collection. Therefore, one may use superficial linguistic knowledge (or statistic) or deep knowledge. It is important to note that deep methods, although more expensive and less robust, produce more informative extracts and with more linguistic quality. For the Portuguese language, the sole deep methods that use lexical-conceptual knowledge are based on the frequency of the occurrence of the concepts in the collection for the selection of a content. Considering the potential for application of semantic-conceptual knowledge, the proposition is to investigate MDS methods that start with representation of lexical concepts of source texts in a hierarchy for further exploration of certain hierarchical properties able to distinguish the most relevant concepts (in other words, the topics from a collection of texts) from the others. Specifically, 3 out of 50 CSTNews (multi-document corpus of Portuguese reference) collections were selected and the names that have occurred in the source texts of each collection were manually indexed to the concepts of the WordNet from Princenton (WN.Pr), engendering at the end, an hierarchy with the concepts derived from the collection and other concepts inherited from the WN.PR for the construction of the hierarchy. The hierarchy concepts were characterized in 5 graph metrics (of relevancy) potentially relevant to identify the concepts that compose a summary: Centrality, Simple Frequency, Cumulative Frequency, Closeness and Level. Said characterization was analyzed manually and by machine learning algorithms (ML) with the purpose of verifying the most suitable measures to identify the relevant concepts of the collection. As a result, the measure Centrality was disregarded and the other ones were used to propose content selection methods to MDS. Specifically, 2 sentences selection methods were selected which make up the extractive methods: (i) CFSumm whose content selection is exclusively based on the metric Simple Frequency, and (ii) LCHSumm whose selection is based on rules learned by machine learning algorithms from the use of all 4 relevant measures as attributes. These methods were intrinsically evaluated concerning the informativeness, by means of the package of measures called ROUGE, and the evaluation of linguistic quality was based on the criteria from the TAC conference. Therefore, the 6 human abstracts available in each CSTNews collection were used. Furthermore, the summaries generated by the proposed methods were compared to the extracts generated by the GistSumm summarizer, taken as baseline. The two methods got satisfactory results when compared to the GistSumm baseline and the CFSumm method stands out upon the LCHSumm method.
Na Sumarização Automática Multidocumento (SAM), busca-se gerar um único sumário, coerente e coeso, a partir de uma coleção de textos, de diferentes fontes, que tratam de um mesmo assunto. A geração de tais sumários, comumente extratos (informativos e genéricos), requer a seleção das sentenças mais importantes da coleção. Para tanto, pode-se empregar conhecimento linguístico superficial (ou estatística) ou conhecimento profundo. Quanto aos métodos profundos, destaca-se que estes, apesar de mais caros e menos robustos, produzem extratos mais informativos e com mais qualidade linguística. Para o português, os únicos métodos profundos que utilizam conhecimento léxico-conceitual baseiam na frequência de ocorrência dos conceitos na coleção para a seleção de conteúdo. Tendo em vista o potencial de aplicação do conhecimento semântico-conceitual, propôs-se investigar métodos de SAM que partem da representação dos conceitos lexicais dos textos-fonte em uma hierarquia para a posterior exploração de certas propriedades hierárquicas capazes de distinguir os conceitos mais relevantes (ou seja, os tópicos da coleção) dos demais. Especificamente, selecionaram-se 3 das 50 coleções do CSTNews, corpus multidocumento de referência do português, e os nomes que ocorrem nos textos-fonte de cada coleção foram manualmente indexados aos conceitos da WordNet de Princeton (WN.Pr), gerando, ao final, uma hierarquia com os conceitos constitutivos da coleção e demais conceitos herdados da WN.Pr para a construção da hierarquia. Os conceitos da hierarquia foram caracterizados em função de 5 métricas (de relevância) de grafo potencialmente pertinentes para a identificação dos conceitos a comporem um sumário: Centrality, Simple Frequency, Cumulative Frequency, Closeness e Level. Tal caracterização foi analisada de forma manual e por meio de algoritmos de Aprendizado de Máquina (AM) com o objetivo de verificar quais medidas seriam as mais adequadas para identificar os conceitos relevantes da coleção. Como resultado, a medida Centrality foi descartada e as demais utilizadas para propor métodos de seleção de conteúdo para a SAM. Especificamente, propuseram-se 2 métodos de seleção de sentenças, os quais compõem os métodos extrativos: (i) CFSumm, cuja seleção de conteúdo se baseia exclusivamente na métrica Simple Frequency, e (ii) LCHSumm, cuja seleção se baseia em regras aprendidas por algoritmos de AM a partir da utilização em conjunto das 4 medidas relevantes como atributos. Tais métodos foram avaliados intrinsecamente quanto à informatividade, por meio do pacote de medidas ROUGE, e qualidade linguística, com base nos critérios da conferência TAC. Para tanto, utilizaram-se os 6 abstracts humanos disponíveis em cada coleção do CSTNews. Ademais, os sumários gerados pelos métodos propostos foram comparados aos extratos gerados pelo sumarizador GistSumm, tido como baseline. Os dois métodos obtiveram resultados satisfatórios quando comparados ao baseline GistSumm e o método CFSumm se sobressai ao método LCHSumm.
FAPESP 2014/12817-4

APA, Harvard, Vancouver, ISO, and other styles

Tosta, Fabricio Elder da Silva. "Aplicação de conhecimento léxico-conceitual na sumarização multidocumento multilíngue." Universidade Federal de São Carlos, 2014. https://repositorio.ufscar.br/handle/ufscar/5796.

Full text

Abstract:

Made available in DSpace on 2016-06-02T20:25:23Z (GMT). No. of bitstreams: 1 6554.pdf: 2657931 bytes, checksum: 11403ad2acdeafd11148154c92757f20 (MD5) Previous issue date: 2014-02-27
Financiadora de Estudos e Projetos
Traditionally, Multilingual Multi-document Automatic Summarization (MMAS) is a computational application that, from a single collection of source-texts on the same subject/topic in at least two languages, produces an informative and generic summary (extract) in one of these languages. The simplest methods automatically translate the source-texts and, from a monolingual collection, apply content selection strategies based on shallow and/or deep linguistic knowledge. Therefore, the MMAS applications need to identify the main information of the collection, avoiding the redundancy, but also treating the problems caused by the machine translation (MT) of the full source-texts. Looking for alternatives to the traditional scenario of MMAS, we investigated two methods (Method 1 and 2) that once based on deep linguistic knowledge of lexical-conceptual level avoid the full MT of the sourcetexts, generating informative and cohesive/coherent summaries. In these methods, the content selection starts with the score and the ranking of the original sentences based on the frequency of occurrence of the concepts in the collection, expressed by their common names. In Method 1, only the most well-scored and non redundant sentences from the user s language are selected to compose the extract, until it reaches the compression rate. In Method 2, the original sentences which are better ranked and non redundant are selected to the summary without privileging the user s language; in cases which sentences that are not in the user s language are selected, they are automatically translated. In order to producing automatic summaries according to Methods 1 and 2 and their subsequent evaluation, the CM2News corpus was built. The corpus has 20 collections of news texts, 1 original text in English and 1 original text in Portuguese, both on the same topic. The common names of CM2News were identified through morphosyntactic annotation and then it was semiautomatically annotated with the concepts in Princeton WordNet through the Mulsen graphic editor, which was especially developed for the task. For the production of extracts according to Method 1, only the best ranked sentences in Portuguese were selected until the compression rate was reached. For the production of extracts according to Method 2, the best ranked sentences were selected, without privileging the language of the user. If English sentences were selected, they were automatically translated into Portuguese by the Bing translator. The Methods 1 and 2 were evaluated intrinsically considering the linguistic quality and informativeness of the summaries. To evaluate linguistic quality, 15 computational linguists analyzed manually the grammaticality, non-redundancy, referential clarity, focus and structure / coherence of the summaries and to evaluate the informativeness of the sumaries, they were automatically compared to reference sumaries by ROUGE measures. In both evaluations, the results have shown the better performance of Method 1, which might be explained by the fact that sentences were selected from a single source text. Furthermore, we highlight the best performance of both methods based on lexicalconceptual knowledge compared to simpler methods of MMAS, which adopted the full MT of the source-texts. Finally, it is noted that, besides the promising results on the application of lexical-conceptual knowledge, this work has generated important resources and tools for MMAS, such as the CM2News corpus and the Mulsen editor.
Tradicionalmente, a Sumarização Automática Multidocumento Multilíngue (SAMM) é uma aplicação que, a partir de uma coleção de textos sobre um mesmo assunto em ao menos duas línguas distintas, produz um sumário (extrato) informativo e genérico em uma das línguas-fonte. Os métodos mais simples realizam a tradução automática (TA) dos textos-fonte e, a partir de uma coleção monolíngue, aplicam estratégias superficiais e/ou profundas de seleção de conteúdo. Dessa forma, a SAMM precisa não só identificar a informação principal da coleção para compor o sumário, evitando-se a redundância, mas também lidar com os problemas causados pela TA integral dos textos-fonte. Buscando alternativas para esse cenário, investigaram-se dois métodos (Método 1 e 2) que, uma vez pautados em conhecimento profundo do tipo léxico-conceitual, evitam a TA integral dos textos-fonte, gerando sumários informativos e coesos/coerentes. Neles, a seleção do conteúdo tem início com a pontuação e o ranqueamento das sentenças originais em função da frequência de ocorrência na coleção dos conceitos expressos por seus nomes comuns. No Método 1, apenas as sentenças mais bem pontuadas na língua do usuário e não redundantes entre si são selecionadas para compor o sumário até que se atinja a taxa de compressão. No Método 2, as sentenças originais mais bem ranqueadas e não redundantes entre si são selecionadas para compor o sumário sem que se privilegie a língua do usuário; caso sentenças que não estejam na língua do usuário sejam selecionadas, estas são automaticamente traduzidas. Para a produção dos sumários automáticos segundo os Métodos 1 e 2 e subsequente avaliação dos mesmos, construiu-se o corpus CM2News, que possui 20 coleções de notícias jornalísticas, cada uma delas composta por 1 texto original em inglês e 1 texto original em português sobre um mesmo assunto. Os nomes comuns do CM2News foram identificados via anotação morfossintática e anotados com os conceitos da WordNet de Princeton de forma semiautomática, ou seja, por meio do editor gráfico MulSen desenvolvido para a tarefa. Para a produção dos sumários segundo o Método 1, somente as sentenças em português mais bem pontuadas foram selecionadas até que se atingisse determinada taxa de compressão. Para a produção dos sumários segundo o Método 2, as sentenças mais pontuadas foram selecionadas sem privilegiar a língua do usuário. Caso as sentenças selecionadas estivessem em inglês, estas foram automaticamente traduzidas para o português pelo tradutor Bing. Os Métodos 1 e 2 foram avaliados de forma intrínseca, considerando-se a qualidade linguística e a informatividade dos sumários. Para avaliar a qualidade linguística, 15 linguistas computacionais analisaram manualmente a gramaticalidade, a não-redundância, a clareza referencial, o foco e a estrutura/coerência dos sumários e, para avaliar a informatividade, os sumários foram automaticamente comparados a sumários de referência pelo pacote de medidas ROUGE. Em ambas as avaliações, os resultados evidenciam o melhor desempenho do Método 1, o que pode ser justificado pelo fato de que as sentenças selecionadas são provenientes de um mesmo texto-fonte. Além disso, ressalta-se o melhor desempenho dos dois métodos baseados em conhecimento léxico-conceitual frente aos métodos mais simples de SAMM, os quais realizam a TA integral dos textos-fonte. Por fim, salienta-se que, além dos resultados promissores sobre a aplicação de conhecimento léxico-conceitual, este trabalho gerou recursos e ferramentas importantes para a SAMM, como o corpus CM2News e o editor MulSen.

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Automatic multi-document summarization"

Hovy, Eduard. Text Summarization. Edited by Ruslan Mitkov. Oxford University Press, 2012. http://dx.doi.org/10.1093/oxfordhb/9780199276349.013.0032.

Full text

Abstract:

This article describes research and development on the automated creation of summaries of one or more texts. It defines the concept of summary and presents an overview of the principal approaches in summarization. It describes the design, implementation, and performance of various summarization systems. The stages of automated text summarization are topic identification, interpretation, and summary generation, each having its sub stages. Due to the challenges involved, multi-document summarization is much less developed than single-document summarization. This article reviews particular techniques used in several summarization systems. Finally, this article assesses the methods of evaluating summaries. This article reviews evaluation strategies, from previous evaluation studies, to the two-basic measures method. Summaries are so task and genre specific; therefore, no single measurement covers all cases of evaluation

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Automatic multi-document summarization"

Torres-Moreno, Juan-Manuel. "Guided Multi-Document Summarization." In Automatic Text Summarization, 109–50. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2014. http://dx.doi.org/10.1002/9781119004752.ch4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ravindra, G., N. Balakrishnan, and K. R. Ramakrishnan. "Multi-document Automatic Text Summarization Using Entropy Estimates." In SOFSEM 2004: Theory and Practice of Computer Science, 289–300. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-24618-3_25.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Canhasi, Ercan, and Igor Kononenko. "Automatic Extractive Multi-document Summarization Based on Archetypal Analysis." In Signals and Communication Technology, 75–88. Berlin, Heidelberg: Springer Berlin Heidelberg, 2015. http://dx.doi.org/10.1007/978-3-662-48331-2_3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Xu, Yong-Dong, Fang Xu, Guang-Ri Quan, and Ya-Dong Wang. "Multi-Document Automatic Summarization Based on the Hierarchical Topics." In Lecture Notes in Electrical Engineering, 323–29. New York, NY: Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-2185-6_40.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Roul, Rajendra Kumar, Samarth Mehrotra, Yash Pungaliya, and Jajati Keshari Sahoo. "A New Automatic Multi-document Text Summarization using Topic Modeling." In Distributed Computing and Internet Technology, 212–21. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-05366-6_17.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yin, Wenpeng, Yulong Pei, and Lian’en Huang. "Automatic Multi-document Summarization Based on New Sentence Similarity Measures." In Lecture Notes in Computer Science, 832–37. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-32695-0_81.

Full text

APA, Harvard, Vancouver, ISO, and other styles

He, Qi, Hong-Wei Hao, and Xu-Cheng Yin. "Query-Based Automatic Multi-document Summarization Extraction Method for Web Pages." In Advances in Intelligent and Soft Computing, 107–12. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-28314-7_15.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Debnath, Dipanwita, and Ranjita Das. "Automatic Citation Contextualization Based Scientific Document Summarization Using Multi-objective Differential Evolution." In Advanced Techniques for IoT Applications, 289–301. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-4435-1_28.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abazari Kia, Mahsa. "Automated Multi-document Text Summarization from Heterogeneous Data Sources." In Lecture Notes in Computer Science, 667–71. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-72240-1_78.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mirchev, Uri, and Mark Last. "Multi-Document Summarization by Extended Graph Text Representation and Importance Refinement." In Advances in Data Mining and Database Management, 28–53. IGI Global, 2014. http://dx.doi.org/10.4018/978-1-4666-5019-0.ch002.

Full text

Abstract:

Automatic multi-document summarization is aimed at recognizing important text content in a collection of topic-related documents and representing it in the form of a short abstract or extract. This chapter presents a novel approach to the multi-document summarization problem, focusing on the generic summarization task. The proposed SentRel (Sentence Relations) multi-document summarization algorithm assigns importance scores to documents and sentences in a collection based on two aspects: static and dynamic. In the static aspect, the significance score is recursively inferred from a novel, tripartite graph representation of the text corpus. In the dynamic aspect, the significance score is continuously refined with respect to the current summary content. The resulting summary is generated in the form of complete sentences exactly as they appear in the summarized documents, ensuring the summary's grammatical correctness. The proposed algorithm is evaluated on the TAC 2011 dataset using DUC 2001 for training and DUC 2004 for parameter tuning. The SentRel ROUGE-1 and ROUGE-2 scores are comparable to state-of-the-art summarization systems, which require a different set of textual entities.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Automatic multi-document summarization"

Yong-Dong, Xu, Quan Guang-Ri, Zhang Ting-Bin, and Wang Ya-Dong. "Used Hierarchical Topic to Generate Multi-document Automatic Summarization." In 2011 International Conference on Intelligent Computation Technology and Automation (ICICTA). IEEE, 2011. http://dx.doi.org/10.1109/icicta.2011.84.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yapinus, Glorian, Alva Erwin, Maulhikmah Galinium, and Wahyu Muliady. "Automatic multi-document summarization for Indonesian documents using hybrid abstractive-extractive summarization technique." In 2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE). IEEE, 2014. http://dx.doi.org/10.1109/iciteed.2014.7007896.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Peyrard, Maxime, and Judith Eckle-Kohler. "Supervised Learning of Automatic Pyramid for Optimization-Based Multi-Document Summarization." In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. http://dx.doi.org/10.18653/v1/p17-1100.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Gomes, Laerth, and Hilário Oliveira. "A Multi-document Summarization System for News Articles in Portuguese using Integer Linear Programming." In Encontro Nacional de Inteligência Artificial e Computacional. Sociedade Brasileira de Computação - SBC, 2019. http://dx.doi.org/10.5753/eniac.2019.9320.

Full text

Abstract:

Automatic Text Summarization (ATS) has been demanding intense research in recent years. Its importance is given the fact that ATS systems can aid in the processing of large amounts of textual documents. The ATS task aims to create a summary of one or more documents by extracting their most relevant information. Despite the existence of several works, researches involving the development of ATS systems for documents written in Brazilian Portuguese are still a few. In this paper, we propose a multi-document summarization system following a concept-based approach using Integer Linear Programming for the generation of summaries from news articles written in Portuguese. Experiments using the CSTNews corpus were performed to evaluate different aspects of the proposed system. The experimental results obtained regarding the ROUGE measures demonstrate that the developed system presents encourage results, outperforming other works of the literature.

APA, Harvard, Vancouver, ISO, and other styles

Liu, Yan, Ying Li, Chengcheng Hu, and Yongbin Wang. "An Method of Improved HLDA-Based Multi-document Automatic Summarization of Chinese News." In 2019 6th International Conference on Dependable Systems and Their Applications (DSA). IEEE, 2020. http://dx.doi.org/10.1109/dsa.2019.00068.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rioux, Cody, Sadid A. Hasan, and Yllias Chali. "Fear the REAPER: A System for Automatic Multi-Document Summarization with Reinforcement Learning." In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics, 2014. http://dx.doi.org/10.3115/v1/d14-1075.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bai, Hao, and De-xiang Zhou. "Multi-document Relationship Model for a same subject and its application in automatic summarization." In 2010 Second International Conference on Computational Intelligence and Natural Computing (CINC). IEEE, 2010. http://dx.doi.org/10.1109/cinc.2010.5643796.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Li, Sujian, and Wei Wang. "Automatic Topic-oriented Multi-document Summarization with Combination of Query-dependent and Query-independent Rankers." In 2007 IEEE International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE '07). IEEE, 2007. http://dx.doi.org/10.1109/nlpke.2007.4368068.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lin, Chin-Yew, and Eduard Hovy. "Automated multi-document summarization in NeATS." In the second international conference. Morristown, NJ, USA: Association for Computational Linguistics, 2002. http://dx.doi.org/10.3115/1289189.1289255.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sahoo, Deepak, Rakesh Balabantaray, Mridumoni Phukon, and Saibali Saikia. "Aspect based multi-document summarization." In 2016 International Conference on Computing, Communication and Automation (ICCCA). IEEE, 2016. http://dx.doi.org/10.1109/ccaa.2016.7813838.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Automatic multi-document summarization'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Automatic multi-document summarization"

Dissertations / Theses on the topic "Automatic multi-document summarization"

Books on the topic "Automatic multi-document summarization"

Book chapters on the topic "Automatic multi-document summarization"

Conference papers on the topic "Automatic multi-document summarization"