Journal articles: 'Multi-document summarization'

7

dos Santos Marujo, Luís Carlos. "Event-based Multi-document Summarization." ACM SIGIR Forum 49, no. 2 (2016): 148–49. http://dx.doi.org/10.1145/2888422.2888448.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Li, Jingxuan, Lei Li, and Tao Li. "Multi-document summarization via submodularity." Applied Intelligence 37, no. 3 (2012): 420–30. http://dx.doi.org/10.1007/s10489-012-0336-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Atkinson, John, and Ricardo Munoz. "Rhetorics-based multi-document summarization." Expert Systems with Applications 40, no. 11 (2013): 4346–52. http://dx.doi.org/10.1016/j.eswa.2013.01.017.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Thanh, Tam Doan, Tan Minh Nguyen, Thai Binh Nguyen, et al. "Graph-based and generative approaches to multi-document summarization." Journal of Computer Science and Cybernetics 40, no. 3 (2024): 203–17. https://doi.org/10.15625/1813-9663/18353.

Full text

Abstract:

Multi-document summarization is a challenging problem in the Natural Language Processing field that has drawn a lot of interest from the research community. In this paper, we propose a two-phase pipeline to tackle the Vietnamese abstractive multi-document summarization task. The initial phase of the pipeline involves an extractive summarization stage including two different systems. The first system employs a hybrid model based on the TextRank algorithm and a text correlation consideration mechanism. The second system is a modified version of SummPip - an unsupervised graph-based method for multi-document summarization. The second phase of the pipeline is abstractive summarization models. Particularly, generative models are applied to produce abstractive summaries from previous phase outputs. The proposed method achieves competitive results as we surpassed many strong research teams to finish the first rank in the AbMusu task - Vietnamese abstractive multi-document summarization, organized in the VLSP 2022 workshop.

APA, Harvard, Vancouver, ISO, and other styles

11

Garg, Srashti, and Dr Akash Saxena. "Novel Algorithm for Multi-document Summarization using Lexical Concept." International Journal of Trend in Scientific Research and Development Volume-2, Issue-3 (2018): 2115–19. http://dx.doi.org/10.31142/ijtsrd11644.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Zhang, Xin, Qiyi Wei, Qing Song, and Pengzhou Zhang. "TOMDS (Topic-Oriented Multi-Document Summarization): Enabling Personalized Customization of Multi-Document Summaries." Applied Sciences 14, no. 5 (2024): 1880. http://dx.doi.org/10.3390/app14051880.

Full text

Abstract:

In a multi-document summarization task, if the user can decide on the summary topic, the generated summary can better align with the reader’s specific needs and preferences. This paper addresses the issue of overly general content generation by common multi-document summarization models and proposes a topic-oriented multi-document summarization (TOMDS) approach. The method is divided into two stages: extraction and abstraction. During the extractive stage, it primarily identifies and retrieves paragraphs relevant to the designated topic, subsequently sorting them based on their relevance to the topic and forming an initial subset of documents. In the abstractive stage, building upon the transformer architecture, the process includes two parts: encoding and decoding. In the encoding part, we integrated an external discourse parsing module that focuses on both micro-level within-paragraph semantic relationships and macro-level inter-paragraph connections, effectively combining these with the implicit relationships in the source document to produce more enriched semantic features. In the decoding part, we incorporated a topic-aware attention mechanism that dynamically zeroes in on information pertinent to the chosen topic, thus guiding the summary generation process more effectively. The proposed model was primarily evaluated using the standard text summary dataset WikiSum. The experimental results show that our model significantly enhanced the thematic relevance and flexibility of the summaries and improved the accuracy of grammatical and semantic comprehension in the generated summaries.

APA, Harvard, Vancouver, ISO, and other styles

13

Neduncheli, R., R. Muthucumar, and E. Saranathan. "Evaluation of Multi Document Summarization Techniques." Research Journal of Applied Sciences 7, no. 4 (2012): 229–33. http://dx.doi.org/10.3923/rjasci.2012.229.233.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Bhagat, Kalyani. "Multi Document summarization using EM Clustering." IOSR Journal of Engineering 4, no. 5 (2014): 45–50. http://dx.doi.org/10.9790/3021-04564550.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Qiang, Ji-Peng, Ping Chen, Wei Ding, Fei Xie, and Xindong Wu. "Multi-document summarization using closed patterns." Knowledge-Based Systems 99 (May 2016): 28–38. http://dx.doi.org/10.1016/j.knosys.2016.01.030.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Na, Liu, Tang Di, Lu Ying, Tang Xiao-Jun, and Wang Hai-Wen. "Topic-sensitive multi-document summarization algorithm." Computer Science and Information Systems 12, no. 4 (2015): 1375–89. http://dx.doi.org/10.2298/csis140815060n.

Full text

Abstract:

Latent Dirichlet Allocation (LDA) has been used to generate text corpora topics recently. However, not all the estimated topics are of equal importance or correspond to genuine themes of the domain. Some of the topics can be a collection of irrelevant words or represent insignificant themes. This paper proposed a topic-sensitive algorithm for multi-document summarization. This algorithm uses LDA model and weight linear combination strategy to identify significance topic which is used in sentence weight calculation. Each topic is measured by three different LDA criteria. Significance topic is evaluated by using weight linear combination to combine the multi-criteria. In addition to topic features, the proposed approach also considered some statistics features, such as term frequency, sentence position, sentence length, etc. It not only highlights the advantages of statistics features, but also cooperates with topic model. The experiments showed that the proposed algorithm achieves better performance than the other state-of-the-art algorithms on DUC2002 corpus.

APA, Harvard, Vancouver, ISO, and other styles

17

Carenini, Giuseppe, Jackie Chi Kit Cheung, and Adam Pauls. "MULTI-DOCUMENT SUMMARIZATION OF EVALUATIVE TEXT." Computational Intelligence 29, no. 4 (2012): 545–76. http://dx.doi.org/10.1111/j.1467-8640.2012.00417.x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Wei, Furu, Wenjie Li, Qin Lu, and Yanxiang He. "A document-sensitive graph model for multi-document summarization." Knowledge and Information Systems 22, no. 2 (2009): 245–59. http://dx.doi.org/10.1007/s10115-009-0194-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Mamidala, Kishore Kumar, and Suresh Kumar Sanampudi. "A Novel Framework for Multi-Document Temporal Summarization (MDTS)." Emerging Science Journal 5, no. 2 (2021): 184–90. http://dx.doi.org/10.28991/esj-2021-01268.

Full text

Abstract:

Internet or Web consists of a massive amount of information, handling which is a tedious task. Summarization plays a crucial role in extracting or abstracting key content from multiple sources with its meaning contained, thereby reducing the complexity in handling the information. Multi-document summarization gives the gist of the content collected from multiple documents. Temporal summarization concentrates on temporally related events. This paper proposes a Multi-Document Temporal Summarization (MDTS) technique that generates the summary based on temporally related events extracted from multiple documents. This technique extracts the events with the time stamp. TIMEML standards tags are used in extracting events and times. These event-times are stored in a structured database form for easier operations. Sentence ranking methods are build based on the frequency of events occurrences in the sentence. Sentence similarity measures are computed to eliminate the redundant sentences in an extracted summary. Depending on the required summary length, top-ranked sentences are selected to form the summary. Experiments are conducted on DUC 2006 and DUC 2007 data set that was released for multi-document summarization task. The extracted summaries are evaluated using ROUGE to determine precision, recall and F measure of generated summaries. The performance of the proposed method is compared with particle swarm optimization-based algorithm (PSOS), Cat swarm optimization-based summarization (CSOS), Cuckoo Search based multi-document summarization (MDSCSA). It is found that the performance of MDTS is better when compared with other methods. Doi: 10.28991/esj-2021-01268 Full Text: PDF

APA, Harvard, Vancouver, ISO, and other styles

20

Tran, Mai Vu, Hoang Quynh Le, Duy Cat Can, and Quoc An Nguyen. "A data challenge for Vietnamese abstractive multi-document summarization." Journal of Computer Science and Cybernetics 40, no. 4 (2024): 347–62. https://doi.org/10.15625/1813-9663/18291.

Full text

Abstract:

This paper provides an overview of the Vietnamese abstractive multi-document summarization shared task (AbMuSu) for Vietnamese news, which is hosted at the 9th annual workshop on Vietnamese Language and Speech Processing (VLSP 2022). The main goal of this shared task is to develop automated summarization systems that can generate abstractive summaries for a given set of documents on a specific topic. The input consists of several news documents on the same topic, and the output is a related abstractive summary. The focus of the AbMuSu shared task is solely on Vietnamese news summarization. To this end, a human-annotated dataset comprising 1,839 documents in 600 clusters, collected from Vietnamese news in 8 categories, has been developed. Participating models are evaluated and ranked based on their ROUGE2-F1 score, which is the most common evaluation metric for document summarization problems.

APA, Harvard, Vancouver, ISO, and other styles

21

Maulin, Nasari, and Suganda Girsang Abba. "Automated multi-document summarization using extractiveabstractive approaches." International Journal of Informatics and Communication Technology 13, no. 3 (2024): 400–409. https://doi.org/10.11591/ijict.v13i3.pp400-409.

Full text

Abstract:

This study presents a multi-document text summarizing system that employs a hybrid approach, including both extractive and abstractive methods. The goal of document summarizing is to create a coherent and comprehensive summary that captures the essential information contained in the document. The difficulty in multi-document text summarization lies in the lengthy nature of the input material and the potential for redundant information. This study utilises a combination of methods to address this issue. This study uses the TextRank algorithm as an extractor for each document to condense the input sequence. This extractor is designed to retrieve crucial sentences from each document, which are then aggregated and utilised as input for the abstractor. This study uses bidirectional and auto-regressive transformers (BART) as an abstractor. This abstractor serves to condense the primary sentences in each document into a more cohesive summary. The evaluation of this text summarizing system was conducted using the ROUGE measure. The research yields ROUGE R1 and R2 scores of 41.95 and 14.81, respectively.

APA, Harvard, Vancouver, ISO, and other styles

22

K. Adi Narayana Reddy. "Multi-Document Summarization using Discrete Bat Optimization." Journal of Electrical Systems 20, no. 7s (2024): 831–42. http://dx.doi.org/10.52783/jes.3457.

Full text

Abstract:

With the World Wide Web, we now have a wide range of data that was previously unavailable. Therefore, it has become a complex problem to find useful information in large datasets. In recent years, text summarization has emerged as a viable option for mining relevant data from massive collections of texts. We may classify summarizing as either "single document" or "multi document" depending on how many source documents we are working with. Finding an accurate summary from a collection of documents is more difficult for researchers than doing it from a single document. For this reason, this research proposes a Discrete Bat algorithm Optimization based multi document summarizer (DBAT-MDS) to tackle the issue of multi document summarizing. Comparisons are made between the proposed DBAT-MDS based model and three different summarization algorithms that take their inspiration from the natural world. All methods are evaluated in relation to the benchmark Document Understanding Conference (DUC) datasets using a variety of criteria, such as the ROUGE score and the F score. Compared to the other summarizers used in the experiment, the suggested method performs much better.

APA, Harvard, Vancouver, ISO, and other styles

23

Sarkar, Kamal, and Santanu Dam. "Exploiting Semantic Term Relations in Text Summarization." International Journal of Information Retrieval Research 12, no. 1 (2022): 1–18. http://dx.doi.org/10.4018/ijirr.289607.

Full text

Abstract:

The traditional frequency based approach to creating multi-document extractive summary ranks sentences based on scores computed by summing up TF*IDF weights of words contained in the sentences. In this approach, TF or term frequency is calculated based on how frequently a term (word) occurs in the input and TF calculated in this way does not take into account the semantic relations among terms. In this paper, we propose methods that exploits semantic term relations for improving sentence ranking and redundancy removal steps of a summarization system. Our proposed summarization system has been tested on DUC 2003 and DUC 2004 benchmark multi-document summarization datasets. The experimental results reveal that performance of our multi-document text summarizer is significantly improved when the distributional term similarity measure is used for finding semantic term relations. Our multi-document text summarizer also outperforms some well known summarization baselines to which it is compared.

APA, Harvard, Vancouver, ISO, and other styles

24

Rahamat Basha, S., J. Keziya Rani, and J. J. C. Prasad Yadav. "A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification Accuracy." Engineering, Technology & Applied Science Research 9, no. 6 (2019): 5001–5. http://dx.doi.org/10.48084/etasr.3173.

Full text

Abstract:

Automatic summarization is the process of shortening one (in single document summarization) or multiple documents (in multi-document summarization). In this paper, a new feature selection method for the nearest neighbor classifier by summarizing the original training documents based on sentence importance measure is proposed. Our approach for single document summarization uses two measures for sentence similarity: the frequency of the terms in one sentence and the similarity of that sentence to other sentences. All sentences were ranked accordingly and the sentences with top ranks (with a threshold constraint) were selected for summarization. The summary of every document in the corpus is taken into a new document used for the summarization evaluation process.

APA, Harvard, Vancouver, ISO, and other styles

25

Rahamat, Basha S., Rani J. Keziya, and Yadav J. J. C. Prasad. "A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification Accuracy." Engineering, Technology & Applied Science Research 9, no. 6 (2019): 5001–5. https://doi.org/10.5281/zenodo.3566535.

Full text

Abstract:

Automatic summarization is the process of shortening one (in single document summarization) or multiple documents (in multi-document summarization). In this paper, a new feature selection method for the nearest neighbor classifier by summarizing the original training documents based on sentence importance measure is proposed. Our approach for single document summarization uses two measures for sentence similarity: the frequency of the terms in one sentence and the similarity of that sentence to other sentences. All sentences were ranked accordingly and the sentences with top ranks (with a threshold constraint) were selected for summarization. The summary of every document in the corpus is taken into a new document used for the summarization evaluation process.

APA, Harvard, Vancouver, ISO, and other styles

26

HIRAO, TSUTOMU, HIDETO KAZAWA, HIDEKI ISOZAKI, EISAKU MAEDA, and YUJI MATSUMOTO. "Machine Learning Approach to Multi-Document Summarization." Journal of Natural Language Processing 10, no. 1 (2003): 81–108. http://dx.doi.org/10.5715/jnlp.10.81.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Yan, Xiaodong, Yiqin Wang, Wei Song, Xiaobing Zhao, A. Run, and Yang Yanxing. "Unsupervised Graph-Based Tibetan Multi-Document Summarization." Computers, Materials & Continua 73, no. 1 (2022): 1769–81. http://dx.doi.org/10.32604/cmc.2022.027301.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Soriyan, Abimbola, and Theresa Omodunbi. "Trends in Multi-document Summarization System Methods." International Journal of Computer Applications 97, no. 16 (2014): 46–52. http://dx.doi.org/10.5120/17095-7804.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Qin, Bing. "Sentences Optimum Selection for Multi-Document Summarization." Journal of Computer Research and Development 43, no. 6 (2006): 1129. http://dx.doi.org/10.1360/crad20060625.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Tohalino, Jorge V., and Diego R. Amancio. "Extractive multi-document summarization using multilayer networks." Physica A: Statistical Mechanics and its Applications 503 (August 2018): 526–39. http://dx.doi.org/10.1016/j.physa.2018.03.013.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Harabagiu, Sanda, and Finley Lacatusu. "Using topic themes for multi-document summarization." ACM Transactions on Information Systems 28, no. 3 (2010): 1–47. http://dx.doi.org/10.1145/1777432.1777436.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

He, Ruifang, Jiliang Tang, Pinghua Gong, Qinghua Hu, and Bo Wang. "Multi-document summarization via group sparse learning." Information Sciences 349-350 (July 2016): 12–24. http://dx.doi.org/10.1016/j.ins.2016.02.032.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Chaudhary, Nidhi, and Shalini Kapoor. "Key Phrase Extraction Based Multi-Document Summarization." International Journal of Engineering Trends and Technology 13, no. 4 (2014): 148–53. http://dx.doi.org/10.14445/22315381/ijett-v13p232.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Rautray, Rasmita, Rakesh Chandra Balabantaray, Rasmita Dash, and Rajashree Dash. "CSMDSE-Cuckoo Search Based Multi Document Summary Extractor." International Journal of Cognitive Informatics and Natural Intelligence 13, no. 4 (2019): 56–70. http://dx.doi.org/10.4018/ijcini.2019100103.

Full text

Abstract:

In the current scenario, managing of a useful web of information has become a challenging issue due to a large amount of information related to many fields is online. The summarization of text is considered as one of the solutions to extract pertinent text from vast documents. Hence, a novel Cuckoo Search-based multi document summary extractor (CSMDSE) is presented to handle the multi-document summarization (MDS) problem. The proposed CSMDSE is assimilating with few other swarm-based summary extractors, such as Cat Swarm Optimization based Extractor (CSOE), Particle Swarm Optimization based Extractor (PSOE), Improved Particle Swarm Optimization based Extractor (IPSOE) and Ant Colony Optimization based Extractor (ACOE). Finally, a simulation of CSMDSE is compared with other techniques with respect to the traditional benchmark datasets for summarization problem. The experimental analysis clearly indicates CSMDSE has good performance than the other summary extractors discussed in this study.

APA, Harvard, Vancouver, ISO, and other styles

35

Rautaray, Jyotirmayee, Sangram Panigrahi, and Ajit Kumar Nayak. "Integrating particle swarm optimization with backtracking search optimization feature extraction with two-dimensional convolutional neural network and attention-based stacked bidirectional long short-term memory classifier for effective single and multi-document summarization." PeerJ Computer Science 10 (December 12, 2024): e2435. https://doi.org/10.7717/peerj-cs.2435.

Full text

Abstract:

The internet now offers a vast amount of information, which makes finding relevant data quite challenging. Text summarization has become a prominent and effective method towards glean important information from numerous documents. Summarization techniques are categorized into single-document and multi-document. Single-document summarization (SDS) targets on single document, whereas multi-document summarization (MDS) combines information from several sources, posing a greater challenge for researchers to create precise summaries. In the realm of automatic text summarization, advanced methods such as evolutionary algorithms, deep learning, and clustering have demonstrated promising outcomes. This study introduces an improvised Particle Swarm Optimization with Backtracking Search Optimization (PSOBSA) designed for feature extraction. For classification purpose, it recommends two-dimensional convolutional neural network (2D CNN) along with an attention-based stacked bidirectional long short-term memory (ABS-BiLSTM) model to generate new summarized sentences by analyzing entire sentences. The model’s performance is assessed using datasets from DUC 2002, 2003, and 2005 for single-document summarization, and from DUC 2002, 2003, and 2005, Multi-News, and CNN/Daily Mail for multi-document summarization. It is compared against five advanced techniques: particle swarm optimization (PSO), Cat Swarm Optimization (CSO), long short-term memory (LSTM) with convolutional neural networks (LSTM-CNN), support vector regression (SVR), bee swarm algorithm (BSA), ant colony optimization (ACO) and the firefly algorithm (FFA). The evaluation metrics include ROUGE score, BLEU score, cohesion, sensitivity, positive predictive value, readability, and scenarios of best, worst, and average case performance to ensure coherence, non-redundancy, and grammatical correctness. The experimental findings demonstrate that the suggested model works better than the other summarizing techniques examined in this research.

APA, Harvard, Vancouver, ISO, and other styles

36

Takale, Sheetal A., Prakash J. Kulkarni, and Sahil K. Shah. "An Intelligent Web Search Using Multi-Document Summarization." International Journal of Information Retrieval Research 6, no. 2 (2016): 41–65. http://dx.doi.org/10.4018/ijirr.2016040103.

Full text

Abstract:

Information available on the internet is huge, diverse and dynamic. Current Search Engine is doing the task of intelligent help to the users of the internet. For a query, it provides a listing of best matching or relevant web pages. However, information for the query is often spread across multiple pages which are returned by the search engine. This degrades the quality of search results. So, the search engines are drowning in information, but starving for knowledge. Here, we present a query focused extractive summarization of search engine results. We propose a two level summarization process: identification of relevant theme clusters, and selection of top ranking sentences to form summarized result for user query. A new approach to semantic similarity computation using semantic roles and semantic meaning is proposed. Document clustering is effectively achieved by application of MDL principle and sentence clustering and ranking is done by using SNMF. Experiments conducted demonstrate the effectiveness of system in semantic text understanding, document clustering and summarization.

APA, Harvard, Vancouver, ISO, and other styles

37

Srashti, Garg, and Akash Saxena Dr. "Novel Algorithm for Multi document Summarization using Lexical Concept." International Journal of Trend in Scientific Research and Development 2, no. 3 (2018): 2115–19. https://doi.org/10.31142/ijtsrd11644.

Full text

Abstract:

Text summarization is the necessity of the society as we are surrounded my various documents which if summarized will not only save our time and but also let us to go through more number of documents in the same time. In this paper we presented the a novel approach for multiple document summarization using the lexical chains with taken into concern the adjective, adverbs , nouns etc.. ,for the formation of the lexical chains. Together with that the better approach is used for the tagging which results in better results for recall when compared the results with the base paper. Srashti Garg | Dr. Akash Saxena "Novel Algorithm for Multi-document Summarization using Lexical Concept" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-3 , April 2018, URL: https://www.ijtsrd.com/papers/ijtsrd11644.pdf

APA, Harvard, Vancouver, ISO, and other styles

38

Yang, Guangbing. "Using Contextual Topic Model for a Query-Focused Multi-Document Summarizer." International Journal on Artificial Intelligence Tools 25, no. 01 (2016): 1660002. http://dx.doi.org/10.1142/s0218213016600022.

Full text

Abstract:

Oft-decried information overload is a serious problem that negatively impacts the comprehension of information in the digital age. Text summarization is a helpful process that can be used to alleviate this problem. With the aim of seeking a novel method to enhance the performance of multi-document summarization, this study proposes a novel approach to analyze the problem of multi-document summarization based on a mixture model, consisting of a contextual topic model from a Bayesian hierarchical topic modeling family for selecting candidate summary sentences, and a regression model in machine learning for generating the summary. By investigating hierarchical topics and their correlations with respect to the lexical co-occurrences of words, the proposed contextual topic model can determine the relevance of sentences more effectively, recognize latent topics, and arrange them hierarchically. The quantitative evaluation results from a practical application demonstrates that a system implementing this model can significantly improve the performance of summarization and make it comparable to state-of-the-art summarization systems.

APA, Harvard, Vancouver, ISO, and other styles

39

Zuhair, Hussein Ali, Kawther Hussein Ahmed, Kareem Abass Haithem, and Fadel Elham. "Extractive multi document summarization using harmony search algorithm." TELKOMNIKA Telecommunication, Computing, Electronics and Control 19, no. 1 (2021): pp. 89~95. https://doi.org/10.12928/TELKOMNIKA.v19i1.15766.

Full text

Abstract:

The exponential growth of information on the internet makes it troublesome for users to get valuable information. Text summarization is the process to overcome such a problem. An adequate summary must have wide coverage, high diversity, and high readability. In this article, a new method for multi document summarization has been supposed based on a harmony search algorithm that optimizes the coverage, diversity, and readability. Concerning the benchmark dataset Text Analysis Conference (TAC-2011), the ROUGE package used to measure the effectiveness of the proposed model. The calculated results support the effectiveness of the proposed approach.  

APA, Harvard, Vancouver, ISO, and other styles

40

Bewoor, M. S., and S. H. Patil. "Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms." Engineering, Technology & Applied Science Research 8, no. 1 (2018): 2562–67. http://dx.doi.org/10.48084/etasr.1775.

Full text

Abstract:

The availability of various digital sources has created a demand for text mining mechanisms. Effective summary generation mechanisms are needed in order to utilize relevant information from often overwhelming digital data sources. In this view, this paper conducts a survey of various single as well as multi-document text summarization techniques. It also provides analysis of treating a query sentence as a common one, segmented from documents for text summarization. Experimental results show the degree of effectiveness in text summarization over different clustering algorithms.

APA, Harvard, Vancouver, ISO, and other styles

41

Nasari, Maulin, and Abba Suganda Girsang. "Automated multi-document summarization using extractive-abstractive approaches." International Journal of Informatics and Communication Technology (IJ-ICT) 13, no. 3 (2024): 400. http://dx.doi.org/10.11591/ijict.v13i3.pp400-409.

Full text

Abstract:

This study presents a multi-document text summarizing system that employs a hybrid approach, including both extractive and abstractive methods. The goal of document summarizing is to create a coherent and comprehensive summary that captures the essential information contained in the document. The difficulty in multi-document text summarization lies in the lengthy nature of the input material and the potential for redundant information. This study utilises a combination of methods to address this issue. This study uses the TextRank algorithm as an extractor for each document to condense the input sequence. This extractor is designed to retrieve crucial sentences from each document, which are then aggregated and utilised as input for the abstractor. This study uses bidirectional and auto-regressive transformers (BART) as an abstractor. This abstractor serves to condense the primary sentences in each document into a more cohesive summary. The evaluation of this text summarizing system was conducted using the ROUGE measure. The research yields ROUGE R1 and R2 scores of 41.95 and 14.81, respectively.

APA, Harvard, Vancouver, ISO, and other styles

42

Lucky, Henry, and Derwin Suhartono. "Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization." Journal of Information and Communication Technology 21, No.1 (2021): 71–94. http://dx.doi.org/10.32890/jict2022.21.1.4.

Full text

Abstract:

Text summarization aims to reduce text by removing less useful information to obtain information quickly and precisely. In Indonesian abstractive text summarization, the research mostly focuses on multi-document summarization which methods will not work optimally in single-document summarization. As the public summarization datasets and works in English are focusing on single-document summarization, this study emphasized on Indonesian single-document summarization. Abstractive text summarization studies in English frequently use Bidirectional Encoder Representations from Transformers (BERT), and since Indonesian BERT checkpoint is available, it was employed in this study. This study investigated the use of Indonesian BERT in abstractive text summarization on the IndoSum dataset using the BERTSum model. The investigation proceeded by using various combinations of model encoders, model embedding sizes, and model decoders. Evaluation results showed that models with more embedding size and used Generative Pre-Training (GPT)-like decoder could improve the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score and BERTScore of the model results.

APA, Harvard, Vancouver, ISO, and other styles

43

DeYoung, Jay, Stephanie C. Martinez, Iain J. Marshall, and Byron C. Wallace. "Do Multi-Document Summarization Models Synthesize?" Transactions of the Association for Computational Linguistics 12 (2024): 1043–62. http://dx.doi.org/10.1162/tacl_a_00687.

Full text

Abstract:

Abstract Multi-document summarization entails producing concise synopses of collections of inputs. For some applications, the synopsis should accurately synthesize inputs with respect to a key aspect, e.g., a synopsis of film reviews written about a particular movie should reflect the average critic consensus. As a more consequential example, narrative summaries that accompany biomedical systematic reviews of clinical trial results should accurately summarize the potentially conflicting results from individual trials. In this paper we ask: To what extent do modern multi-document summarization models implicitly perform this sort of synthesis? We run experiments over opinion and evidence synthesis datasets using a suite of summarization models, from fine-tuned transformers to GPT-4. We find that existing models partially perform synthesis, but imperfectly: Even the best performing models are over-sensitive to changes in input ordering and under-sensitive to changes in input compositions (e.g., ratio of positive to negative reviews). We propose a simple, general, effective method for improving model synthesis capabilities by generating an explicitly diverse set of candidate outputs, and then selecting from these the string best aligned with the expected aggregate measure for the inputs, or abstaining when the model produces no good candidate.

APA, Harvard, Vancouver, ISO, and other styles

44

Sanchez-Gómez, Jesús Manuel, Miguél Ángel Vega-Rodríguez, and Carlos J. Pérez. "A Decomposition-based Multi-Objective Optimization Approach for Extractive Multi-Document Text Summarization." Applied Soft Computing 91 (June 5, 2020): 106231. https://doi.org/10.1016/j.asoc.2020.106231.

Full text

Abstract:

Currently, due to the over ow of textual information on the Internet, automatic text summarization methods are becoming increasingly important in many  elds of knowledge. Extractive multi-document text summarization approaches are intended to automatically generate summaries from a document collection, covering the main content and avoiding redundant information. These approaches can be addressed through optimization techniques. In the scientific literature, most of them are single-objective optimization approaches,but recently multi-objective approaches have been developed and they have improved the single-objective existing results. In addition, in the field of multi-objective optimization, decomposition-based approaches arebeing successfully applied increasingly. For this reason, a Multi-Objective Artificial Bee Colony algorithm based on Decomposition (MOABC/D) is proposed to solve the extractive multi-document text summarization problem. An asynchronous parallel design of MOABC/D algorithm has been implemented in order to take advantage of multi-core architectures. Experimentshave been carried out with Document Understanding Conferences (DUC) datasets, and the results have been evaluated with Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. The obtained results have improved the existing ones in the scientific literature for ROUGE-1, ROUGE-2, and ROUGE-L scores, also reporting a very good speedup.

APA, Harvard, Vancouver, ISO, and other styles

45

Bewoor, Mrunal S., and Suhas H. Patil. "Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms." Engineering, Technology & Applied Science Research 8, no. 1 (2018): 2562–67. https://doi.org/10.5281/zenodo.1207394.

Full text

Abstract:

Abstract—The availability of various digital sources has created a demand for text mining mechanisms. Effective summary generation mechanisms are needed in order to utilize relevant information from often overwhelming digital data sources. In this view, this paper conducts a survey of various single as well as multi-document text summarization techniques. It also provides analysis of treating a query sentence as a common one, segmented from documents for text summarization. Experimental results show the degree of effectiveness in text summarization over different clustering algorithms.

APA, Harvard, Vancouver, ISO, and other styles

46

Mustamiin, Muhamad, Ahmad Lubis Ghozali, and Muhammad Lukman Sifa. "Peringkasan Multi-dokumen menggunakan Metode Pengelompokkan berbasis Hirarki dengan Multi-level Divisive Coefficient." Jurnal Teknologi Informasi dan Ilmu Komputer 5, no. 6 (2018): 697. http://dx.doi.org/10.25126/jtiik.2018561149.

Full text

Abstract:

Peringkasan merupakan salah satu bagian dari perolehan informasi yang bertujuan untuk mendapatkan informasi secara cepat dan efisien dengan membuat intisari dari suatu dokumen. Dokumen-dokumen khususnya dokumen laporan setiap hari semakin bertambah seiring dengan bertambahnya pelaksanaan suatu kegiatan atau acara. Kebutuhan informasi yang semakin cepat, jumlah dokumen yang semakin bertambah banyak membuat kebutuhan akan adanya peringkasan dokumen semakin tinggi. Peringkasan yang digunakan untuk meringkas lebih dari satu dokumen disebut peringkasan multi-dokumen. Untuk mencegah adanya penyampaian informasi yang berulang pada peringkasan multi-dokumen, maka proses pengelompokkan diperlukan untuk menjamin bahwa informasi yang disampaikan bervariasi dan mencakup semua bagian dari dokumen-dokumen tersebut. Pengelompokkan hirarki dengan multi-level divisive coefficient dapat digunakan untuk mengelompokkan suatu bagian/kalimat dalam dokumen-dokumen dengan bervariasi dan mendalam yang disesuaikan dengan tingkat kebutuhan informasi dari pengguna. Bedasarkan dari tingkat kompresi peringkasan yang berbeda-beda, peringkasan menggunakan pengelompokkan hirarki dengan multi-level divisive coefficient dapat menghasilkan hasil peringkasan yang cukup baik dengan nilai f-measure sebesar 0,398 sementara nilai f-measure peringkasan dengan satu level divisive coefficient hanya mencapai 0,335.AbstractSummarization is one part of the information retrieval that aims to obtain information quickly and efficiently by making the essence of a document. Documents, especially document reports every day increasing as the implementation of an event. The need for information is getting faster, the increasing number of documents makes the need for document summaries is getting higher. Summarization used to summarize more than one document is called multi-document summarization. To prevent repetitive information from being submitted to multi-document summarization, the grouping process is necessary to ensure that the information submitted varies and covers all parts of the documents. Hierarchical clustering with multi-level divisive coefficient can be used to group a part / sentence in documents with varying and depth adjusted to the level of information needs of the user. Based on different compression levels of summarization, summarization using hierarchical clustering with multi-level divisive coefficient can produce a fairly good summary result with f-measure value of 0.398 while the f-measure summarization value with one level of divisive coefficient only reaches 0.335.

APA, Harvard, Vancouver, ISO, and other styles

47

Aris, Fanani, Farida Yuniar, Prima Arhandi Putra, Mahaputra Hidaya M., Muhid Abdul, and Montolalu Billy. "Regression model focused on query for multi documents summarization based on significance of the sentence position." TELKOMNIKA Telecommunication, Computing, Electronics and Control 17, no. 6 (2019): 3050–56. https://doi.org/10.12928/TELKOMNIKA.v17i6.12494.

Full text

Abstract:

Document summarization is needed to get the information effectively and efficiently. One method used to obtain the document summarization by applying machine learning techniques. This paper proposes the application of regression models to query-focused multi-document summarization based on the significance of the sentence position. The method used is the Support Vector Regression (SVR) which estimates the weight of the sentence on a set of documents to be made as a summary based on sentence feature which has been defined previously. A series of evaluations performed on a data set of DUC 2005. From the test results obtained summary which has an average precision and recall values of 0.0580 and 0.0590 for measurements using ROUGE-2, ROUGE 0.0997 and 0.1019 for measurements using the proposed regression-SU4. Model can perform measurements of the significance of the position of the sentence in the document well.

APA, Harvard, Vancouver, ISO, and other styles

48

KITAJIMA, Risa, and Ichiro KOBAYASHI. "Graph based Multi-Document Summarization with Latent Topics." Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 25, no. 6 (2013): 914–23. http://dx.doi.org/10.3156/jsoft.25.914.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Dharan A C, Dharani. "Multi-Document Summarization Using K-Medoids Clustering Approach." International Journal for Research in Applied Science and Engineering Technology V, no. II (2017): 639–41. http://dx.doi.org/10.22214/ijraset.2017.2096.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Abid, Azal Minshed. "Multi-Document Text Summarization Using Deep Belief Network." International Journal of Advances in Scientific Research and Engineering 08, no. 08 (2022): 56–65. http://dx.doi.org/10.31695/ijasre.2022.8.8.7.

Full text

Abstract:

Recently, there is a lot of information available on the Internet, which makes it difficult for users to find what they're looking for. Extractive text summarization methods are designed to reduce the amount of text in a document collection by focusing on the most important information and reducing the redundant information. Summarizing documents should not affect the main ideas and the meaning of the original text. This paper proposes a new automatic, generic, and extractive multi-document summarizing model aiming at producing a sufficiently informative summary. The idea of the proposed model is based on extracting nine different features from each sentence in the document collection. The extracted features are introduced as input to the Deep Belief Network (DBN) for the classification purpose as either important or unimportant sentences. Only, the important sentences pass to the next phase to construct a graph. The PageRank algorithm is used to assign scores to the graph sentences. The sentences with high scores were selected to create a summary document. The performance of the proposed model was evaluated using the DUC-2004 (Task2) dataset using ROUGE more. The experimental results demonstrate that our proposed model is more effective than the baseline method and some state-of-the-art methods, Where ROUGE-1 reached 0.4032 and ROUGE-2 to 0.1021.

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Multi-document summarization'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles