To see the other types of publications on this topic, follow the link: News summarization.

Journal articles on the topic 'News summarization'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'News summarization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Chang, Hsien-Tsung, Shu-Wei Liu, and Nilamadhab Mishra. "A tracking and summarization system for online Chinese news topics." Aslib Journal of Information Management 67, no. 6 (2015): 687–99. http://dx.doi.org/10.1108/ajim-10-2014-0147.

Full text
Abstract:
Purpose – The purpose of this paper is to design and implement new tracking and summarization algorithms for Chinese news content. Based on the proposed methods and algorithms, the authors extract the important sentences that are contained in topic stories and list those sentences according to timestamp order to ensure ease of understanding and to visualize multiple news stories on a single screen. Design/methodology/approach – This paper encompasses an investigational approach that implements a new Dynamic Centroid Summarization algorithm in addition to a Term Frequency (TF)-Density algorithm to empirically compute three target parameters, i.e., recall, precision, and F-measure. Findings – The proposed TF-Density algorithm is implemented and compared with the well-known algorithms Term Frequency-Inverse Word Frequency (TF-IWF) and Term Frequency-Inverse Document Frequency (TF-IDF). Three test data sets are configured from Chinese news web sites for use during the investigation, and two important findings are obtained that help the authors provide more precision and efficiency when recognizing the important words in the text. First, the authors evaluate three topic tracking algorithms, i.e., TF-Density, TF-IDF, and TF-IWF, with the said target parameters and find that the recall, precision, and F-measure of the proposed TF-Density algorithm is better than those of the TF-IWF and TF-IDF algorithms. In the context of the second finding, the authors implement a blind test approach to obtain the results of topic summarizations and find that the proposed Dynamic Centroid Summarization process can more accurately select topic sentences than the LexRank process. Research limitations/implications – The results show that the tracking and summarization algorithms for news topics can provide more precise and convenient results for users tracking the news. The analysis and implications are limited to Chinese news content from Chinese news web sites such as Apple Library, UDN, and well-known portals like Yahoo and Google. Originality/value – The research provides an empirical analysis of Chinese news content through the proposed TF-Density and Dynamic Centroid Summarization algorithms. It focusses on improving the means of summarizing a set of news stories to appear for browsing on a single screen and carries implications for innovative word measurements in practice.
APA, Harvard, Vancouver, ISO, and other styles
2

ber, Bam, and Micah Jason. "News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages." Bonfring International Journal of Data Mining 7, no. 2 (2017): 11–15. http://dx.doi.org/10.9756/bijdm.8339.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

M., Nafees Muneera, and P.Sriramya. "Extractive Text Summarization for Social News using Hybrid Techniques in Opinion Mining." International Journal of Engineering and Advanced Technology (IJEAT) 9, no. 3 (2020): 2109–15. https://doi.org/10.35940/ijeat.B3356.029320.

Full text
Abstract:
Presently almost all enterprises are oriented into building text data in abundance savoring the benefits of big data concept but the reality is that it’s not practically possible to go through all this data/documents for decision making because of the time constraint. Here in exists intense need of an approach as an alternative for the actual content which can summarize the complete textual content. By adopting these summarizing approaches, the accuracy in data retrieval of summarized content via search queries can be enhanced compared to performing search over the broad range of original textual content. There are many text summarization techniques formulated having their own pros and cons. The present work focuses on a comprehensive news review of extractive text summarization process methods and also taking into account, data appended dynamically. The existing work recommends a technique of hybrid text summarization that’s a blend of CRF (conditional random fields) and LSA (Latent Semantic Analysis) which being highly adhesive with low redundant summary and coherent and in-depth information. The above hybrid techniques is being extracted in five types that being: Positive and negative, statement, questions, suggestions and comments. The technique of LSA extracts hidden semantic structures within words/sentences that being commonly utilized in the process of summarization. The statistical modeling technique of CRF adopts ML (machine leaning) for offering structured detection and providing multiple options for evaluation of opinion summarization thereby identifying the most appropriate algorithm for news text summarizations considering the heavy volume of datasets.
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Chih-Yuan, Soon Ae Chun, and James Geller. "Perspective-Based Microblog Summarization." Information 16, no. 4 (2025): 285. https://doi.org/10.3390/info16040285.

Full text
Abstract:
Social media allows people to express and share a variety of experiences, opinions, beliefs, interpretations, or viewpoints on a single topic. Summarizing a collection of social media posts (microblogs) on one topic may be challenging and can result in an incoherent summary due to multiple perspectives from different users. We introduce a novel approach to microblog summarization, the Multiple-View Summarization Framework (MVSF), designed to efficiently generate multiple summaries from the same social media dataset depending on chosen perspectives and deliver personalized and fine-grained summaries. The MVSF leverages component-of-perspective computing, which can recognize the perspectives expressed in microblogs, such as sentiments, political orientations, or unreliable opinions (fake news). The perspective computing can filter social media data to summarize them according to specific user-selected perspectives. For the summarization methods, our framework implements three extractive summarization methods: Entity-based, Social Signal-based, and Triple-based. We conduct comparative evaluations of MVSF summarizations against state-of-the-art summarization models, including BertSum, SBert, T5, and Bart-Large-CNN, by using a gold-standard BBC news dataset and Rouge scores. Furthermore, we utilize a dataset of 18,047 tweets about COVID-19 vaccines to demonstrate the applications of MVSF. Our contributions include the innovative approach of using user perspectives in summarization methods as a unified framework, capable of generating multiple summaries that reflect different perspectives, in contrast to prior approaches of generating one-size-fits-all summaries for one dataset. The practical implication of MVSF is that it offers users diverse perspectives from social media data. Our prototype web application is also implemented using ChatGPT to show the feasibility of our approach.
APA, Harvard, Vancouver, ISO, and other styles
5

Arya, Chandrakala, Manoj Diwakar, Prabhishek Singh, Vijendra Singh, Seifedine Kadry, and Jungeun Kim. "Multi-Document News Web Page Summarization Using Content Extraction and Lexical Chain Based Key Phrase Extraction." Mathematics 11, no. 8 (2023): 1762. http://dx.doi.org/10.3390/math11081762.

Full text
Abstract:
In the area of text summarization, there have been significant advances recently. In the meantime, the current trend in text summarization is focused more on news summarization. Therefore, developing a synthesis approach capable of extracting, comparing, and ranking sentences is vital to create a summary of various news articles in the context of erroneous online data. It is necessary, however, for the news summarization system to be able to deal with multi-document summaries due to content redundancy. This paper presents a method for summarizing multi-document news web pages based on similarity models and sentence ranking, where relevant sentences are extracted from the original article. English-language articles are collected from five news websites that cover the same topic and event. According to our experimental results, our approach provides better results than other recent methods for summarizing news.
APA, Harvard, Vancouver, ISO, and other styles
6

Arora, Amita, Ashlesha Gupta, Manvi Siwach, et al. "Web-Based News Straining and Summarization Using Machine Learning Enabled Communication Techniques for Large-Scale 5G Networks." Wireless Communications and Mobile Computing 2022 (June 23, 2022): 1–15. http://dx.doi.org/10.1155/2022/3792816.

Full text
Abstract:
In recent times, text summarization has gained enormous attention from the research community. Among the many uses of natural language processing, text summarization has emerged as a critical component in information retrieval. In particular, within the past two decades, many attempts have been undertaken by researchers to provide robust, useful summaries of their findings. Text summarizing may be described as automatically constructing a summary version of a given document while keeping the most important information included within the content itself. This method also aids users in quickly grasping the fundamental notions of information sources. The current trend in text summarizing, on the other hand, is increasingly focused on the area of news summaries. The first work in summarizing was done using a single-document summary as a starting point. The summarizing of a single document generates a summary of a single paper. As research advanced, mainly due to the vast quantity of information available on the internet, the concept of multidocument summarization evolved. Multidocument summarization generates summaries from a large number of source papers that are all about the same subject or are about the same event. Because of the content duplication, the news summarization system, on the other hand, is unable to cope with multidocument news summarizations well. Using the Naive Bayes classifier for classification, news websites were distinguished from nonnews web pages by extracting content, structure, and URL characteristics. The classifier was then used to differentiate between the two groups. A comparison is also made between the Naive Bayes classifier and the SMO and J48 classifiers for the same dataset. The findings demonstrate that it performs much better than the other two. After those important contents have been extracted from the correctly classified newscast web pages. Then, extracted relevant content is used for the keyphrase extraction from the news articles. Keyphrases can be a single word or a combination of more than one word representing the news article’s significant concept. Our proposed approach of crucial phrase extraction is based on identifying candidate phrases from the news articles and choosing the highest weight candidate phrase using the weight formula. Weight formula includes features such as TFIDF, phrase position, and construction of lexical chain to represent the semantic relations between words using WordNet. The proposed approach shows promising results compared to the other existing techniques.
APA, Harvard, Vancouver, ISO, and other styles
7

Darnoto, Brian Rizqi Paradisiaca, Daniel Siahaan, and Diana Purwitasari. "Automated Detection of Persuasive Content in Electronic News." Informatics 10, no. 4 (2023): 86. http://dx.doi.org/10.3390/informatics10040086.

Full text
Abstract:
Persuasive content in online news contains elements that aim to persuade its readers and may not necessarily include factual information. Since a news article only has some sentences that indicate persuasiveness, it would be quite challenging to differentiate news with or without the persuasive content. Recognizing persuasive sentences with a text summarization and classification approach is important to understand persuasive messages effectively. Text summarization identifies arguments and key points, while classification separates persuasive sentences based on the linguistic and semantic features used. Our proposed architecture includes text summarization approaches to shorten sentences without persuasive content and then using classifiers model to detect those with persuasive indication. In this paper, we compare the performance of latent semantic analysis (LSA) and TextRank in text summarization methods, the latter of which has outperformed in all trials, and also two classifiers of convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM). We have prepared a dataset (±1700 data and manually persuasiveness-labeled) consisting of news articles written in the Indonesian language collected from a nationwide electronic news portal. Comparative studies in our experimental results show that the TextRank–BERT–BiLSTM model achieved the highest accuracy of 95% in detecting persuasive news. The text summarization methods were able to generate detailed and precise summaries of the news articles and the deep learning models were able to effectively differentiate between persuasive news and real news.
APA, Harvard, Vancouver, ISO, and other styles
8

Falahah, Falahah, Ari Fajar Santoso, and Abdullah Fajar. "Implementasi Tunelling pada Perancangan Sistem Peringkasan dan Klasifikasi Berita Otomatis menggunakan Textrank dan KNN." TEMATIK 11, no. 1 (2024): 25–32. http://dx.doi.org/10.38204/tematik.v11i1.1840.

Full text
Abstract:
News summarization is very important in the news analysis process. However, in the summarization process, there are often obstacles such as the large number of news stories and the need for news classification. This research aims to build a simple web-based system that can be used to summarize and classify news which will be very useful in the news analysis process. The proposed summarization method is Textrank, and the news classification method that will be used is KNN. This system is expected to provide an automatic summarization function to make it easier to analyze news content. The data that will be used as the basis for classification modeling is sports news in 3 months, and the classification that will be used to determine whether the news includes sports news in three branches, namely football, rackets or basketball. Testing of the summarization model using textrank was carried out by applying ROUGE-1 and ROUGE-2, with results of 0.79 and 0.67. Meanwhile, testing the classification model using KNN with k=3 and k=5 is 0.9866 and 0.9666 so k=3 will be used. This system will be built using the web scrapping library, textrank, stopword from PySastrawi, scikit-learn for the classification module using the KNN algorithm, and ngrok for publishing web-based applications. By using ngrok, we can expose the application through internet with a temporary public url without hosting required
APA, Harvard, Vancouver, ISO, and other styles
9

Kim, Myeong-Kwon, and Sangrok Lee. "Implementation of KoBART-Based Real-Time Long-News Summarization System Using Text Segmentation." Korea Industrial Technology Convergence Society 29, no. 3 (2024): 27–35. http://dx.doi.org/10.29279/jitr.k.2024.29.3.27.

Full text
Abstract:
In this study, a real-timelong-news summarization system is implemented based on the model. Owing to its characteristics, the KoBART model cannot summarize news with a token length of 1024 or more. Hence, we implemented a method of dividing long news into paragraphs, summarizing the divided paragraphs, and then resummarizing the summarized sentences. First, we evaluated the performance using an AI Hub dataset to validate the implemented two-stage summarization method. However, because the token length of most of the news provided in the AI Hub dataset is 1024 or less, we analyzed the performance for long news by applying the dataset provided by Hugging Face with a token length of 1024 or more. When summarizing long news with a token length of 1024 or more by dividing it into 512 paragraphs, the average Luge score is 33.99% and the runtime required for summarization is 0.8492 s. Therefore, we confirmed that the implemented long-news summarization system can provide real-time services, even for long news with a token length of 1024 or more.
APA, Harvard, Vancouver, ISO, and other styles
10

Lwin, Soe Soe, and Khin Thandar Nwet. "Myanmar news summarization using different word representations." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 3 (2021): 2285. http://dx.doi.org/10.11591/ijece.v11i3.pp2285-2292.

Full text
Abstract:
There is enormous amount information available in different forms of sources and genres. In order to extract useful information from a massive amount of data, automatic mechanism is required. The text summarization systems assist with content reduction keeping the important information and filtering the non-important parts of the text. Good document representation is really important in text summarization to get relevant information. Bag-of-words cannot give word similarity on syntactic and semantic relationship. Word embedding can give good document representation to capture and encode the semantic relation between words. Therefore, centroid based on word embedding representation is employed in this paper. Myanmar news summarization based on different word embedding is proposed. In this paper, Myanmar local and international news are summarized using centroid-based word embedding summarizer using the effectiveness of word representation approach, word embedding. Experiments were done on Myanmar local and international news dataset using different word embedding models and the results are compared with performance of bag-of-words summarization. Centroid summarization using word embedding performs comprehensively better than centroid summarization using bag-of-words.
APA, Harvard, Vancouver, ISO, and other styles
11

Rananavare, Laxmi B., and P. Venkata Subba Reddy. "Automatic News Article Summarization." International Journal of Computer Sciences and Engineering 6, no. 2 (2018): 230–37. http://dx.doi.org/10.26438/ijcse/v6i2.230237.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Javed, Hira, Nadeem Akhtar, and M. M. Sufyan Beg. "Multimodal news document summarization." Journal of Information and Optimization Sciences 45, no. 4 (2024): 959–68. http://dx.doi.org/10.47974/jios-1619.

Full text
Abstract:
With the increase in multimedia content, the domain of multimodal processing is experiencing constant growth. The question of whether combining these modalities is beneficial may come up. In this work, we investigate this by working on multi-modal content for obtaining quality summaries. We have conducted several experiments on the extractive summarization process employing asynchronous text, audio, image,and video. Information present in the multimedia content has been leveraged to bridge the semantic gaps between different modes. Vision Transformers and BERT have been used for the imagematching and similarity-checking tasks. Furthermore, audio transcriptions have been used for incorporating the audio information in the summaries. The obtained news summaries have been evaluated with Rouge Score and a comparative analysis has been done.
APA, Harvard, Vancouver, ISO, and other styles
13

Ahuir, Vicent, José-Ángel González, Lluís-F. Hurtado, and Encarna Segarra. "Abstractive Summarizers Become Emotional on News Summarization." Applied Sciences 14, no. 2 (2024): 713. http://dx.doi.org/10.3390/app14020713.

Full text
Abstract:
Emotions are central to understanding contemporary journalism; however, they are overlooked in automatic news summarization. Actually, summaries are an entry point to the source article that could favor some emotions to captivate the reader. Nevertheless, the emotional content of summarization corpora and the emotional behavior of summarization models are still unexplored. In this work, we explore the usage of established methodologies to study the emotional content of summarization corpora and the emotional behavior of summarization models. Using these methodologies, we study the emotional content of two widely used summarization corpora: Cnn/Dailymail and Xsum, and the capabilities of three state-of-the-art transformer-based abstractive systems for eliciting emotions in the generated summaries: Bart, Pegasus, and T5. The main significant findings are as follows: (i) emotions are persistent in the two summarization corpora, (ii) summarizers approach moderately well the emotions of the reference summaries, and (iii) more than 75% of the emotions introduced by novel words in generated summaries are present in the reference ones. The combined use of these methodologies has allowed us to conduct a satisfactory study of the emotional content in news summarization.
APA, Harvard, Vancouver, ISO, and other styles
14

Prakash, Anand. "Enhancing News Article Summarization with Machine Learning." International Journal for Global Academic & Scientific Research 3, no. 4 (2025): 20–34. https://doi.org/10.55938/ijgasr.v3i4.152.

Full text
Abstract:
Due to increase volume of news content available online, Automated summarization of news articles has gained significant attention in recent years. This paper presents a comprehensive overview of the methodology and implementation of machine learning based approach for summarizing news articles using Python. The proposed approach involves preprocessing the text, extracting relevant features, and training a machine learning model to generate concise and informative summaries. The implementation utilizes Python libraries such as NLTK and TensorFlow fo text processing and model training. The study also discusses the limitations of the approach and suggests future research directions to enhance the performance of automated news summarization systems. Our research demonstrates the effectiveness of machine learning based approach in generating summaries that are coherent and informative. Overall, the research demonstrates the potential of machine learning in automating the summarization of news articles and suggests avenues for further improvement in this field. Additionally, the paper discusses the limitations of the approach and suggests future research directions to enhance automated news summarization system. The preprocessing steps include tokenization, stop-word removal, and stemming to prepare the text for feature extraction. The study compares the performance of the proposed approach with baseline models, showing its superiority in terms of summary quality. The research contributes to the field of automated news summarization by providing a practical and effective approach that can be used to summarize news articles automatically, saving time and effort for news readers and editors alike.
APA, Harvard, Vancouver, ISO, and other styles
15

Kartamanah, Fatih Fauzan, Aldy Rialdy Atmadja, and Ichsan Budiman. "Analyzing PEGASUS Model Performance with ROUGE on Indonesian News Summarization." sinkron 9, no. 1 (2025): 31–42. https://doi.org/10.33395/sinkron.v9i1.14303.

Full text
Abstract:
Text summarization technology has been rapidly advancing, playing a vital role in improving information accessibility and reducing reading time within Natural Language Processing (NLP) research. There are two primary approaches to text summarization: extractive and abstractive. Extractive methods focus on selecting key sentences or phrases directly from the source text, while abstractive summarization generates new sentences that capture the essence of the content. Abstractive summarization, although more flexible, poses greater challenges in maintaining coherence and contextual relevance due to its complexity. This study aims to enhance automated abstractive summarization for Indonesian-language online news articles by employing the PEGASUS (Pre-training with Extracted Gap-sentences Sequences for Abstractive Summarization) model, which leverages an encoder-decoder architecture optimized for summarization tasks. The dataset utilized consists of 193,883 articles from Liputan6, a prominent Indonesian news platform. The model was fine-tuned and evaluated using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric, focusing on F-1 scores for ROUGE-1, ROUGE-2, and ROUGE-L. The results demonstrated the model's ability to generate coherent and informative summaries, achieving ROUGE-1, ROUGE-2, and ROUGE-L scores of 0.439, 0.183, and 0.406, respectively. These findings underscore the potential of the PEGASUS model in addressing the challenges of abstractive summarization for low-resource languages like Indonesian language, offering a significant contribution to summarization quality for online news content.
APA, Harvard, Vancouver, ISO, and other styles
16

Tran, Mai Vu, Hoang Quynh Le, Duy Cat Can, and Quoc An Nguyen. "A data challenge for Vietnamese abstractive multi-document summarization." Journal of Computer Science and Cybernetics 40, no. 4 (2024): 347–62. https://doi.org/10.15625/1813-9663/18291.

Full text
Abstract:
This paper provides an overview of the Vietnamese abstractive multi-document summarization shared task (AbMuSu) for Vietnamese news, which is hosted at the 9th annual workshop on Vietnamese Language and Speech Processing (VLSP 2022). The main goal of this shared task is to develop automated summarization systems that can generate abstractive summaries for a given set of documents on a specific topic. The input consists of several news documents on the same topic, and the output is a related abstractive summary. The focus of the AbMuSu shared task is solely on Vietnamese news summarization. To this end, a human-annotated dataset comprising 1,839 documents in 600 clusters, collected from Vietnamese news in 8 categories, has been developed. Participating models are evaluated and ranked based on their ROUGE2-F1 score, which is the most common evaluation metric for document summarization problems.
APA, Harvard, Vancouver, ISO, and other styles
17

Karnan, K., and Dr L. R. Aravind Babu. "Text Mining and Natural Language Processing Frameworks for ‎Enhanced Fake News Detection, Sentiment Analysis, and ‎Automated Summarization in Social Media." International Journal of Basic and Applied Sciences 14, no. 2 (2025): 107–12. https://doi.org/10.14419/hgj17c14.

Full text
Abstract:
Efficient text summarization, public sentiment analysis, and fake news detection have become difficult tasks due to the exponential growth of ‎digital content. Sentiment analysis aids in assessing trends and public opinion, while fake news detection is crucial for combating false ‎information. To alleviate information overload, automated text summarization extracts important information from long documents. This ‎study examines three sophisticated Natural Language Processing (NLP) models: 1) The BiLSTM-based sentiment analysis model uses ‎Word2Vec embeddings and bidirectional LSTM units to understand context better and classify text into positive, negative, or neutral ‎sentiments. 2) Followed by a sigmoid classifier, to differentiate real from fake news, the BiLSTM-CNN-based fake news detection model ‎combines a 1D CNN for spatial pattern recognition and BiLSTM for sequential feature extraction. 3) For extractive summarization, the hybrid ‎extractive-abstractive summarization model uses TF-IDF-based sentence weighting for abstractive summarization it uses a Transformer-based encoder-decoder. The outcome is measured using metrics like BLEU and ROUGE. These models improve the online user experience ‎, decision-making, and misinformation detection in text mining applications‎.
APA, Harvard, Vancouver, ISO, and other styles
18

Zhang, Xin, Qiyi Wei, Bin Zheng, Jiefeng Liu, and Pengzhou Zhang. "FrameSum: Leveraging Framing Theory and Deep Learning for Enhanced News Text Summarization." Applied Sciences 14, no. 17 (2024): 7548. http://dx.doi.org/10.3390/app14177548.

Full text
Abstract:
Framing theory is a widely accepted theoretical framework in the field of news communication studies, frequently employed to analyze the content of news reports. This paper innovatively introduces framing theory into the text summarization task and proposes a news text summarization method based on framing theory to address the global context of rapidly increasing speed and scale of information dissemination. Traditional text summarization methods often overlook the implicit deep-level semantic content and situational frames in news texts, and the method proposed in this paper aims to fill this gap. Our deep learning-based news frame identification module can automatically identify frame elements in the text and predict the dominant frame of the text. The frame-aware summarization generation model (FrameSum) can incorporate the identified frame feature into the text representation and attention mechanism, ensuring that the generated summary focuses on the core content of the news report while maintaining high information coverage, readability, and objectivity. Through empirical studies on the standard CNN/Daily Mail dataset, we found that this method performs significantly better in improving summary quality and maintaining the accuracy of news facts.
APA, Harvard, Vancouver, ISO, and other styles
19

Soe, Soe Lwin, and Thandar Nwet Khin. "Myanmar news summarization using different word representations." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 3 (2021): 2285–92. https://doi.org/10.11591/ijece.v11i3.pp2285-2292.

Full text
Abstract:
There is enormous amount information available in different forms of sources and genres. In order to extract useful information from a massiveamount of data, automatic mechanism is required. The text summarization systems assist with content reduction keeping the important information and filtering the non-important parts of the text. Good document representation isreally important in text summarization to get relevant information. Bag-ofwords cannot give word similarity on syntactic and semantic relationshipWord embedding can give good document representation to capture andencode the semantic relation between words. Therefore, centroid based on word embedding representation is employed in this paper. Myanmar news summarization based on different word embedding is proposed. In this paperMyanmar local and international news are summarized using centroid-basedword embedding summarizer using the effectiveness of word representationapproach, word embedding. Experiments were done on Myanmar local andinternational news dataset using different word embedding models and theresults are compared with performance of bag-of-words summarization.Centroid summarization using word embedding performs comprehensivelybetter than centroid summarization using bag-of-words.
APA, Harvard, Vancouver, ISO, and other styles
20

Upadhye, Ashwinee. "Automatic News Summarization using Web Scraping." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 05 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem34832.

Full text
Abstract:
In an era of information overload, accessing relevant news content efficiently is crucial. However, language barriers often hinder users from accessing news articles in languages they understand. To address this challenge, we propose an automatic news summarization and translation system. This system aggregates news articles from multiple URLs, summarizes them into concise summaries, and translates the summaries into the user's preferred language. By leveraging web scraping, natural language processing, and translation integration techniques, our system aims to provide users with access to relevant news content in their preferred language, thereby overcoming language barriers and fostering global connectivity. Keywords— automatic news summarization, translation integration, web scraping, natural language processing, language barriers, information accessibility, global connectivity, user preferences, multilingual content, web-based news aggregation
APA, Harvard, Vancouver, ISO, and other styles
21

Ramón-Hernández, Alejandro, Alfredo Simón-Cuevas, María Matilde García Lorenzo, Leticia Arco, and Jesús Serrano-Guerrero. "Towards Context-Aware Opinion Summarization for Monitoring Social Impact of News." Information 11, no. 11 (2020): 535. http://dx.doi.org/10.3390/info11110535.

Full text
Abstract:
Opinion mining and summarization of the increasing user-generated content on different digital platforms (e.g., news platforms) are playing significant roles in the success of government programs and initiatives in digital governance, from extracting and analyzing citizen’s sentiments for decision-making. Opinion mining provides the sentiment from contents, whereas summarization aims to condense the most relevant information. However, most of the reported opinion summarization methods are conceived to obtain generic summaries, and the context that originates the opinions (e.g., the news) has not usually been considered. In this paper, we present a context-aware opinion summarization model for monitoring the generated opinions from news. In this approach, the topic modeling and the news content are combined to determine the “importance” of opinionated sentences. The effectiveness of different developed settings of our model was evaluated through several experiments carried out over Spanish news and opinions collected from a real news platform. The obtained results show that our model can generate opinion summaries focused on essential aspects of the news, as well as cover the main topics in the opinionated texts well. The integration of term clustering, word embeddings, and the similarity-based sentence-to-news scoring turned out the more promising and effective setting of our model.
APA, Harvard, Vancouver, ISO, and other styles
22

Kopeć, Mateusz. "Three-step coreference-based summarizer for Polish news texts." Poznan Studies in Contemporary Linguistics 55, no. 2 (2019): 397–443. http://dx.doi.org/10.1515/psicl-2019-0015.

Full text
Abstract:
Abstract This article addresses the problem of automatic summarization of press articles in Polish. The main novelty of this research lays in the proposal of a three-step summarization algorithm which benefits from using coreference information. In related work section, all coreference-based approaches to summarization are presented. Then we describe in detail all publicly available summarization tools developed for Polish language. We state the problem of single-document press article summarization for Polish, describing the training and evaluation dataset: the POLISH SUMMARIES CORPUS. Next, a new coreference-based extractive summarization system NICOLAS is introduced. Its algorithm utilises advanced third-party preprocessing tools to extract the coreference information from the text to be summarized. This information is transformed into a complex set of features related to coreference concepts (mentions and coreference clusters) that are used for training the summarization system (on the basis of a manually prepared gold summaries corpus). The proposed solution is compared to the best publicly available summarization systems for Polish language and two state-of-the-art tools, developed for English language, but adapted to Polish for this article. NICOLAS summarization system obtains best scores, for selected metrics outperforming other systems in a statistically significant way. The evaluation also contains calculation of interesting upper-bounds: human performance and theoretical upper-bound.
APA, Harvard, Vancouver, ISO, and other styles
23

WU, XINDONG, FEI XIE, GONGQING WU, and WEI DING. "PNFS: PERSONALIZED WEB NEWS FILTERING AND SUMMARIZATION." International Journal on Artificial Intelligence Tools 22, no. 05 (2013): 1360007. http://dx.doi.org/10.1142/s0218213013600075.

Full text
Abstract:
Information on the World Wide Web is congested with large amounts of news contents. Recommending, filtering, and summarization of Web news have become hot topics of research in Web intelligence, aiming to find interesting news for users and give concise content for reading. This paper presents our research on developing the Personalized News Filtering and Summarization system (PNFS). An embedded learning component of PNFS induces a user interest model and recommends personalized news. Two Web news recommendation methods are proposed to keep tracking news and find topic interesting news for users. A keyword knowledge base is maintained and provides real-time updates to reflect the news topic information and the user's interest preferences. The non-news content irrelevant to the news Web page is filtered out. A keyword extraction method based on lexical chains is proposed that uses the semantic similarity and the relatedness degree to represent the semantic relations between words. Word sense disambiguation is also performed in the built lexical chains. Experiments on Web news pages and journal articles show that the proposed keyword extraction method is effective. An example run of our PNFS system demonstrates the superiority of this Web intelligence system.
APA, Harvard, Vancouver, ISO, and other styles
24

Astuti, Rahma Hayuning, Muljono Muljono, and Sutriawan Sutriawan. "Indonesian News Text Summarization Using MBART Algorithm." Scientific Journal of Informatics 11, no. 1 (2024): 155–64. http://dx.doi.org/10.15294/sji.v11i1.49224.

Full text
Abstract:
Purpose: Technology advancements have led to the production of a large amount of textual data. There are numerous locations where one can find textual information sources, including blogs, news portals, and websites. Kompas, BBC, Liputan 6, CNN, and other news portals are a few websites that offer news in Indonesian. The purpose of this study was to explore the effectiveness of using mBART in text summarization for Bahasa Indonesia.Methods: This study uses mBART, a transformer architecture, to perform fine-tuning to generate news article summaries in Bahasa Indonesia. Evaluation was conducted using the ROUGE method to assess the quality of the summaries produced.Results: Evaluation using the ROUGE metric showed better results, with ROUGE-1 of 35.94, ROUGE-2 of 16.43, and ROUGE-L of 29.91. However, the performance of the model is still not optimal compared to existing models in text summarization for another language.Novelty: The novelty of this research lies in the use of mBART for text summarization, specifically adapted for Bahasa Indonesia. In addition, the findings also contribute to understanding the challenges and opportunities of improving text summarization techniques in the Indonesian context.
APA, Harvard, Vancouver, ISO, and other styles
25

Wang, Chengyu, Xiaofeng He, and Aoying Zhou. "Event phase oriented news summarization." World Wide Web 21, no. 4 (2017): 1069–92. http://dx.doi.org/10.1007/s11280-017-0501-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Li, Zechao, Jinhui Tang, Xueming Wang, Jing Liu, and Hanqing Lu. "Multimedia News Summarization in Search." ACM Transactions on Intelligent Systems and Technology 7, no. 3 (2016): 1–20. http://dx.doi.org/10.1145/2822907.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Amato, Flora, Vincenzo Moscato, Antonio Picariello, Giancarlo Sperlí, Antonio D’Acierno, and Antonio Penta. "Semantic summarization of web news." Encyclopedia with Semantic Computing and Robotic Intelligence 01, no. 01 (2017): 1630006. http://dx.doi.org/10.1142/s2425038416300068.

Full text
Abstract:
In this paper, we present a general framework for retrieving relevant information from news papers that exploits a novel summarization algorithm based on a deep semantic analysis of texts. In particular, we extract from each Web document a set of triples (subject, predicate, object) that are then used to build a summary through an unsupervised clustering algorithm exploiting the notion of semantic similarity. Finally, we leverage the centroids of clusters to determine the most significant summary sentences using some heuristics. Several experiments are carried out using the standard DUC methodology and ROUGE software and show how the proposed method outperforms several summarizer systems in terms of recall and readability.
APA, Harvard, Vancouver, ISO, and other styles
28

Fauzi, Ahmad. "Penerapan Algoritma Text Mining dan Lexrank dalam Meringkas Teks Secara Otomatis." Bulletin of Data Science 1, no. 2 (2022): 65–72. https://doi.org/10.47065/bulletinds.v1i2.1359.

Full text
Abstract:
The growth of media and online news has allowed writers to automate research in the field of text summarization. News that offers a quick and concise concept, but in reality digital news is not organized and it takes so long to find the essence of the news. Document summarization is an effective way to get information from a document without reading the entire document. However, document summaries for Indonesian are still relatively small compared to other languages. This study develops document summarization automatically using a graph-based method, namely the Lexrank Algorithm which can be proven by research that has been tested using Indonesian news data obtained from liputan6.com. The number of sentences extracted is 25%-50% of the total sentences in the document. The results of the Lexrank summary in order of the highest weight order are = D2 = 1,433, D10 = 1,289, D3 = 1,253, ….. D8 = 0.673. The largest value from the summary will be arranged according to the order of words so as to get the summary of the news.
APA, Harvard, Vancouver, ISO, and other styles
29

Kevin, Sherilyn. "News Summarization of BBC Articles: A Multi-Category Approach." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 01 (2024): 1–10. http://dx.doi.org/10.55041/ijsrem28129.

Full text
Abstract:
In this research project, we explore the application of advanced natural language processing techniques to automatically summarize news articles from the BBC. The dataset comprises five distinct categories— business, entertainment, politics, sport, and tech—each containing a wealth of information. Our primary goal is to develop an efficient and accurate news summarization system using state-of-the-art language models. We employ the Hugging Face Transformers library to create a summarization pipeline capable of extracting key information from lengthy news articles.
APA, Harvard, Vancouver, ISO, and other styles
30

Hiray, Neha. "Newspaper Summarizer using Natural Language Processing and Machine Learning." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 02 (2025): 1–9. https://doi.org/10.55041/ijsrem41838.

Full text
Abstract:
In the era of digital information, users are inundated with news articles from numerous sources, resulting in information overload and an overwhelming user experience. This research presents an advanced, real-time Newspaper Aggregator that utilizes Natural Language Processing (NLP) and Machine Learning (ML) techniques to collect, process, and personalize news articles from diverse sources in real-time. The aggregator’s architecture integrates several NLP models to achieve comprehensive news handling: topic modeling categorizes articles into predefined topics such as Politics, Sports, and Technology using Latent Dirichlet Allocation (LDA), while sentiment analysis, powered by BERT, classifies public sentiment as Positive, Negative, or Neutral, capturing nuanced perspectives. The system’s summarization module leverages PEGASUS and Text Rank to deliver coherent, concise summaries, improving information accessibility and reducing reading time. Additionally, the recommendation engine employs a hybrid filtering approach, combining collaborative and content-based filtering, to provide personalized news recommendations based on user history and article characteristics. Our methodology includes systematic data collection, text pre-processing, topic categorization, sentiment classification, summarization, and real-time recommendation, followed by rigorous evaluation. The aggregator achieves high accuracy across tasks: BERT-driven sentiment analysis achieves 92% accuracy, LDA models yield coherent topic clusters, and summarization evaluations produce a ROUGE-L score of 0.75, all of which underscore the system's reliability in managing dynamic news content. Performance testing indicates that this Newspaper Aggregator offers a significant improvement in user relevance and engagement compared to traditional keyword-based systems. Overall, this study establishes a foundation for intelligent, real-time news aggregation, providing users with a streamlined, personalized news experience. KEYWORDS: Real-time news aggregation, Natural Language Processing (NLP), Machine Learning (ML), topic modeling, sentiment analysis, BERT, Latent Dirichlet Allocation (LDA), text summarization, PEGASUS, Text Rank, recommendation systems, collaborative filtering, content-based filtering, personalized news, information overload, news categorization, user relevance, article classification, hybrid recommendation model.
APA, Harvard, Vancouver, ISO, and other styles
31

More, Ojasvi Sanjay. "Newspaper Summarizer using NLP and ML." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 06 (2025): 1–9. https://doi.org/10.55041/ijsrem50571.

Full text
Abstract:
Abstract—In the era of digital information, users are inundated with news articles from numerous sources, resulting in information overload and an overwhelming user experience. This research presents an advanced, real-time Newspaper Aggregator that utilizes Natural Language Processing (NLP) and Machine Learning (ML) techniques to collect, process, and personalize news articles from diverse sources in real-time. The aggregator’s architecture integrates several NLP models to achieve comprehensive news handling: topic modeling categorizes articles into predefined topics such as Politics, Sports, and Technology using Latent Dirichlet Allocation (LDA), while sentiment analysis, powered by BERT, classifies public sentiment as Positive, Negative, or Neutral, capturing nuanced perspectives. The system’s summarization module leverages PEGASUS and Text Rank to deliver coherent, concise summaries, improving information accessibility and reducing reading time. Additionally, the recommendation engine employs a hybrid filtering approach, combining collaborative and content-based filtering, to provide personalized news recommendations based on user history and article characteristics. Our methodology includes systematic data collection, text pre-processing, topic categorization, sentiment classification, summarization, and real-time recommendation, followed by rigorous evaluation. The aggregator achieves high accuracy across tasks: BERT-driven sentiment analysis achieves 92% accuracy, LDA models yield coherent topic clusters, and summarization evaluations produce a ROUGE-L score of 0.75, all of which underscore the system's reliability in managing dynamic news content. Performance testing indicates that this Newspaper Aggregator offers a significant improvement in user relevance and engagement compared to traditional keyword-based systems. Overall, this study establishes a foundation for intelligent, real-time news aggregation, providing users with a streamlined, personalized news experience. Keywords— Real-time news aggregation, Natural Language Processing (NLP), Machine Learning (ML), topic modeling, sentiment analysis, BERT, Latent Dirichlet Allocation (LDA), text summarization, PEGASUS, Text Rank, recommendation systems, collaborative filtering, content-based filtering, personalized news, information overload, news categorization, user relevance, article classification, hybrid recommendation model.
APA, Harvard, Vancouver, ISO, and other styles
32

Sufi, Fahim, and Musleh Alsulami. "AI-Driven Chatbot for Real-Time News Automation." Mathematics 13, no. 5 (2025): 850. https://doi.org/10.3390/math13050850.

Full text
Abstract:
The rapid expansion of digital news sources has necessitated intelligent systems capable of filtering, analyzing, and deriving meaningful insights from vast amounts of information in real time. This study presents an AI-driven chatbot designed for real-time news automation, integrating advanced natural language processing techniques, knowledge graphs, and generative AI models to improve news summarization and correlation analysis. The chatbot processes over 1,306,518 news reports spanning from 25 September 2023 to 17 February 2025, categorizing them into 15 primary event categories and extracting key insights through structured analysis. By employing state-of-the-art machine learning techniques, the system enables real-time classification, interactive query-based exploration, and automated event correlation. The chatbot demonstrated high accuracy in both summarization and correlation tasks, achieving an average F1 score of 0.94 for summarization and 0.92 for correlation analysis. Summarization queries were processed within an average response time of 9 s, while correlation analyses required approximately 21 s per query. The chatbot’s ability to generate real-time, concise news summaries and uncover hidden relationships between events makes it a valuable tool for applications in disaster response, policy analysis, cybersecurity, and public communication. This research contributes to the field of AI-driven news analytics by bridging the gap between static news retrieval platforms and interactive conversational agents. Future work will focus on expanding multilingual support, enhancing misinformation detection, and optimizing computational efficiency for broader real-world applicability. The proposed chatbot stands as a scalable and adaptive solution for real-time decision support in dynamic information environments.
APA, Harvard, Vancouver, ISO, and other styles
33

Apurva, D. Dhawale, B. Kulkarni Sonali, and M. Kumbhakarna Vaishali. "Automatic Pre-Processing of Marathi Text for Summarization." International Journal of Engineering and Advanced Technology (IJEAT) 10, no. 1 (2020): 230–34. https://doi.org/10.35940/ijeat.A1803.1010120.

Full text
Abstract:
The text summarization is a technique where the original large text is condensed into smaller version without changing its abstract meaning. The text summarization is done on the common foreign and regional languages typically, but infrequent work has been observed for the Marathi language. As the amount of e-contents on web is increasing drastically, the users are facing difficulty to read the newspaper articles with extraction of its different perspectives with sorting. We are focussing on educational, Political and sports news for summarization, which will be helpful for students who are appearing for competitive exams. This paper explores the pre processing techniques for Marathi e-news articles.
APA, Harvard, Vancouver, ISO, and other styles
34

Abdullah, Moch Zawaruddin, and Chastine Fatichah. "Feature-based POS tagging and sentence relevance for news multi-document summarization in Bahasa Indonesia." Bulletin of Electrical Engineering and Informatics 11, no. 1 (2022): 541–49. http://dx.doi.org/10.11591/eei.v11i1.3275.

Full text
Abstract:
Sentence extraction in news document summarization determines representative sentences primarily by employing the news feature known as news feature score (NeFS). NeFS can achieve meaningful sentences by analyzing the frequency and similarity of phrases while neglecting grammatical information and sentence relevance to the title. The presence of instructive content is indicated by grammatical information carried by part of speech (POS). POS tagging is the process of giving a meaningful tag to each term based on qualified data and even surrounding words. Sentence relevance to the title is intended to determine the sentence's level of connectivity to the title in terms of both word-based and meaning-based similarity, primarily for news documents in Bahasa Indonesia. In this study, we present an alternative sentence weighting method by incorporating news features, POS tagging, and sentence relevance to the title. Sentence extraction based on news features, POS tagging, and sentence relevance is introduced to extract the representative sentences. The experiment results on the 11 groups of Indonesian news documents are compared with the news features scores with the grammatical information approach method (NeFGIS). The proposed method achieved better results. The increasing f-score rate of ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-SU4 sequentially are 1.84%, 3.03%, 3.85%, 2.08%.
APA, Harvard, Vancouver, ISO, and other styles
35

Moch, Zawaruddin Abdullah, and Fatichah Chastine. "Feature-based POS tagging and sentence relevance for news multi-document summarization in Bahasa Indonesia." Bulletin of Electrical Engineering and Informatics 11, no. 1 (2022): 541–49. https://doi.org/10.11591/eei.v11i1.3275.

Full text
Abstract:
Sentence extraction in news document summarization determines representative sentences primarily by employing the news feature known as news feature score (NeFS). NeFS can achieve meaningful sentences by analyzing the frequency and similarity of phrases while neglecting grammatical information and sentence relevance to the title. The presence of instructive content is indicated by grammatical information carried by part of speech (POS). POS tagging is the process of giving a meaningful tag to each term based on qualified data and even surrounding words. Sentence relevance to the title is intended to determine the sentence's level of connectivity to the title in terms of both word-based and meaning-based similarity, primarily for news documents in Bahasa Indonesia. In this study, we present an alternative sentence weighting method by incorporating news features, POS tagging, and sentence relevance to the title. Sentence extraction based on news features, POS tagging, and sentence relevance is introduced to extract the representative sentences. The experiment results on the 11 groups of Indonesian news documents are compared with the news features scores with the grammatical information approach method (NeFGIS). The proposed method achieved better results. The increasing fscore rate of ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-SU4 sequentially are 1.84%, 3.03%, 3.85%, 2.08%.
APA, Harvard, Vancouver, ISO, and other styles
36

Laksana, Made Dwiki Budi, AAIN Eka Karyawati, Luh Arida Ayu Rahning Putri, I. Wayan Santiyasa, Ngurah Agus Sanjaya ER, and I. Gusti Agung Gede Arya Kadnyanan. "Text Summarization terhadap Berita Bahasa Indonesia menggunakan Dual Encoding." JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) 11, no. 2 (2022): 339. http://dx.doi.org/10.24843/jlk.2022.v11.i02.p13.

Full text
Abstract:
Text summarization or automatic text summarization can make readers receive information quickly without having to read the entire news text, so readers can get more time in reading other news texts. Making text summarization can use two techniques, namely, extractive and abstractive techniques. Abstractive techniques have the aim of producing summary sentences with concepts as humans take the essence of a document that is read. In this study, the author builds an abstractive summarization model using the Dual Encoding method consisting of GRU. The evaluation was carried out using K- Fold Cross Validation, the number of folds used was 5. By using K-Fold Cross Validation, the ROUGE- 1, ROUGE-2, and ROUGE-L values were 0.2127749, 0.119851, dan 0.1880595, respectively. For testing when using new data ROUGE-1, ROUGE-2, and ROUGE-L values were 0.3387776, 0.2395176, dan 0.3077376, respectively.
APA, Harvard, Vancouver, ISO, and other styles
37

Kondath, Manju, David Peter Suseelan, and Sumam Mary Idicula. "Extractive summarization of Malayalam documents using latent Dirichlet allocation: An experience." Journal of Intelligent Systems 31, no. 1 (2022): 393–406. http://dx.doi.org/10.1515/jisys-2022-0027.

Full text
Abstract:
Abstract Automatic text summarization (ATS) extracts information from a source text and presents it to the user in a condensed form while preserving its primary content. Many text summarization approaches have been investigated in the literature for highly resourced languages. At the same time, ATS is a complicated and challenging task for under-resourced languages like Malayalam. The lack of a standard corpus and enough processing tools are challenges when it comes to language processing. In the absence of a standard corpus, we have developed a dataset consisting of Malayalam news articles. This article proposes an extractive topic modeling-based multi-document text summarization approach for Malayalam news documents. We first cluster the contents based on latent topics identified using the latent Dirichlet allocation topic modeling technique. Then by adopting vector space model, the topic vector and sentence vector of the given document are generated. According to the relevant status value, sentences are ranked between the document’s topic and sentence vectors. The summary obtained is optimized for non-redundancy. Evaluation results on Malayalam news articles show that the summary generated by the proposed method is closer to the human-generated summaries than the existing text summarization methods.
APA, Harvard, Vancouver, ISO, and other styles
38

Timalsina, Bipin, Nawaraj Paudel, and Tej Bahadur Shahi. "Attention based Recurrent Neural Network for Nepali Text Summarization." Journal of Institute of Science and Technology 27, no. 1 (2022): 141–48. http://dx.doi.org/10.3126/jist.v27i1.46709.

Full text
Abstract:
Automatic text summarization has been a challenging topic in natural language processing (NLP) as it demands preserving important information while summarizing the large text into a summary. Extractive and abstractive text summarization are widely investigated approaches for text summarization. In extractive summarization, the important sentence from the large text is extracted and combined to create a summary whereas abstractive summarization creates a summary that is more focused on meaning, rather than content. Therefore, abstractive summarization gained more attention from researchers in the recent past. However, text summarization is still an untouched topic in the Nepali language. To this end, we proposed an abstractive text summarization for Nepali text. Here, we, first, create a Nepali text dataset by scraping Nepali news from the online news portals. Second, we design a deep learning-based text summarization model based on an encoder-decoder recurrent neural network with attention. More precisely, Long Short-Term Memory (LSTM) cells are used in the encoder and decoder layer. Third, we build nine different models by selecting various hyper-parameters such as the number of hidden layers and the number of nodes. Finally, we report the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score for each model to evaluate their performance. Among nine different models created by adjusting different numbers of layers and hidden states, the model with a single-layer encoder and 256 hidden states outperformed all other models with F-Score values of 15.74, 3.29, and 15.21 for ROUGE-1 ROUGE-2 and ROUGE-L, respectively.
APA, Harvard, Vancouver, ISO, and other styles
39

Shilpa Serasiya. "Abstractive Gujarati Text Summarization Using Sequence-To-Sequence Model and Attention Mechanism." Journal of Information Systems Engineering and Management 10, no. 41s (2025): 754–62. https://doi.org/10.52783/jisem.v10i41s.7998.

Full text
Abstract:
Introduction: In recent years text summarization has been one of the piloting problems of natural language processing (NLP). It comprises a consolidated brief on a large text document. Extractive and Abstractive are the two output-based summarization techniques. For the Indian Language much research is being carried out in Extractive Summarization. Performance of Abstractive summarization remains a challenge for a language like Gujarati. With the rise of digital Gujarati news portals, automatic summarization can provide concise versions of news articles and make it easier for readers to grasp key information quickly Objectives: We aim to create an effective and efficient abstractive text summarizer for Gujarati text, which can generate an understandable and expressive summary. Methods: Our model works as a Sequence-to-Sequence model using encoder-decoder architecture with an attention mechanism. LSTM-based encoder-decoder with an attention-based model generates human-like sentences with core information of the original documents. Results: Our experiment conducted the effectiveness and success of the proposed model by increasing the accuracy up to 87% and decreasing the loss to 0.48 for the Gujarati Text. Novelty: In terms of NLP, Gujarati is a low-resource language for researchers, especially for text summarization. So to achieve our goal, we created our dataset by collecting Gujarati text data such as news articles and their headlines from online/offline resources like daily newspapers. Gujarati has unique grammatical structures and morphology, so for pre-processing the Gujarati text, we proposed a pre-processor(GujProc) specific to Gujarati to trace the linguistic.
APA, Harvard, Vancouver, ISO, and other styles
40

Mohsin, Muhammad, Shazad Latif, Muhammad Haneef, et al. "Improved Text Summarization of News Articles Using GA-HC and PSO-HC." Applied Sciences 11, no. 22 (2021): 10511. http://dx.doi.org/10.3390/app112210511.

Full text
Abstract:
Automatic Text Summarization (ATS) is gaining attention because a large volume of data is being generated at an exponential rate. Due to easy internet availability globally, a large amount of data is being generated from social networking websites, news websites and blog websites. Manual summarization is time consuming, and it is difficult to read and summarize a large amount of content. Automatic text summarization is the solution to deal with this problem. This study proposed two automatic text summarization models which are Genetic Algorithm with Hierarchical Clustering (GA-HC) and Particle Swarm Optimization with Hierarchical Clustering (PSO-HC). The proposed models use a word embedding model with Hierarchal Clustering Algorithm to group sentences conveying almost same meaning. Modified GA and adaptive PSO based sentence ranking models are proposed for text summary in news text documents. Simulations are conducted and compared with other understudied algorithms to evaluate the performance of proposed methodology. Simulations results validate the superior performance of the proposed methodology.
APA, Harvard, Vancouver, ISO, and other styles
41

Tuhpatussania, Siti, Ema Utami, and Anggit Dwi Hartanto. "COMPARISON OF LEXRANK ALGORITHM AND MAXIMUM MARGINAL RELEVANCE IN SUMMARY OF INDONESIAN NEWS TEXT IN ONLINE NEWS PORTALS." Jurnal Pilar Nusa Mandiri 18, no. 2 (2022): 187–92. http://dx.doi.org/10.33480/pilar.v18i2.3190.

Full text
Abstract:
The presence of online media has shifted print media for news readers to get information that is fast, accurate, and easy to access. However, the problem arises because the length of the news text makes the reader bored to search for the news as a whole so the news that is obtained will be less accurate. For this reason, it is necessary to have an automatic text summary that was raised in this study as well as to compare the Maximum Marginal Relevance (MMR) algorithm and the LexRank algorithm to the summary of Indonesian news texts on the online news portal graphanews. com. the results of the comparison test of text summarization using f-measure , precision and recall show the performance of text summarization with the MMR algorithm is better where f-measure is 91.65%, precision is 91.08% and recall is 92.23%.
APA, Harvard, Vancouver, ISO, and other styles
42

Kalbande, Anusha. "Summarization and Sentiment Analysis for Financial News." International Journal for Research in Applied Science and Engineering Technology 9, no. 10 (2021): 88–90. http://dx.doi.org/10.22214/ijraset.2021.38345.

Full text
Abstract:
Abstract: Data is growing at an unimaginable speed around us, but what part of it is really useful information? Business leaders, financial analysts, stock market enthusiasts, researchers etc. often need to go through a plethora of news articles and data every day, and this time spent may not even result in any fruitful insights. Considering such a huge volume of data, there is difficulty in gaining precise, relevant information and interpreting the overall sentiment portrayed by the article. The proposed method helps in conceptualizing a tool that takes financial news from selected and trusted online sources as an input and gives a summary of the same along with a basic positive, negative or neutral sentiment. Here it is assumed that the tool user is familiar with the company’s profile. Based on the input (company name/symbol) given by the user, the corresponding news articles will be fetched using web scraping. All these articles will then be summarized to gain succinct and to the point information. An overall sentiment about the company will be portrayed based on the different important features in the article about the company. Keywords: Financial News; Summarization; Sentiment Analysis.
APA, Harvard, Vancouver, ISO, and other styles
43

Falahah. "Summarization and Classification of Sports News using Textrank and KNN." Journal of Systems Engineering and Information Technology (JOSEIT) 3, no. 1 (2024): 23–29. https://doi.org/10.29207/joseit.v3i1.5706.

Full text
Abstract:
The news summary process is critical in the news analysis process. However, there are frequently barriers to the summary process, such as the large number of news articles and the requirement for news classification. The goal of this study is to develop a news summary and categorization model that will be extremely valuable in the news analysis process. Textrank is the suggested summarizing approach, and KNN will be utilized for news classification. The resulting model can be used to automatically summarize and group news, making content analysis easier. Sports news will be used as the study object from July to August 2023, and the supervised category will be used to identify whether the news comprises sports news in three branches, soccer, badminton / tennis, or basketball. Classification is carried out using the KNN algorithm by training the model using 500 categorized news data. Modeling using k = 3 and k = 5 shows that the precision is around 0.9866 and 0.9666 respectively. The model's implementation on unknown text demonstrates that the model can properly predict text categories as long as the news content falls into the three specified categories, but fails for news content that does not fall into these categories.
APA, Harvard, Vancouver, ISO, and other styles
44

Barzilay, Regina, and Kathleen R. McKeown. "Sentence Fusion for Multidocument News Summarization." Computational Linguistics 31, no. 3 (2005): 297–328. http://dx.doi.org/10.1162/089120105774321091.

Full text
Abstract:
A system that can produce informative summaries, highlighting common information found in many online documents, will help Web users to pinpoint information that they need without extensive reading. In this article, we introduce sentence fusion, a novel text-to-text generation technique for synthesizing common information across documents. Sentence fusion involves bottom-up local multisequence alignment to identify phrases conveying similar information and statistical generation to combine common phrases into a sentence. Sentence fusion moves the summarization field from the use of purely extractive methods to the generation of abstracts that contain sentences not found in any of the input documents and can synthesize information across sources.
APA, Harvard, Vancouver, ISO, and other styles
45

TABAK, Feride Savaroğlu, and Vesile EVRİM. "Event-based summarization of news articles." TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES 28, no. 2 (2020): 850–64. http://dx.doi.org/10.3906/elk-1904-98.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Gangi Reddy, Revanth, Heba Elfardy, Hou Pong Chan, Kevin Small, and Heng Ji. "SumREN: Summarizing Reported Speech about Events in News." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (2023): 12808–17. http://dx.doi.org/10.1609/aaai.v37i11.26506.

Full text
Abstract:
A primary objective of news articles is to establish the factual record for an event, frequently achieved by conveying both the details of the specified event (i.e., the 5 Ws; Who, What, Where, When and Why regarding the event) and how people reacted to it (i.e., reported statements). However, existing work on news summarization almost exclusively focuses on the event details. In this work, we propose the novel task of summarizing the reactions of different speakers, as expressed by their reported statements, to a given event. To this end, we create a new multi-document summarization benchmark, SumREN, comprising 745 summaries of reported statements from various public figures obtained from 633 news articles discussing 132 events. We propose an automatic silver-training data generation approach for our task, which helps smaller models like BART achieve GPT-3 level performance on this task. Finally, we introduce a pipeline-based framework for summarizing reported speech, which we empirically show to generate summaries that are more abstractive and factual than baseline query-focused summarization approaches.
APA, Harvard, Vancouver, ISO, and other styles
47

Salunke, Anushree B., Eisha Saini, Sanskruti Shinde, Pooja Tumma, and Suchita Suresh Dange. "NewsIN: A News Summarizer and Analyzer." International Journal for Research in Applied Science and Engineering Technology 10, no. 12 (2022): 695–706. http://dx.doi.org/10.22214/ijraset.2022.47997.

Full text
Abstract:
Abstract: A summary condenses a lengthy document by highlighting salient features. It helps the reader to understand completely just by reading a summary so that the reader can save time and also can decide whether to go through the entire document. Summaries should be shorter than the original article so make sure to select only pertinent information to include the article. The main goal of a newspaper article summary is, the readers to walk away with knowledge on what the newspaper article is all about without the need to read the entire article. This work proposes a news article summarization system which access information from various local online newspapers automatically and summarizes information using heterogeneous articles. To make ad-hoc keyword-based extraction of news articles, the system uses a tailor-made web crawler which crawls the websites for searching relevant articles. Computational Linguistic techniques mainly Triplet Extraction, Semantic Similarity calculation and OPTICS clustering with DBSCAN is used alongside a sentence selection heuristic to generate coherent and cogent summaries irrespective of the number of articles supplied to the engine. The performance evaluation is one using the ROUGE metric. The rapid progresses in digital data acquisition techniques have led to a huge volume of news data available in the news websites. Most of such digital news collections lack summaries. Due to that, online newspaper readers are overloaded with lengthy text documents. Also, it is a tedious task for human beings to generate an abstract for a news event manually since it requires a rigorous analysis of the news documents. An achievable solution tothis problem is condensing the digital news collections and taking out only the essence in the form of an automatically generated summary which allows readers to make effective decisions in less time. The graph based algorithms for text summarization have been proven to be very successful over other methods for producing multi document summaries. The summary generated fromknowledge graphs is more in line with human reading habits and possesses the logic of human reasoning. Due to the fast-growing need of retrieving information in abstract form, we are proposing a novel approach for abstractive news summarization using the knowledge graphs to fulfill the need of having more accurate automatic abstractive news summarization and analyzer
APA, Harvard, Vancouver, ISO, and other styles
48

Bhurhani, SK B. A. "News Aggregator: The World at Your Finger Tips." International Journal for Research in Applied Science and Engineering Technology 13, no. 3 (2025): 3019–25. https://doi.org/10.22214/ijraset.2025.67944.

Full text
Abstract:
The rapid growth of digital news platforms has led to an overwhelming influx of information, making it challenging for users to access relevant, unbiased, and credible news. Traditional news aggregation methods struggle to personalize content effectively while filtering misinformation. This research proposes an AI-powered News Aggregator System that leverages Natural Language Processing (NLP), machine learning, and web scraping techniques to collect, categorize, and summarize news articles from multiple sources in real-time. The system utilizes topic modeling and user preference-based recommendations to ensure personalized and diverse news delivery. Unlike conventional news aggregators, this model incorporates fake news detection algorithms and bias evaluation metrics to enhance credibility and minimize misinformation spread. The proposed system employs Recurrent Neural Networks (RNN) and Transformer-based architectures like BERT for text processing, ensuring high accuracy in classification and summarization. Performance evaluation is conducted based on parameters such as precision, recall, F1-score, and computational efficiency, comparing results with existing state-of-the-art news aggregation models. The system achieves 95% accuracy in news scraping, 86% in fake news detection, and a 78% ROUGE score for summarization, demonstrating its potential to revolutionize news consumption
APA, Harvard, Vancouver, ISO, and other styles
49

Li, Nian, Qing Xi Peng, Li Yin, and Li Ping Wang. "Semi-Supervised Method for News Summarization in Microblog." Applied Mechanics and Materials 556-562 (May 2014): 5918–21. http://dx.doi.org/10.4028/www.scientific.net/amm.556-562.5918.

Full text
Abstract:
With development of Internet, an increasing number of user-generated-contents provide valuable information to the public. Microblog is a new platform where peoples discuss all kinds of topics. It also provides a good opportunity for the researchers to explore the online public opinion. News collection and summarization has been attracted lots of research previously. However, manually labeling is impossible since the task is time-consuming. In this paper, we focus on news summarization with few labeled samples. A semi-supervised learning method has been proposed to tackle the problem. We employ Co-Training method to extract the news information. Posts and replies of Microblog have been identified as two independent views to train a classification model. Entity, Time, place and incident of news have been identified as well. Experimental result in different datasets shows the proposed method outperform the baseline methods.
APA, Harvard, Vancouver, ISO, and other styles
50

Joshi, Chiranjeevi. "Summarization and Translation Using NLP." International Journal for Research in Applied Science and Engineering Technology 12, no. 5 (2024): 555–58. http://dx.doi.org/10.22214/ijraset.2024.61391.

Full text
Abstract:
Abstract: Text summarization and translation are two critical tasks in natural language processing with significant applications in various domains such as news aggregation, document summarization, machine translation, and information retrieval. In recent years, there has been remarkable progress in the development of techniques and models for both tasks, leveraging advancements in deep learning and neural network architectures. This paper presents a comprehensive review and comparative analysis of state-of-the-art methods in text summarization and translation. First, we provide an overview of the different approaches to text summarization, including extractive, abstractive, and hybrid methods, highlighting their strengths and weaknesses. We discuss various evaluation metrics and datasets commonly used for benchmarking summarization systems, shedding light on the challenges and opportunities in this field.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography