Academic literature on the topic 'Document Summarization'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Document Summarization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Document Summarization"

1

Rahamat Basha, S., J. Keziya Rani, and J. J. C. Prasad Yadav. "A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification Accuracy." Engineering, Technology & Applied Science Research 9, no. 6 (December 1, 2019): 5001–5. http://dx.doi.org/10.48084/etasr.3173.

Full text
Abstract:
Automatic summarization is the process of shortening one (in single document summarization) or multiple documents (in multi-document summarization). In this paper, a new feature selection method for the nearest neighbor classifier by summarizing the original training documents based on sentence importance measure is proposed. Our approach for single document summarization uses two measures for sentence similarity: the frequency of the terms in one sentence and the similarity of that sentence to other sentences. All sentences were ranked accordingly and the sentences with top ranks (with a threshold constraint) were selected for summarization. The summary of every document in the corpus is taken into a new document used for the summarization evaluation process.
APA, Harvard, Vancouver, ISO, and other styles
2

Singh, Sandhya, Kevin Patel, Krishnanjan Bhattacharjee, Hemant Darbari, and Seema Verma. "Towards Better Single Document Summarization using Multi-Document Summarization Approach." International Journal of Computer Sciences and Engineering 7, no. 5 (May 31, 2019): 695–703. http://dx.doi.org/10.26438/ijcse/v7i5.695703.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kongara, Srinivasa Rao, Dasika Sree Rama Chandra Murthy, and Gangadhara Rao Kancherla. "An Automatic Text Summarization Method with the Concern of Covering Complete Formation." Recent Advances in Computer Science and Communications 13, no. 5 (November 5, 2020): 977–86. http://dx.doi.org/10.2174/2213275912666190716105347.

Full text
Abstract:
Background: Text summarization is the process of generating a short description of the entire document which is more difficult to read. This method provides a convenient way of extracting the most useful information and a short summary of the documents. In the existing research work, this is focused by introducing the Fuzzy Rule-based Automated Summarization Method (FRASM). Existing work tends to have various limitations which might limit its applicability to the various real-world applications. The existing method is only suitable for the single document summarization where various applications such as research industries tend to summarize information from multiple documents. Methods: This paper proposed Multi-document Automated Summarization Method (MDASM) to introduce the summarization framework which would result in the accurate summarized outcome from the multiple documents. In this work, multi-document summarization is performed whereas in the existing system only single document summarization was performed. Initially document clustering is performed using modified k means cluster algorithm to group the similar kind of documents that provides the same meaning. This is identified by measuring the frequent term measurement. After clustering, pre-processing is performed by introducing the Hybrid TF-IDF and Singular value decomposition technique which would eliminate the irrelevant content and would result in the required content. Then sentence measurement is one by introducing the additional metrics namely Title measurement in addition to the existing work metrics to accurately retrieve the sentences with more similarity. Finally, a fuzzy rule system is applied to perform text summarization. Results: The overall evaluation of the research work is conducted in the MatLab simulation environment from which it is proved that the proposed research method ensures the optimal outcome than the existing research method in terms of accurate summarization. MDASM produces 89.28% increased accuracy, 89.28% increased precision, 89.36% increased recall value and 70% increased the f-measure value which performs better than FRASM. Conclusion: The summarization processes carried out in this work provides the accurate summarized outcome.
APA, Harvard, Vancouver, ISO, and other styles
4

Diedrichsen, Elke. "Linguistic challenges in automatic summarization technology." Journal of Computer-Assisted Linguistic Research 1, no. 1 (June 26, 2017): 40. http://dx.doi.org/10.4995/jclr.2017.7787.

Full text
Abstract:
Automatic summarization is a field of Natural Language Processing that is increasingly used in industry today. The goal of the summarization process is to create a summary of one document or a multiplicity of documents that will retain the sense and the most important aspects while reducing the length considerably, to a size that may be user-defined. One differentiates between extraction-based and abstraction-based summarization. In an extraction-based system, the words and sentences are copied out of the original source without any modification. An abstraction-based summary can compress, fuse or paraphrase sections of the source document. As of today, most summarization systems are extractive. Automatic document summarization technology presents interesting challenges for Natural Language Processing. It works on the basis of coreference resolution, discourse analysis, named entity recognition (NER), information extraction (IE), natural language understanding, topic segmentation and recognition, word segmentation and part-of-speech tagging. This study will overview some current approaches to the implementation of auto summarization technology and discuss the state of the art of the most important NLP tasks involved in them. We will pay particular attention to current methods of sentence extraction and compression for single and multi-document summarization, as these applications are based on theories of syntax and discourse and their implementation therefore requires a solid background in linguistics. Summarization technologies are also used for image collection summarization and video summarization, but the scope of this paper will be limited to document summarization.
APA, Harvard, Vancouver, ISO, and other styles
5

D’Silva, Suzanne, Neha Joshi, Sudha Rao, Sangeetha Venkatraman, and Seema Shrawne. "Improved Algorithms for Document Classification &Query-based Multi-Document Summarization." International Journal of Engineering and Technology 3, no. 4 (2011): 404–9. http://dx.doi.org/10.7763/ijet.2011.v3.261.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Vikas, A., Pradyumna G.V.N, and Tahir Ahmed Shaik. "Text Summarization." International Journal of Engineering and Computer Science 9, no. 2 (February 3, 2020): 24940–45. http://dx.doi.org/10.18535/ijecs/v9i2.4437.

Full text
Abstract:
In this new era, where tremendous information is available on the internet, it is most important to provide the improved mechanism to extract the information quickly and most efficiently. It is very difficult for human beings to manually extract the summary of a large documents of text. There are plenty of text material available on the internet. So, there is a problem of searching for relevant documents from the number of documents available and absorbing relevant information from it. In order to solve the above two problems, the automatic text summarization is very much necessary. Text summarization is the process of identifying the most important meaningful information in a document or set of related documents and compressing them into a shorter version preserving its overall meanings.
APA, Harvard, Vancouver, ISO, and other styles
7

Sirohi, Neeraj Kumar, Dr Mamta Bansal, and Dr S. N. Rajan Rajan. "Text Summarization Approaches Using Machine Learning & LSTM." Revista Gestão Inovação e Tecnologias 11, no. 4 (September 1, 2021): 5010–26. http://dx.doi.org/10.47059/revistageintec.v11i4.2526.

Full text
Abstract:
Due to the massive amount of online textual data generated in a diversity of social media, web, and other information-centric applications. To select the vital data from the large text, need to study the full article and generate summary also not loose critical information of text document this process is called summarization. Text summarization is done either by human which need expertise in that area, also very tedious and time consuming. second type of summarization is done through system which is known as automatic text summarization which generate summary automatically. There are mainly two categories of Automatic text summarizations that is abstractive and extractive text summarization. Extractive summary is produced by picking important and high rank sentences and word from the text document on the other hand the sentences and word are present in the summary generated through Abstractive method may not present in original text. This article mainly focuses on different ATS (Automatic text summarization) techniques that has been instigated in the present are argue. The paper begin with a concise introduction of automatic text summarization, then closely discussed the innovative developments in extractive and abstractive text summarization methods, and then transfers to literature survey, and it finally sum-up with the proposed techniques using LSTM with encoder Decoder for abstractive text summarization are discussed along with some future work directions.
APA, Harvard, Vancouver, ISO, and other styles
8

Manju, K., S. David Peter, and Sumam Idicula. "A Framework for Generating Extractive Summary from Multiple Malayalam Documents." Information 12, no. 1 (January 18, 2021): 41. http://dx.doi.org/10.3390/info12010041.

Full text
Abstract:
Automatic extractive text summarization retrieves a subset of data that represents most notable sentences in the entire document. In the era of digital explosion, which is mostly unstructured textual data, there is a demand for users to understand the huge amount of text in a short time; this demands the need for an automatic text summarizer. From summaries, the users get the idea of the entire content of the document and can decide whether to read the entire document or not. This work mainly focuses on generating a summary from multiple news documents. In this case, the summary helps to reduce the redundant news from the different newspapers. A multi-document summary is more challenging than a single-document summary since it has to solve the problem of overlapping information among sentences from different documents. Extractive text summarization yields the sensitive part of the document by neglecting the irrelevant and redundant sentences. In this paper, we propose a framework for extracting a summary from multiple documents in the Malayalam Language. Also, since the multi-document summarization data set is sparse, methods based on deep learning are difficult to apply. The proposed work discusses the performance of existing standard algorithms in multi-document summarization of the Malayalam Language. We propose a sentence extraction algorithm that selects the top ranked sentences with maximum diversity. The system is found to perform well in terms of precision, recall, and F-measure on multiple input documents.
APA, Harvard, Vancouver, ISO, and other styles
9

Mamidala, Kishore Kumar, and Suresh Kumar Sanampudi. "A Novel Framework for Multi-Document Temporal Summarization (MDTS)." Emerging Science Journal 5, no. 2 (April 1, 2021): 184–90. http://dx.doi.org/10.28991/esj-2021-01268.

Full text
Abstract:
Internet or Web consists of a massive amount of information, handling which is a tedious task. Summarization plays a crucial role in extracting or abstracting key content from multiple sources with its meaning contained, thereby reducing the complexity in handling the information. Multi-document summarization gives the gist of the content collected from multiple documents. Temporal summarization concentrates on temporally related events. This paper proposes a Multi-Document Temporal Summarization (MDTS) technique that generates the summary based on temporally related events extracted from multiple documents. This technique extracts the events with the time stamp. TIMEML standards tags are used in extracting events and times. These event-times are stored in a structured database form for easier operations. Sentence ranking methods are build based on the frequency of events occurrences in the sentence. Sentence similarity measures are computed to eliminate the redundant sentences in an extracted summary. Depending on the required summary length, top-ranked sentences are selected to form the summary. Experiments are conducted on DUC 2006 and DUC 2007 data set that was released for multi-document summarization task. The extracted summaries are evaluated using ROUGE to determine precision, recall and F measure of generated summaries. The performance of the proposed method is compared with particle swarm optimization-based algorithm (PSOS), Cat swarm optimization-based summarization (CSOS), Cuckoo Search based multi-document summarization (MDSCSA). It is found that the performance of MDTS is better when compared with other methods. Doi: 10.28991/esj-2021-01268 Full Text: PDF
APA, Harvard, Vancouver, ISO, and other styles
10

Yadav, Avaneesh Kumar, Ashish Kumar Maurya, Ranvijay, and Rama Shankar Yadav. "Extractive Text Summarization Using Recent Approaches: A Survey." Ingénierie des systèmes d information 26, no. 1 (February 28, 2021): 109–21. http://dx.doi.org/10.18280/isi.260112.

Full text
Abstract:
In this era of growing digital media, the volume of text data increases day by day from various sources and may contain entire documents, books, articles, etc. This amount of text is a source of information that may be insignificant, redundant, and sometimes may not carry any meaningful representation. Therefore, we require some techniques and tools that can automatically summarize the enormous amounts of text data and help us to decide whether they are useful or not. Text summarization is a process that generates a brief version of the document in the form of a meaningful summary. It can be classified into abstractive text summarization and extractive text summarization. Abstractive text summarization generates an abstract type of summary from the given document. In extractive text summarization, a summary is created from the given document that contains crucial sentences of the document. Many authors proposed various techniques for both types of text summarization. This paper presents a survey of extractive text summarization on graphical-based techniques. Specifically, it focuses on unsupervised and supervised techniques. This paper shows the recent works and advances on them and focuses on the strength and weaknesses of surveys of previous works in tabular form. At last, it concentrates on the evaluation measure techniques of summary.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Document Summarization"

1

Tohalino, Jorge Andoni Valverde. "Extractive document summarization using complex networks." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-24102018-155954/.

Full text
Abstract:
Due to a large amount of textual information available on the Internet, the task of automatic document summarization has gained significant importance. Document summarization became important because its focus is the development of techniques aimed at finding relevant and concise content in large volumes of information without changing its original meaning. The purpose of this Masters work is to use network theory concepts for extractive document summarization for both Single Document Summarization (SDS) and Multi-Document Summarization (MDS). In this work, the documents are modeled as networks, where sentences are represented as nodes with the aim of extracting the most relevant sentences through the use of ranking algorithms. The edges between nodes are established in different ways. The first approach for edge calculation is based on the number of common nouns between two sentences (network nodes). Another approach to creating an edge is through the similarity between two sentences. In order to calculate the similarity of such sentences, we used the vector space model based on Tf-Idf weighting and word embeddings for the vector representation of the sentences. Also, we make a distinction between edges linking sentences from different documents (inter-layer) and those connecting sentences from the same document (intra-layer) by using multilayer network models for the Multi-Document Summarization task. In this approach, each network layer represents a document of the document set that will be summarized. In addition to the measurements typically used in complex networks such as node degree, clustering coefficient, shortest paths, etc., the network characterization also is guided by dynamical measurements of complex networks, including symmetry, accessibility and absorption time. The generated summaries were evaluated by using different corpus for both Portuguese and English language. The ROUGE-1 metric was used for the validation of generated summaries. The results suggest that simpler models like Noun and Tf-Idf based networks achieved a better performance in comparison to those models based on word embeddings. Also, excellent results were achieved by using the multilayered representation of documents for MDS. Finally, we concluded that several measurements could be used to improve the characterization of networks for the summarization task.
Devido à grande quantidade de informações textuais disponíveis na Internet, a tarefa de sumarização automática de documentos ganhou importância significativa. A sumarização de documentos tornou-se importante porque seu foco é o desenvolvimento de técnicas destinadas a encontrar conteúdo relevante e conciso em grandes volumes de informação sem alterar seu significado original. O objetivo deste trabalho de Mestrado é usar os conceitos da teoria de grafos para o resumo extrativo de documentos para Sumarização mono-documento (SDS) e Sumarização multi-documento (MDS). Neste trabalho, os documentos são modelados como redes, onde as sentenças são representadas como nós com o objetivo de extrair as sentenças mais relevantes através do uso de algoritmos de ranqueamento. As arestas entre nós são estabelecidas de maneiras diferentes. A primeira abordagem para o cálculo de arestas é baseada no número de substantivos comuns entre duas sentenças (nós da rede). Outra abordagem para criar uma aresta é através da similaridade entre duas sentenças. Para calcular a similaridade de tais sentenças, foi usado o modelo de espaço vetorial baseado na ponderação Tf-Idf e word embeddings para a representação vetorial das sentenças. Além disso, fazemos uma distinção entre as arestas que vinculam sentenças de diferentes documentos (inter-camada) e aquelas que conectam sentenças do mesmo documento (intra-camada) usando modelos de redes multicamada para a tarefa de Sumarização multi-documento. Nesta abordagem, cada camada da rede representa um documento do conjunto de documentos que será resumido. Além das medições tipicamente usadas em redes complexas como grau dos nós, coeficiente de agrupamento, caminhos mais curtos, etc., a caracterização da rede também é guiada por medições dinâmicas de redes complexas, incluindo simetria, acessibilidade e tempo de absorção. Os resumos gerados foram avaliados usando diferentes corpus para Português e Inglês. A métrica ROUGE-1 foi usada para a validação dos resumos gerados. Os resultados sugerem que os modelos mais simples, como redes baseadas em Noun e Tf-Idf, obtiveram um melhor desempenho em comparação com os modelos baseados em word embeddings. Além disso, excelentes resultados foram obtidos usando a representação de redes multicamada de documentos para MDS. Finalmente, concluímos que várias medidas podem ser usadas para melhorar a caracterização de redes para a tarefa de sumarização.
APA, Harvard, Vancouver, ISO, and other styles
2

Ou, Shiyan, Christopher S. G. Khoo, and Dion H. Goh. "Automatic multi-document summarization for digital libraries." School of Communication & Information, Nanyang Technological University, 2006. http://hdl.handle.net/10150/106042.

Full text
Abstract:
With the rapid growth of the World Wide Web and online information services, more and more information is available and accessible online. Automatic summarization is an indispensable solution to reduce the information overload problem. Multi-document summarization is useful to provide an overview of a topic and allow users to zoom in for more details on aspects of interest. This paper reports three types of multi-document summaries generated for a set of research abstracts, using different summarization approaches: a sentence-based summary generated by a MEAD summarization system that extracts important sentences using various features, another sentence-based summary generated by extracting research objective sentences, and a variable-based summary focusing on research concepts and relationships. A user evaluation was carried out to compare the three types of summaries. The evaluation results indicated that the majority of users (70%) preferred the variable-based summary, while 55% of the users preferred the research objective summary, and only 25% preferred the MEAD summary.
APA, Harvard, Vancouver, ISO, and other styles
3

Huang, Fang. "Multi-document summarization with latent semantic analysis." Thesis, University of Sheffield, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.419255.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Grant, Harald. "Extractive Multi-document Summarization of News Articles." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-158275.

Full text
Abstract:
Publicly available data grows exponentially through web services and technological advancements. To comprehend large data-streams multi-document summarization (MDS) can be used. In this research, the area of multi-document summarization is investigated. Multiple systems for extractive multi-document summarization are implemented using modern techniques, in the form of the pre-trained BERT language model for word embeddings and sentence classification. This is combined with well proven techniques, in the form of the TextRank ranking algorithm, the Waterfall architecture and anti-redundancy filtering. The systems are evaluated on the DUC-2002, 2006 and 2007 datasets using the ROUGE metric. Where the results show that the BM25 sentence representation implemented in the TextRank model using the Waterfall architecture and an anti-redundancy technique outperforms the other implementations, providing competitive results with other state-of-the-art systems. A cohesive model is derived from the leading system and tried in a user study using a real-world application. The user study is conducted using a real-time news detection application with users from the news-domain. The study shows a clear favour for cohesive summaries in the case of extractive multi-document summarization. Where the cohesive summary is preferred in the majority of cases.
APA, Harvard, Vancouver, ISO, and other styles
5

Geiss, Johanna. "Latent semantic sentence clustering for multi-document summarization." Thesis, University of Cambridge, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609761.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Chellal, Abdelhamid. "Event summarization on social media stream : retrospective and prospective tweet summarization." Thesis, Toulouse 3, 2018. http://www.theses.fr/2018TOU30118/document.

Full text
Abstract:
Le contenu généré dans les médias sociaux comme Twitter permet aux utilisateurs d'avoir un aperçu rétrospectif d'évènement et de suivre les nouveaux développements dès qu'ils se produisent. Cependant, bien que Twitter soit une source d'information importante, il est caractérisé par le volume et la vélocité des informations publiées qui rendent difficile le suivi de l'évolution des évènements. Pour permettre de mieux tirer profit de ce nouveau vecteur d'information, deux tâches complémentaires de recherche d'information dans les médias sociaux ont été introduites : la génération de résumé rétrospectif qui vise à sélectionner les tweets pertinents et non redondant récapitulant "ce qui s'est passé" et l'envoi des notifications prospectives dès qu'une nouvelle information pertinente est détectée. Notre travail s'inscrit dans ce cadre. L'objectif de cette thèse est de faciliter le suivi d'événement, en fournissant des outils de génération de synthèse adaptés à ce vecteur d'information. Les défis majeurs sous-jacents à notre problématique découlent d'une part du volume, de la vélocité et de la variété des contenus publiés et, d'autre part, de la qualité des tweets qui peut varier d'une manière considérable. La tâche principale dans la notification prospective est l'identification en temps réel des tweets pertinents et non redondants. Le système peut choisir de retourner les nouveaux tweets dès leurs détections où bien de différer leur envoi afin de s'assurer de leur qualité. Dans ce contexte, nos contributions se situent à ces différents niveaux : Premièrement, nous introduisons Word Similarity Extended Boolean Model (WSEBM), un modèle d'estimation de la pertinence qui exploite la similarité entre les termes basée sur le word embedding et qui n'utilise pas les statistiques de flux. L'intuition sous- jacente à notre proposition est que la mesure de similarité à base de word embedding est capable de considérer des mots différents ayant la même sémantique ce qui permet de compenser le non-appariement des termes lors du calcul de la pertinence. Deuxièmement, l'estimation de nouveauté d'un tweet entrant est basée sur la comparaison de ses termes avec les termes des tweets déjà envoyés au lieu d'utiliser la comparaison tweet à tweet. Cette méthode offre un meilleur passage à l'échelle et permet de réduire le temps d'exécution. Troisièmement, pour contourner le problème du seuillage de pertinence, nous utilisons un classificateur binaire qui prédit la pertinence. L'approche proposée est basée sur l'apprentissage supervisé adaptatif dans laquelle les signes sociaux sont combinés avec les autres facteurs de pertinence dépendants de la requête. De plus, le retour des jugements de pertinence est exploité pour re-entrainer le modèle de classification. Enfin, nous montrons que l'approche proposée, qui envoie les notifications en temps réel, permet d'obtenir des performances prometteuses en termes de qualité (pertinence et nouveauté) avec une faible latence alors que les approches de l'état de l'art tendent à favoriser la qualité au détriment de la latence. Cette thèse explore également une nouvelle approche de génération du résumé rétrospectif qui suit un paradigme différent de la majorité des méthodes de l'état de l'art. Nous proposons de modéliser le processus de génération de synthèse sous forme d'un problème d'optimisation linéaire qui prend en compte la diversité temporelle des tweets. Les tweets sont filtrés et regroupés d'une manière incrémentale en deux partitions basées respectivement sur la similarité du contenu et le temps de publication. Nous formulons la génération du résumé comme étant un problème linéaire entier dans lequel les variables inconnues sont binaires, la fonction objective est à maximiser et les contraintes assurent qu'au maximum un tweet par cluster est sélectionné dans la limite de la longueur du résumé fixée préalablement
User-generated content on social media, such as Twitter, provides in many cases, the latest news before traditional media, which allows having a retrospective summary of events and being updated in a timely fashion whenever a new development occurs. However, social media, while being a valuable source of information, can be also overwhelming given the volume and the velocity of published information. To shield users from being overwhelmed by irrelevant and redundant posts, retrospective summarization and prospective notification (real-time summarization) were introduced as two complementary tasks of information seeking on document streams. The former aims to select a list of relevant and non-redundant tweets that capture "what happened". In the latter, systems monitor the live posts stream and push relevant and novel notifications as soon as possible. Our work falls within these frameworks and focuses on developing a tweet summarization approaches for the two aforementioned scenarios. It aims at providing summaries that capture the key aspects of the event of interest to help users to efficiently acquire information and follow the development of long ongoing events from social media. Nevertheless, tweet summarization task faces many challenges that stem from, on one hand, the high volume, the velocity and the variety of the published information and, on the other hand, the quality of tweets, which can vary significantly. In the prospective notification, the core task is the relevancy and the novelty detection in real-time. For timeliness, a system may choose to push new updates in real-time or may choose to trade timeliness for higher notification quality. Our contributions address these levels: First, we introduce Word Similarity Extended Boolean Model (WSEBM), a relevance model that does not rely on stream statistics and takes advantage of word embedding model. We used word similarity instead of the traditional weighting techniques. By doing this, we overcome the shortness and word mismatch issues in tweets. The intuition behind our proposition is that context-aware similarity measure in word2vec is able to consider different words with the same semantic meaning and hence allows offsetting the word mismatch issue when calculating the similarity between a tweet and a topic. Second, we propose to compute the novelty score of the incoming tweet regarding all words of tweets already pushed to the user instead of using the pairwise comparison. The proposed novelty detection method scales better and reduces the execution time, which fits real-time tweet filtering. Third, we propose an adaptive Learning to Filter approach that leverages social signals as well as query-dependent features. To overcome the issue of relevance threshold setting, we use a binary classifier that predicts the relevance of the incoming tweet. In addition, we show the gain that can be achieved by taking advantage of ongoing relevance feedback. Finally, we adopt a real-time push strategy and we show that the proposed approach achieves a promising performance in terms of quality (relevance and novelty) with low cost of latency whereas the state-of-the-art approaches tend to trade latency for higher quality. This thesis also explores a novel approach to generate a retrospective summary that follows a different paradigm than the majority of state-of-the-art methods. We consider the summary generation as an optimization problem that takes into account the topical and the temporal diversity. Tweets are filtered and are incrementally clustered in two cluster types, namely topical clusters based on content similarity and temporal clusters that depends on publication time. Summary generation is formulated as integer linear problem in which unknowns variables are binaries, the objective function is to be maximized and constraints ensure that at most one post per cluster is selected with respect to the defined summary length limit
APA, Harvard, Vancouver, ISO, and other styles
7

Linhares, Pontes Elvys. "Compressive Cross-Language Text Summarization." Thesis, Avignon, 2018. http://www.theses.fr/2018AVIG0232/document.

Full text
Abstract:
La popularisation des réseaux sociaux et des documents numériques a rapidement accru l'information disponible sur Internet. Cependant, cette quantité massive de données ne peut pas être analysée manuellement. Parmi les applications existantes du Traitement Automatique du Langage Naturel (TALN), nous nous intéressons dans cette thèse au résumé cross-lingue de texte, autrement dit à la production de résumés dans une langue différente de celle des documents sources. Nous analysons également d'autres tâches du TALN (la représentation des mots, la similarité sémantique ou encore la compression de phrases et de groupes de phrases) pour générer des résumés cross-lingues plus stables et informatifs. La plupart des applications du TALN, celle du résumé automatique y compris, utilisent une mesure de similarité pour analyser et comparer le sens des mots, des séquences de mots, des phrases et des textes. L’une des façons d'analyser cette similarité est de générer une représentation de ces phrases tenant compte de leur contenu. Le sens des phrases est défini par plusieurs éléments, tels que le contexte des mots et des expressions, l'ordre des mots et les informations précédentes. Des mesures simples, comme la mesure cosinus et la distance euclidienne, fournissent une mesure de similarité entre deux phrases. Néanmoins, elles n'analysent pas l'ordre des mots ou les séquences de mots. En analysant ces problèmes, nous proposons un modèle de réseau de neurones combinant des réseaux de neurones récurrents et convolutifs pour estimer la similarité sémantique d'une paire de phrases (ou de textes) en fonction des contextes locaux et généraux des mots. Sur le jeu de données analysé, notre modèle a prédit de meilleurs scores de similarité que les systèmes de base en analysant mieux le sens local et général des mots mais aussi des expressions multimots. Afin d'éliminer les redondances et les informations non pertinentes de phrases similaires, nous proposons de plus une nouvelle méthode de compression multiphrase, fusionnant des phrases au contenu similaire en compressions courtes. Pour ce faire, nous modélisons des groupes de phrases semblables par des graphes de mots. Ensuite, nous appliquons un modèle de programmation linéaire en nombres entiers qui guide la compression de ces groupes à partir d'une liste de mots-clés ; nous cherchons ainsi un chemin dans le graphe de mots qui a une bonne cohésion et qui contient le maximum de mots-clés. Notre approche surpasse les systèmes de base en générant des compressions plus informatives et plus correctes pour les langues française, portugaise et espagnole. Enfin, nous combinons les méthodes précédentes pour construire un système de résumé de texte cross-lingue. Notre système génère des résumés cross-lingue de texte en analysant l'information à la fois dans les langues source et cible, afin d’identifier les phrases les plus pertinentes. Inspirés par les méthodes de résumé de texte par compression en analyse monolingue, nous adaptons notre méthode de compression multiphrase pour ce problème afin de ne conserver que l'information principale. Notre système s'avère être performant pour compresser l'information redondante et pour préserver l'information pertinente, en améliorant les scores d'informativité sans perdre la qualité grammaticale des résumés cross-lingues du français vers l'anglais. En analysant les résumés cross-lingues depuis l’anglais, le français, le portugais ou l’espagnol, vers l’anglais ou le français, notre système améliore les systèmes par extraction de l'état de l'art pour toutes ces langues. En outre, une expérience complémentaire menée sur des transcriptions automatiques de vidéo montre que notre approche permet là encore d'obtenir des scores ROUGE meilleurs et plus stables, même pour ces documents qui présentent des erreurs grammaticales et des informations inexactes ou manquantes
The popularization of social networks and digital documents increased quickly the informationavailable on the Internet. However, this huge amount of data cannot be analyzedmanually. Natural Language Processing (NLP) analyzes the interactions betweencomputers and human languages in order to process and to analyze natural languagedata. NLP techniques incorporate a variety of methods, including linguistics, semanticsand statistics to extract entities, relationships and understand a document. Amongseveral NLP applications, we are interested, in this thesis, in the cross-language textsummarization which produces a summary in a language different from the languageof the source documents. We also analyzed other NLP tasks (word encoding representation,semantic similarity, sentence and multi-sentence compression) to generate morestable and informative cross-lingual summaries.Most of NLP applications (including all types of text summarization) use a kind ofsimilarity measure to analyze and to compare the meaning of words, chunks, sentencesand texts in their approaches. A way to analyze this similarity is to generate a representationfor these sentences that contains the meaning of them. The meaning of sentencesis defined by several elements, such as the context of words and expressions, the orderof words and the previous information. Simple metrics, such as cosine metric andEuclidean distance, provide a measure of similarity between two sentences; however,they do not analyze the order of words or multi-words. Analyzing these problems,we propose a neural network model that combines recurrent and convolutional neuralnetworks to estimate the semantic similarity of a pair of sentences (or texts) based onthe local and general contexts of words. Our model predicted better similarity scoresthan baselines by analyzing better the local and the general meanings of words andmulti-word expressions.In order to remove redundancies and non-relevant information of similar sentences,we propose a multi-sentence compression method that compresses similar sentencesby fusing them in correct and short compressions that contain the main information ofthese similar sentences. We model clusters of similar sentences as word graphs. Then,we apply an integer linear programming model that guides the compression of theseclusters based on a list of keywords. We look for a path in the word graph that has goodcohesion and contains the maximum of keywords. Our approach outperformed baselinesby generating more informative and correct compressions for French, Portugueseand Spanish languages. Finally, we combine these previous methods to build a cross-language text summarizationsystem. Our system is an {English, French, Portuguese, Spanish}-to-{English,French} cross-language text summarization framework that analyzes the informationin both languages to identify the most relevant sentences. Inspired by the compressivetext summarization methods in monolingual analysis, we adapt our multi-sentencecompression method for this problem to just keep the main information. Our systemproves to be a good alternative to compress redundant information and to preserve relevantinformation. Our system improves informativeness scores without losing grammaticalquality for French-to-English cross-lingual summaries. Analyzing {English,French, Portuguese, Spanish}-to-{English, French} cross-lingual summaries, our systemsignificantly outperforms extractive baselines in the state of the art for all these languages.In addition, we analyze the cross-language text summarization of transcriptdocuments. Our approach achieved better and more stable scores even for these documentsthat have grammatical errors and missing information
APA, Harvard, Vancouver, ISO, and other styles
8

Kipp, Darren. "Shallow semantics for topic-oriented multi-document automatic text summarization." Thesis, University of Ottawa (Canada), 2008. http://hdl.handle.net/10393/27772.

Full text
Abstract:
There are presently a number of NLP tools available which can provide semantic information about a sentence. Connexor Machinese Semantics is one of the most elaborate of such tools in terms of the information it provides. It has been hypothesized that semantic analysis of sentences is required in order to make significant improvements in automatic summarization. Elaborate semantic analysis is still not particularly feasible. In this thesis, I will look at what shallow semantic features are available from an off the shelf semantic analysis tool which might improve the responsiveness of a summary. The aim of this work is to use the information made available as an intermediary approach to improving the responsiveness of summaries. While this approach is not likely to perform as well as full semantic analysis, it is considerably easier to achieve and could provide an important stepping stone in the direction of deeper semantic analysis. As a significant portion of this task we develop mechanisms in various programming languages to view, process, and extract relevant information and features from the data.
APA, Harvard, Vancouver, ISO, and other styles
9

Hennig, Leonhard Verfasser], and Sahin [Akademischer Betreuer] [Albayrak. "Content Modeling for Automatic Document Summarization / Leonhard Hennig. Betreuer: Sahin Albayrak." Berlin : Universitätsbibliothek der Technischen Universität Berlin, 2011. http://d-nb.info/1017593698/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Tsai, Chun-I. "A Study on Neural Network Modeling Techniques for Automatic Document Summarization." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-395940.

Full text
Abstract:
With the Internet becoming widespread, countless articles and multimedia content have been filled in our daily life. How to effectively acquire the knowledge we seek becomes one of the unavoidable issues. To help people to browse the main theme of the document faster, many studies are dedicated to automatic document summarization, which aims to condense one or more documents into a short text yet still keep its essential content as much as possible. Automatic document summarization can be categorized into extractive and abstractive. Extractive summarization selects the most relevant set of sentences to a target ratio and assemble them into a concise summary. On the other hand, abstractive summarization produces an abstract after understanding the key concept of a document. The recent past has seen a surge of interest in developing deep neural network-based supervised methods for both types of automatic summarization. This thesis presents a continuation of this line and exploit two kinds of frameworks, which integrate convolutional neural network (CNN), long short-term memory (LSTM) and multilayer perceptron (MLP) for extractive speech summarization. The empirical results seem to demonstrate the effectiveness of neural summarizers when compared with other conventional supervised methods. Finally, to further explore the ability of neural networks, we experiment and analyze the results of applying sequence-to-sequence neural networks for abstractive summarization.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Document Summarization"

1

Hovy, Eduard. Text Summarization. Edited by Ruslan Mitkov. Oxford University Press, 2012. http://dx.doi.org/10.1093/oxfordhb/9780199276349.013.0032.

Full text
Abstract:
This article describes research and development on the automated creation of summaries of one or more texts. It defines the concept of summary and presents an overview of the principal approaches in summarization. It describes the design, implementation, and performance of various summarization systems. The stages of automated text summarization are topic identification, interpretation, and summary generation, each having its sub stages. Due to the challenges involved, multi-document summarization is much less developed than single-document summarization. This article reviews particular techniques used in several summarization systems. Finally, this article assesses the methods of evaluating summaries. This article reviews evaluation strategies, from previous evaluation studies, to the two-basic measures method. Summaries are so task and genre specific; therefore, no single measurement covers all cases of evaluation
APA, Harvard, Vancouver, ISO, and other styles
2

Innovative Document Summarization Techniques Revolutionizing Knowledge Understanding. Idea Group,U.S., 2014.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Jacquemin, Christian, and Didier Bourigault. Term Extraction and Automatic Indexing. Edited by Ruslan Mitkov. Oxford University Press, 2012. http://dx.doi.org/10.1093/oxfordhb/9780199276349.013.0033.

Full text
Abstract:
Terms are pervasive in scientific and technical documents and their identification is a crucial issue for any application dealing with the analysis, understanding, generation, or translation of such documents. In particular, the ever-growing mass of specialized documentation available on-line, in industrial and governmental archives or in digital libraries, calls for advances in terminology processing for tasks such as information retrieval, cross-language querying, indexing of multimedia documents, translation aids, document routing and summarization, etc. This article presents a new domain of research and development in natural language processing (NLP) that is concerned with the representation, acquisition, and recognition of terms. It begins with presenting the basic notions about the concept of ‘terms’, ranging from the classical view, to the recent concepts. There are two main areas of research involving terminology in NLP, which are, term acquisition and term recognition. Finally, this article presents the recent advances and prospects in term acquisition and automatic indexing.
APA, Harvard, Vancouver, ISO, and other styles
4

Hirschman, Lynette, and Inderjeet Mani. Evaluation. Edited by Ruslan Mitkov. Oxford University Press, 2012. http://dx.doi.org/10.1093/oxfordhb/9780199276349.013.0022.

Full text
Abstract:
The commercial success of natural language (NL) technology has raised the technical criticality of evaluation. Choices of evaluation methods depend on software life cycles, typically charting four stages — research, advance prototype, operational prototype, and product. At the prototype stage, embedded evaluation can prove helpful. Analysis components can be loose grouped viz., segmentation, tagging, extracting information, and document threading. Output technologies such as text summarization can be evaluated in terms of intrinsic and extrinsic measures, the former checking for quality and informativeness and the latter, for efficiency and acceptability, in some tasks. ‘Post edit measures’ commonly used in machine translation, determine the amount of correction required to obtain a desirable output. Evaluation of interactive systems typically evaluates the system and the user as one team and deploys subject variability, which runs enough subjects to obtain statistical validity hence, incurring substantial costs. Evaluation being a social activity, creates a community for internal technical comparison, via shared evaluation criteria.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Document Summarization"

1

Torres-Moreno, Juan-Manuel. "Single-Document Summarization." In Automatic Text Summarization, 53–108. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2014. http://dx.doi.org/10.1002/9781119004752.ch3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Torres-Moreno, Juan-Manuel. "Evaluating Document Summaries." In Automatic Text Summarization, 243–73. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2014. http://dx.doi.org/10.1002/9781119004752.ch8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Torres-Moreno, Juan-Manuel. "Guided Multi-Document Summarization." In Automatic Text Summarization, 109–50. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2014. http://dx.doi.org/10.1002/9781119004752.ch4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Ramanathan, Krishnan, Yogesh Sankarasubramaniam, Nidhi Mathur, and Ajay Gupta. "Document Summarization using Wikipedia." In Proceedings of the First International Conference on Intelligent Human Computer Interaction, 254–60. New Delhi: Springer India, 2009. http://dx.doi.org/10.1007/978-81-8489-203-1_25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Sonawane, Sheetal, Archana Ghotkar, and Sonam Hinge. "Context-Based Multi-document Summarization." In Advances in Intelligent Systems and Computing, 153–65. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-1540-4_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Bathija, Richeeka, Pranav Agarwal, Rakshith Somanna, and G. B. Pallavi. "Multi-document Text Summarization Tool." In Evolutionary Computing and Mobile Sustainable Networks, 683–91. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-5258-8_63.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Kumaresh, Nandhini, and Balasundaram Sadhu Ramakrishnan. "Graph Based Single Document Summarization." In Lecture Notes in Computer Science, 32–35. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-27872-3_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Carrillo-Mendoza, Pabel, Hiram Calvo, and Alexander Gelbukh. "Intra-document and Inter-document Redundancy in Multi-document Summarization." In Advances in Computational Intelligence, 105–15. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-62434-1_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wan, Xiaojun. "Document-Based HITS Model for Multi-document Summarization." In PRICAI 2008: Trends in Artificial Intelligence, 454–65. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-89197-0_42.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Afantenos, Stergos D., Irene Doura, Eleni Kapellou, and Vangelis Karkaletsis. "Exploiting Cross-Document Relations for Multi-document Evolving Summarization." In Methods and Applications of Artificial Intelligence, 410–19. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-24674-9_43.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Document Summarization"

1

Wang, Fu Lee, Tak-Lam Wong, and Aston Nai Hong Mak. "Organization of Documents for Multiple Document Summarization." In 2008 Seventh International Conference on Web-based Learning, ICWL. IEEE, 2008. http://dx.doi.org/10.1109/icwl.2008.6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Jin, Hanqi, and Xiaojun Wan. "Abstractive Multi-Document Summarization via Joint Learning with Single-Document Summarization." In Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.findings-emnlp.231.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Christensen, Janara, Stephen Soderland, Gagan Bansal, and Mausam. "Hierarchical Summarization: Scaling Up Multi-Document Summarization." In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2014. http://dx.doi.org/10.3115/v1/p14-1085.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Ranjitha, N. S., and Jagadish S. Kallimani. "Abstractive multi-document summarization." In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, 2017. http://dx.doi.org/10.1109/icacci.2017.8126086.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Hu, Meishan, Aixin Sun, and Ee-Peng Lim. "Comments-oriented document summarization." In the 31st annual international ACM SIGIR conference. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1390334.1390385.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kishore, V. V. Krishna, and Pramod Kumar Singh. "Multiple data document summarization." In 2017 Conference on Information and Communication Technology (CICT). IEEE, 2017. http://dx.doi.org/10.1109/infocomtech.2017.8340602.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zhu, Junyan, Can Wang, Xiaofei He, Jiajun Bu, Chun Chen, Shujie Shang, Mingcheng Qu, and Gang Lu. "Tag-oriented document summarization." In the 18th international conference. New York, New York, USA: ACM Press, 2009. http://dx.doi.org/10.1145/1526709.1526925.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Wang, Feng, and Bernard Merialdo. "Multi-document video summarization." In 2009 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2009. http://dx.doi.org/10.1109/icme.2009.5202747.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Yapinus, Glorian, Alva Erwin, Maulhikmah Galinium, and Wahyu Muliady. "Automatic multi-document summarization for Indonesian documents using hybrid abstractive-extractive summarization technique." In 2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE). IEEE, 2014. http://dx.doi.org/10.1109/iciteed.2014.7007896.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Naveen, Gopal K. R., and Prema Nedungadi. "Query-based Multi-Document Summarization by Clustering of Documents." In the 2014 International Conference. New York, New York, USA: ACM Press, 2014. http://dx.doi.org/10.1145/2660859.2660972.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Document Summarization"

1

Sekine, Satoshi, and Chikashi Nobata. A Survey for Multi-Document Summarization. Fort Belvoir, VA: Defense Technical Information Center, January 2003. http://dx.doi.org/10.21236/ada460234.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Siddharthan, Advaith, Ani Nenkova, and Kathleen McKeown. Syntactic Simplification for Improving Content Selection in Multi-Document Summarization. Fort Belvoir, VA: Defense Technical Information Center, January 2004. http://dx.doi.org/10.21236/ada457833.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kaplin, David B. Automatic Summarization with Sloth (Summarizes Lengthy Documents and Outputs The Highlights). Fort Belvoir, VA: Defense Technical Information Center, November 2002. http://dx.doi.org/10.21236/ada408523.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography