Academic literature on the topic 'Sentence compression'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Sentence compression.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Sentence compression"

1

Kamigaito, Hidetaka, and Manabu Okumura. "Syntactically Look-Ahead Attention Network for Sentence Compression." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 8050–57. http://dx.doi.org/10.1609/aaai.v34i05.6315.

Full text
Abstract:
Sentence compression is the task of compressing a long sentence into a short one by deleting redundant words. In sequence-to-sequence (Seq2Seq) based models, the decoder unidirectionally decides to retain or delete words. Thus, it cannot usually explicitly capture the relationships between decoded words and unseen words that will be decoded in the future time steps. Therefore, to avoid generating ungrammatical sentences, the decoder sometimes drops important words in compressing sentences. To solve this problem, we propose a novel Seq2Seq model, syntactically look-ahead attention network (SLAHAN), that can generate informative summaries by explicitly tracking both dependency parent and child words during decoding and capturing important words that will be decoded in the future. The results of the automatic evaluation on the Google sentence compression dataset showed that SLAHAN achieved the best kept-token-based-F1, ROUGE-1, ROUGE-2 and ROUGE-L scores of 85.5, 79.3, 71.3 and 79.1, respectively. SLAHAN also improved the summarization performance on longer sentences. Furthermore, in the human evaluation, SLAHAN improved informativeness without losing readability.
APA, Harvard, Vancouver, ISO, and other styles
2

Choi, Su Jeong, Ian Jung, Seyoung Park, and Seong-Bae Park. "Abstractive Sentence Compression with Event Attention." Applied Sciences 9, no. 19 (2019): 3949. http://dx.doi.org/10.3390/app9193949.

Full text
Abstract:
Sentence compression aims at generating a shorter sentence from a long and complex source sentence while preserving the important content of the source sentence. Since it provides enhanced comprehensibility and readability to readers, sentence compression is required for summarizing news articles in which event words play a key role in delivering the meaning of the source sentence. Therefore, this paper proposes an abstractive sentence compression with event attention. In compressing a sentence of news articles, event words should be preserved as important information for sentence compression. For this, event attention is proposed which focuses on the event words of the source sentence in generating a compressed sentence. The global information in the source sentence is as significant as event words, since it captures the information of a whole source sentence. As a result, the proposed model generates a compressed sentence by combining both attentions. According to experimental results, the proposed model outperforms both the normal sequence-to-sequence model and the pointer generator on three datasets, namely the MSR dataset, Filippova dataset, and Korean sentence compression dataset. In particular, it shows 122% higher BLEU score than the sequence-to-sequence model. Therefore, the proposed model is effective in sentence compression.
APA, Harvard, Vancouver, ISO, and other styles
3

Clarke, J., and M. Lapata. "Global Inference for Sentence Compression: An Integer Linear Programming Approach." Journal of Artificial Intelligence Research 31 (March 11, 2008): 399–429. http://dx.doi.org/10.1613/jair.2433.

Full text
Abstract:
Sentence compression holds promise for many applications ranging from summarization to subtitle generation. Our work views sentence compression as an optimization problem and uses integer linear programming (ILP) to infer globally optimal compressions in the presence of linguistically motivated constraints. We show how previous formulations of sentence compression can be recast as ILPs and extend these models with novel global constraints. Experimental results on written and spoken texts demonstrate improvements over state-of-the-art models.
APA, Harvard, Vancouver, ISO, and other styles
4

Alias, Suraya, Mohd Shamrie Sainin, and Siti Khaotijah Mohammad. "A SYNTACTIC-BASED SENTENCE VALIDATION TECHNIQUE FOR MALAY TEXT SUMMARIZER." Journal of Information and Communication Technology 20, Number 3 (2021): 329–52. http://dx.doi.org/10.32890/jict2021.20.3.3.

Full text
Abstract:
In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence's grammar. The existing development of Malay Natural Language Processing (NLP) tools is still under study with limited open access. The issue is the lack of a benchmark dataset in the Malay language to evaluate the quality of the summaries and to validate the compressed sentence produced by the summarizer model. Hence, our paper outlines a Syntactic-based Sentence Validation technique for Malay sentences by referring to the Malay Grammar Pattern. In this work, we propose a new derivation set of Syntactic Rules based on the Malay main Word Class to validate a Malay sentence that undergoes the SC procedure. We experimented using the Malay dataset of 100 new articles covering the Natural Disaster and Events domain to find the optimal compression rate and its effect on the summary content. An automatic evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) produced a result with an average F-measure of 0.5826 and an average Recall value of 0.5925 with an optimum compression rate of 0.5 Confidence Conf value. Furthermore, a manual summary evaluation by a group of Malay experts on the grammaticality of the compressed summary sentence produced a good result of 4.11 and a readability score of 4.12 out of 5. This depicts the reliability of the proposed technique to validate the Malay sentence with promising summary content and readability results.
APA, Harvard, Vancouver, ISO, and other styles
5

Clarke, James, and Mirella Lapata. "Discourse Constraints for Document Compression." Computational Linguistics 36, no. 3 (2010): 411–41. http://dx.doi.org/10.1162/coli_a_00004.

Full text
Abstract:
Sentence compression holds promise for many applications ranging from summarization to subtitle generation. The task is typically performed on isolated sentences without taking the surrounding context into account, even though most applications would operate over entire documents. In this article we present a discourse-informed model which is capable of producing document compressions that are coherent and informative. Our model is inspired by theories of local coherence and formulated within the framework of integer linear programming. Experimental results show significant improvements over a state-of-the-art discourse agnostic approach.
APA, Harvard, Vancouver, ISO, and other styles
6

Sahoo, Deepak, and Rakesh Chandra Balabantaray. "Single-Sentence Compression using XGBoost." International Journal of Information Retrieval Research 9, no. 3 (2019): 1–11. http://dx.doi.org/10.4018/ijirr.2019070101.

Full text
Abstract:
Sentence compression is known as presenting a sentence in a fewer number of words compared to its original one without changing the meaning. Recent works on sentence compression formulates the problem as an integer linear programming problem (ILP) then solves it using an external ILP-solver which suffers from slow running time. In this article, the sentence compression task is formulated as a two-class classification problem and used a gradient boosting technique to solve the problem. Different models are created using two different datasets. The best model is taken for evaluation. The quality of compression is measured using two important quality measures, informativeness and compression rate. This article has achieved 70.2 percent in informativeness and 38.62 percent in compression rate.
APA, Harvard, Vancouver, ISO, and other styles
7

Cohn, T. A., and M. Lapata. "Sentence Compression as Tree Transduction." Journal of Artificial Intelligence Research 34 (April 24, 2009): 637–74. http://dx.doi.org/10.1613/jair.2655.

Full text
Abstract:
This paper presents a tree-to-tree transduction method for sentence compression. Our model is based on synchronous tree substitution grammar, a formalism that allows local distortion of the tree topology and can thus naturally capture structural mismatches. We describe an algorithm for decoding in this framework and show how the model can be trained discriminatively within a large margin framework. Experimental results on sentence compression bring significant improvements over a state-of-the-art model.
APA, Harvard, Vancouver, ISO, and other styles
8

Knight, Kevin, and Daniel Marcu. "Summarization beyond sentence extraction: A probabilistic approach to sentence compression." Artificial Intelligence 139, no. 1 (2002): 91–107. http://dx.doi.org/10.1016/s0004-3702(02)00222-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Li, Zuchao, Rui Wang, Kehai Chen, et al. "Explicit Sentence Compression for Neural Machine Translation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 8311–18. http://dx.doi.org/10.1609/aaai.v34i05.6347.

Full text
Abstract:
State-of-the-art Transformer-based neural machine translation (NMT) systems still follow a standard encoder-decoder framework, in which source sentence representation can be well done by an encoder with self-attention mechanism. Though Transformer-based encoder may effectively capture general information in its resulting source sentence representation, the backbone information, which stands for the gist of a sentence, is not specifically focused on. In this paper, we propose an explicit sentence compression method to enhance the source sentence representation for NMT. In practice, an explicit sentence compression goal used to learn the backbone information in a sentence. We propose three ways, including backbone source-side fusion, target-side fusion, and both-side fusion, to integrate the compressed sentence into NMT. Our empirical tests on the WMT English-to-French and English-to-German translation tasks show that the proposed sentence compression method significantly improves the translation performances over strong baselines.
APA, Harvard, Vancouver, ISO, and other styles
10

Souza, Pamela E., Kathryn H. Arehart, James M. Kates, Naomi B. H. Croghan, and Namita Gehani. "Exploring the Limits of Frequency Lowering." Journal of Speech, Language, and Hearing Research 56, no. 5 (2013): 1349–63. http://dx.doi.org/10.1044/1092-4388(2013/12-0151).

Full text
Abstract:
Purpose This study examined how frequency lowering affected sentence intelligibility and quality for adults with postlingually acquired, mild-to-moderate hearing loss. Method Listeners included adults aged 60–92 years with sloping sensorineural hearing loss and a control group of similarly aged adults with normal hearing. Sentences were presented in quiet and babble at a range of signal-to-noise ratios. Intelligibility and quality were measured with varying amounts of frequency lowering, implemented using a form of frequency compression. Results Moderate amounts of compression, particularly with high cutoff frequencies, had minimal effects on sentence intelligibility. Listeners with the greatest high-frequency hearing loss showed the greatest benefit. Sentence intelligibility decreased with more compression. Listeners were more affected by a given set of parameters in noise than in quiet. In quiet, any amount of compression resulted in lower speech quality for most listeners, with the greatest degradation for listeners with better high-frequency hearing. Quality ratings were lower with background noise, and in noise, the effect of changing compression parameters was small. Conclusions The benefits of frequency lowering in adults were affected by the compression parameters as well as individual hearing thresholds. The data are consistent with the idea that frequency lowering can be viewed in terms of improved audibility versus increased distortion trade-off.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Sentence compression"

1

Egawa, Seiji, Yoshihide Kato, and Shigeki Matsubara. "Sentence Compression by Structural Conversion of Parse Tree." IEEE, 2008. http://hdl.handle.net/2237/12140.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Nóbrega, Fernando Antônio Asevêdo. "Sumarização Automática de Atualização para a língua portuguesa." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-30072018-090806/.

Full text
Abstract:
O enorme volume de dados textuais disponível na web caracteriza-se como um cenário ideal para inúmeras aplicações do Processamento de Língua Natural, tal como a tarefa da Sumarização Automática de Atualização (SAA), que tem por objetivo a geração automática de resumos a partir de uma coleção textual admitindo-se que o leitor possui algum conhecimento prévio sobre os textos-fonte. Dessa forma, um bom resumo de atualização deve ser constituído pelas informações mais relevantes, novas e atualizadas com relação ao conhecimento prévio do leitor. Essa tarefa implica em diversos desafios, sobretudo nas etapas de seleção e síntese de conteúdo para o sumário. Embora existam inúmeras abordagens na literatura, com diferentes níveis de complexidade teórica e computacional, pouco dessas investigações fazem uso de algum conhecimento linguístico profundo, que pode auxiliar a identificação de conteúdo mais relevante e atualizado. Além disso, os métodos de sumarização comumente empregam uma abordagem de síntese extrativa, na qual algumas sentenças dos textos-fonte são selecionadas e organizadas para compor o sumário sem alteração de seu conteúdo. Tal abordagem pode limitar a informatividade do sumário, uma vez que alguns segmentos sentenciais podem conter informação redundante ou irrelevante ao leitor. Assim, esforços recentes foram direcionados à síntese compressiva, na qual alguns segmentos das sentenças selecionadas para o sumário são removidos previamente à inserção no sumário. Nesse cenário, este trabalho de doutorado teve por objetivo a investigação do uso de conhecimentos linguísticos, como a Teoria Discursiva Multidocumento (CST), Segmentação de Subtópicos e Reconhecimento de Entidades Nomeadas, em distintas abordagens de seleção de conteúdo por meio das sínteses extrativas e compressivas visando à produção de sumários de atualização mais informativos. Tendo a língua Portuguesa como principal objeto de estudo, foram organizados três novos córpus, o CSTNews-Update, que viabiliza experimentos de SAA, e o PCSC-Pares e G1-Pares, para o desenvolvimento/avaliação de métodos de Compressão Sentencial. Ressalta-se que os experimentos de sumarização foram também realizados para a língua inglesa. Após as experimentações, observou-se que a Segmentação de Subtópicos foi mais efetiva para a produção de sumários mais informativos, porém, em apenas poucas abordagens de seleção de conteúdo. Além disso, foram propostas algumas simplificações para o método DualSum por meio da distribuição de Subtópicos. Tais métodos apresentaram resultados muito satisfatórios com menor complexidade computacional. Visando a produção de sumários compressivos, desenvolveram-se inúmeros métodos de Compressão Sentencial por meio de algoritmos de Aprendizado de Máquina. O melhor método proposto apresentou resultados superiores a um trabalho do estado da arte, que faz uso de algoritmos de Deep Learning. Além dos resultados supracitados, ressalta-se que anteriormente a este trabalho, a maioria das investigações de Sumarização Automática para a língua Portuguesa foi direcionada à geração de sumários a partir de um (monodocumento) ou vários textos relacionados (multidocumento) por meio da síntese extrativa, sobretudo pela ausência se recursos que viabilizassem a expansão da área de Sumarização Automática para esse idioma. Assim, as contribuições deste trabalho engajam-se em três campos, nos métodos de SAA propostos com conhecimento linguísticos, nos métodos de Compressão Sentencial e nos recursos desenvolvidos para a língua Portuguesa.<br>The huge amount of data that is available online is the main motivation for many tasks of Natural Language Processing, as the Update Summarization (US) which aims to produce a summary from a collection of related texts under the assumption the user/reader has some previous knowledge about the texts subject. Thus, a good update summary must be produced with the most relevant, new and updated content in order to assist the user. This task presents many research challenges, mainly in the processes of content selection and synthesis of the summary. Although there are several approaches for US, most of them do not use of some linguistic information that may assist the identification relevant content for the summary/user. Furthermore, US methods frequently apply an extractive synthesis approach, in which the summary is produced by picking some sentences from the source texts without rewriting operations. Once some segments of the picked sentences may contain redundant or irrelevant content, this synthesis process can to reduce the summary informativeness. Thus, some recent efforts in this field have focused in the compressive synthesis approach, in which some sentences are compressed by deletion of tokens or rewriting operations before be inserted in the output summary. Given this background, this PhD research has investigated the use of some linguistic information, as the Cross Document Theory (CST), Subtopic Segmentation and Named Entity Recognition into distinct content selection approaches for US by use extractive and compressive synthesis process in order to produce more informative update summaries. Once we have focused on the Portuguese language, we have compiled three new resources for this language, the CSTNews-Update, which allows the investigation of US methods for this language, the PCST-Pairs and G1-Pairs, in which there are pairs of original and compressed sentences in order to produce methods of sentence compression. It is important to say we also have performed experiments for the English language, in which there are more resources. The results show the Subtopic Segmentation assists the production of better summaries, however, this have occurred just on some content selection approaches. Furthermore, we also have proposed a simplification for the method DualSum by use Subtopic Segments. These simplifications require low computation power than DualSum and they have presented very satisfactory results. Aiming the production of compressive summaries, we have proposed different compression methods by use machine learning techniques. Our better proposed method present quality similar to a state-of-art system, which is based on Deep Learning algorithms. Previously this investigation, most of the researches on the Automatic Summarization field for the Portuguese language was focused on previous traditional tasks, as the production of summaries from one and many texts that does not consider the user knowledge, by use extractive synthesis processes. Thus, beside our proposed US systems based on linguistic information, which were evaluated over English and Portuguese datasets, we have produced many Compressions Methods and three new resources that will assist the expansion of the Automatic Summarization field for the Portuguese Language.
APA, Harvard, Vancouver, ISO, and other styles
3

Matsubara, Shigeki, Yoshihide Kato, and Seiji Egawa. "Sentence Compression by Removing Recursive Structure from Parse Tree." Springer, 2008. http://hdl.handle.net/2237/15113.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Clarke, James. "Global inference for sentence compression : an integer linear programming approach." Thesis, University of Edinburgh, 2008. http://hdl.handle.net/1842/2384.

Full text
Abstract:
In this thesis we develop models for sentence compression. This text rewriting task has recently attracted a lot of attention due to its relevance for applications (e.g., summarisation) and simple formulation by means of word deletion. Previous models for sentence compression have been inherently local and thus fail to capture the long range dependencies and complex interactions involved in text rewriting. We present a solution by framing the task as an optimisation problem with local and global constraints and recast existing compression models into this framework. Using the constraints we instil syntactic, semantic and discourse knowledge the models otherwise fail to capture. We show that the addition of constraints allow relatively simple local models to reach state-of-the-art performance for sentence compression. The thesis provides a detailed study of sentence compression and its models. The differences between automatic and manually created compression corpora are assessed along with how compression varies across written and spoken text. We also discuss various techniques for automatically and manually evaluating compression output against a gold standard. Models are reviewed based on their assumptions, training requirements, and scalability. We introduce a general method for extending previous approaches to allow for more global models. This is achieved through the optimisation framework of Integer Linear Programming (ILP). We reformulate three compression models: an unsupervised model, a semi-supervised model and a fully supervised model as ILP problems and augment them with constraints. These constraints are intuitive for the compression task and are both syntactically and semantically motivated. We demonstrate how they improve compression quality and reduce the requirements on training material. Finally, we delve into document compression where the task is to compress every sentence of a document and use the resulting summary as a replacement for the original document. For document-based compression we investigate discourse information and its application to the compression task. Two discourse theories, Centering and lexical chains, are used to automatically annotate documents. These annotations are then used in our compression framework to impose additional constraints on the resulting document. The goal is to preserve the discourse structure of the original document and most of its content. We show how a discourse informed compression model can outperform a discourse agnostic state-of-the-art model using a question answering evaluation paradigm.
APA, Harvard, Vancouver, ISO, and other styles
5

Linhares, Pontes Elvys. "Compressive Cross-Language Text Summarization." Thesis, Avignon, 2018. http://www.theses.fr/2018AVIG0232/document.

Full text
Abstract:
La popularisation des réseaux sociaux et des documents numériques a rapidement accru l'information disponible sur Internet. Cependant, cette quantité massive de données ne peut pas être analysée manuellement. Parmi les applications existantes du Traitement Automatique du Langage Naturel (TALN), nous nous intéressons dans cette thèse au résumé cross-lingue de texte, autrement dit à la production de résumés dans une langue différente de celle des documents sources. Nous analysons également d'autres tâches du TALN (la représentation des mots, la similarité sémantique ou encore la compression de phrases et de groupes de phrases) pour générer des résumés cross-lingues plus stables et informatifs. La plupart des applications du TALN, celle du résumé automatique y compris, utilisent une mesure de similarité pour analyser et comparer le sens des mots, des séquences de mots, des phrases et des textes. L’une des façons d'analyser cette similarité est de générer une représentation de ces phrases tenant compte de leur contenu. Le sens des phrases est défini par plusieurs éléments, tels que le contexte des mots et des expressions, l'ordre des mots et les informations précédentes. Des mesures simples, comme la mesure cosinus et la distance euclidienne, fournissent une mesure de similarité entre deux phrases. Néanmoins, elles n'analysent pas l'ordre des mots ou les séquences de mots. En analysant ces problèmes, nous proposons un modèle de réseau de neurones combinant des réseaux de neurones récurrents et convolutifs pour estimer la similarité sémantique d'une paire de phrases (ou de textes) en fonction des contextes locaux et généraux des mots. Sur le jeu de données analysé, notre modèle a prédit de meilleurs scores de similarité que les systèmes de base en analysant mieux le sens local et général des mots mais aussi des expressions multimots. Afin d'éliminer les redondances et les informations non pertinentes de phrases similaires, nous proposons de plus une nouvelle méthode de compression multiphrase, fusionnant des phrases au contenu similaire en compressions courtes. Pour ce faire, nous modélisons des groupes de phrases semblables par des graphes de mots. Ensuite, nous appliquons un modèle de programmation linéaire en nombres entiers qui guide la compression de ces groupes à partir d'une liste de mots-clés ; nous cherchons ainsi un chemin dans le graphe de mots qui a une bonne cohésion et qui contient le maximum de mots-clés. Notre approche surpasse les systèmes de base en générant des compressions plus informatives et plus correctes pour les langues française, portugaise et espagnole. Enfin, nous combinons les méthodes précédentes pour construire un système de résumé de texte cross-lingue. Notre système génère des résumés cross-lingue de texte en analysant l'information à la fois dans les langues source et cible, afin d’identifier les phrases les plus pertinentes. Inspirés par les méthodes de résumé de texte par compression en analyse monolingue, nous adaptons notre méthode de compression multiphrase pour ce problème afin de ne conserver que l'information principale. Notre système s'avère être performant pour compresser l'information redondante et pour préserver l'information pertinente, en améliorant les scores d'informativité sans perdre la qualité grammaticale des résumés cross-lingues du français vers l'anglais. En analysant les résumés cross-lingues depuis l’anglais, le français, le portugais ou l’espagnol, vers l’anglais ou le français, notre système améliore les systèmes par extraction de l'état de l'art pour toutes ces langues. En outre, une expérience complémentaire menée sur des transcriptions automatiques de vidéo montre que notre approche permet là encore d'obtenir des scores ROUGE meilleurs et plus stables, même pour ces documents qui présentent des erreurs grammaticales et des informations inexactes ou manquantes<br>The popularization of social networks and digital documents increased quickly the informationavailable on the Internet. However, this huge amount of data cannot be analyzedmanually. Natural Language Processing (NLP) analyzes the interactions betweencomputers and human languages in order to process and to analyze natural languagedata. NLP techniques incorporate a variety of methods, including linguistics, semanticsand statistics to extract entities, relationships and understand a document. Amongseveral NLP applications, we are interested, in this thesis, in the cross-language textsummarization which produces a summary in a language different from the languageof the source documents. We also analyzed other NLP tasks (word encoding representation,semantic similarity, sentence and multi-sentence compression) to generate morestable and informative cross-lingual summaries.Most of NLP applications (including all types of text summarization) use a kind ofsimilarity measure to analyze and to compare the meaning of words, chunks, sentencesand texts in their approaches. A way to analyze this similarity is to generate a representationfor these sentences that contains the meaning of them. The meaning of sentencesis defined by several elements, such as the context of words and expressions, the orderof words and the previous information. Simple metrics, such as cosine metric andEuclidean distance, provide a measure of similarity between two sentences; however,they do not analyze the order of words or multi-words. Analyzing these problems,we propose a neural network model that combines recurrent and convolutional neuralnetworks to estimate the semantic similarity of a pair of sentences (or texts) based onthe local and general contexts of words. Our model predicted better similarity scoresthan baselines by analyzing better the local and the general meanings of words andmulti-word expressions.In order to remove redundancies and non-relevant information of similar sentences,we propose a multi-sentence compression method that compresses similar sentencesby fusing them in correct and short compressions that contain the main information ofthese similar sentences. We model clusters of similar sentences as word graphs. Then,we apply an integer linear programming model that guides the compression of theseclusters based on a list of keywords. We look for a path in the word graph that has goodcohesion and contains the maximum of keywords. Our approach outperformed baselinesby generating more informative and correct compressions for French, Portugueseand Spanish languages. Finally, we combine these previous methods to build a cross-language text summarizationsystem. Our system is an {English, French, Portuguese, Spanish}-to-{English,French} cross-language text summarization framework that analyzes the informationin both languages to identify the most relevant sentences. Inspired by the compressivetext summarization methods in monolingual analysis, we adapt our multi-sentencecompression method for this problem to just keep the main information. Our systemproves to be a good alternative to compress redundant information and to preserve relevantinformation. Our system improves informativeness scores without losing grammaticalquality for French-to-English cross-lingual summaries. Analyzing {English,French, Portuguese, Spanish}-to-{English, French} cross-lingual summaries, our systemsignificantly outperforms extractive baselines in the state of the art for all these languages.In addition, we analyze the cross-language text summarization of transcriptdocuments. Our approach achieved better and more stable scores even for these documentsthat have grammatical errors and missing information
APA, Harvard, Vancouver, ISO, and other styles
6

Yamangil, Elif. "Rich Linguistic Structure from Large-Scale Web Data." Thesis, Harvard University, 2013. http://dissertations.umi.com/gsas.harvard:11162.

Full text
Abstract:
The past two decades have shown an unexpected effectiveness of Web-scale data in natural language processing. Even the simplest models, when paired with unprecedented amounts of unstructured and unlabeled Web data, have been shown to outperform sophisticated ones. It has been argued that the effectiveness of Web-scale data has undermined the necessity of sophisticated modeling or laborious data set curation. In this thesis, we argue for and illustrate an alternative view, that Web-scale data not only serves to improve the performance of simple models, but also can allow the use of qualitatively more sophisticated models that would not be deployable otherwise, leading to even further performance gains.<br>Engineering and Applied Sciences
APA, Harvard, Vancouver, ISO, and other styles
7

Molina, Villegas Alejandro. "Compression automatique de phrases : une étude vers la génération de résumés." Phd thesis, Université d'Avignon, 2013. http://tel.archives-ouvertes.fr/tel-00998924.

Full text
Abstract:
Cette étude présente une nouvelle approche pour la génération automatique de résumés, un des principaux défis du Traitement de la Langue Naturelle. Ce sujet, traité pendant un demi-siècle par la recherche, reste encore actuel car personne n'a encore réussi à créer automatiquement des résumés comparables, en qualité, avec ceux produits par des humains. C'est dans ce contexte que la recherche en résumé automatique s'est divisée en deux grandes catégories : le résumé par extraction et le résumé par abstraction. Dans le premier, les phrases sont triées de façon à ce que les meilleures conforment le résumé final. Or, les phrases sélectionnées pour le résumé portent souvent des informations secondaires, une analyse plus fine s'avère nécessaire.Nous proposons une méthode de compression automatique de phrases basée sur l'élimination des fragments à l'intérieur de celles-ci. À partir d'un corpus annoté, nous avons créé un modèle linéaire pour prédire la suppression de ces fragments en fonction de caractéristiques simples. Notre méthode prend en compte trois principes : celui de la pertinence du contenu, l'informativité ; celui de la qualité du contenu, la grammaticalité, et la longueur, le taux de compression. Pour mesurer l'informativité des fragments,nous utilisons une technique inspirée de la physique statistique : l'énergie textuelle.Quant à la grammaticalité, nous proposons d'utiliser des modèles de langage probabilistes.La méthode proposée est capable de générer des résumés corrects en espagnol.Les résultats de cette étude soulèvent divers aspects intéressants vis-à- vis du résumé de textes par compression de phrases. On a observé qu'en général il y a un haut degré de subjectivité de la tâche. Il n'y a pas de compression optimale unique mais plusieurs compressions correctes possibles. Nous considérons donc que les résultats de cette étude ouvrent la discussion par rapport à la subjectivité de l'informativité et son influence pour le résumé automatique.
APA, Harvard, Vancouver, ISO, and other styles
8

Shang, Guokan. "Spoken Language Understanding for Abstractive Meeting Summarization Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization. Energy-based Self-attentive Learning of Abstractive Communities for Spoken Language Understanding Speaker-change Aware CRF for Dialogue Act Classification." Thesis, Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAX011.

Full text
Abstract:
Grâce aux progrès impressionnants qui ont été réalisés dans la transcription du langage parlé, il est de plus en plus possible d'exploiter les données transcrites pour des tâches qui requièrent la compréhension de ce que l'on dit dans une conversation. Le travail présenté dans cette thèse, réalisé dans le cadre d'un projet consacré au développement d'un assistant de réunion, contribue aux efforts en cours pour apprendre aux machines à comprendre les dialogues des réunions multipartites. Nous nous sommes concentrés sur le défi de générer automatiquement les résumés abstractifs de réunion.Nous présentons tout d'abord nos résultats sur le Résumé Abstractif de Réunion (RAR), qui consiste à prendre une transcription de réunion comme entrée et à produire un résumé abstractif comme sortie. Nous introduisons une approche entièrement non-supervisée pour cette tâche, basée sur la compression multi-phrases et la maximisation sous-modulaire budgétisée. Nous tirons également parti des progrès récents en vecteurs de mots et dégénérescence de graphes appliqués au TAL, afin de prendre en compte les connaissances sémantiques extérieures et de concevoir de nouvelles mesures de diversité et d'informativité.Ensuite, nous discutons de notre travail sur la Classification en Actes de Dialogue (CAD), dont le but est d'attribuer à chaque énoncé d'un discours une étiquette qui représente son intention communicative. La CAD produit des annotations qui sont utiles pour une grande variété de tâches, y compris le RAR. Nous proposons une couche neuronale modifiée de Champ Aléatoire Conditionnel (CAC) qui prend en compte non seulement la séquence des énoncés dans un discours, mais aussi les informations sur les locuteurs et en particulier, s'il y a eu un changement de locuteur d'un énoncé à l'autre.La troisième partie de la thèse porte sur la Détection de Communauté Abstractive (DCA), une sous-tâche du RAR, dans laquelle les énoncés d'une conversation sont regroupés selon qu'ils peuvent être résumés conjointement par une phrase abstractive commune. Nous proposons une nouvelle approche de la DCA dans laquelle nous introduisons d'abord un encodeur neuronal contextuel d'énoncé qui comporte trois types de mécanismes d'auto-attention, puis nous l'entraînons en utilisant les méta-architectures siamoise et triplette basées sur l'énergie. Nous proposons en outre une méthode d'échantillonnage générale qui permet à l'architecture triplette de capturer des motifs subtils (p. ex., des groupes qui se chevauchent et s'emboîtent)<br>With the impressive progress that has been made in transcribing spoken language, it is becoming increasingly possible to exploit transcribed data for tasks that require comprehension of what is said in a conversation. The work in this dissertation, carried out in the context of a project devoted to the development of a meeting assistant, contributes to ongoing efforts to teach machines to understand multi-party meeting speech. We have focused on the challenge of automatically generating abstractive meeting summaries.We first present our results on Abstractive Meeting Summarization (AMS), which aims to take a meeting transcription as input and produce an abstractive summary as output. We introduce a fully unsupervised framework for this task based on multi-sentence compression and budgeted submodular maximization. We also leverage recent advances in word embeddings and graph degeneracy applied to NLP, to take exterior semantic knowledge into account and to design custom diversity and informativeness measures.Next, we discuss our work on Dialogue Act Classification (DAC), whose goal is to assign each utterance in a discourse a label that represents its communicative intention. DAC yields annotations that are useful for a wide variety of tasks, including AMS. We propose a modified neural Conditional Random Field (CRF) layer that takes into account not only the sequence of utterances in a discourse, but also speaker information and in particular, whether there has been a change of speaker from one utterance to the next.The third part of the dissertation focuses on Abstractive Community Detection (ACD), a sub-task of AMS, in which utterances in a conversation are grouped according to whether they can be jointly summarized by a common abstractive sentence. We provide a novel approach to ACD in which we first introduce a neural contextual utterance encoder featuring three types of self-attention mechanisms and then train it using the siamese and triplet energy-based meta-architectures. We further propose a general sampling scheme that enables the triplet architecture to capture subtle patterns (e.g., overlapping and nested clusters)
APA, Harvard, Vancouver, ISO, and other styles
9

Zajic, David Michael. "Multiple Alternative Sentence Compressions as a tool for automatic summarization tasks." College Park, Md. : University of Maryland, 2007. http://hdl.handle.net/1903/6729.

Full text
Abstract:
Thesis (Ph. D.) -- University of Maryland, College Park, 2007<br>Thesis research directed by: Computer Science. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
APA, Harvard, Vancouver, ISO, and other styles
10

Perera, Paththamestrige. "Syntactic Sentence Compression for Text Summarization." Thesis, 2013. http://spectrum.library.concordia.ca/977725/1/Paththamestrige_MSc_F2013.pdf.

Full text
Abstract:
Abstract Automatic text summarization is a dynamic area in Natural Language Processing that has gained much attention in the past few decades. As a vast amount of data is accumulating and becoming available online, providing automatic summaries of specific subjects/topics has become an important user requirement. To encourage the growth of this research area, several shared tasks are held annually and different types of benchmarks are made available. Early work on automatic text summarization focused on improving the relevance of the summary content but now the trend is more towards generating more abstractive and coherent summaries. As a result of this, sentence simplification has become a prominent requirement in automatic summarization. This thesis presents our work on sentence compression using syntactic pruning methods in order to improve automatic text summarization. Sentence compression has several applications in Natural Language Processing such as text simplification, topic and subtitle generation, removal of redundant information and text summarization. Effective sentence compression techniques can contribute to text summarization by simplifying texts, avoiding redundant and irrelevant information and allowing more space for useful information. In our work, we have focused on pruning individual sentences, using their phrase structure grammar representations. We have implemented several types of pruning techniques and the results were evaluated in the context of automatic summarization, using standard evaluation metrics. In addition, we have performed a series of human evaluations and a comparison with other sentence compression techniques used in automatic summarization. Our results show that our syntactic pruning techniques achieve compression rates that are similar to previous work and also with what humans achieve. However, the automatic evaluation using ROUGE shows that any type of sentence compression causes a decrease in content compared to the original summary and extra content addition does not show a significant improvement in ROUGE. The human evaluation shows that our syntactic pruning techniques remove syntactic structures that are similar to what humans remove and inter-annotator content evaluation using ROUGE shows that our techniques perform well compared to other baseline techniques. However, when we evaluate our techniques with a grammar structure based F-measure, the results show that our pruning techniques perform better and seem to approximate human techniques better than baseline techniques.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Sentence compression"

1

Molina, Alejandro, Juan-Manuel Torres-Moreno, Eric SanJuan, Iria da Cunha, and Gerardo Eugenio Sierra Martínez. "Discursive Sentence Compression." In Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-37256-8_33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Nóbrega, Fernando A. A., Alipio M. Jorge, Pavel Brazdil, and Thiago A. S. Pardo. "Sentence Compression for Portuguese." In Lecture Notes in Computer Science. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-41505-1_26.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Sahoo, Deepak, and Rakesh Chandra Balabantaray. "Single-Sentence Compression Using SVM." In Soft Computing in Data Analytics. Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-0514-6_48.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Molina, Alejandro, Juan-Manuel Torres-Moreno, Eric SanJuan, Iria da Cunha, Gerardo Sierra, and Patricia Velázquez-Morales. "Discourse Segmentation for Sentence Compression." In Advances in Artificial Intelligence. Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-25324-9_27.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Mehta, Parth, and Prasenjit Majumder. "Neural Model for Sentence Compression." In From Extractive to Abstractive Summarization: A Journey. Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-8934-4_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Liangguo, Jing Jiang, and Lejian Liao. "Sentence Compression with Reinforcement Learning." In Knowledge Science, Engineering and Management. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99365-2_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Nayeem, Mir Tafseer, Tanvir Ahmed Fuad, and Yllias Chali. "Neural Diverse Abstractive Sentence Compression Generation." In Lecture Notes in Computer Science. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-15719-7_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Niu, Yi-Shuai, Xi-Wei Hu, Yu You, Faouzi Mohamed Benammour, and Hu Zhang. "Sentence Compression via DC Programming Approach." In Advances in Intelligent Systems and Computing. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-21803-4_35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Li, Peng, and Yinglin Wang. "Sentence Compression with Natural Language Generation." In Advances in Intelligent and Soft Computing. Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-25661-5_46.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Zhang, Chunliang, Minghan Hu, Tong Xiao, Xue Jiang, Lixin Shi, and Jingbo Zhu. "Chinese Sentence Compression: Corpus and Evaluation." In Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-41491-6_24.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Sentence compression"

1

Kuvshinova, T. "SENTENCE COMPRESSION FOR RUSSIAN: DATASET AND BASELINES." In International Conference on Computational Linguistics and Intellectual Technologies "Dialogue". Russian State University for the Humanities, 2020. http://dx.doi.org/10.28995/2075-7182-2020-19-517-528.

Full text
Abstract:
Sentence compression is the task of removing redundant information from a sentence while preserving its original meaning. In this paper, we approach deletion-based sentence compression for the Russian language. We use the data from the plagiarism detection corpus (ParaPlag) to create a corpus for sentence compression in Russian of almost 3,000 pairs of sentences. We align source sentences and their compressions using the NeedlemanWunsch algorithm and perform human-evaluation of the corpus by readability and informativeness. Then we use bidirectional LSTM to solve sentence-compression task for Russian, which is a typical baseline for the problem. We also experiment with RuBert and Bert-multilingual. For the latter, we use transfer-learning, firstly pretraining the model on English data, which improves performance. We conduct human evaluation by readability and informativeness and do error analysis for the models. We are able to achieve f-measure of 74.8%, readability of 3.88 and informativeness of 3.47 (out of 5) on test data. We also implement post-hoc syntax-based evaluator, which can detect some of the wrong compressions, increasing overall quality of the system. We provide the data and baseline results for future studies.
APA, Harvard, Vancouver, ISO, and other styles
2

Clarke, James, and Mirella Lapata. "Models for sentence compression." In the 21st International Conference. Association for Computational Linguistics, 2006. http://dx.doi.org/10.3115/1220175.1220223.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Chen, Jinguang, Tingting He, Zhuoming Gui, and Fang Li. "Probabilistic unsupervised Chinese sentence compression." In 2009 IEEE International Conference on Granular Computing (GRC). IEEE, 2009. http://dx.doi.org/10.1109/grc.2009.5255158.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Pourgholamali, Fatemeh, and Mohsen Kahani. "Semantic role based sentence compression." In 2012 2nd International eConference on Computer and Knowledge Engineering (ICCKE). IEEE, 2012. http://dx.doi.org/10.1109/iccke.2012.6395380.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Filippova, Katja, and Michael Strube. "Dependency tree based sentence compression." In the Fifth International Natural Language Generation Conference. Association for Computational Linguistics, 2008. http://dx.doi.org/10.3115/1708322.1708329.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Cohn, Trevor, and Mirella Lapata. "Sentence compression beyond word deletion." In the 22nd International Conference. Association for Computational Linguistics, 2008. http://dx.doi.org/10.3115/1599081.1599099.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

"Automatic Summarization Based on Sentence Morpho-Syntactic Structure: Narrative Sentences Compression." In The 2nd International Workshop on Natural Language Understanding and Cognitive Science. SciTePress - Science and and Technology Publications, 2005. http://dx.doi.org/10.5220/0002570201610167.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Lin, Chin-Yew. "Improving summarization performance by sentence compression." In the sixth international workshop. Association for Computational Linguistics, 2003. http://dx.doi.org/10.3115/1118935.1118936.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Zhao, Yang, Xiaoyu Shen, Wei Bi, and Akiko Aizawa. "Unsupervised Rewriter for Multi-Sentence Compression." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2019. http://dx.doi.org/10.18653/v1/p19-1216.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Filippova, Katja, Enrique Alfonseca, Carlos A. Colmenares, Lukasz Kaiser, and Oriol Vinyals. "Sentence Compression by Deletion with LSTMs." In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2015. http://dx.doi.org/10.18653/v1/d15-1042.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography