To see the other types of publications on this topic, follow the link: News summarization.

Dissertations / Theses on the topic 'News summarization'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 39 dissertations / theses for your research on the topic 'News summarization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Lehto, Niko, and Mikael Sjödin. "Automatic text summarization of Swedish news articles." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159972.

Full text
Abstract:
With an increasing amount of textual information available there is also an increased need to make this information more accessible. Our paper describes a modified TextRank model and investigates the different methods available to use automatic text summarization as a means for summary creation of swedish news articles. To evaluate our model we focused on intrinsic evaluation methods, in part through content evaluation in the form of of measuring referential clarity and non-redundancy, and in part by text quality evaluation measures, in the form of keyword retention and ROUGE evaluation. The results acquired indicate that stemming and improved stop word capabilities can have a positive effect on the ROUGE scores. The addition of redundancy checks also seems to have a positive effect on avoiding repetition of information. Keyword retention decreased somewhat, however. Lastly all methods had some trouble with dangling anaphora, showing a need for further work within anaphora resolution.
APA, Harvard, Vancouver, ISO, and other styles
2

Grant, Harald. "Extractive Multi-document Summarization of News Articles." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-158275.

Full text
Abstract:
Publicly available data grows exponentially through web services and technological advancements. To comprehend large data-streams multi-document summarization (MDS) can be used. In this research, the area of multi-document summarization is investigated. Multiple systems for extractive multi-document summarization are implemented using modern techniques, in the form of the pre-trained BERT language model for word embeddings and sentence classification. This is combined with well proven techniques, in the form of the TextRank ranking algorithm, the Waterfall architecture and anti-redundancy filtering. The systems are evaluated on the DUC-2002, 2006 and 2007 datasets using the ROUGE metric. Where the results show that the BM25 sentence representation implemented in the TextRank model using the Waterfall architecture and an anti-redundancy technique outperforms the other implementations, providing competitive results with other state-of-the-art systems. A cohesive model is derived from the leading system and tried in a user study using a real-world application. The user study is conducted using a real-time news detection application with users from the news-domain. The study shows a clear favour for cohesive summaries in the case of extractive multi-document summarization. Where the cohesive summary is preferred in the majority of cases.
APA, Harvard, Vancouver, ISO, and other styles
3

Biniam, Thomas Indrias, and Adam Morén. "Extractive Text Summarization of Norwegian News Articles Using BERT." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176598.

Full text
Abstract:
Extractive text summarization has over the years been an important research area in Natural Language Processing. Numerous methods have been proposed for extracting information from text documents. Recent works have shown great success for English summarization tasks by fine-tuning the language model BERT using large summarization datasets. However, less research has been made for low-resource languages. This work contributes by investigating how BERT can be used for Norwegian text summarization. Two models are developed by applying a modified BERT architecture, called BERTSum, on pre-trained Norwegian and Multilingual BERT. The results are models able to predict key sentences from articles to generate bullet-point summaries. These models are evaluated with the automatic metric ROUGE and in this evaluation, the Multilingual BERT model outperforms the Norwegian model. The multilingual model is further evaluated in a human evaluation by journalists, revealing that the generated summaries are not entirely satisfactory in some aspects. With some improvements, the model shows to be a valuable tool for journalists to edit and rewrite generated summaries, saving time and workload.<br><p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p>
APA, Harvard, Vancouver, ISO, and other styles
4

Kantzola, Evangelia. "Extractive Text Summarization of Greek News Articles Based on Sentence-Clusters." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-420291.

Full text
Abstract:
This thesis introduces an extractive summarization system for Greek news articles based on sentence clustering. The main purpose of the paper is to evaluate the impact of three different types of text representation, Word2Vec embeddings, TF-IDF and LASER embeddings, on the summarization task. By taking these techniques into account, we build three different versions of the initial summarizer. Moreover, we create a new corpus of gold standard summaries to evaluate them against the system summaries. The new collection of reference summaries is merged with a part of the MultiLing Pilot 2011 in order to constitute our main dataset. We perform both automatic and human evaluation. Our automatic ROUGE results suggest that System A which employs Average Word2Vec vectors to create sentence embeddings, outperforms the other two systems by yielding higher ROUGE-L F-scores. Contrary to our initial hypotheses, System C using LASER embeddings fails to surpass even the Word2Vec embeddings method, showing sometimes a weak sentence representation. With regard to the scores obtained by the manual evaluation task, we observe that System A using Average Word2Vec vectors and System C with LASER embeddings tend to produce more coherent and adequate summaries than System B employing TF-IDF. Furthermore, the majority of system summaries are rated very high with respect to non-redundancy. Overall, System A utilizing Average Word2Vec embeddings performs quite successfully according to both evaluations.
APA, Harvard, Vancouver, ISO, and other styles
5

Hassel, Martin. "Resource Lean and Portable Automatic Text Summarization." Doctoral thesis, Stockholm : Numerisk analys och datalogi Numerical Analysis and Computer Science, Kungliga Tekniska högskolan, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4414.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kan'an, Tarek Ghaze. "Arabic News Text Classification and Summarization: A Case of the Electronic Library Institute SeerQ (ELISQ)." Diss., Virginia Tech, 2015. http://hdl.handle.net/10919/74272.

Full text
Abstract:
Arabic news articles in heterogeneous electronic collections are difficult for users to work with. Two problems are: that they are not categorized in a way that would aid browsing, and that there are no summaries or detailed metadata records that could be easier to work with than full articles. To address the first problem, schema mapping techniques were adapted to construct a simple taxonomy for Arabic news stories that is compatible with the subject codes of the International Press Telecommunications Council. So that each article would be labeled with the proper taxonomy category, automatic classification methods were researched, to identify the most appropriate. Experiments showed that the best features to use in classification resulted from a new tailored stemming approach (i.e., a new Arabic light stemmer called P-Stemmer). When coupled with binary classification using SVM, the newly developed approach proved to be superior to state-of-the-art techniques. To address the second problem, i.e., summarization, preliminary work was done with English corpora. This was in the context of a new Problem Based Learning (PBL) course wherein students produced template summaries of big text collections. The techniques used in the course were extended to work with Arabic news. Due to the lack of high quality tools for Named Entity Recognition (NER) and topic identification for Arabic, two new tools were constructed: RenA for Arabic NER, and ALDA for Arabic topic extraction tool (using the Latent Dirichlet Algorithm). Controlled experiments with each of RenA and ALDA, involving Arabic speakers and a randomly selected corpus of 1000 Qatari news articles, showed the tools produced very good results (i.e., names, organizations, locations, and topics). Then the categorization, NER, topic identification, and additional information extraction techniques were combined to produce approximately 120,000 summaries for Qatari news articles, which are searchable, along with the articles, using LucidWorks Fusion, which builds upon Solr software. Evaluation of the summaries showed high ratings based on the 1000-article test corpus. Contributions of this research with Arabic news articles thus include a new: test corpus, taxonomy, light stemmer, classification approach, NER tool, topic identification tool, and template-based summarizer – all shown through experimentation to be highly effective.<br>Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
7

Keneshloo, Yaser. "Addressing Challenges of Modern News Agencies via Predictive Modeling, Deep Learning, and Transfer Learning." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/91910.

Full text
Abstract:
Today's news agencies are moving from traditional journalism, where publishing just a few news articles per day was sufficient, to modern content generation mechanisms, which create more than thousands of news pieces every day. With the growth of these modern news agencies comes the arduous task of properly handling this massive amount of data that is generated for each news article. Therefore, news agencies are constantly seeking solutions to facilitate and automate some of the tasks that have been previously done by humans. In this dissertation, we focus on some of these problems and provide solutions for two broad problems which help a news agency to not only have a wider view of the behaviour of readers around the article but also to provide an automated tools to ease the job of editors in summarizing news articles. These two disjoint problems are aiming at improving the users' reading experience by helping the content generator to monitor and focus on poorly performing content while allow them to promote the good-performing ones. We first focus on the task of popularity prediction of news articles via a combination of regression, classification, and clustering models. We next focus on the problem of generating automated text summaries for a long news article using deep learning models. The first problem aims at helping the content developer in understanding of how a news article is performing over the long run while the second problem provides automated tools for the content developers to generate summaries for each news article.<br>Doctor of Philosophy<br>Nowadays, each person is exposed to an immense amount of information from social media, blog posts, and online news portals. Among these sources, news agencies are one of the main content providers for each person around the world. Contemporary news agencies are moving from traditional journalism to modern techniques from different angles. This is achieved either by building smart tools to track the behaviour of readers’ reaction around a specific news article or providing automated tools to facilitate the editor’s job in providing higher quality content to readers. These systems should not only be able to scale well with the growth of readers but also they have to be able to process ad-hoc requests, precisely since most of the policies and decisions in these agencies are taken around the result of these analytical tools. As part of this new movement towards adapting new technologies for smart journalism, we have worked on various problems with The Washington Post news agency on building tools for predicting the popularity of a news article and automated text summarization model. We develop a model that monitors each news article after its publication and provide prediction over the number of views that this article will receive within the next 24 hours. This model will help the content creator to not only promote potential viral article in the main page of the web portal or social media, but also provide intuition for editors on potential poorly performing articles so that they can edit the content of those articles for better exposure. On the other hand, current news agencies are generating more than a thousands news articles per day and generating three to four summary sentences for each of these news pieces not only become infeasible in the near future but also very expensive and time-consuming. Therefore, we also develop a separate model for automated text summarization which generates summary sentences for a news article. Our model will generate summaries by selecting the most salient sentence in the news article and paraphrase them to shorter sentences that could represent as a summary sentence for the entire document.
APA, Harvard, Vancouver, ISO, and other styles
8

Monsen, Julius. "Building high-quality datasets for abstractive text summarization : A filtering‐based method applied on Swedish news articles." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176352.

Full text
Abstract:
With an increasing amount of information on the internet, automatic text summarization could potentially make content more readily available for a larger variety of people. Training and evaluating text summarization models require datasets of sufficient size and quality. Today, most such datasets are in English, and for minor languages such as Swedish, it is not easy to obtain corresponding datasets with handwritten summaries. This thesis proposes methods for compiling high-quality datasets suitable for abstractive summarization from a large amount of noisy data through characterization and filtering. The data used consists of Swedish news articles and their preambles which are here used as summaries. Different filtering techniques are applied, yielding five different datasets. Furthermore, summarization models are implemented by warm-starting an encoder-decoder model with BERT checkpoints and fine-tuning it on the different datasets. The fine-tuned models are evaluated with ROUGE metrics and BERTScore. All models achieve significantly better results when evaluated on filtered test data than when evaluated on unfiltered test data. Moreover, models trained on the most filtered dataset with the smallest size achieves the best results on the filtered test data. The trade-off between dataset size and quality and other methodological implications of the data characterization, the filtering and the model implementation are discussed, leading to suggestions for future research.
APA, Harvard, Vancouver, ISO, and other styles
9

Duan, Yijun. "History-related Knowledge Extraction from Temporal Text Collections." Kyoto University, 2020. http://hdl.handle.net/2433/253410.

Full text
Abstract:
Kyoto University (京都大学)<br>0048<br>新制・課程博士<br>博士(情報学)<br>甲第22574号<br>情博第711号<br>新制||情||122(附属図書館)<br>京都大学大学院情報学研究科社会情報学専攻<br>(主査)教授 吉川 正俊, 教授 鹿島 久嗣, 教授 田島 敬史, 特定准教授 JATOWT Adam Wladyslaw<br>学位規則第4条第1項該当
APA, Harvard, Vancouver, ISO, and other styles
10

Cahuina, Edward Jorge Yuri Cayllahua. "A new method for static video summarization using visual words and video temporal segmentation." reponame:Repositório Institucional da UFOP, 2013. http://www.repositorio.ufop.br/handle/123456789/4216.

Full text
Abstract:
Programa de Pós-Graduação em Ciência da Computação. Departamento de Ciência da Computação, Instituto de Ciências Exatas e Biológicas, Universidade Federal de Ouro Preto.<br>Submitted by Oliveira Flávia (flavia@sisbin.ufop.br) on 2015-01-06T17:24:34Z No. of bitstreams: 2 license_rdf: 22190 bytes, checksum: 19e8a2b57ef43c09f4d7071d2153c97d (MD5) DISSERTAÇÃO_NewMethodStatic.pdf: 22958602 bytes, checksum: 14bbececbfb78222b4cef2be60a1d93d (MD5)<br>Approved for entry into archive by Gracilene Carvalho (gracilene@sisbin.ufop.br) on 2015-01-15T17:23:20Z (GMT) No. of bitstreams: 2 license_rdf: 22190 bytes, checksum: 19e8a2b57ef43c09f4d7071d2153c97d (MD5) DISSERTAÇÃO_NewMethodStatic.pdf: 22958602 bytes, checksum: 14bbececbfb78222b4cef2be60a1d93d (MD5)<br>Made available in DSpace on 2015-01-15T17:23:20Z (GMT). No. of bitstreams: 2 license_rdf: 22190 bytes, checksum: 19e8a2b57ef43c09f4d7071d2153c97d (MD5) DISSERTAÇÃO_NewMethodStatic.pdf: 22958602 bytes, checksum: 14bbececbfb78222b4cef2be60a1d93d (MD5) Previous issue date: 2013<br>Durante os últimos anos, uma demanda continua de informações de vídeo digital tem ocorrido. A criação de vídeo digital tem provocado um crescimento exponencial de conteúdo de vídeo digital. Para aumentar a usabilidade de grande volume de vídeos, muita pesquisa tem sido feita. A Sumarização Automática de Vídeos, em particular, tem sido proposta para explorar rapidamente grandes coleções de vídeo. Os resumos de vídeos têm sido utilizados de forma eficiente para indexar e conteúdos de vídeo de acesso. Para resumir qualquer tipo de vídeo, os pesquisadores têm usado as características visuais contidas nos quadros do vídeo. A fim de extrair essas características, diferentes técnicas têm utilizado descritores locais ou globais. No entanto, nenhuma avaliação extensa tem sido feita sobre a utilidade de ambos os tipos de descritores na sumarização automática de vídeos. Neste trabalho, realizamos uma ampla avaliação, a fim de alcançar uma posição mais forte sobre o desempenho de descritores locais na sumarização automática de vídeos. De acordo com nossos experimentos, nosso modelo proposto utilizando descritores locais e segmentação temporal de vídeos elabora resumos melhores do que os outros modelos que não. Nós também reconhecemos a importância marginal de informação de cor usada pelos descritores locais para produzir resumos de vídeo. Uma contribuição importante deste trabalho é propor um modelo simples, para sumarização de vídeo que pode produzir resumos de vídeo significativos e informativos. ______________________________________________________________________________________________<br>ABSTRACT: During the last years, a continuous demand and creation of digital video information have occurred. The creation of digital video has caused an exponential growth of digital video content. To increase the usability of such large volume of videos, a lot of research has been made. Video summarization, in particular, has been proposed to rapidly browse large video collections. It has also been used to efficiently index and access video content. To summarize any type of video, researchers have relied on visual features contained in frames. In order to extract these features, different techniques have used local or global descriptors. Nonetheless, no extensive evaluation has been made about the usefulness of both types of descriptors in video summarization. One important contribution of this dissertation is to propose a method for semantic video summarization that can produce meaningful and informative video summaries. In this dissertation, we perform a wide evaluation using over 100 videos; in order to achieve a stronger position about the performance of local descriptors in semantic video summarization. According to our experiments, our proposed method using local descriptors and temporal video segmentation produce better summaries than other methods that do not. We also acknowledge a marginal importance of color information when using local descriptors to produce video summaries.
APA, Harvard, Vancouver, ISO, and other styles
11

Marinone, Emilio. "Evaluation of New Features for Extractive Summarization of Meeting Transcripts : Improvement of meeting summarization based on functional segmentation, introducing topic model, named entities and domain specific frequency measure." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-249560.

Full text
Abstract:
Automatic summarization of meeting transcripts has been widely stud­ied in last two decades, achieving continuous improvements in terms of the standard summarization metric (ROUGE). A user study has shown that people noticeably prefer abstractive summarization rather than the extractive approach. However, a fluent and informative ab­stract depends heavily on the performance of the Information Extrac­tion method(s) applied. In this work, basic concepts useful for understanding meeting sum­marization methods like Parts-of-Speech (POS), Named Entity Recog­nition (NER), frequency and similarity measure and topic models are introduced together with a broad literature analysis. The proposed method takes inspiration from the current unsupervised extractive state of the art and introduces new features that improve the baseline. It is based on functional segmentation, meaning that it first aims to divide the preprocessed source transcript into monologues and dialogues. Then, two different approaches are used to extract the most impor­tant sentences from each segment, whose concatenation together with redundancy reduction creates the final summary. Results show that a topic model trained on an extended corpus, some variations in the proposed parameters and the consideration of word tags improve the performance in terms of ROUGE Precision, Re­call and F-measure. It outperforms the currently best performing un­supervised extractive summarization method in terms of ROUGE-1 Precision and F-measure. A subjective evaluation of the generated summaries demonstrates that the current unsupervised framework is not yet accurate enough for commercial use, but the new introduced features can help super­vised methods to achieve acceptable performance. A much larger, non-artificially constructed meeting dataset with reference summaries is also needed for training supervised methods as well as a more accu­rate algorithm evaluation. The source code is available on GitHub: https://github.com/marinone94/ThesisMeetingSummarization<br>Automatgenererade textsammanfattningar av mötestranskript har varit ett allmänt studerat område de senaste två decennierna där resultatet varit ständiga förbättringar mätt mot standardsammanfattningsvärdet (ROUGE). En studie visar att människor märkbart föredrar abstraherade sammanfattningar gentemot omfattande sammanfattningar. En informativ och flytande textsammanfattning förlitar sig däremot mycket på informationsextraheringsmetoden som används. I det har arbetet presenteras grundläggande koncept som är användbara för att förstå textsammanfattningar så som: Parts-of-Speech (POS), Named Entity Recognition (NER), frekvens och likhetsvärden, och ämnesmodeller. Även en bred litterär analys ingår i arbetet. Den föreslagna metoden tar inspiration från de nuvarande främsta omfattande metoderna och introducerar nya egenskaper som förbättrar referensmodellen. Det är helt oövervakat och baseras på funktionell segmentering vilket betyder att den i först fallet försöker dela upp den förbehandlade källtexten i monologer och dialoger. Därefter används två metoder for att extrahera de mest betydelsefulla meningarna ur varje segment vilkas sammanbindning, tillsammans med redundansminskning, bildar den slutliga textsammanfattningen. Resultaten visar att en ämnesmodell, tränad på ett omfattande korpus med viss variation i de föreslagna parametrarna och med ordmärkning i åtanke, förbättrar prestandan mot ROUGE, precision, Recall och F-matning. Den överträffar den nuvarande bästa Rouge-1 precision och F-matning. En subjektiv utvärdering av de genererade textsammanfattningarna visar att det nuvarande, oövervakade ramverket inte är exakt nog for kommersiellt bruk än men att de nyintroducerade egenskaperna kan hjälpa oövervakade metoder uppnå acceptabla resultat. En mycket större, icke artificiellt skapad, datamängd bestående utav textsammanfattningar av möten krävs för att träna de övervakade, metoderna så väl som en mer noggrann utvärdering av de utvalda algoritmerna. Nya och existerande sammanfattningsmetoder kan appliceras på meningar extraherade ur den föreslagna metoden.
APA, Harvard, Vancouver, ISO, and other styles
12

MELLO, Rafael Ferreira Leite de. "A solution to extractive summarization based on document type and a new measure for sentence similarity." UNIVERSIDADE FEDERAL DE PERNAMBUCO, 2015. https://repositorio.ufpe.br/handle/123456789/15257.

Full text
Abstract:
Submitted by Isaac Francisco de Souza Dias (isaac.souzadias@ufpe.br) on 2016-02-19T18:25:04Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) TESE Rafael Ferreira Leite de Mello.pdf: 1860839 bytes, checksum: 4d54a6ef5e3c40f8bce57e3cc957a8f4 (MD5)<br>Made available in DSpace on 2016-02-19T18:25:04Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) TESE Rafael Ferreira Leite de Mello.pdf: 1860839 bytes, checksum: 4d54a6ef5e3c40f8bce57e3cc957a8f4 (MD5) Previous issue date: 2015-03-20<br>The Internet is a enormous and fast growing digital repository encompassing billions of documents in a diversity of subjects, quality, reliability, etc. It is increasingly difficult to scavenge useful information from it. Thus, it is necessary to provide automatically techniques that allowing users to save time and resources. Automatic text summarization techniques may offer a way out to this problem. Text summarization (TS) aims at automatically compress one or more documents to present their main ideas in less space. TS platforms receive one or more documents as input to generate a summary. In recent years, a variety of text summarization methods has been proposed. However, due to the different document types (such as news, blogs, and scientific articles) it became difficult to create a general TS application to create expressive summaries for each type. Another related relevant problem is measuring the degree of similarity between sentences, which is used in applications, such as: text summarization, information retrieval, image retrieval, text categorization, and machine translation. Recent works report several efforts to evaluate sentence similarity by representing sentences using vectors of bag of words or a tree of the syntactic information among words. However, most of these approaches do not take in consideration the sentence meaning and the words order. This thesis proposes: (i) a new text summarization solution which identifies the document type before perform the summarization, (ii) the creation of a new sentence similarity measure based on lexical, syntactic and semantic evaluation to deal with meaning and word order problems. The previous identification of the document types allows the summarization solution to select the methods that is more suitable to each type of text. This thesis also perform a detailed assessment with the most used text summarization methods to selects which create more informative summaries for news, blogs and scientific articles contexts.The sentence similarity measure proposed is completely unsupervised and reaches results similar to humans annotator using the dataset proposed by Li et al. The proposed measure was satisfactorily applied to evaluate the similarity between summaries and to eliminate redundancy in multi-document summarization.<br>Atualmente a quantidade de documentos de texto aumentou consideravelmente principalmente com o grande crescimento da internet. Existem milhares de artigos de notícias, livros eletrônicos, artigos científicos, blog, etc. Com isso é necessário aplicar técnicas automáticas para extrair informações dessa grande massa de dados. Sumarização de texto pode ser usada para lidar com esse problema. Sumarização de texto (ST) cria versões comprimidas de um ou mais documentos de texto. Em outras palavras, palataformas de ST recebem um ou mais documentos como entrada e gera um sumário deles. Nos últimos anos, uma grande quantidade de técnicas de sumarização foram propostas. Contudo, dado a grande quantidade de tipos de documentos (por exemplo, notícias, blogs e artigos científicos) é difícil encontrar uma técnica seja genérica suficiente para criar sumários para todos os tipos de forma eficiente. Além disto, outro tópico bastante trabalhado na área de mineração de texto é a análise de similaridade entre sentenças. Essa similaridade pode ser usada em aplicações como: sumarização de texto, recuperação de infromação, recuperação de imagem, categorização de texto e tradução. Em geral, as técnicas propostas são baseados em vetores de palavras ou árvores sintáticas, com isso dois problemas não são abordados: o problema de significado e de ordem das palavras. Essa tese propõe: (i) Uma nova solução em sumarização de texto que identifica o tipo de documento antes de realizar a sumarização. (ii) A criação de uma nova medida de similaridade entre sentenças baseada nas análises léxica, sintática e semântica. A identificação de tipo de documento permite que a solução de sumarização selecione os melhores métodos para cada tipo de texto. Essa tese também realizar um estudo detalhado sobre os métodos de sumarização para selecinoar os que criam sumários mais informativos nos contextos de notícias blogs e artigos científicos. A medida de similaridade entre sentences é completamente não supervisionada e alcança resultados similarires dos anotadores humanos usando o dataset proposed por Li et al. A medida proposta também foi satisfatoriamente aplicada na avaliação de similaridade entre resumos e para eliminar redundância em sumarização multi-documento.
APA, Harvard, Vancouver, ISO, and other styles
13

Chiu, Chung-Ren, and 邱中人. "Chinese News Summarization." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/12254466664744727912.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Lee, Hsiang-Pin, and 李祥賓. "Text Summarization on News." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/54130654842944705212.

Full text
Abstract:
碩士<br>東吳大學<br>資訊科學系<br>89<br>The swift development of information technique and the Internet has resulted in a problem of information overflow. Hence it is imperative to find a way to help users browse through documents efficiently and effectively. Text summarization could be a remedy to this problem. Traditional text summarization is usually processed manually. However, it does cost lots of human resources and cannot satisfy the demand in real time. Therefore, it is necessary to automate the process. This paper presents three methods of text summarization on Reuters news corpus. First, we use the technique of Information Retrieval to collect the important vocabulary of the document (called Important Vocabulary Extract Policy). Second, we determine the significance of the sentence with its position in the document (called Optimal Position Policy). Last, we expand the vocabulary of the title (called Title Expand Policy). To express the concept of the document, we extract the important vocabulary from the document and analyze its structure to find which position the document subject occupies. Moreover, we believe that the title is rather significant in the document. We therefore expand the relative vocabulary of the title from the WordNet. We then use the expanded set of words to find the appropriate sentence for summarization. In experimentation, we design different experiments for three text summarization methods. The summary of text is then evaluated according to text categorization. Experimental results indicate that all of the methods used in this thesis can achieve acceptable performance. Finally, this thesis also proposes a method to combine two policies -- Optimal Position and Title Expand. Opposite to the criterion in 65.6% precision rate, the proposed method result a 71.9% precision rate, a 9.6% improvement in precision.
APA, Harvard, Vancouver, ISO, and other styles
15

Yang, Jeng-Yuan, and 楊政遠. "Statistical Chinese News Summarization." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/tu9p36.

Full text
Abstract:
碩士<br>國立臺北科技大學<br>資訊工程系研究所<br>98<br>With the growing number of news articles around the world every day, it would be helpful to users if the time to read news articles can be reduced. Typically, there are two general ways to summarize documents: multi-document summarization and single-document summarization. Multi-document news summarization is similar to ‘hot topics of the week’, which only lists the most important news reports; while single-document news summarization is more similar to a short abstract, which help readers quickly grasp the overall idea in articles. The focus of single-document news summarization is to remove as many unimportant words as possible and only preserve major keywords. In this paper, we mainly focus on single-document summarization for Chinese news articles with statistical methods. The proposed architecture of this paper is as follows. First, auxiliary vocabularies will be collected from news articles, which are included as the dictionary of our system. The original news articles will be kept along with the vocabularies. The vocabularies are stored in word bi-grams, as well as the document frequency and term frequency. Then, these are used to calculate the importance of sentences and select the most representative sentences as the summary. In our experiments, we only adopted news articles in the ‘science and technology’ category since more new terms can be easily obtained. The experimental result showed that news summaries generated from our system can be effectively clustered with the original news articles. These news summaries also showed a great reduction in the time needed to read news articles, which also save the total time to read all news articles. This shows that we have successfully achieved the major goal of our proposed system: to reduce the news reading time.
APA, Harvard, Vancouver, ISO, and other styles
16

Liu, Cheng-Chang, and 劉政璋. "Concept Cluster Based News Document Summarization." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/22477396909604899181.

Full text
Abstract:
碩士<br>國立交通大學<br>資訊科學系所<br>93<br>A multi-document summarization system can reduce the time for a user to read a large number of documents. A summarization system, in general, selects salient features from one (or many) document(s) to compose a summarization, in the hope that the generated summarization can help a user understand the meaning of the document(s). This thesis proposes a method to analyze the semantics of news documents. The method is divided into two phases. The first phase attempts to discover the subtle topics called concepts hidden in documents. Due to the phenomenon that similar nouns, verbs, and adjectives usually co-occur with the same representative term, we describe a concept by those terms around it, and use a semantic network to assist the description of a concept more accurately. The second phase distinguishes the concepts discovered in the first phase by their word senses. The K-means clustering algorithm is exploited to gather concepts with the same sense into the same cluster. Clustering can diminish the problem about word sense ambiguation and reduce concepts with similar sense. After the two above phase, we choose five features to weight sentences and order sentences according to their weights. The five features are lengths of clusters, location of a sentence, tf*idf, distance between a sentence and the center of the cluster to which the sentence belongs, and the similarity between a sentence and the cluster to which the sentence belongs. We use the news documents of Document Understanding Conferences 2003 (DUC2003) and its evaluation tool to evaluate the performance of our method.
APA, Harvard, Vancouver, ISO, and other styles
17

Liu, Shu Wei, and 劉書瑋. "Detection and Summarization System for News Topics." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/56164626413580247961.

Full text
Abstract:
碩士<br>長庚大學<br>資訊工程學系<br>99<br>People love to track the news because of curiosity. In the past, news always passed over time on the Newspapers and hard to track. Now, news is continuing shared and stored in a large amount due to the rapid growth of Internet. It provides an opportunity to browse a topic containing many news stories in a single screen. However, the topic may contain too much stories, the user may take more time to read. so we extract the importance sentence which contain in topic stories, and list the sentence by time feature. User can more convenient to understand what happen with this topic. In this paper, we implement a detection and summarization system for news topic with our new words measurement scheme and new summarization algorithm for Chinese News content. A new words measurement scheme which is named TF-Density.TF-Density is a new algorithm modified from famous algorithms TF-IWF and TF-IDF to provide more precise and efficient method to recognize the important words in the text, and new summarization algorithm which called Dynamic Centroid Summarization. We improve the algorithm more suitable in Chinese article behavior based on the traditional algorithm. Through our experiments, our implement detection and summarization system can provide more precise and convenient result for users to track desired news.
APA, Harvard, Vancouver, ISO, and other styles
18

Chen, Shin-Chia, and 陳信嘉. "Intelligent Location-based Mobile News Service System with Automatic News Summarization." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/25881681257588172806.

Full text
Abstract:
碩士<br>國立臺灣師範大學<br>工業教育學系<br>95<br>Due to fast development of wireless network and mobile computing technologies, mobile information services make more convenient in terms of getting useful information in our dalily life. Mobile information service will become more and more important in the future. “Localcation-Based Service (LSB) System” in mobile service can sense a user’s position and integrate location information to individual users, thus helping users obtain the useful information. However, developing a Location-Based Service System must be based on information service and deliver structure with additional two elements: one is information type provider in server-side and another is location awareness in client-side. This study mainly focused on developing a Location-Based Service System which can provide location-based news with news summarization service to individual users. Therefore, this study proposed a novel multi-document summarization method based on fuzzy theorm to provie summarized news to mobile devices. Moreover, to sense user location precisely, the study also proposed a practicable location awareness classifier by aware GPS (Golbal Positioning System) signals to identify user’s position. Finally, the study itegrated the proposed multi-document summarization method and user location identification scheme to construct “Intelligent Location-based Mobile News Service System with Automatic News Summarization” in order to provide users about location information services. According to experimental results in Mutli-Document Summarization and user location identification, news summarization with good quality is up to 86%, the accuray rate of user location awareness classifier is up to 90%. Therefore, these experimental results can prove that the proposed “Intelligent Location-based Mobile News Service System with Automatic News Summarization” can be successfully applied for location-based services in real-world application.
APA, Harvard, Vancouver, ISO, and other styles
19

Hu, Jia-Yu, and 胡家瑜. "Monitoring the Progressive news topic with Storyline-based Summarization." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/97686924752003486877.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Liang, Chia-Hao, and 梁家豪. "Topic Retrospection with Storyline-based Summarization on News Reports." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/00287816586744189530.

Full text
Abstract:
碩士<br>國立中山大學<br>資訊管理學系研究所<br>93<br>The electronics newspaper becomes a main source for online news readers. When facing the numerous stories, news readers need some supports in order to review a topic in short time. Due to previous researches in TDT (Topic Detection and Tracking) only considering how to identify events and present the results with news titles and keywords, a summarized text to present event evolution is necessary for general news readers to retrospect events under a news topic. This thesis proposes a topic retrospection process and implements the SToRe system that identifies various events under a new topic and constructs the relationship to compose a summary which gives readers the sketch of event evolution in a topic. It consists of three main functions: event identification, main storyline construction and storyline-based summarization. The constructed main storyline can remove the irrelevant events and present a main theme. The summarization extracts the representative sentences and takes the main theme as the template to compose summary. The summarization not only provides enough information to comprehend the development of a topic, but also can be an index to help readers to find more detailed information. A lab experiment is conducted to evaluate the SToRe system in the question-and-answer (Q&A) setting. From the experimental results, the SToRe system can help news readers more effectively and efficiently to capture the development of a topic.
APA, Harvard, Vancouver, ISO, and other styles
21

Fang, Tzu-Wei, and 方子維. "An Effective Summarization and Browsing Tool for News Videos." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/83400702737791907058.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Wu, Chao-Chung, and 吳潮崇. "An Automatic Summarization System for on-line News Articles." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/82720558630791982148.

Full text
Abstract:
碩士<br>國立臺北科技大學<br>電機工程系碩士班<br>90<br>News web sites conveniently allow readers to peruse news online. Owing to rapid updating, the amount of news articles is so massive that readers have to spend much time on searching for desired news. Accumulating those related information and summarizing news events will shorten readers’ access time. This thesis proposes a novel automatic summarization system to retrieve on-line news articles efficiently. This proposed system can collect news, calculate the weights of keywords, cluster news events, and then summarize them automatically. This system pays more attention to clustering similar news events according to the news characters and generates a summary for each cluster. In addition, this thesis proposes two summary generating approaches: the similarity value density approach and keyword value density approach. For experimentation, the summary generated from the proposed system is compared with the one generated by majority selection of readers. Besides, the evaluations for these two proposed summary generating approaches have also been prosecuted to verify the suitability of the proposed system. The proposed system can be adopted to collect, categorize, retrieve and cluster news events. In addition to reducing searching and retrieval time on the web sites, the proposed system can enable readers to quickly access news events via a news summary and, in doing so, effectively manage news retrieval system.
APA, Harvard, Vancouver, ISO, and other styles
23

Liu, Shih-Hung, and 劉士弘. "Improved Language Modeling Approaches for Mandarin Broadcast News Extractive Summarization." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/28709009924818514664.

Full text
Abstract:
博士<br>國立臺灣大學<br>電機工程學研究所<br>104<br>Extractive speech summarization aims to select an indicative set of sentences from a spoken document so as to succinctly cover the most important aspects of the document, which has garnered much research over the years. In this dissertation, we cast extractive speech summarization as an ad-hoc information retrieval (IR) problem and investigate various language modeling (LM) methods for important sentence selection. The main contributions of this dissertation are four-fold. First, we propose a novel clarity measure for use in important sentence selection, which can help quantify the thematic specificity of each individual sentence and is deemed to be a crucial indicator orthogonal to the relevance measure provided by the LM-based methods. Second, we explore a novel sentence modeling paradigm building on top of the notion of relevance, where the relationship between a candidate summary sentence and a spoken document to be summarized is unveiled through different granularities of context for relevance modeling. In addition, not only lexical but also topical cues inherent in the spoken document are exploited for sentence modeling. Third, we explore a novel approach that generates overlapped clusters to extract sentence relatedness information from the document to be summarized, which can be used not only to enhance the estimation of various sentence models but also to facilitate the sentence-level structural relationships for better summarization performance. Fourth, we also explore several effective formulations of proximity cues, and proposing a position-aware language modeling framework using various granularities of position-specific information for sentence modeling. Extensive experiments are conducted on Mandarin broadcast news summarization dataset with Mandarin large vocabulary continuous speech recognition (LVCSR), and the empirical results seem to demonstrate the performance merits of our methods when compared to several existing well-developed and/or state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
24

Gobaan, Raveendran. "Crawling, Collecting, and Condensing News Comments." Thesis, 2013. http://hdl.handle.net/10012/8050.

Full text
Abstract:
Traditionally, public opinion and policy is decided by issuing surveys and performing censuses designed to measure what the public thinks about a certain topic. Within the past five years social networks such as Facebook and Twitter have gained traction for collection of public opinion about current events. Academic research on Facebook data proves difficult since the platform is generally closed. Twitter on the other hand restricts the conversation of its users making it difficult to extract large scale concepts from the microblogging infrastructure. News comments provide a rich source of discourse from individuals who are passionate about an issue. Furthermore, due to the overhead of commenting, the population of commenters is necessarily biased towards individual who have either strong opinions of a topic or in depth knowledge of the given issue. Furthermore, their comments are often a collection of insight derived from reading multiple articles on any given topic. Unfortunately the commenting systems employed by news companies are not implemented by a single entity, and are often stored and generated using AJAX, which causes traditional crawlers to ignore them. To make matters worse they are often noisy; containing spam, poor grammar, and excessive typos. Furthermore, due to the anonymity of comment systems, conversations can often be derailed by malicious users or inherent biases in the commenters. In this thesis we discuss the design and creation of a crawler designed to extract comments from domains across the internet. For practical purposes we create a semiautomatic parser generator and describe how our system attempts to employ user feedback to predict which remote procedure calls are used to load comments. By reducing comment systems into remote procedure calls, we simplify the internet into a much simpler space, where we can focus on the data, almost independently from its presentation. Thus we are able to quickly create high fidelity parsers to extract comments from a web page. Once we have our system, we show the usefulness by attempting to extract meaningful opinions from the large collections we collect. Unfortunately doing so in real time is shown to foil traditional summarization systems, which are designed to handle dozens of well formed documents. In attempting to solve this problem we create a new algorithm, KLSum+, that outperforms all its competitors in efficiency while generally scoring well against the ROUGE SU4 metric. This algorithm factors in background models to boost accuracy, but performs over 50 times faster than alternatives. Furthermore, using the summaries we see that the data collected can provide useful insight into public opinion and even provide the key points of discourse.
APA, Harvard, Vancouver, ISO, and other styles
25

Tsai, Chun-Yu. "Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes." Thesis, 2017. https://doi.org/10.7916/D8FF44N7.

Full text
Abstract:
We demonstrate four novel multimodal methods for efficient video summarization and comprehensive cross-cultural news video understanding. First, For video quick browsing, we demonstrate a multimedia event recounting system. Based on nine people-oriented design principles, it summarizes YouTube-like videos into short visual segments (812sec) and textual words (less than 10 terms). In the 2013 Trecvid Multimedia Event Recounting competition, this system placed first in recognition time efficiency, while remaining above average in description accuracy. Secondly, we demonstrate the summarization of large amounts of online international news videos. In order to understand an international event such as Ebola virus, AirAsia Flight 8501 and Zika virus comprehensively, we present a novel and efficient constrained tensor factorization algorithm that first represents a video archive of multimedia news stories concerning a news event as a sparse tensor of order 4. The dimensions correspond to extracted visual memes, verbal tags, time periods, and cultures. The iterative algorithm approximately but accurately extracts coherent quad-clusters, each of which represents a significant summary of an important independent aspect of the news event. We give examples of quad-clusters extracted from tensors with at least 108 entries derived from international news coverage. We show the method is fast, can be tuned to give preferences to any subset of its four dimensions, and exceeds three existing methods in performance. Thirdly, noting that the co-occurrence of visual memes and tags in our summarization result is sparse, we show how to model cross-cultural visual meme influence based on normalized PageRank, which more accurately captures the rates at which visual memes are reposted in a specified time period in a specified culture. Lastly, we establish the correspondences of videos and text descriptions in different cultures by reliable visual cues, detect culture-specific tags for visual memes and then annotate videos in a cultural settings. Starting with any video with less text or no text in one culture (say, US), we select candidate annotations in the text of another culture (say, China) to annotate US video. Through analyzing the similarity of images annotated by those candidates, we can derive a set of proper tags from the viewpoints of another culture (China). We illustrate cultural-based annotation examples by segments of international news. We evaluate the generated tags by cross-cultural tag frequency, tag precision, and user studies.
APA, Harvard, Vancouver, ISO, and other styles
26

Lai, Jimmy, and 賴駿銘. "MPEG-4 News Video Summarization Based on Spatial and Motion Features." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/38167966422552451834.

Full text
Abstract:
碩士<br>國立中正大學<br>電機工程研究所<br>91<br>The multimedia information has being in widespread use for the growing development of software- and hardware-technology which bring the difficulty in the usage and digestion. To assimilate the great deal of information becomes the most important issue in the ‘surplus’ world. What concerned in the thesis is the MPEG-4 news video summarization. The key issues include the pre-process of MPEG-4 video: (1) the reliability and modification of motion vectors (MVs), (2) the estimation of MVs in I-frame, (3) the reconstruction of P-frame DCT coefficients. The need of pre-process results from the distinct properties of the lack of motion information in I-frame and the intra-coded DCT coefficients in P-frame. The summarization system can benefit greatly from the these procedures which make it possible to go through the entire process in the compressed domain. The presentation of ‘anchor video with news audio’ is main objective in the thesis. We first detect the anchor shots by analyzing the extracted color and motion features in the compressed domain. The following news content analysis involves : (1) flash detection, (2) zoom detection, (3) Time allocation. In the summarization algorithm, we classify all the news shots into ‘normal’ and ‘special’ events. Afterward, the ‘Lagrangian multiplier’ method is applied to adjust the video playback speed according to the frame activity and shot duration . The adjusted playback speed can maintain the equilibrium of the human’s perception and the consuming time for the goal of video summarization. Finally, the appropriate news video can be constructed through the combination of condensed news video and anchor shot sudio. The characteristics of the thesis can be listed as : 1. The establishment of video rectification in the compressed domain and the method to extract spatial and motion features. 2. The appropriate algorithm of anchor shot detection and news content analysis. 3. The news video summarization system based on the frame activity and the adjustment of playback speed (shot time allocation).
APA, Harvard, Vancouver, ISO, and other styles
27

Kau, Jia-liang, and 郭家良. "A Study and Implementation of News Event Clustering and Summarization Search Engine." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/06309080359305491455.

Full text
Abstract:
碩士<br>國立雲林科技大學<br>資訊管理系碩士班<br>92<br>This research applies Topic Detection and Tracking technique which is both a clustering and classification method to group news reports into different clusters based on news event. In addition to a traditional browsing way based on category classification, this study also provides a more intuitive way of reading news. It is expeted to provide, news readers a more complete picture about one specific news event due to this effort. Besides, in order to give news reader a complete report about one news event, this research will group new coming news report into its own news event. To reduce the probability of clustering error, this study also utilizes a data mining technique—association rule— to screen out unrelated news reports. Furthermore, this research provides a brief summany of each event by using multi-document summarization technique. The evaluation result shows that seventy percentage of users agree that news reports for the same news event are well-connected, and the title of each news event is helpful to understand the content. Moreover, about eighty percentage of users think that the summary of news event is useful for realizing the news event. Besides, seventy percentages of users favor the way of retrieval with association rule more than the way of searching by traditional keyword method.
APA, Harvard, Vancouver, ISO, and other styles
28

Hsu, Ming-Chung, and 徐銘忠. "A Study on Ontology-based Document Summarization System for Chinese Stock News." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/82265734387494044642.

Full text
Abstract:
碩士<br>東海大學<br>資訊工程與科學系<br>92<br>Under the rapid evolution of the Internet, people can conveniently gather the information needed by using browsers. This results information overloading and users do not know how to deal with such a massive data. So how to get correct information efficiently and effectively becomes an important issue. However, document summarization technologies are capable of providing concise and compact content by filtering redundant and less important information existing in the document with thm, people can catch the key meaning of a document in a very short period of time rather then spend a lot of time to read the full text. Thus they have attracted the researcher’s eyes, especially in the area of information retrieval. Conceptually, document summarization techniques can be classified into two classes:Extraction and Abstraction. In the past, most of the researches focus on only one of them. In this paper, we propose a combination of the two classes named Abstraction From Extraction(AFE) in a specific domain based on domain ontologies. In this combination, extraction is performaced first, by invoking statistical methods to rank each sentence in the document concerned. The sentences with the highest ranks are the feature sentences of the document. The structures of the most important feature sentences selected are then compared with sentence patterns previously prepared based on the characteristics of the domain concerned. Those matched the sentence patterns will be summarized providing users to decide whether they want to read the full text of the document or not. Users can then save their time to choose the correct information.
APA, Harvard, Vancouver, ISO, and other styles
29

Chen, Hua-Tsung, and 陳華總. "Object and Color Based Video Representation for Automatic Model-Free News Summarization." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/21309150489866087398.

Full text
Abstract:
碩士<br>國立交通大學<br>資訊工程系<br>91<br>News summarization provides a good starting point for users to efficiently navigate the news stories of interest in different levels of detail without the need to view the entire video through the structured table of video content that contains the leading shot of anchorperson and the subsequent shots of corresponding story. In this thesis, we propose a novel approach for automatically detecting anchorperson shots, which is the key component of news parsing and summarization. In addition, a news summarization system for browsing and retrieving news stories is also developed. An approach of hierarchical shot filtering is proposed in the extraction of anchorperson shots considering the characteristics of the spatio-temporal variance of both color and objects in consecutive frames. With the successful identifications of anchorperson shots, the table of video content can be well structured for summarizing news videos without any human intervention. The experimental results on extensive testing sequences show the high precision-recall rate and thus reveal the effectiveness and the feasibility of the proposed news summarization system.
APA, Harvard, Vancouver, ISO, and other styles
30

Chen, Yea-Juan, and 陳雅絹. "A Study on Ontology-based fuzzy Agent for Chinese e-News Summarization." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/11067688702199461022.

Full text
Abstract:
碩士<br>長榮大學<br>經營管理研究所<br>91<br>Abstract An Ontology-based Fuzzy Event Extraction (OFEE) agent for Chinese e-News summarization is proposed in this thesis. The OFEE agent contains Retrieval Agent (RA), Document Processing Agent (DPA) and Fuzzy Inference Agent (FIA) to perform the event extraction for Chinese e-News summarization. First, RA automatically retrieves Internet e-News periodically, stores them into the e-News repository, and sends them to DPA for document processing. Then, the DPA will utilize the Chinese Part-of-Speech (POS) tagger provided by CKIP to process the retrieved e-News and filter the Chinese term set by Chinese term filter. Next, the FIA and Event Ontology Filter (EOF) extract the e-News event ontology based on the Chinese term set and domain Ontology. Finally, the Summarization Agent (SA) will summarize the e-News by the extracted-event Ontology. By the simulation, the proposed method can summarize the Chinese weather e-News effectively.
APA, Harvard, Vancouver, ISO, and other styles
31

Lin, Shu-Ling, and 林淑鈴. "A STUDY OF INTEGRATING AUTOMATIC SUMMARIZATION INTO A RSS READER FOR CHINESE NEWS." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/67650031024224934096.

Full text
Abstract:
碩士<br>國立交通大學<br>資訊學院數位圖書資訊學程<br>102<br>In the modern digital era, due to the prevalence of smart phones, it has become a habit of most people to read news on their mobile phones. Despite the convenience of reading news on mobile phones, the small display is unable to show the full content of each news article. RSS Reader is a solution that allows people to subscribe and read news on mobile devices in the easiest way. Most RSS feeds contain the first few lines of each news article. However, when users subscribe to numerous news channels, news of hot topics may easily take up the entire page of their RSS readers. Therefore, how to filter news based on user preference is an important issue. This study proposed a novel automatic news summarization system for RSS readers. Using the two major Chinese RSS feeds channels in Taiwan as an example, this system retrieved full news articles, processed them using the CKIP Chinese word segmentation technology, and then clustered news based on the topic detection and tracking techniques of MEAD to filter out repetitive news articles. Finally, a multi-document summarization technique was applied to summarize news articles in each topic cluster for optimal viewing on mobile V devices. Finally, this study introduced a mobile RSS reader application that enables users to have a consistent viewing experience across all kinds of smart phones and Tablet PCs.
APA, Harvard, Vancouver, ISO, and other styles
32

Chen, Chingru, and 陳靜如. "Using the Mandarin Daily News for summarization instruction to improve third graders' reading comprehension." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/00185713285439673725.

Full text
Abstract:
碩士<br>國立中正大學<br>教育研究所<br>99<br>The purposes of this study were to explore the effects of summarization instruction on third graders’ reading comprehension ability and to examine the effects of the instruction on students with different comprehension abilities. This study adopted the quasi-experimental design. The subjects were two third-grade classes from an elementary school in Tainan County. The experimental group was the researcher’s class of 29 students, who received summarization instruction for 16 sessions in 8 weeks; the control group was another class of 30 students, who adopted the self-reading strategy. The reading materials were chosen from the Mandarin Daily News, with the content modified to meet the students’ needs. The results were analyzed with two-way ANCOVA. The findings of this study were as follows: 1. Summarization instruction enhanced third graders’ reading comprehension ability. 2. Students with low comprehension ability benefited more from summarization instruction than those with high comprehension ability. 3. In terms of posttest effect on reading comprehension, there was no interaction between instruction strategy and comprehension ability. 4. Summarization instruction had a significant effect on third-grade students’ literal comprehension and inferential comprehension. 5. The text structures instruction helped students’ reading comprehension and summary ability. 6. Using articles from the Mandarin Daily News for summarization instruction helped increase students’ reading interest and comprehension ability. Based on the above results, the study proposed some suggestions for teaching and future studies.
APA, Harvard, Vancouver, ISO, and other styles
33

Chen, Yu-Jen, and 陳俞任. "SocFeedViewer: A Novel Visualization Technique for Social News Feeds Summarization on Large-Scale Social Network Services." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/60165888016632806948.

Full text
Abstract:
碩士<br>國立臺灣大學<br>電機工程學研究所<br>99<br>Online social network services such as Facebook and Twitter have become increasingly popular. More and more users are accustomed to regularly reading the latest news feeds and interacting with friends on these social websites. However, when the numbers of friends and subscribed pages increase to a large extent, users will receive hundreds of messages in a day and will be overwhelmed by the information overload. To alleviate this problem, we propose a novel visualization technique for social news feeds summarization on large-scale social web services. The proposed system SocFeedViewer can produce an egocentric network graph based on the news feeds generated in an arbitrary period of time. This graph provides an overview of those who have generated news feeds during this time period. To enhance the reading experience, we incorporate community detection, connectivity analysis, and importance analysis into our system to make users capable of preferentially surfing news feeds that are more significant and interesting. We implement a real-world application and use the real social data of several volunteers to verify the usefulness of SocFeedViewer.
APA, Harvard, Vancouver, ISO, and other styles
34

Wei, Ling-Yu, and 魏玲玉. "Dynamic Clustering and Multi-Document Summarization Based on the Concept of Document Warehousing — Using Chinese News Articles as Examples." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/24394459454373718346.

Full text
Abstract:
碩士<br>國立高雄第一科技大學<br>資訊管理所<br>92<br>Traditionally, the query results obtained from full-text searching systems based on inverted-index are usually huge in volume and unsorted, which make users suffer from easily determining what information embedded in the collection. Therefore, for contemporary document searching over the Internet, such systems are no longer satisfying user’s need. With the ever growing sea of electronic documents, it is indispensable to provide a mechanism for integrating a large number of documents to facilitate user’s browsing and querying need efficiently. In this paper, we propose a general framework for document clustering and multi-document summarization based on the concept of document warehousing. Based on our framework, we have implemented a prototype system, named DNCSS (Dynamic News Clustering and Summarization System) to be the testbed of our approach. The system adopts the concept of data warehousing, which models text-oriented documents into multi-dimensional viewpoints. The constructed document warehouse can be regarded as the main repository for our system and it flexibly organizes document structure information for user''s searching and querying. Moreover, the retrieved documents from the document warehouse will be further clustered by some clustering techniques to provide a more organized structure. Finally, our system generates a multi-document summary for each cluster to support users finding their needed information more efficiently. We collect the most famous on-line news in TAIWAN from the Internet as the testing examples to verify the effectiveness of our system. The evaluation result shows that our approach positively alleviates users from reading large amount of related news and elaborating the necessary conclusion effectively.
APA, Harvard, Vancouver, ISO, and other styles
35

Brus, Tomáš. "Sumarizace českých textů z více zdrojů." Master's thesis, 2012. http://www.nusl.cz/ntk/nusl-313876.

Full text
Abstract:
This work focuses on the summarization task for a set of articles on the same topic. It discusses several possible ways of summarizations and ways to assess their final quality. The implementation of the described algorithms and their application to selected texts constitutes a part of this work. The input texts come from several Czech news servers and they are represented as deep syntactic trees (the so called tectogrammatical layer).
APA, Harvard, Vancouver, ISO, and other styles
36

Wung, Hung_chia, and 翁鴻加. "Some New Approaches to Multi-document Summarization." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/39815575583792443924.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Ouyang, Jessica Jin. "Adapting Automatic Summarization to New Sources of Information." Thesis, 2019. https://doi.org/10.7916/d8-5nar-6b61.

Full text
Abstract:
English-language news articles are no longer necessarily the best source of information. The Web allows information to spread more quickly and travel farther: first-person accounts of breaking news events pop up on social media, and foreign-language news articles are accessible to, if not immediately understandable by, English-speaking users. This thesis focuses on developing automatic summarization techniques for these new sources of information. We focus on summarizing two specific new sources of information: personal narratives, first-person accounts of exciting or unusual events that are readily found in blog entries and other social media posts, and non-English documents, which must first be translated into English, often introducing translation errors that complicate the summarization process. Personal narratives are a very new area of interest in natural language processing research, and they present two key challenges for summarization. First, unlike many news articles, whose lead sentences serve as summaries of the most important ideas in the articles, personal narratives provide no such shortcuts for determining where important information occurs in within them; second, personal narratives are written informally and colloquially, and unlike news articles, they are rarely edited, so they require heavier editing and rewriting during the summarization process. Non-English documents, whether news or narrative, present yet another source of difficulty on top of any challenges inherent to their genre: they must be translated into English, potentially introducing translation errors and disfluencies that must be identified and corrected during summarization. The bulk of this thesis is dedicated to addressing the challenges of summarizing personal narratives found on the Web. We develop a two-stage summarization system for personal narrative that first extracts sentences containing important content and then rewrites those sentences into summary-appropriate forms. Our content extraction system is inspired by contextualist narrative theory, using changes in writing style throughout a narrative to detect sentences containing important information; it outperforms both graph-based and neural network approaches to sentence extraction for this genre. Our paraphrasing system rewrites the extracted sentences into shorter, standalone summary sentences, learning to mimic the paraphrasing choices of human summarizers more closely than can traditional lexicon- or translation-based paraphrasing approaches. We conclude with a chapter dedicated to summarizing non-English documents written in low-resource languages – documents that would otherwise be unreadable for English-speaking users. We develop a cross-lingual summarization system that performs even heavier editing and rewriting than does our personal narrative paraphrasing system; we create and train on large amounts of synthetic errorful translations of foreign-language documents. Our approach produces fluent English summaries from disdisfluent translations of non-English documents, and it generalizes across languages.
APA, Harvard, Vancouver, ISO, and other styles
38

Chi, Pi-Chuan, and 冀碧娟. "The Effects of Differentiated Summarization Strategy Teaching on New Immigrant Students’ Ability of Summarizing Texts." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/00716166480291537017.

Full text
Abstract:
碩士<br>明道大學<br>課程與教學研究所<br>101<br>The purpose of the study was to examine the effects of differentiated summarization strategy teaching on the ability of summarizing texts of the fourth- grade new immigrant students. This study was designed as a quasi-experiment, which adopted one-group pretest-posttest design. The subjects of this study were 5 fourth- grade new immigrant students in Yunlin County, Taiwan. They received the instruction of differentiated summarization strategy for 12 weeks. Before the experimental instructions, the participants were divided into two groups: low and high reading comprehension abilities, by the results of reading comprehension screening test. The participants were evaluated via a summary skill test before and after the experimental sessions. They also had a post-summarization interview in order to explore their process of summarization. The findings of this study were as follows: 1.Differentiated summarization strategy can enhance fourth- grade new immigrant students’ ability of summarizing texts. 2.Differentiated summarization strategy can shorten the gap of summary abilities between the general and the new immigrant groups. 3.The new immigrant students find it easier to learn the rules of inducting similar words and deleting trial information and repetitions. 4.The new immigrant students find it more difficult to learn the rules of judging important information and retouching sentences.
APA, Harvard, Vancouver, ISO, and other styles
39

Zhang, Lei. "New data analytics and visualization methods in personal data mining, cancer data analysis and sports data visualization." 2017. http://scholarworks.gsu.edu/cs_diss/126.

Full text
Abstract:
In this dissertation, we discuss a reading profiling system, a biological data visualization system and a sports visualization system. Self-tracking is getting increasingly popular in the field of personal informatics. Reading profiling can be used as a personal data collection method. We present UUAT, an unintrusive user attention tracking system. In UUAT, we used user interaction data to develop technologies that help to pinpoint a users reading region (RR). Based on computed RR and user interaction data, UUAT can identify a readers reading struggle or interest. A biomarker is a measurable substance that may be used as an indicator of a particular disease. We developed CancerVis for visual and interactive analysis of cancer data and demonstrate how to apply this platform in cancer biomarker research. CancerVis provides interactive multiple views from different perspectives of a dataset. The views are synchronized so that users can easily link them to a same data entry. Furthermore, CancerVis supports data mining practice in cancer biomarker, such as visualization of optimal cutpoints and cutthrough exploration. Tennis match summarization helps after-live sports consumers assimilate an interested match. We developed TennisVis, a comprehensive match summarization and visualization platform. TennisVis offers chart- graph for a client to quickly get match facts. Meanwhile, TennisVis offers various queries of tennis points to satisfy diversified client preferences (such as volley shot, many-shot rally) of tennis fans. Furthermore, TennisVis offers video clips for every single tennis point and a recommendation rating is computed for each tennis play. A case study shows that TennisVis identifies more than 75% tennis points in full time match.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!