Log in

Relevant bibliographies by topics / Automated Text Categorization / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Automated Text Categorization.

Dissertations / Theses on the topic 'Automated Text Categorization'

Author: Grafiati

Published: 4 June 2021

Last updated: 16 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 30 dissertations / theses for your research on the topic 'Automated Text Categorization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Wirantono, Marcel. "Automated text categorization with collaboratively tagged data." Thesis, University of Ottawa (Canada), 2009. http://hdl.handle.net/10393/28116.

Full text

Abstract:

Recent popularity of collaborative tagging as a component of a retrieval system has lead us to study such a system. Similar to text categorization, albeit in a less centralized fashion, collaborative tagging relies on humans to annotate documents with metadata descriptions, i.e. tags. For that reason, this thesis attempts to extend the tagging process to include a more consistent non-human annotations in the form of automatic text categorization. In applying automatic text categorization to collaboratively tagged data, we have created two sets of experiment. The first experiment compares two classification methods, Naive Bayes and Support Vector Machine (SVM) in a straightforward 1-vs. all classification. The results of the comparison allow us to make important observations such as the benefit of using a maximum margin classifiers (SVM) in annotating concepts with skewed document distributions as well as establishing a baseline result. For the second experiment, we have found that the lack of structure in tagging has limited our learning approach to the simple 1-vs. all setting. Inspired by the application of hierarchical categorization in web directories[15], we introduce in our second experiment a categorization approach that automatically builds a hierarchy from the tag space and incorporates it to the training and classification process. Unlike previous hierarchical categorizations that rely on human-generated hierarchies, our hierarchical approach relies on an artificial hierarchy that is created from tag usage analysis. After the method was applied to the dataset, we compared the result of the new methods with the baseline results from the first experiment. Based on that comparison, we observed that our hierarchical approach improves not only on the quality of predictions, but also the efficiency (total training and classification time) of our automatic text categorization system.

APA, Harvard, Vancouver, ISO, and other styles

2

Eramo, Mark D. Sutter Christopher M. "Automated psychological categorization via linguistic processing system /." Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2004. http://library.nps.navy.mil/uhtbin/hyperion/04Sep%5FEramo.pdf.

Full text

Abstract:

Thesis (M.S. in Information Technology Management and M.S. in Information Systems and Operations)--Naval Postgraduate School, Sept. 2004.
Thesis advisor(s): Raymond Buettner, Magdi Kamel. Includes bibliographical references (p. 115-122). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

3

Sutter, Christopher M., and Mark D. Eramo. "Automated psychological categorization via linguistic processing system." Thesis, Monterey, California. Naval Postgraduate School, 2004. http://hdl.handle.net/10945/1439.

Full text

Abstract:

Approved for public release; distribution is unlimited
Influencing one's adversary has always been an objective in warfare. However, to date the majority of influence operations have been geared toward the masses or to very small numbers of individuals. Although marginally effective, this approach is inadequate with respect to larger numbers of high value targets and to specific subsets of the population. Limited human resources have prevented a more tailored approach, which would focus on segmentation, because individual targeting demands significant time from psychological analysts. This research examined whether or not Information Technology (IT) tools, specializing in text mining, are robust enough to automate the categorization/segmentation of individual profiles for the purpose of psychological operations (PSYOP). Research indicated that only a handful of software applications claimed to provide adequate functionality to perform these tasks. Text mining via neural networks was determined to be the best approach given the constraints of the profile data and the desired output. Five software applications were tested and evaluated for their ability to reproduce the results of a social psychologist. Through statistical analysis, it was concluded that the tested applications are not currently mature enough to produce accurate results that would enable automated segmentation of individual profiles based on supervised linguistic processing.
Captain, United States Marine Corps
Lieutenant, United States Navy

APA, Harvard, Vancouver, ISO, and other styles

4

SOARES, FABIO DE AZEVEDO. "AUTOMATIC TEXT CATEGORIZATION BASED ON TEXT MINING." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2013. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23213@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
A Categorização de Documentos, uma das tarefas desempenhadas em Mineração de Textos, pode ser descrita como a obtenção de uma função que seja capaz de atribuir a um documento uma categoria a que ele pertença. O principal objetivo de se construir uma taxonomia de documentos é tornar mais fácil a obtenção de informação relevante. Porém, a implementação e a execução de um processo de Categorização de Documentos não é uma tarefa trivial: as ferramentas de Mineração de Textos estão em processo de amadurecimento e ainda, demandam elevado conhecimento técnico para a sua utilização. Além disso, exercendo grande importância em um processo de Mineração de Textos, a linguagem em que os documentos se encontram escritas deve ser tratada com as particularidades do idioma. Contudo há grande carência de ferramentas que forneçam tratamento adequado ao Português do Brasil. Dessa forma, os objetivos principais deste trabalho são pesquisar, propor, implementar e avaliar um framework de Mineração de Textos para a Categorização Automática de Documentos, capaz de auxiliar a execução do processo de descoberta de conhecimento e que ofereça processamento linguístico para o Português do Brasil.
Text Categorization, one of the tasks performed in Text Mining, can be described as the achievement of a function that is able to assign a document to the category, previously defined, to which it belongs. The main goal of building a taxonomy of documents is to make easier obtaining relevant information. However, the implementation and execution of Text Categorization is not a trivial task: Text Mining tools are under development and still require high technical expertise to be handled, also having great significance in a Text Mining process, the language of the documents should be treated with the peculiarities of each idiom. Yet there is great need for tools that provide proper handling to Portuguese of Brazil. Thus, the main aims of this work are to research, propose, implement and evaluate a Text Mining Framework for Automatic Text Categorization, capable of assisting the execution of knowledge discovery process and provides language processing for Brazilian Portuguese.

APA, Harvard, Vancouver, ISO, and other styles

5

Hall, Scott R. "Automatic text categorization applied to E-mail." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2002. http://library.nps.navy.mil/uhtbin/hyperion-image/02sep%5FHall.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Demirtas, Kezban. "Automatic Video Categorization And Summarization." Master's thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/3/12611113/index.pdf.

Full text

Abstract:

In this thesis, we make automatic video categorization and summarization by using subtitles of videos. We propose two methods for video categorization. The first method makes unsupervised categorization by applying natural language processing techniques on video subtitles and uses the WordNet lexical database and WordNet domains. The method starts with text preprocessing. Then a keyword extraction algorithm and a word sense disambiguation method are applied. The WordNet domains that correspond to the correct senses of keywords are extracted. Video is assigned a category label based on the extracted domains. The second method has the same steps for extracting WordNet domains of video but makes categorization by using a learning module. Experiments with documentary videos give promising results in discovering the correct categories of videos. Video summarization algorithms present condensed versions of a full length video by identifying the most significant parts of the video. We propose a video summarization method using the subtitles of videos and text summarization techniques. We identify significant sentences in the subtitles of a video by using text summarization techniques and then we compose a video summary by finding the video parts corresponding to these summary sentences.

APA, Harvard, Vancouver, ISO, and other styles

7

Eklund, Johan. "With or without context : Automatic text categorization using semantic kernels." Doctoral thesis, Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-8949.

Full text

Abstract:

In this thesis text categorization is investigated in four dimensions of analysis: theoretically as well as empirically, and as a manual as well as a machine-based process. In the first four chapters we look at the theoretical foundation of subject classification of text documents, with a certain focus on classification as a procedure for organizing documents in libraries. A working hypothesis used in the theoretical analysis is that classification of documents is a process that involves translations between statements in different languages, both natural and artificial. We further investigate the close relationships between structures in classification languages and the order relations and topological structures that arise from classification. A classification algorithm that gets a special focus in the subsequent chapters is the support vector machine (SVM), which in its original formulation is a binary classifier in linear vector spaces, but has been extended to handle classification problems for which the categories are not linearly separable. To this end the algorithm utilizes a category of functions called kernels, which induce feature spaces by means of high-dimensional and often non-linear maps. For the empirical part of this study we investigate the classification performance of semantic kernels generated by different measures of semantic similarity. One category of such measures is based on the latent semantic analysis and the random indexing methods, which generates term vectors by using co-occurrence data from text collections. Another semantic measure used in this study is pointwise mutual information. In addition to the empirical study of semantic kernels we also investigate the performance of a term weighting scheme called divergence from randomness, that has hitherto received little attention within the area of automatic text categorization. The result of the empirical part of this study shows that the semantic kernels generally outperform the “standard” (non-semantic) linear kernel, especially for small training sets. A conclusion that can be drawn with respect to the investigated datasets is therefore that semantic information in the kernel in general improves its classification performance, and that the difference between the standard kernel and the semantic kernels is particularly large for small training sets. Another clear trend in the result is that the divergence from randomness weighting scheme yields a classification performance surpassing that of the common tf-idf weighting scheme.

APA, Harvard, Vancouver, ISO, and other styles

8

Borggren, Lukas. "Automatic Categorization of News Articles With Contextualized Language Models." Thesis, Linköpings universitet, Artificiell intelligens och integrerade datorsystem, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177004.

Full text

Abstract:

This thesis investigates how pre-trained contextualized language models can be adapted for multi-label text classification of Swedish news articles. Various classifiers are built on pre-trained BERT and ELECTRA models, exploring global and local classifier approaches. Furthermore, the effects of domain specialization, using additional metadata features and model compression are investigated. Several hundred thousand news articles are gathered to create unlabeled and labeled datasets for pre-training and fine-tuning, respectively. The findings show that a local classifier approach is superior to a global classifier approach and that BERT outperforms ELECTRA significantly. Notably, a baseline classifier built on SVMs yields competitive performance. The effect of further in-domain pre-training varies; ELECTRA’s performance improves while BERT’s is largely unaffected. It is found that utilizing metadata features in combination with text representations improves performance. Both BERT and ELECTRA exhibit robustness to quantization and pruning, allowing model sizes to be cut in half without any performance loss.

APA, Harvard, Vancouver, ISO, and other styles

9

Zhang, Xueying. "Rough set theory based automatic text categorization and the handling of semantic heterogeneity." Bonn Informationszentrum Sozialwiss, 2006. http://deposit.ddb.de/cgi-bin/dokserv?id=2704442&prov=M&dokv̲ar=1&doke̲xt=htm.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Pereira, Dennis V. "Automatic Lexicon Generation for Unsupervised Part-of-Speech Tagging Using Only Unannotated Text." Thesis, Virginia Tech, 1999. http://hdl.handle.net/10919/10094.

Full text

Abstract:

With the growing number of textual resources available, the ability to understand them becomes critical. An essential first step in understanding these sources is the ability to identify the parts-of-speech in each sentence. The goal of this research is to propose, improve, and implement an algorithm capable of finding terms (words in a corpus) that are used in similar ways--a term categorizer. Such a term categorizer can be used to find a particular part-of-speech, i.e. nouns in a corpus, and generate a lexicon. The proposed work is not dependent on any external sources of information, such as dictionaries, and it shows a significant improvement (~30%) over an existing method of categorization. More importantly, the proposed algorithm can be applied as a component of an unsupervised part-of-speech tagger, making it truly unsupervised, requiring only unannotated text. The algorithm is discussed in detail, along with its background, and its performance. Experimentation shows that the proposed algorithm performs within 3% of the baseline, the Penn-TreeBank Lexicon.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

11

Chung, EunKyung. "A Framework of Automatic Subject Term Assignment: An Indexing Conception-Based Approach." Thesis, University of North Texas, 2006. https://digital.library.unt.edu/ark:/67531/metadc5473/.

Full text

Abstract:

The purpose of dissertation is to examine whether the understandings of subject indexing processes conducted by human indexers have a positive impact on the effectiveness of automatic subject term assignment through text categorization (TC). More specifically, human indexers' subject indexing approaches or conceptions in conjunction with semantic sources were explored in the context of a typical scientific journal article data set. Based on the premise that subject indexing approaches or conceptions with semantic sources are important for automatic subject term assignment through TC, this study proposed an indexing conception-based framework. For the purpose of this study, three hypotheses were tested: 1) the effectiveness of semantic sources, 2) the effectiveness of an indexing conception-based framework, and 3) the effectiveness of each of three indexing conception-based approaches (the content-oriented, the document-oriented, and the domain-oriented approaches). The experiments were conducted using a support vector machine implementation in WEKA (Witten, & Frank, 2000). The experiment results pointed out that cited works, source title, and title were as effective as the full text, while keyword was found more effective than the full text. In addition, the findings showed that an indexing conception-based framework was more effective than the full text. Especially, the content-oriented and the document-oriented indexing approaches were found more effective than the full text. Among three indexing conception-based approaches, the content-oriented approach and the document-oriented approach were more effective than the domain-oriented approach. In other words, in the context of a typical scientific journal article data set, the objective contents and authors' intentions were more focused that the possible users' needs. The research findings of this study support that incorporation of human indexers' indexing approaches or conception in conjunction with semantic sources has a positive impact on the effectiveness of automatic subject term assignment.

APA, Harvard, Vancouver, ISO, and other styles

12

Maguluri, Naga Sai Nikhil. "Multi-Class Classification of Textual Data: Detection and Mitigation of Cheating in Massively Multiplayer Online Role Playing Games." Wright State University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=wright1494248022049882.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Faulstich, Lukas C., Peter F. Stadler, Caroline Thurner, and Christina Witwer. "litsift: Automated Text Categorization in Bibliographic Search." 2003. https://ul.qucosa.de/id/qucosa%3A32597.

Full text

Abstract:

In bioinformatics there exist research topics that cannot be uniquely characterized by a set of key words because relevant key words are (i) also heavily used in other contexts and (ii) often omitted in relevant documents because the context is clear to the target audience. Information retrieval interfaces such as entrez/Pubmed produce either low precision or low recall in this case. To yield a high recall at a reasonable precision, the results of a broad information retrieval search have to be filtered to remove irrelevant documents. We use automated text categorization for this purpose. In this study we use the topic of conserved secondary RNA structures in viral genomes as running example. Pubmed result sets for two virus groups, Picornaviridae and Flaviviridae, have been manually labeled by human experts. We evaluated various classifiers from the Weka toolkit together with different feature selection methods to assess whether classifiers trained on documents dedicated to one virus group can be successfully applied to filter literature on other virus groups. Our results indicate that in this domain a bibliographic search tool trained on a reference corpus may significantly reduce the amount of time needed for extensive literature recherches.

APA, Harvard, Vancouver, ISO, and other styles

14

Wei, Yuan-Gu, and 魏源谷. "A Study of Multiple Classifier Systems in Automated Text Categorization." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/58157330409643777309.

Full text

Abstract:

碩士
國立中正大學
資訊工程研究所
90
Automatic text categorization, which is defined as the task of assigning predefined class (category) labels to text documents, is one of the main techniques that are useful both in organizing and in locating information in huge text collections from, for example, the Internet. Many approaches such as linear classifiers, decision trees, Bayesian methods, neural networks and support vector machines, have been extensively studied and used to implement classifier systems for text categorization as well as for web page classification. Although a lot of efforts have been spent in each of these methods, we are reaching the limit of further performance improvement. Multiple classifier systems whose objective aims to combine the strength of individual classifiers to improve overall performance, have been widely studied recently. In this thesis, we study the development of multiple classifier systems in the automated text categorization. We investigate and propose various approaches for fundamental issues such as classifier combination, classifier subset selection, and static and dynamic classifier selection. We use our idea to develop efficient combination-based as well as selection-based multiple classifier systems. Experiments show that our approaches significantly improves the classification accuracy of individual classifiers for web page collections from web portals. In addition, we also propose a cascaded class reduction method in which a sequence of classifiers are cascaded to successively reducing the set of possible classes. We show that by cascading Naive Bayes and SVMs, we can improve the classification accuracy of SVMs while reducing the running time of SVMs.

APA, Harvard, Vancouver, ISO, and other styles

15

Silva, Sara Alexandra Teixeira da. "Automatization of incident categorization." Master's thesis, 2018. http://hdl.handle.net/10071/17585.

Full text

Abstract:

To be able to keep up with the grow of the created incidents quantity in an organization nowadays, there was the need to increase the resources to ensure the management of all incidents. Incident Management is composed by several activities, being one of them, Incident Categorization. Merging Natural Language and Text Mining techniques and Machine Learning algorithms, we propose improve this activity, specifically the Incident Management Process. For that, we propose replace the manual sub-process of Categorization inherent to the Incident Management Process by an automatic sub-process, without any human interaction. The goal of this dissertation is to propose a solution to categorize correctly and automatically the incidents. For that, there are real data provided by a company, which due to privacy questions will not be mention along dissertation. The datasets are composed by incidents correctly categorized, which leverage us to apply supervised learning algorithms. It is supposed to obtain as output a developed method through the merge of Natural Language Processing techniques and classification algorithms with better performance on the data. At the end, the proposed method is assessed comparatively with the current categorization done to conclude if our proposal really improves the Incident Management Process and which are the advantages brought by the automation.
De forma a acompanhar o crescimento da quantidade de incidentes criados no diaa-dia de uma organização, houve a necessidade de aumentar a quantidade de recursos, de maneira a assegurar a gestão de todos os incidentes. A gestão de incidentes é composta por várias atividades, sendo uma delas, a categorização de incidentes. Através da junção de técnicas de Linguagem Natural e Processamento de Texto e de Algoritmos de Aprendizagem Automática propomos melhorar esta atividade, especificamente o Processo de Gestão de Incidentes. Para tal, propomos a substituição do subprocesso manual de Categorização inerente ao Processo de Gestão de Incidentes por um subprocesso automatizado, sem qualquer interação humana. A dissertação tem como objetivo propor uma solução para categorizar corretamente e automaticamente incidentes. Para tal, temos dados reais de uma organização, que devido a questões de privacidade não será mencionada ao longo da dissertação. Os datasets são compostos por incidentes corretamente categorizados o que nos leva a aplicar algoritmos de aprendizagem supervisionada. Pretendemos ter como resultado final um método desenvolvido através da junção das diferentes técnicas de Linguagem Natural e dos algoritmos com melhor performance para classificar os dados. No final será avaliado o método proposto comparativamente à categorização que é realizada atualmente, de modo a concluir se a nossa proposta realmente melhora o Processo de Gestão de Incidentes e quais são as vantagens trazidas pela automatização.

APA, Harvard, Vancouver, ISO, and other styles

16

Hsu, Ya-Fen, and 許雅芬. "Automatic Text Categorization on News." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/91208800987400778267.

Full text

Abstract:

碩士
東吳大學
資訊科學系
90
Nowadays, people are eager to get new information. People can’t easily and efficiently find out the wanted information among such huge data. So, we have to classify the documents and then users can efficiently search these documents in the category they belong. Traditionally, by understanding the document experts assign specific categories to that document. However, it costs a lot of resources and has no economic benefits. So, we need an automatic text classifier to heap classification process. Automatic text categorization is the task of assigning predefined categories to free text documents. In text classification, there are always two important steps. The first step is features selection, and the second one is relevance function selection. Here we propose two techniques to improve the precision of classification by using co-occurrence terms and by considering the positions which bigram occurs. Moreover, this research also provides some other different features selection methods as the contrast for the experiment, including single terms features, bigram features, segmentation features and the position which segmentation occurs. The experimental result shows that the strategy which uses the co-occurrences as features did perform relatively well. Comparing with using pure bigram, there is about 15% improvement of the performance in average. Besides, the experiment also proves our observation of the texts, that is, bigram is more representative than single terms. In the next place, the positions of the key words have quite positive relation to importance.

APA, Harvard, Vancouver, ISO, and other styles

17

"Automatic text categorization for information filtering." 1998. http://library.cuhk.edu.hk/record=b5889734.

Full text

Abstract:

Ho Chao Yang.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.
Includes bibliographical references (leaves 157-163).
Abstract also in Chinese.
Abstract --- p.i
Acknowledgment --- p.iii
List of Figures --- p.viii
List of Tables --- p.xiv
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Automatic Document Categorization --- p.1
Chapter 1.2 --- Information Filtering --- p.3
Chapter 1.3 --- Contributions --- p.6
Chapter 1.4 --- Organization of the Thesis --- p.7
Chapter 2 --- Related Work --- p.9
Chapter 2.1 --- Existing Automatic Document Categorization Approaches --- p.9
Chapter 2.1.1 --- Rule-Based Approach --- p.10
Chapter 2.1.2 --- Similarity-Based Approach --- p.13
Chapter 2.2 --- Existing Information Filtering Approaches --- p.19
Chapter 2.2.1 --- Information Filtering Systems --- p.19
Chapter 2.2.2 --- Filtering in TREC --- p.21
Chapter 3 --- Document Pre-Processing --- p.23
Chapter 3.1 --- Document Representation --- p.23
Chapter 3.2 --- Classification Scheme Learning Strategy --- p.26
Chapter 4 --- A New Approach - IBRI --- p.31
Chapter 4.1 --- Overview of Our New IBRI Approach --- p.31
Chapter 4.2 --- The IBRI Representation and Definitions --- p.34
Chapter 4.3 --- The IBRI Learning Algorithm --- p.37
Chapter 5 --- IBRI Experiments --- p.43
Chapter 5.1 --- Experimental Setup --- p.43
Chapter 5.2 --- Evaluation Metric --- p.45
Chapter 5.3 --- Results --- p.46
Chapter 6 --- A New Approach - GIS --- p.50
Chapter 6.1 --- Motivation of GIS --- p.50
Chapter 6.2 --- Similarity-Based Learning --- p.51
Chapter 6.3 --- The Generalized Instance Set Algorithm (GIS) --- p.58
Chapter 6.4 --- Using GIS Classifiers for Classification --- p.63
Chapter 6.5 --- Time Complexity --- p.64
Chapter 7 --- GIS Experiments --- p.68
Chapter 7.1 --- Experimental Setup --- p.68
Chapter 7.2 --- Results --- p.73
Chapter 8 --- A New Information Filtering Approach Based on GIS --- p.87
Chapter 8.1 --- Information Filtering Systems --- p.87
Chapter 8.2 --- GIS-Based Information Filtering --- p.90
Chapter 9 --- Experiments on GIS-based Information Filtering --- p.95
Chapter 9.1 --- Experimental Setup --- p.95
Chapter 9.2 --- Results --- p.100
Chapter 10 --- Conclusions and Future Work --- p.108
Chapter 10.1 --- Conclusions --- p.108
Chapter 10.2 --- Future Work --- p.110
Chapter A --- Sample Documents in the corpora --- p.111
Chapter B --- Details of Experimental Results of GIS --- p.120
Chapter C --- Computational Time of Reuters-21578 Experiments --- p.141

APA, Harvard, Vancouver, ISO, and other styles

18

"Training example adaptation for text categorization." 2005. http://library.cuhk.edu.hk/record=b5892711.

Full text

Abstract:

Ko Hon Man.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.
Includes bibliographical references (leaves 68-72).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Background and Motivation --- p.1
Chapter 1.2 --- Thesis Organization --- p.4
Chapter 2 --- Related Work --- p.6
Chapter 2.1 --- Semi-supervised learning --- p.6
Chapter 2.2 --- Hierarchical Categorization --- p.10
Chapter 3 --- Framework Overview --- p.13
Chapter 4 --- Inherent Concept Detection --- p.18
Chapter 4.1 --- Data Preprocessing --- p.18
Chapter 4.2 --- Concept Detection Algorithm --- p.22
Chapter 4.3 --- Kernel-based Distance Measure --- p.27
Chapter 5 --- Training Example Discovery from Unlabeled Documents --- p.33
Chapter 5.1 --- Training Document Discovery --- p.33
Chapter 5.2 --- Automatically determining the number of extracted positive examples --- p.37
Chapter 5.3 --- Classification Model --- p.39
Chapter 6 --- Experimental Evaluation --- p.44
Chapter 6.1 --- Corpus Description --- p.44
Chapter 6.2 --- Evaluation Metric --- p.49
Chapter 6.3 --- Result Analysis --- p.50
Chapter 7 --- Conclusions and Future Work --- p.66
Bibliography --- p.68
Chapter A --- Detailed result on the inherent concept detection process for the TDT and RCV1 corpora --- p.73

APA, Harvard, Vancouver, ISO, and other styles

19

"New learning strategies for automatic text categorization." 2001. http://library.cuhk.edu.hk/record=b5890838.

Full text

Abstract:

Lai Kwok-yin.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.
Includes bibliographical references (leaves 125-130).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Automatic Textual Document Categorization --- p.1
Chapter 1.2 --- Meta-Learning Approach For Text Categorization --- p.3
Chapter 1.3 --- Contributions --- p.6
Chapter 1.4 --- Organization of the Thesis --- p.7
Chapter 2 --- Related Work --- p.9
Chapter 2.1 --- Existing Automatic Document Categorization Approaches --- p.9
Chapter 2.2 --- Existing Meta-Learning Approaches For Information Retrieval --- p.14
Chapter 2.3 --- Our Meta-Learning Approaches --- p.20
Chapter 3 --- Document Pre-Processing --- p.22
Chapter 3.1 --- Document Representation --- p.22
Chapter 3.2 --- Classification Scheme Learning Strategy --- p.25
Chapter 4 --- Linear Combination Approach --- p.30
Chapter 4.1 --- Overview --- p.30
Chapter 4.2 --- Linear Combination Approach - The Algorithm --- p.33
Chapter 4.2.1 --- Equal Weighting Strategy --- p.34
Chapter 4.2.2 --- Weighting Strategy Based On Utility Measure --- p.34
Chapter 4.2.3 --- Weighting Strategy Based On Document Rank --- p.35
Chapter 4.3 --- Comparisons of Linear Combination Approach and Existing Meta-Learning Methods --- p.36
Chapter 4.3.1 --- LC versus Simple Majority Voting --- p.36
Chapter 4.3.2 --- LC versus BORG --- p.38
Chapter 4.3.3 --- LC versus Restricted Linear Combination Method --- p.38
Chapter 5 --- The New Meta-Learning Model - MUDOF --- p.40
Chapter 5.1 --- Overview --- p.41
Chapter 5.2 --- Document Feature Characteristics --- p.42
Chapter 5.3 --- Classification Errors --- p.44
Chapter 5.4 --- Linear Regression Model --- p.45
Chapter 5.5 --- The MUDOF Algorithm --- p.47
Chapter 6 --- Incorporating MUDOF into Linear Combination approach --- p.52
Chapter 6.1 --- Background --- p.52
Chapter 6.2 --- Overview of MUDOF2 --- p.54
Chapter 6.3 --- Major Components of the MUDOF2 --- p.57
Chapter 6.4 --- The MUDOF2 Algorithm --- p.59
Chapter 7 --- Experimental Setup --- p.66
Chapter 7.1 --- Document Collection --- p.66
Chapter 7.2 --- Evaluation Metric --- p.68
Chapter 7.3 --- Component Classification Algorithms --- p.71
Chapter 7.4 --- Categorical Document Feature Characteristics for MUDOF and MUDOF2 --- p.72
Chapter 8 --- Experimental Results and Analysis --- p.74
Chapter 8.1 --- Performance of Linear Combination Approach --- p.74
Chapter 8.2 --- Performance of the MUDOF Approach --- p.78
Chapter 8.3 --- Performance of MUDOF2 Approach --- p.87
Chapter 9 --- Conclusions and Future Work --- p.96
Chapter 9.1 --- Conclusions --- p.96
Chapter 9.2 --- Future Work --- p.98
Chapter A --- Details of Experimental Results for Reuters-21578 corpus --- p.99
Chapter B --- Details of Experimental Results for OHSUMED corpus --- p.114
Bibliography --- p.125

APA, Harvard, Vancouver, ISO, and other styles

20

Lin, Ching-Han, and 林京翰. "Cascaded Class Reduction for Automatic Text Categorization." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/09069339792107773438.

Full text

Abstract:

碩士
國立中正大學
資訊工程研究所
91
The task of text categorization is the classification of natural text (or hypertext) documents into a fixed number of predefined categories. This problem arises in a number of different areas including email filtering, web searching, office automation, sorting documents by topics, and classification of newsagency stories etc. Some approaches such as K-nearest neighbor and support vector machines achieve outstanding performance, but they suffer long classification time when the number of predefined categories is very large. In this thesis, we investigate and propose a cascaded class reduction method in which a sequence of classifiers are cascaded to successively reducing the set of possible classes. We show that by cascading simple clasifiers and SVM or KNN, we can improve the classification accuracy while reducing the classification time.

APA, Harvard, Vancouver, ISO, and other styles

21

Yang, Cheng-Han, and 楊承翰. "Automatic Text Categorization Model Based on Genetic Algorithm." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/z9j2cq.

Full text

Abstract:

碩士
國立中正大學
資訊管理學系暨研究所
101
The rapid accumulation of a large number of digital information indeed raises the difficulties in searching information, so effectively manage documents has become an important task. Therefore, Text Categorization (TC) research growing in importance. The majority of TC studies focus on trying to find out a best individual classifier with the highest accuracy from different classifiers to be the model of TC. However, the individual classifier often provides better results only in the appropriate data. So our research attempts to integrate various individual classifiers into ensemble to improve the classification performance. And then compile the opinions of different experts (classifiers) to make decision. In this way, it can solve the problem of that the original individual classifier can only fit the particular document datasets. TC is also likely to be confronted by the problem of excessive document feature dimensions. Therefore, We hope to use the Genetic Algorithm (GA) to optimize the classifier's training, and make each classifier have diverse features, mutual independences and better prediction abilities, and further enhance the overall classification performance. We propose two versions of GA encoding methods: (1) Selection of Disjoint Feature Subsets (SDFS) which lets each feature can use only one kind of classifier to perform training. (2) Selection of Possibly Overlapping Feature Subsets (SPOFS) which lets each feature can use more than one kinds of classifiers to perform training. In experimental evaluation, we use the real-world data set from Reuters-21578 news article collection with Modified Apte Split. The experimental result shows that our method can improve the document classification accuracy both in individual classifier and ensemble, and ensemble document classification model which has good and stable classification effects.

APA, Harvard, Vancouver, ISO, and other styles

22

Ying, Jia-Ching, and 英家慶. "Automatic Chinese Text Categorization Using N-gram Model." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/p5k937.

Full text

Abstract:

碩士
銘傳大學
資訊傳播工程學系碩士班
95
Chinese text classification is an important and well-known technique in the field of machine learning. However, most applications often avoid the problem of word segmentation and ignore the relationship between words. It is important to model a suitable classifier for Chinese text classification. In this paper, we propose an N-gram-based Language model for Chinese text categorization which considers the relationship of words. To prevent from out-of-vocabulary, we also propose a novel smoothing approach based on logistic regression to improve accuracy. The experimental result shows that our approach outperforms former N-gram-based classification model above 11% on micro-average F-measure.

APA, Harvard, Vancouver, ISO, and other styles

23

Ko-Li, Kan, and 甘可立. "Effectiveness Issues in Keyword Extraction and Automatic Text Categorization." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/06898275923006333937.

Full text

Abstract:

碩士
中華大學
資訊工程學系碩士班
94
Presently, automatic text categorization is primarily based on extractingkeywords from documents. Extracting keywords is also a basic and coretechnology for document analysis. Nowadays, keyword extracting whichmostly depends on the judgement of professional researchers is a waste oftime and manpower. Therefore, it is important to employ automatic keywordextraction methods in text categorization.. In this thesis, we propose four keyword extraction methods to improve the efficiency and accuracy of automatic text categorization: (1)Content-reduction text categorization - The abstract of an article isretrieved before keyword extracting; (2) Hierarchical text categorization- The keyword is extracted according to the taxonomy hierarchy; (3) PNpruning - Redundant keywords are pruned to retain the important keywords;(4) TFxR keyword weighting method - The accuracy of categorization is increased by calculating keyword weight. We evaluated the new methods by the efficiency of both keyword extraction and text categorization as well as the accuracy of text categorization. The experiment results showed our new approaches improve both the efficiency of keyword extraction and the accuracy of text categorization. Furthermore, our new methods demostrate a huge saving on manpower and time especially when applied to the knowledge management systems of some industries.

APA, Harvard, Vancouver, ISO, and other styles

24

Wang, Jing-Doo, and 王經篤. "Design and Evaluation of Approaches for Automatic Chinese Text Categorization." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/16980764430249360032.

Full text

Abstract:

博士
國立中正大學
資訊工程研究所
90
In recent years, we have seen a tremendous growth in the number of online text documents available on the Internet, in digital libraries and news sources. Effective location of information in these huge resources is difficult without good indexing as well as organization of text collections. Automatic text categorization, which is defined as the task of assigning predefined class (category) labels to free text documents, is one of the main techniques that are useful both in organizing and in locating information in these huge collections. Many approaches to text categorization and web page classification have been proposed. Most of them have been evaluated using English texts. Evaluation of these approaches using texts in Chinese and other oriental languages has been limited. This dissertation proposes and evaluates approaches for categorizing Chinese texts, which consist of term extraction, term selection, term clustering and text classification. For term extraction, we propose an I/O-efficient approach which uses frequency counts to identify left and right boundaries of possibly significant terms. We then perform term selection and term clustering to reduce the dimension of term space into a practical level while without losing classification accuracy. We study and compare the performance of three well known classifiers, including linear classifier, naive Bayes probabilistic classifier and k-Nearnest Neighbors (kNN) classifier, when they are applied to categorize Chinese texts. Overall, kNN achieves the best accuracy but requires large amount of computation time and memory in classifying new texts. Linear classifier is very time and memory efficient in practical implementation, but achieves accuracy which is slightly worse than that of kNN. To compensate for the potential weakness of linear classifier which computes one representative for each class, we increase the number of representatives for each class. Experimental results show that this approach improved linear classifier and achieved micro-averaged accuracy similar to that of kNN, with much less classification time. Furthermore, we provide a suggestion to reorganize the structure of classes when identify new representatives for linear classifier. With the scalability of our term extraction approach that could handle large text collections derived from the chronologically-ordered Chinese news, we could mine for periodic events via the term frequency distribution of significant terms in some time series. Note that chronologically-ordered news articles concerned with regular events such as annual festivals, ceremonies, games and customs are appealing to a foreigner who likes to have a deep understanding of an unfamiliar country, and are useful to an observer who wants to review news after a long period.

APA, Harvard, Vancouver, ISO, and other styles

25

Lin, Jeng-Wei, and 林政緯. "A Study on Automatic Text Categorization And Its Performance Evaluation." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/13076597873081176063.

Full text

Abstract:

碩士
輔仁大學
圖書資訊學系
90
The study tries to use computers to classify documents automatically and evaluate the efficiencies of each methodology and itself. According to results of the processes in those classification systems, we learn something about which factors may impact the efficiencies. The system in this thesis trains the classification module to improve the correct rate by using the documents which were classified into many categories. For this paper we use the ApteMod version of Reuters-21578, which was obtained by eliminating unlabelled documents and selecting the categories which have at least one document in the training set and the test set. This process resulted in 90 categories in both the training and test sets. After eliminating documents which do not belong to any of these 90 categories, we obtained a training set of 7769 documents, a test set of 3019 documents. In the thesis, we not only discuss the Linear Function in IR, Rocchio Algorithm and the k-Nearest Neighbor (kNN), but also investigate the methodologies including Vector Space Module and kNN Classifier. Based on the concepts, we runs several experiments. Finally we compare the results with the data from the references, and evaluate the efficiencies of the study.

APA, Harvard, Vancouver, ISO, and other styles

26

Li, Po-Yi, and 李柏毅. "Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/65318851471594812159.

Full text

Abstract:

碩士
國立高雄應用科技大學
電機工程系碩士班
92
“Automatic text categorization” is based on machine learning techniques to fulfill classification of heterogeneous texts through an implemented classification system. The theory of Support Vector Machine (SVM) was constructed based on statistical learning, neural network and optimization techniques. The major features of SVM are: (1). the capacity to deal with linear and non-linear problems, and (2). the total sizes of tested data items (data size) are not limited. As a result, SVM algorithm offers an effective solution to resolve the difficulties in text categorization with a large scale data size. This research work is mainly based on Support Vector Machine (SVM) learning algorithm and proposed a strategy of feature selection to carry out classification of Chinese document. Based on several experimental situations, we discussed the differences among several feature selection strategies, and verified their impacts on the performance of SVM based classification tasks. After that, according to the analysis of the strategies, we determined one of them for our implementation of developed classification system, and combined different kernel functions with various parameters into the SVM algorithm to establish the experiments of document categorization. Our experimental results indicate that the SVM algorithm for document classification can produce a satisfactory performance, based on the determined strategy of feature selection. We also demonstrate that only 500 dimensions required, our system can perform an outstanding accuracy of categorization. Eventually we conducted several experiments to compare the neural networks and kNN classifiers with our implemented SVM classifier for document categorization. The SVM classifier also obtains a superior performance than others.

APA, Harvard, Vancouver, ISO, and other styles

27

Wu, Chia-Chuan, and 吳家銓. "A Study of Automatic Text Categorization based on Directional Term Structure." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/25870222133144392967.

Full text

Abstract:

碩士
國立中興大學
資訊管理學系所
98
In the mainstream research of automatic text categorization, rule-based classifier provides an interesting advantage, the interpretability. Because the rules in the classifier are composed by terms, thus they can be easily understood, modified and maintained by human. And the classifier’s accuracy is also competitive when compared to other accurate classifiers such as SVM and Bayes net, so rule-based classification techniques are very popular. Today’s rule-based techniques identify valuable patterns from training documents to construct classification rules. These techniques did not consider the relationship between terms and paragraphs in documents. Since there must be a dominant topic throughout a document’s content, thus generates certain term structures across paragraphs. Therefore, this study presents a new concept, Meaningful Inner Link Object-MILO, by finding underlying directional term links across paragraphs of document for text categorization. In this study, the process of MILO for text categorization consists of four main procedures. Firstly, feature selection, the purpose is to find representative terms to compose MILO from training documents which have a great quantity of noises terms. Secondly, MILO filtering, through the number of MILOs can be more than ten thousand, to measure MILO’s quality is an important issue, by filtering useless MILOs, the accuracy can be improved. Thirdly, the designing of a scoring model, to correctly classify document, an effective model is needed to accurately assign category to unlabeled document. Finally, classification structures, traditional techniques only use one classifier for classification, while this study presents a hierarchical classification structure to improve accuracy. Summary of our method, firstly, a novel method is presented by observing term’s distribution in document paragraphs to extract MILOs for text categorization. Secondly, an improved method is presented by eliminating noises MILOs and using a hierarchical classification structure. The experimental results of the two methods in this study show competitive performance on famous benchmarks such as Reuters, WebKB and Ohsumed.

APA, Harvard, Vancouver, ISO, and other styles

28

Chen, Chao-long, and 陳朝龍. "A Text Categorization Method Based on Term Distributional Clustering and Automatic Summarization." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/73204557664150148160.

Full text

Abstract:

碩士
國立屏東商業技術學院
資訊管理系
96
Owing to the exponential growth of electronic documents, research on automatic summarization is flourishing in the last decade. This is evident from the fact that Computational Linguistics (Vo. 28, No. 4, 2002) and Information Processing & Processing (Vol. 43, No. 6, 2007) both have published a special issue on automatic summarization. However, the summaries generated by these sophisticated methods have little uses except for affording document searchers a glimpse of what the document is about. Therefore, this study proposes to use the automatic summarization for term selection as a way of dimension reduction. Furthermore, to solve the problem that similar concepts with different term representations might cause the deficiency of classification, we also investigate the effect of term expansion on the classification accuracy. Contrary to using term distributional clustering for feature extraction, we propose use it for expanding the feature terms. Finally, we compare four data sets with different attributes (including Chinese and English news stories, longer articles like academic research papers, and short articles like medical abstracts), and different classification algorithms (KNN, Naive Bayes, and SVM) to understand the feasibility of the proposed method. The results show that text summarization is an effective way for dimension reduction. The classification accuracy from the summarization performs better than the traditional TFIDF and Information Gain term weighting schemes. Also term distributional clustering can also be applied to term expansion, and further improve the classification accuracy, especially when the size of feature terms is small. The proposed method will not only reduce the dimensionality of the term vector and select more representative terms; it can also save the computation resources. That is, one need not redo the feature selection process to cope with the task of text categorization. Finally, a by-product of our proposed method is that it can generate indicative summaries of those documents. Thus, readers can easily grasp the concepts of those documents by our method when browsing the classification results.

APA, Harvard, Vancouver, ISO, and other styles

29

Su, Jong-Ming, and 蘇中明. "Use the Automatic Text Categorization Technology to Support the Management of the Discussion Portfolio Process." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/11562412456994381992.

Full text

Abstract:

碩士
大葉大學
資訊管理研究所
90
The subject of this thesis is clustering technology in information retrieval. The simplest approach will be used, free of jargon, to find the similarities between articles on this subject. Also, the technical acceptance model aided by automatic text categorization will be used. A web-based education database will provide discussion groups for students, and provide links to other sites useful to all students. This is the subject of this thesis. Moreover, teachers will be assisted with a heavy workload. Before the school year the teacher can reorganize the database so it can be easily retrieved. This way, teachers can save time and effort carrying out their jobs. Furthermore, the analysis of the learning portfolio should focus on the acquisition of knowledge. By recording the progress of each student, we may influence the study techniques of each student, and increase our knowledge of the learning process itself through data mining. This will be the focus of this thesis? Using Automatic Text Categorization to provide a richer learning environment for students. At last, using Davis’s technology acceptance model to put to the proof, and show the results on the chapter 4, to be a better ending, and adding a chapter 5 to discussing the difference with Davis’s, to look into the reason, it is the strategy of the adding number to their score. Key Words : Information Retrieval ,Clustering Technology , Technology Acceptance Model , Automatic Text Categorization

APA, Harvard, Vancouver, ISO, and other styles

30

Alberts, Inge. "Exploitation des genres de textes pour assister les pratiques textuelles dans les environnements numériques de travail : le cas du courriel chez des cadres et des secrétaires dans une municipalité et une administration fédérale canadiennes." Thèse, 2009. http://hdl.handle.net/1866/2839.

Full text

Abstract:

Notre recherche a pour but de déterminer comment les genres textuels peuvent être exploités dans le design des environnements numériques de travail afin de faciliter l’accomplissement des pratiques textuelles de cadres et de secrétaires dans une municipalité et une administration fédérale canadiennes. À cet effet, le premier objectif consiste à évaluer l’aptitude des environnements numériques de travail à supporter les pratiques textuelles (lecture, écriture et manipulation des textes) de ces employés. Le deuxième objectif est de décrire les rôles des genres textuels au cours des pratiques textuelles. Avec l’exemple du courriel, le troisième objectif vise à examiner comment le genre peut être exploité dans une perspective d’assistance à la réalisation des pratiques textuelles dans les environnements numériques de travail. Cette recherche de nature qualitative comporte une méthodologie en deux étapes. La première étape consiste en un examen minutieux des pratiques textuelles, des difficultés rencontrées au cours de celles-ci, du rôle du genre dans les environnements numériques de travail, ainsi que des indices sollicités au cours de la gestion du courriel. Trois modes de collecte des données qualitatives sont utilisés auprès de 17 cadres et de 17 secrétaires issus de deux administrations publiques : l’entrevue semi-dirigée, le journal de bord et l’enquête cognitive. Les résultats sont examinés à l’aide de stratégies d’analyse de contenu qualitative. La deuxième phase comprend la mise au point d’une chaîne de traitement du courriel, visant à étayer notre réflexion sur le genre textuel et son exploitation dans la conception des environnements numériques de travail. Un corpus de 1703 messages est élaboré à partir d’un échantillon remis par deux cadres gouvernementaux. Les résultats permettent d’abord de dresser un portrait général des pratiques de lecture, d’écriture et de manipulation des textes communes et spécifiques aux cadres et aux secrétaires. L’importance du courriel, qui constitue environ 40% des systèmes notés dans les journaux de bord, est soulignée. Les difficultés rencontrées dans les environnements numériques de travail sont également décrites. Dans un deuxième temps, les rôles du genre au cours des pratiques textuelles sont examinés en fonction d’une matrice tenant à la fois compte de ses dimensions individuelles et collectives, ainsi que de ses trois principales facettes ; la forme, le contenu et la fonction. Ensuite, nous présentons un cadre d’analyse des indices affectant la gestion du courriel qui synthétise le processus d’interprétation des messages par le destinataire. Une typologie des patrons de catégorisation des cadres est également définie, puis employée dans une expérimentation statistique visant la description et la catégorisation automatique du courriel. Au terme de ce processus, on observe des comportements linguistiques marqués en fonction des catégories du courriel. Il s’avère également que la catégorisation automatique basée sur le lexique des messages est beaucoup plus performante que la catégorisation non lexicale. À l’issue de cette recherche, nous suggérons d’enrichir le paradigme traditionnel relevant de l’interaction humain-ordinateur par une sémiotique du genre dans les environnements numériques de travail. L’étude propose également une réflexion sur l’appartenance du courriel à un genre, en ayant recours aux concepts théoriques d’hypergenre, de genre et de sous-genre. Le succès de la catégorisation automatique du courriel en fonction de facettes tributaires du genre (le contenu, la forme et la fonction) offre des perspectives intéressantes sur l’application de ce concept au design des environnements numériques de travail en vue de faciliter l’accomplissement des pratiques textuelles par les employés.
This research reveals how textual genres can be exploited in digital work environments to improve the textual practices of managers and secretaries in the context of a municipality and the Canadian federal government. The first objective of this research assesses the suitability of digital work environments to support the textual practices of managers and secretaries in their reading, writing and manipulation of texts. The second objective describes the various roles of textual genre during the managerial and secretarial textual practices. Using email as a focal point, the third objective examines how genre can be exploited to advance the benefits of textual practices in the digital work environments. This qualitative research entails a two-phase methodology. By the study of 17 secretaries and 17 managers, the first phase consists of a thorough examination of the current textual practices in the Canadian federal government and municipal contexts and the difficulties encountered during these practices. This phase also considers the various roles of genre in the digital work environments along with the salient clues sought during email management. This study deployed three data collection techniques: semi-structured interviews, diary journals and cognitive inquiries. The results are examined using several qualitative content analysis techniques. The second phase of this research consists of developing an email processing sequence to further expand our understanding of textual genre and its exploitation in the design of digital work environments. The data for this phase uses a corpus of 1703 messages developed from a sample of two governmental managers. The results provide an encompassing overview of practices relating to the reading, writing and manipulation of texts that are both common and specific to managers and secretaries. With over 40% of events recorded in the diary journal relating to email, the importance of this type of system in digital work environments is clearly emphasized. The difficulties encountered in the digital work environments are also described. The role of genre during textual practices is examined according to a matrix illustrating both the individual and collective dimensions of genre in addition to its three main facets: the form, the content and the purpose. We present next an analytic framework of the prominent cues affecting email management to summarize the process of interpreting messages by the recipient. A typology of the categorization patterns of managers is also developed and used in a statistical experiment aiming to automatically describe and categorize email. Resulting from this experiment, we observe specific linguistic behaviours that characterize each email category. It is also revealed that automatic categorization based on message lexicon is more efficient than non-lexical categorization. At the conclusion of this research, we suggest to enrich the traditional human-computer interaction paradigm with a semiotics of genre in the digital work environments. The study also offers a reflection regarding email membership to a specific genre using the theoretical concepts of hypergenre, genre and sub-genre. The success of the automatic categorization of email according to genre-related facets (the content, the form and the purpose) uncovers valuable insights and perspectives in designing digital work environments with the objective of facilitating the vital performance of textual practices by employees.
Conseil de recherches en sciences humaines du Canada (CRSH), Faculté des études supérieures de l'Université de Montréal

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!