Academic literature on the topic 'Multi-label Text Classification'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multi-label Text Classification.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Multi-label Text Classification"

1

Siringoringo, Rimbun, Jamaluddin Jamaluddin, and Resianta Perangin-angin. "TEXT MINING DAN KLASIFIKASI MULTI LABEL MENGGUNAKAN XGBOOST." METHOMIKA Jurnal Manajemen Informatika dan Komputerisasi Akuntansi 6, no. 6 (2022): 234–38. http://dx.doi.org/10.46880/jmika.vol6no2.pp234-238.

Full text
Abstract:
The conventional classification process is applied to find a single criterion or label. The multi-label classification process is more complex because a large number of labels results in more classes. Another aspect that must be considered in multi-label classification is the existence of mutual dependencies between data labels. In traditional binary classification, classification analysis only aims to determine the label in the text, whether positive or negative. This method is sub-optimal because the relationship between labels cannot be determined. To overcome the weaknesses of these traditional methods, multi-label classification is one of the solutions in data labeling. With multi-label text classification, it allows the existence of many labels in a document and there is a semantic correlation between these labels. This research performs multi-label classification on research article texts using the ensemble classifier approach, namely XGBoost. Classification performance evaluation is based on several metrics criteria of confusion matrix, accuracy, and f1 score. Model evaluation is also carried out by comparing the performance of XGBoost with Logistic Regression. The results of the study using the train test split and cross-validation obtained an average accuracy of training and testing for Regression Logistics of 0.81, and an average f1 score of 0.47. The average accuracy for XGBoost is 0.88, and the average f1 score is 0.78. The results show that the XGBoost classifier model can be applied to produce a good classification performance.
APA, Harvard, Vancouver, ISO, and other styles
2

Wu, Tianxiang, and Shuqun Yang. "Contrastive Enhanced Learning for Multi-Label Text Classification." Applied Sciences 14, no. 19 (2024): 8650. http://dx.doi.org/10.3390/app14198650.

Full text
Abstract:
Multi-label text classification (MLTC) aims to assign appropriate labels to each document from a given set. Prior research has acknowledged the significance of label information, but its utilization remains insufficient. Existing approaches often focus on either label correlation or label textual semantics, without fully leveraging the information contained within labels. In this paper, we propose a multi-perspective contrastive model (MPCM) with an attention mechanism to integrate labels and documents, utilizing contrastive methods to enhance label information from both textual semantic and correlation perspectives. Additionally, we introduce techniques for contrastive global representation learning and positive label representation alignment to improve the model’s perception of accurate labels. The experimental results demonstrate that our algorithm achieves superior performance compared to existing methods when evaluated on the AAPD and RCV1-V2 datasets.
APA, Harvard, Vancouver, ISO, and other styles
3

S. Tidake, Vaishali, and Shirish S. Sane. "Multi-label Classification: a survey." International Journal of Engineering & Technology 7, no. 4.19 (2018): 1045. http://dx.doi.org/10.14419/ijet.v7i4.19.28284.

Full text
Abstract:
Wide use of internet generates huge data which needs proper organization leading to text categorization. Earlier it was found that a document describes one category. Soon it was realized that it can describe multiple categories simultaneously. This scenario reveals the use of multi-label classification, a supervised learning approach, which assigns a predefined set of labels to an object by looking at its characteristics. Earlier used in text categorization, but soon it became the choice of researchers for wide applications like marketing, multimedia annotation, bioinformatics. Two most common approaches for multi-label classification are transformation which takes the benefit of existing single label classifiers preceded by converting multi-label data to single label, or an adaptation which designs classifiers which handle multi-label data directly. Another popular approach is ensemble of multiple classifiers taking votes of all. Other approaches are also available namely algorithm independent and algorithm dependent approach. Based on results produced, suitable metric is used for example or label wise evaluation which depends on whether prediction is binary or ranking. Every approach offers benefits and issues like loss of label dependency in transformation, complexity in case of adaptation, improvement in results using ensemble which should be considered during design of underlying application.
APA, Harvard, Vancouver, ISO, and other styles
4

Abdullahi, Adeleke, Noor Azah Samsudin, Mohd Hisyam Abdul Rahim, Shamsul Kamal Ahmad Khalid, and Riswan Efendi. "Multi-label classification approach for Quranic verses labeling." Indonesian Journal of Electrical Engineering and Computer Science 24, no. 1 (2021): 484–90. https://doi.org/10.11591/ijeecs.v24.i1.pp484-490.

Full text
Abstract:
Machine learning involves the task of training systems to be able to make decisions without being explicitly programmed. Important among machine learning tasks is classification involving the process of training machines to make predictions from predefined labels. Classification is broadly categorized into three distinct groups: single-label (SL), multi-class, and multi-label (ML) classification. This research work presents an application of a multi-label classification (MLC) technique in automating Quranic verses labeling. MLC has been gaining attention in recent years. This is due to the increasing amount of works based on real-world classification problems of multi-label data. In traditional classification problems, patterns are associated with a single-label from a set of disjoint labels. However, in MLC, an instance of data is associated with a set of labels. In this paper, three standard MLC methods: binary relevance (BR), classifier chain (CC), and label powerset (LP) algorithms are implemented with four baseline classifiers: support vector machine (SVM), naïve Bayes (NB), k-nearest neighbors (kNN), and J48. The research methodology adopts the multi-label problem transformation (PT) approach. The results are validated using six conventional performance metrics. These include: hamming loss, accuracy, one error, micro-F1, macro-F1, and avg. precision. From the results, the classifiers effectively achieved above 70% accuracy mark. Overall, SVM achieved the best results with CC and LP algorithms.
APA, Harvard, Vancouver, ISO, and other styles
5

Tandon, Kushagri, and Niladri Chatterjee. "Multi-label text classification with an ensemble feature space." Journal of Intelligent & Fuzzy Systems 42, no. 5 (2022): 4425–36. http://dx.doi.org/10.3233/jifs-219232.

Full text
Abstract:
Multi-label text classification aims at assigning more than one class to a given text document, which makes the task more ambiguous and challenging at the same time. The ambiguities come from the fact that often several labels in the prescribed label set are semantically close to each other, making clear demarcation between them difficult. As a consequence, any Machine Learning based approach for developing multi-label classification scheme needs to define its feature space by choosing features beyond linguistic or semi-linguistic features, so that the semantic closeness between the labels is also taken into account. The present work describes a scheme of feature extraction where the training document set and the prescribed label set are intertwined in a novel way to capture the ambiguity in a meaningful way. In particular, experiments were conducted using Topic Modeling and Fuzzy C-Means clustering which aim at measuring the underlying uncertainty using probability and membership based measures, respectively. Several Nonparametric hypothesis tests establish the effectiveness of the features obtained through Fuzzy C-Means clustering in multi-label classification. A new algorithm has been proposed for training the system for multi-label classification using the above set of features.
APA, Harvard, Vancouver, ISO, and other styles
6

Sellah, Smail, and Vincent Hilaire. "Label Clustering for a Novel Problem Transformation in Multi-label Classification." JUCS - Journal of Universal Computer Science 26, no. (1) (2020): 71–88. https://doi.org/10.3897/jucs.2020.005.

Full text
Abstract:
Document classification is a large body of search, many approaches were proposed for single label and multi-label classification. We focus on the multi-label classification more precisely those methods that transformation multi-label classification into single label classification. In this paper, we propose a novel problem transformation that leverage label dependency. We used Reuters-21578 corpus that is among the most used for text categorization and classification research. Results show that our approach improves the document classification at least by 8% regarding one-vs-all classification.
APA, Harvard, Vancouver, ISO, and other styles
7

Maruthupandi, J., and K. Vimala Devi. "Multi-label text classification using optimised feature sets." International Journal of Data Mining, Modelling and Management 9, no. 3 (2017): 237. http://dx.doi.org/10.1504/ijdmmm.2017.086583.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Maruthupandi, J., and K. Vimala Devi. "Multi-label text classification using optimised feature sets." International Journal of Data Mining, Modelling and Management 9, no. 3 (2017): 237. http://dx.doi.org/10.1504/ijdmmm.2017.10007699.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

林, 娜. "Hierarchical Multi-label Text Classification Based on Bert." Advances in Applied Mathematics 13, no. 05 (2024): 2141–47. http://dx.doi.org/10.12677/aam.2024.135202.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Zha, Daochen, and Chenliang Li. "Multi-label dataless text classification with topic modeling." Knowledge and Information Systems 61, no. 1 (2018): 137–60. http://dx.doi.org/10.1007/s10115-018-1280-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Multi-label Text Classification"

1

Wei, Zhihua. "The research on chinese text multi-label classification." Thesis, Lyon 2, 2010. http://www.theses.fr/2010LYO20025/document.

Full text
Abstract:
Text Classification (TC) which is an important field in information technology has many valuable applications. When facing the sea of information resources, the objects of TC are more complicated and diversity. The researches in pursuit of effective and practical TC technology are fairly challenging. More and more researchers regard that multi-label TC is more suited for many applications. This thesis analyses the difficulties and problems in multi-label TC and Chinese text representation based on a mass of algorithms for single-label TC and multi-label TC. Aiming at high dimensionality in feature space, sparse distribution in text representation and poor performance of multi-label classifier, this thesis will bring forward corresponding algorithms from different angles.Focusing on the problem of dimensionality “disaster” when Chinese texts are represented by using n-grams, two-step feature selection algorithm is constructed. The method combines filtering rare features within class and selecting discriminative features across classes. Moreover, the proper value of “n”, the strategy of feature weight and the correlation among features are discussed based on variety of experiments. Some useful conclusions are contributed to the research of n-gram representation in Chinese texts.In a view of the disadvantage in Latent Dirichlet Allocation (LDA) model, that is, arbitrarily revising the variable in smooth process, a new strategy for smoothing based on Tolerance Rough Set (TRS) is put forward. It constructs tolerant class in global vocabulary database firstly and then assigns value for out-of-vocabulary (oov) word in each class according to tolerant class.In order to improve performance of multi-label classifier and degrade computing complexity, a new TC method based on LDA model is applied for Chinese text representation. It extracts topics statistically from texts and then texts are represented by using the topic vector. It shows competitive performance both in English and in Chinese corpus.To enhance the performance of classifiers in multi-label TC, a compound classification framework is raised. It partitions the text space by computing the upper approximation and lower approximation. This algorithm decomposes a multi-label TC problem into several single-label TCs and several multi-label TCs which have less labels than original problem. That is, an unknown text should be classified by single-label classifier when it is partitioned into lower approximation space of some class. Otherwise, it should be classified by corresponding multi-label classifier.An application system TJ-MLWC (Tongji Multi-label Web Classifier) was designed. It could call the result from Search Engines directly and classify these results real-time using improved Naïve Bayes classifier. This makes the browse process more conveniently for users. Users could locate the texts interested immediately according to the class information given by TJ-MLWC<br>La thèse est centrée sur la Classification de texte, domaine en pleine expansion, avec de nombreuses applications actuelles et potentielles. Les apports principaux de la thèse portent sur deux points : Les spécificités du codage et du traitement automatique de la langue chinoise : mots pouvant être composés de un, deux ou trois caractères ; absence de séparation typographique entre les mots ; grand nombre d’ordres possibles entre les mots d’une phrase ; tout ceci aboutissant à des problèmes difficiles d’ambiguïté. La solution du codage en «n-grams »(suite de n=1, ou 2 ou 3 caractères) est particulièrement adaptée à la langue chinoise, car elle est rapide et ne nécessite pas les étapes préalables de reconnaissance des mots à l’aide d’un dictionnaire, ni leur séparation. La classification multi-labels, c'est-à-dire quand chaque individus peut être affecté à une ou plusieurs classes. Dans le cas des textes, on cherche des classes qui correspondent à des thèmes (topics) ; un même texte pouvant être rattaché à un ou plusieurs thème. Cette approche multilabel est plus générale : un même patient peut être atteint de plusieurs pathologies ; une même entreprise peut être active dans plusieurs secteurs industriels ou de services. La thèse analyse ces problèmes et tente de leur apporter des solutions, d’abord pour les classifieurs unilabels, puis multi-labels. Parmi les difficultés, la définition des variables caractérisant les textes, leur grand nombre, le traitement des tableaux creux (beaucoup de zéros dans la matrice croisant les textes et les descripteurs), et les performances relativement mauvaises des classifieurs multi-classes habituels<br>文本分类是信息科学中一个重要而且富有实际应用价值的研究领域。随着文本分类处理内容日趋复杂化和多元化,分类目标也逐渐多样化,研究有效的、切合实际应用需求的文本分类技术成为一个很有挑战性的任务,对多标签分类的研究应运而生。本文在对大量的单标签和多标签文本分类算法进行分析和研究的基础上,针对文本表示中特征高维问题、数据稀疏问题和多标签分类中分类复杂度高而精度低的问题,从不同的角度尝试运用粗糙集理论加以解决,提出了相应的算法,主要包括:针对n-gram作为中文文本特征时带来的维数灾难问题,提出了两步特征选择的方法,即去除类内稀有特征和类间特征选择相结合的方法,并就n-gram作为特征时的n值选取、特征权重的选择和特征相关性等问题在大规模中文语料库上进行了大量的实验,得出一些有用的结论。针对文本分类中运用高维特征表示文本带来的分类效率低,开销大等问题,提出了基于LDA模型的多标签文本分类算法,利用LDA模型提取的主题作为文本特征,构建高效的分类器。在PT3多标签分类转换方法下,该分类算法在中英文数据集上都表现出很好的效果,与目前公认最好的多标签分类方法效果相当。针对LDA模型现有平滑策略的随意性和武断性的缺点,提出了基于容差粗糙集的LDA语言模型平滑策略。该平滑策略首先在全局词表上构造词的容差类,再根据容差类中词的频率为每类文档的未登录词赋予平滑值。在中英文、平衡和不平衡语料库上的大量实验都表明该平滑方法显著提高了LDA模型的分类性能,在不平衡语料库上的提高尤其明显。针对多标签分类中分类复杂度高而精度低的问题,提出了一种基于可变精度粗糙集的复合多标签文本分类框架,该框架通过可变精度粗糙集方法划分文本特征空间,进而将多标签分类问题分解为若干个两类单标签分类问题和若干个标签数减少了的多标签分类问题。即,当一篇未知文本被划分到某一类文本的下近似区域时,可以直接用简单的单标签文本分类器判断其类别;当未知文本被划分在边界域时,则采用相应区域的多标签分类器进行分类。实验表明,这种分类框架下,分类的精确度和算法效率都有较大的提高。本文还设计和实现了一个基于多标签分类的网页搜索结果可视化系统(MLWC),该系统能够直接调用搜索引擎返回的搜索结果,并采用改进的Naïve Bayes多标签分类算法实现实时的搜索结果分类,使用户可以快速地定位搜索结果中感兴趣的文本。
APA, Harvard, Vancouver, ISO, and other styles
2

Burkhardt, Sophie [Verfasser]. "Online Multi-label Text Classification using Topic Models / Sophie Burkhardt." Mainz : Universitätsbibliothek Mainz, 2018. http://d-nb.info/1173911235/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Sendur, Zeynel. "Text Document Categorization by Machine Learning." Scholarly Repository, 2008. http://scholarlyrepository.miami.edu/oa_theses/209.

Full text
Abstract:
Because of the explosion of digital and online text information, automatic organization of documents has become a very important research area. There are mainly two machine learning approaches to enhance the organization task of the digital documents. One of them is the supervised approach, where pre-defined category labels are assigned to documents based on the likelihood suggested by a training set of labeled documents; and the other one is the unsupervised approach, where there is no need for human intervention or labeled documents at any point in the whole process. In this thesis, we concentrate on the supervised learning task which deals with document classification. One of the most important tasks of information retrieval is to induce classifiers capable of categorizing text documents. The same document can belong to two or more categories and this situation is referred by the term multi-label classification. Multi-label classification domains have been encountered in diverse fields. Most of the existing machine learning techniques which are in multi-label classification domains are extremely expensive since the documents are characterized by an extremely large number of features. In this thesis, we are trying to reduce these computational costs by applying different types of algorithms to the documents which are characterized by large number of features. Another important thing that we deal in this thesis is to have the highest possible accuracy when we have the high computational performance on text document categorization.
APA, Harvard, Vancouver, ISO, and other styles
4

Průša, Petr. "Multi-label klasifikace textových dokumentů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-412872.

Full text
Abstract:
The master's thesis deals with automatic classifi cation of text document. It explains basic terms and problems of text mining. The thesis explains term clustering and shows some basic clustering algoritms. The thesis also shows some methods of classi fication and deals with matrix regression closely. Application using matrix regression for classifi cation was designed and developed. Experiments were focused on normalization and thresholding.
APA, Harvard, Vancouver, ISO, and other styles
5

Artmann, Daniel. "Applying machine learning algorithms to multi-label text classification on GitHub issues." Thesis, Högskolan i Halmstad, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-43097.

Full text
Abstract:
This report compares five machine learning algorithms in their ability to categorize code repositories. The focus of expanding software projects tend to shift from developing new software to the maintenance of the projects. Maintainers can label code repositories to organize the project, but this requires manual labor and time. This report will evaluate how machine learning algorithms perform in automatically classifying code repositories. Automatic classification can aid the management process by reducing both manual labor and human errors. GitHub provides online hosting for both private and public code repositories. In these repositories, users can open issues and assign labels to them, to keep track of bugs, enhancement, or requests. GitHub was used as a source for all data as it contains millions of open-source repositories. The focus was on the most popular labels from GitHub - both default labels and those defined by users. This report investigated the algorithms linear regression (LR), convolutional neural network (CNN), recurrent neural network (RNN), random forest (RF), and k-nearest-neighbor (KNN) - in multi-label text classification. The mentioned algorithms were implemented, trained, and tested with the Keras and Scikit-learn libraries. The training sets contained around 38 thousand rows and the test set around 12 thousand rows. Cross-validation was used to measure the performance of each algorithm. The metrics used to obtain the results were precision, recall, and F1-score. The algorithms were empirically tested on a different number of output labels. In order to maximize the F1-score, different designs of the neural networks and different natural language processing (NLP) methods were evaluated. This was done to see if the algorithms could be used to efficiently organize code repositories. CNN displayed the best scores in all experiments, but LR, RNN, and RF also showed some good results. LR, CNN, and RNN the had the highest F1-scores while RF could achieve a particularly high precision. KNN performed much worse than all other algorithms. The highest F1-score of 46.48% was achieved when using a non-sequential CNN model that used text input with stem words. The highest precision of 89.17% was achieved by RF. It was concluded that LR, CNN, RNN, and RF were all viable in classifying labels in software-related texts, among those found in GitHub issues. KNN wasn't found to be a viable candidate for this purpose.
APA, Harvard, Vancouver, ISO, and other styles
6

Dendamrongvit, Sareewan. "Induction in Hierarchical Multi-label Domains with Focus on Text Categorization." Scholarly Repository, 2011. http://scholarlyrepository.miami.edu/oa_dissertations/542.

Full text
Abstract:
Induction of classifiers from sets of preclassified training examples is one of the most popular machine learning tasks. This dissertation focuses on the techniques needed in the field of automated text categorization. Here, each document can be labeled with more than one class, sometimes with many classes. Moreover, the classes are hierarchically organized, the mutual relations being typically expressed in terms of a generalization tree. Both aspects (multi-label classification and hierarchically organized classes) have so far received inadequate attention. Existing literature work largely assumes that it is enough to induce a separate binary classifier for each class, and the question of class hierarchy is rarely addressed. This, however, ignores some serious problems. For one thing, induction of thousands of classifiers from hundreds of thousands of examples described by tens of thousands of features (a common case in automated text categorization) incurs prohibitive computational costs---even a single binary classifier in domains of this kind often takes hours, even days, to induce. For another, the circumstance that the classes are hierarchically organized affects the way we view the classification performance of the induced classifiers. The presented work proposes a technique referred to by the acronym "H-kNN-plus." The technique combines support vector machines and nearest neighbor classifiers with the intention to capitalize on the strengths of both. As for performance evaluation, a variety of measures have been used to evaluate hierarchical classifiers, including the standard non-hierarchical criteria that assign the same weight to different types of error. The author proposes a performance measure that overcomes some of their weaknesses. The dissertation begins with a study of (non-hierarchical) multi-label classification. One of the reasons for the poor performance of earlier techniques is the class-imbalance problem---a small number of positive examples being outnumbered by a great many negative examples. Another difficulty is that each of the classes tends to be characterized by a different set of characteristic features. This means that most of the binary classifiers are induced from examples described by predominantly irrelevant features. Addressing these weaknesses by majority-class undersampling and feature selection, the proposed technique significantly improves the overall classification performance. Even more challenging is the issue of hierarchical classification. Here, the dissertation introduces a new induction mechanism, H-kNN-plus, and subjects it to extensive experiments with two real-world datasets. The results indicate its superiority, in these domains, over earlier work in terms of prediction performance as well as computational costs.
APA, Harvard, Vancouver, ISO, and other styles
7

Rios, Anthony. "Deep Neural Networks for Multi-Label Text Classification: Application to Coding Electronic Medical Records." UKnowledge, 2018. https://uknowledge.uky.edu/cs_etds/71.

Full text
Abstract:
Coding Electronic Medical Records (EMRs) with diagnosis and procedure codes is an essential task for billing, secondary data analyses, and monitoring health trends. Both speed and accuracy of coding are critical. While coding errors could lead to more patient-side financial burden and misinterpretation of a patient’s well-being, timely coding is also needed to avoid backlogs and additional costs for the healthcare facility. Therefore, it is necessary to develop automated diagnosis and procedure code recommendation methods that can be used by professional medical coders. The main difficulty with developing automated EMR coding methods is the nature of the label space. The standardized vocabularies used for medical coding contain over 10 thousand codes. The label space is large, and the label distribution is extremely unbalanced - most codes occur very infrequently, with a few codes occurring several orders of magnitude more than others. A few codes never occur in training dataset at all. In this work, we present three methods to handle the large unbalanced label space. First, we study how to augment EMR training data with biomedical data (research articles indexed on PubMed) to improve the performance of standard neural networks for text classification. PubMed indexes more than 23 million citations. Many of the indexed articles contain relevant information about diagnosis and procedure codes. Therefore, we present a novel method of incorporating this unstructured data in PubMed using transfer learning. Second, we combine ideas from metric learning with recent advances in neural networks to form a novel neural architecture that better handles infrequent codes. And third, we present new methods to predict codes that have never appeared in the training dataset. Overall, our contributions constitute advances in neural multi-label text classification with potential consequences for improving EMR coding.
APA, Harvard, Vancouver, ISO, and other styles
8

Rodríguez, Medina Samuel. "Multi-Label Text Classification with Transfer Learning for Policy Documents : The Case of the Sustainable Development Goals." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-395186.

Full text
Abstract:
We created and analyzed a text classification dataset from freely-available web documents from the United Nation's Sustainable Development Goals. We then used it to train and compare different multi-label text classifiers with the aim of exploring the alternatives for methods that facilitate the search of information of this type of documents. We explored the effectiveness of deep learning and transfer learning in text classification by fine-tuning different pre-trained language representations — Word2Vec, GloVe, ELMo, ULMFiT and BERT. We also compared these approaches against a baseline of more traditional algorithms without using transfer learning. More specifically, we used multinomial Naive Bayes, logistic regression, k-nearest neighbors and Support Vector Machines. We then analyzed the results of our experiments quantitatively and qualitatively. The best results in terms of micro-averaged F1 scores and AUROC are obtained by BERT. However, it is also interesting that the second best classifier in terms of micro-averaged F1 scores is the Support Vector Machines, closely followed by the logistic regression classifier, which both have the advantage of being less computationally expensive than BERT. The results also show a close relation between our dataset size and the effectiveness of the classifiers.
APA, Harvard, Vancouver, ISO, and other styles
9

Borggren, Lukas. "Automatic Categorization of News Articles With Contextualized Language Models." Thesis, Linköpings universitet, Artificiell intelligens och integrerade datorsystem, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177004.

Full text
Abstract:
This thesis investigates how pre-trained contextualized language models can be adapted for multi-label text classification of Swedish news articles. Various classifiers are built on pre-trained BERT and ELECTRA models, exploring global and local classifier approaches. Furthermore, the effects of domain specialization, using additional metadata features and model compression are investigated. Several hundred thousand news articles are gathered to create unlabeled and labeled datasets for pre-training and fine-tuning, respectively. The findings show that a local classifier approach is superior to a global classifier approach and that BERT outperforms ELECTRA significantly. Notably, a baseline classifier built on SVMs yields competitive performance. The effect of further in-domain pre-training varies; ELECTRA’s performance improves while BERT’s is largely unaffected. It is found that utilizing metadata features in combination with text representations improves performance. Both BERT and ELECTRA exhibit robustness to quantization and pruning, allowing model sizes to be cut in half without any performance loss.
APA, Harvard, Vancouver, ISO, and other styles
10

Dalloux, Clément. "Fouille de texte et extraction d'informations dans les données cliniques." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S050.

Full text
Abstract:
Avec la mise en place d'entrepôts de données cliniques, de plus en plus de données de santé sont disponibles pour la recherche. Si une partie importante de ces données existe sous forme structurée, une grande partie des informations contenues dans les dossiers patients informatisés est disponible sous la forme de texte libre qui peut être exploité pour de nombreuses tâches. Dans ce manuscrit, deux tâches sont explorées~: la classification multi-étiquette de textes cliniques et la détection de la négation et de l'incertitude. La première est étudiée en coopération avec le centre hospitalier universitaire de Rennes, propriétaire des textes cliniques que nous exploitons, tandis que, pour la seconde, nous exploitons des textes biomédicaux librement accessibles que nous annotons et diffusons gratuitement. Afin de résoudre ces tâches, nous proposons différentes approches reposant principalement sur des algorithmes d'apprentissage profond, utilisés en situations d'apprentissage supervisé et non-supervisé<br>With the introduction of clinical data warehouses, more and more health data are available for research purposes. While a significant part of these data exist in structured form, much of the information contained in electronic health records is available in free text form that can be used for many tasks. In this manuscript, two tasks are explored: the multi-label classification of clinical texts and the detection of negation and uncertainty. The first is studied in cooperation with the Rennes University Hospital, owner of the clinical texts that we use, while, for the second, we use publicly available biomedical texts that we annotate and release free of charge. In order to solve these tasks, we propose several approaches based mainly on deep learning algorithms, used in supervised and unsupervised learning situations
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Multi-label Text Classification"

1

Hrala, Michal, and Pavel Král. "Multi-label Document Classification in Czech." In Text, Speech, and Dialogue. Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40585-3_44.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Vilar, David, María José Castro, and Emilio Sanchis. "Multi-label Text Classification Using Multinomial Models." In Advances in Natural Language Processing. Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-30228-5_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kubat, Miroslav, Kanoksri Sarinnapakorn, and Sareewan Dendamrongvit. "Induction in Multi-Label Text Classification Domains." In Advances in Machine Learning II. Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-05179-1_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Yoshimura, Kosuke, Tomoaki Iwase, Yukino Baba, and Hisashi Kashima. "Interdependence Model for Multi-label Classification." In Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-30490-4_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Liu, Han, Caixia Yuan, and Xiaojie Wang. "Label-Wise Document Pre-training for Multi-label Text Classification." In Natural Language Processing and Chinese Computing. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60450-9_51.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Chen, Zihao, Yang Liu, Baitai Cheng, and Jing Peng. "Integrating Label Semantic Similarity Scores into Multi-label Text Classification." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-15931-2_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Lehečka, Jan, and Jan Švec. "Improving Multi-label Document Classification of Czech News Articles." In Text, Speech, and Dialogue. Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-24033-6_35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Wenshi, Xinhui Liu, Dongyu Guo, and Mingyu Lu. "Multi-label Text Classification Based on Sequence Model." In Data Mining and Big Data. Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-32-9563-6_21.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Guo, Xiaodong, and Yang Weng. "Deep Dependency Network for Multi-label Text Classification." In Pattern Recognition and Computer Vision. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60636-7_25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Chen, Xiaolong, Jieren Cheng, Zhixin Rong, Wenghang Xu, Shuai Hua, and Zhu Tang. "Multi-label Text Classification Based on Improved Seq2Seq." In Lecture Notes in Electrical Engineering. Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-99-9243-0_43.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Multi-label Text Classification"

1

Yelamanchili, Likhitha, Ching-Seh Mike Wu, Chris Pollett, and Robert Chun. "Multi-Label Text Classification with Transfer Learning." In 2024 IEEE/ACIS 9th International Conference on Big Data, Cloud Computing, and Data Science (BCD). IEEE, 2024. http://dx.doi.org/10.1109/bcd61269.2024.10743077.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Gu, Qiliang, Shuo Zhao, Jianqiang Zhang, Gongpeng Song, and Qin Lu. "MFFLEN: Multi-Label Text Classification Based on Multi-Feature Fusion and Label Embedding." In 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2024. https://doi.org/10.1109/smc54092.2024.10831836.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Gu, Qiliang, and Qin Lu. "Multi-Label Text Classification for Judicial Texts via Dual Graph and Label Feature Fusion." In 2024 IEEE Smart World Congress (SWC). IEEE, 2024. https://doi.org/10.1109/swc62898.2024.00130.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Xu, Pengyu, Liping Jing, and Jian Yu. "Enhancing Multi-Label Text Classification under Label-Dependent Noise: A Label-Specific Denoising Framework." In Findings of the Association for Computational Linguistics: EMNLP 2024. Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.findings-emnlp.324.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Reddy, Veerababu, Usha Rani Uppukonda, and N. Veeranjaneyulu. "Enhancing Multi-label text classification using adaptive promptify concepts." In 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT). IEEE, 2024. http://dx.doi.org/10.1109/icccnt61001.2024.10724953.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

M, Agbozoh Aku Sitsofe, and Kaiwei Sun. "A Semantic-Based Framework for Multi-Label Text Classification." In 2024 2nd International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA). IEEE, 2024. http://dx.doi.org/10.1109/prmvia63497.2024.00010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zhang, Shengnan, and Huiyang Xu. "Multi-label text classification method based on ReBERTa-TextCNN." In Fourth International Conference on Computer Vision, Application, and Algorithm (CVAA 2024), edited by Hui Yuan and Lu Leng. SPIE, 2025. https://doi.org/10.1117/12.3055909.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Pan, Lihu, Xiaohua Li, Zhengkui Wang, Rui Zhang, Nan Yang, and Wen Shan. "Enhancing Multi-Label Text Classification by Incorporating Label Dependency to Handle Imbalanced Data." In 2024 International Joint Conference on Neural Networks (IJCNN). IEEE, 2024. http://dx.doi.org/10.1109/ijcnn60899.2024.10650276.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Li, Xintong, Jinya Jiang, Ria Dharmani, Jayanth Srinivasa, Gaowen Liu, and Jingbo Shang. "Open-world Multi-label Text Classification with Extremely Weak Supervision." In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.emnlp-main.841.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Guo, Hao, Xiangyang Li, Lei Zhang, Jia Liu, and Wei Chen. "Label-Aware Text Representation for Multi-Label Text Classification." In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. http://dx.doi.org/10.1109/icassp39728.2021.9413921.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!