Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Text document classification.

Статті в журналах з теми "Text document classification"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 статей у журналах для дослідження на тему "Text document classification".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Y Baravkar, B. "Automated Text Document Classification Using Predictive Network." International Journal of Scientific Engineering and Research 12, no. 1 (2024): 17–19. https://doi.org/10.70729/se24120142420.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Mr. D Krishna, Erukulla Laasya, A Sowmya Sri, T Ravinder Reddy, and Akhil Sanjoy. "BIOMEDICAL TEXT DOCUMENT CLASSIFICATION." international journal of engineering technology and management sciences 7, no. 3 (2023): 788–92. http://dx.doi.org/10.46647/ijetms.2023.v07i03.121.

Повний текст джерела
Анотація:
Information extraction, retrieval, and text categorization are only a few of the significant research fields covered by "bio medical text classification." This study examines many text categorization techniques utilised in practise, as well as their strengths and weaknesses, in order to improve knowledge of various information extraction opportunities in the field of data mining. We compiled a dataset with a focus on three categories: "Thyroid Cancer," "Lung Cancer," and "Colon Cancer." This paper presents an empirical study of a classifier. The investigation was carried out using biomedical l
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Mukherjee, Indrajit, Prabhat Kumar Mahanti, Vandana Bhattacharya, and Samudra Banerjee. "Text classification using document-document semantic similarity." International Journal of Web Science 2, no. 1/2 (2013): 1. http://dx.doi.org/10.1504/ijws.2013.056572.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Wu, Tiandeng, Qijiong Liu, Yi Cao, Yao Huang, Xiao-Ming Wu, and Jiandong Ding. "Continual Graph Convolutional Network for Text Classification." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (2023): 13754–62. http://dx.doi.org/10.1609/aaai.v37i11.26611.

Повний текст джерела
Анотація:
Graph convolutional network (GCN) has been successfully applied to capture global non-consecutive and long-distance semantic information for text classification. However, while GCN-based methods have shown promising results in offline evaluations, they commonly follow a seen-token-seen-document paradigm by constructing a fixed document-token graph and cannot make inferences on new documents. It is a challenge to deploy them in online systems to infer steaming text data. In this work, we present a continual GCN model (ContGCN) to generalize inferences from observed documents to unobserved docum
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Katta, Divya. "Multilingual Text Document Clustering and Classification." International Journal for Research in Applied Science and Engineering Technology 13, no. 7 (2025): 1009–14. https://doi.org/10.22214/ijraset.2025.73119.

Повний текст джерела
Анотація:
The increasing volume of digital content in multiple languages has created a strong need for intelligent systems that can organize and retrieve multilingual documents efficiently. This project introduces a comprehensive pipeline for clustering and semantic search of multilingual text documents, supporting English, Hindi, and Telugu. The system begins by accepting PDF documents and identifying their language using the langdetect library. This is followed by language-specific preprocessing, including Unicode normalization, sentence tokenization, punctuation removal, stopword elimination, and lem
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Yao, Liang, Chengsheng Mao, and Yuan Luo. "Graph Convolutional Networks for Text Classification." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 7370–77. http://dx.doi.org/10.1609/aaai.v33i01.33017370.

Повний текст джерела
Анотація:
Text classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolution on regular grid, e.g., sequence) to classification. However, only a limited number of studies have explored the more flexible graph convolutional neural networks (convolution on non-grid, e.g., arbitrary graph) for the task. In this work, we propose to use graph convolutional networks for text classification. We build a single text graph for a corpus based on word co-occurrence and document word relations, then lea
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Mohammed, Ali Sura I., Marwah Nihad, Sharaf Hussien Mohamed, and Haitham Farouk. "Machine learning for text document classification-efficient classification approach." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 1 (2024): 703–10. https://doi.org/10.11591/ijai.v13.i1.pp703-710.

Повний текст джерела
Анотація:
Numerous alternative methods for text classification have been created because of the increase in the amount of online text information available. The cosine similarity classifier is the most extensively utilized simple and efficient approach. It improves text classification performance. It is combined with estimated values provided by conventional classifiers such as Multinomial Naive Bayesian (MNB). Consequently, combining the similarity between a test document and a category with the estimated value for the category enhances the performance of the classifier. This approach provides a text d
Стилі APA, Harvard, Vancouver, ISO та ін.
8

K, Dinesh Balaji. "SMART DOCUMENT COMPANION - TEXT DATA CLASSIFICATION IN DOCUMENTS USING AI." International Research Journal of Education and Technology 6, no. 11 (2024): 2041–46. https://doi.org/10.70127/irjedt.vol.7.issue03.2046.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Mohammed Ali, Sura I., Marwah Nihad, Hussien Mohamed Sharaf, and Haitham Farouk. "Machine learning for text document classification-efficient classification approach." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 1 (2024): 703. http://dx.doi.org/10.11591/ijai.v13.i1.pp703-710.

Повний текст джерела
Анотація:
<p>Numerous alternative methods for text classification have been created because of the increase in the amount of online text information available. The cosine similarity classifier is the most extensively utilized simple and efficient approach. It improves text classification performance. It is combined with estimated values provided by conventional classifiers such as Multinomial Naive Bayesian (MNB). Consequently, combining the similarity between a test document and a category with the estimated value for the category enhances the performance of the classifier. This approach provides
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Cheng, Betty Yee Man, Jaime G. Carbonell, and Judith Klein-Seetharaman. "Protein classification based on text document classification techniques." Proteins: Structure, Function, and Bioinformatics 58, no. 4 (2005): 955–70. http://dx.doi.org/10.1002/prot.20373.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
11

Anna, Fay E. Naïve, and B. Barbosa Jocelyn. "Efficient Accreditation Document Classification Using Naïve Bayes Classifier." Indian Journal of Science and Technology 15, no. 1 (2022): 9–18. https://doi.org/10.17485/IJST/v15i1.1761.

Повний текст джерела
Анотація:
ABSTRACT <strong>Objectives:</strong>&nbsp;To develop a desktop application that automatically classifies a document as to which area of accreditation documents it should belong to. Specifically, it aims to: a) To create a predictive model that addresses document classification tasks. b) To design and develop an application that classifies documents according to document classification. c) To evaluate the performance measures of the automatic document classification.&nbsp;<strong>Methods:</strong>&nbsp;We introduce an innovative approach for the automatic classification of accreditation docume
Стилі APA, Harvard, Vancouver, ISO та ін.
12

Kim, Jiyun, and Han-joon Kim. "Multidimensional Text Warehousing for Automated Text Classification." Journal of Information Technology Research 11, no. 2 (2018): 168–83. http://dx.doi.org/10.4018/jitr.2018040110.

Повний текст джерела
Анотація:
This article describes how, in the era of big data, a data warehouse is an integrated multidimensional database that provides the basis for the decision making required to establish crucial business strategies. Efficient, effective analysis requires a data organization system that integrates and manages data of various dimensions. However, conventional data warehousing techniques do not consider the various data manipulation operations required for data-mining activities. With the current explosion of text data, much research has examined text (or document) repositories to support text mining
Стилі APA, Harvard, Vancouver, ISO та ін.
13

Muhaimin, Amri, Tresna Maulana Fahrudin, Trimono, Prismahardi Aji Riyantoko, and Kartika Maulida Hindrayani. "Metric Comparison For Text Classification." Internasional Journal of Data Science, Engineering, and Anaylitics 2, no. 1 (2022): 86–90. http://dx.doi.org/10.33005/ijdasea.v2i1.34.

Повний текст джерела
Анотація:
Text classifications have been popular in recent years. To classify the text, the first step that needs to be done is to convert the text into some value. Some values that can be used, such as Term Frequencies, Inverse Document Frequencies, Term Frequencies – Inverse Document Frequencies, and Frequency of the word itself. This study aims to get which metric value is best in text classification. The method used is Naïve Bayes, Logistic Regression, and Random Forest. The evaluation score that is used is accuracy and Area Under Curve value. It comes out that some metric values produce similar eva
Стилі APA, Harvard, Vancouver, ISO та ін.
14

P, Ashokkumar, Siva Shankar G, Gautam Srivastava, Praveen Kumar Reddy Maddikunta, and Thippa Reddy Gadekallu. "A Two-stage Text Feature Selection Algorithm for Improving Text Classification." ACM Transactions on Asian and Low-Resource Language Information Processing 20, no. 3 (2021): 1–19. http://dx.doi.org/10.1145/3425781.

Повний текст джерела
Анотація:
As the number of digital text documents increases on a daily basis, the classification of text is becoming a challenging task. Each text document consists of a large number of words (or features) that drive down the efficiency of a classification algorithm. This article presents an optimized feature selection algorithm designed to reduce a large number of features to improve the accuracy of the text classification algorithm. The proposed algorithm uses noun-based filtering, a word ranking that enhances the performance of the text classification algorithm. Experiments are carried out on three b
Стилі APA, Harvard, Vancouver, ISO та ін.
15

Zheng, Jianming, Yupu Guo, Chong Feng, and Honghui Chen. "A Hierarchical Neural-Network-Based Document Representation Approach for Text Classification." Mathematical Problems in Engineering 2018 (2018): 1–10. http://dx.doi.org/10.1155/2018/7987691.

Повний текст джерела
Анотація:
Document representation is widely used in practical application, for example, sentiment classification, text retrieval, and text classification. Previous work is mainly based on the statistics and the neural networks, which suffer from data sparsity and model interpretability, respectively. In this paper, we propose a general framework for document representation with a hierarchical architecture. In particular, we incorporate the hierarchical architecture into three traditional neural-network models for document representation, resulting in three hierarchical neural representation models for d
Стилі APA, Harvard, Vancouver, ISO та ін.
16

Kiran, V. Gaidhane* Prof. L. H. Patil Prof. C. U. Chouhan. "AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION." INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY 5, no. 7 (2016): 1137–41. https://doi.org/10.5281/zenodo.58632.

Повний текст джерела
Анотація:
Nowadays, in many text mining applications, information is present in the form of text documents. Text document contains various types of information such as side information or metadata. The different types of information such as document provenance information, title of the document, links in the document, user-access behavior from web logs, or other non-textual attributes treated as side information contained into the text document. Such attributes contains a large amount of information for clustering purposes. It is difficult to estimate the importance of this side-information when text do
Стилі APA, Harvard, Vancouver, ISO та ін.
17

Lee, Kangwook, Sanggyu Han, and Sung-Hyon Myaeng. "A discourse-aware neural network-based text model for document-level text classification." Journal of Information Science 44, no. 6 (2017): 715–35. http://dx.doi.org/10.1177/0165551517743644.

Повний текст джерела
Анотація:
Capturing semantics scattered across entire text is one of the important issues for Natural Language Processing (NLP) tasks. It would be particularly critical with long text embodying a flow of themes. This article proposes a new text modelling method that can handle thematic flows of text with Deep Neural Networks (DNNs) in such a way that discourse information and distributed representations of text are incorporate. Unlike previous DNN-based document models, the proposed model enables discourse-aware analysis of text and composition of sentence-level distributed representations guided by the
Стилі APA, Harvard, Vancouver, ISO та ін.
18

Kumari, Lalitha, and Ch Satyanarayana. "An novel cluster based feature selection and document classification model on high dimension trec data." International Journal of Engineering & Technology 7, no. 1.1 (2017): 466. http://dx.doi.org/10.14419/ijet.v7i1.1.10146.

Повний текст джерела
Анотація:
TREC text documents are complex to analyze the features its relevant similar documents using the traditional document similarity measures. As the size of the TREC repository is increasing, finding relevant clustered documents from a large collection of unstructured documents is a challenging task. Traditional document similarity and classification models are implemented on homogeneous TREC data to find essential features for document entities that are similar to the TREC documents. Also, most of the traditional models are applicable to limited text document sets for text analysis. The main iss
Стилі APA, Harvard, Vancouver, ISO та ін.
19

Rahamat, Basha S., Rani J. Keziya, and Yadav J. J. C. Prasad. "A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification Accuracy." Engineering, Technology & Applied Science Research 9, no. 6 (2019): 5001–5. https://doi.org/10.5281/zenodo.3566535.

Повний текст джерела
Анотація:
Automatic summarization is the process of shortening one (in single document summarization) or multiple documents (in multi-document summarization). In this paper, a new feature selection method for the nearest neighbor classifier by summarizing the original training documents based on sentence importance measure is proposed. Our approach for single document summarization uses two measures for sentence similarity: the frequency of the terms in one sentence and the similarity of that sentence to other sentences. All sentences were ranked accordingly and the sentences with top ranks (with a thre
Стилі APA, Harvard, Vancouver, ISO та ін.
20

Nakajima, Hiromu, and Minoru Sasaki. "Text Classification Based on the Heterogeneous Graph Considering the Relationships between Documents." Big Data and Cognitive Computing 7, no. 4 (2023): 181. http://dx.doi.org/10.3390/bdcc7040181.

Повний текст джерела
Анотація:
Text classification is the task of estimating the genre of a document based on information such as word co-occurrence and frequency of occurrence. Text classification has been studied by various approaches. In this study, we focused on text classification using graph structure data. Conventional graph-based methods express relationships between words and relationships between words and documents as weights between nodes. Then, a graph neural network is used for learning. However, there is a problem that conventional methods are not able to represent the relationship between documents on the gr
Стилі APA, Harvard, Vancouver, ISO та ін.
21

Seifert, Christin, Eva Ulbrich, Roman Kern, and Michael Granitzer. "Text Representation for Efficient Document Annotation." JUCS - Journal of Universal Computer Science 19, no. (3) (2013): 383–405. https://doi.org/10.3217/jucs-019-03-0383.

Повний текст джерела
Анотація:
In text classification the amount and quality of training data is crucial for the performance of the classifier. The generation of training data is done by human labellers - a tedious and time-consuming work. To reduce the labelling time for single documents we propose to use condensed representations of text documents instead of the full-text document. These condensed representations are key sentences and key phrases and can be generated in a fully unsupervised way. We extended and evaluated the TextRank algorithm to automatically extract key sentences and key phrases. For representing key ph
Стилі APA, Harvard, Vancouver, ISO та ін.
22

Uddin, Farid, Yibo Chen, Zuping Zhang, and Xin Huang. "Corpus Statistics Empowered Document Classification." Electronics 11, no. 14 (2022): 2168. http://dx.doi.org/10.3390/electronics11142168.

Повний текст джерела
Анотація:
In natural language processing (NLP), document classification is an important task that relies on the proper thematic representation of the documents. Gaussian mixture-based clustering is widespread for capturing rich thematic semantics but ignores emphasizing potential terms in the corpus. Moreover, the soft clustering approach causes long-tail noise by putting every word into every cluster, which affects the natural thematic representation of documents and their proper classification. It is more challenging to capture semantic insights when dealing with short-length documents where word co-o
Стилі APA, Harvard, Vancouver, ISO та ін.
23

Wang, Bohan, Rui Qi, Jinhua Gao, Jianwei Zhang, Xiaoguang Yuan, and Wenjun Ke. "Mining the Frequent Patterns of Named Entities for Long Document Classification." Applied Sciences 12, no. 5 (2022): 2544. http://dx.doi.org/10.3390/app12052544.

Повний текст джерела
Анотація:
Nowadays, a large amount of information is stored as text, and numerous text mining techniques have been developed for various applications, such as event detection, news topic classification, public opinion detection, and sentiment analysis. Although significant progress has been achieved for short text classification, document-level text classification requires further exploration. Long documents always contain irrelevant noisy information that shelters the prominence of indicative features, limiting the interpretability of classification results. To alleviate this problem, a model called MI
Стилі APA, Harvard, Vancouver, ISO та ін.
24

Rahamat Basha, S., J. Keziya Rani, and J. J. C. Prasad Yadav. "A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification Accuracy." Engineering, Technology & Applied Science Research 9, no. 6 (2019): 5001–5. http://dx.doi.org/10.48084/etasr.3173.

Повний текст джерела
Анотація:
Automatic summarization is the process of shortening one (in single document summarization) or multiple documents (in multi-document summarization). In this paper, a new feature selection method for the nearest neighbor classifier by summarizing the original training documents based on sentence importance measure is proposed. Our approach for single document summarization uses two measures for sentence similarity: the frequency of the terms in one sentence and the similarity of that sentence to other sentences. All sentences were ranked accordingly and the sentences with top ranks (with a thre
Стилі APA, Harvard, Vancouver, ISO та ін.
25

Chouhan, Khushi Udaysingh, Nikita Pradeep Kumar Jha, Roshni Sanjay Jha, Shaikh Insha Kamaluddin, and Dr Sujata Khedkar. "Legal Document Analysis." International Journal for Research in Applied Science and Engineering Technology 11, no. 4 (2023): 548–57. http://dx.doi.org/10.22214/ijraset.2023.50123.

Повний текст джерела
Анотація:
Abstract: Text preprocessing is the most essential and foremost step for any Machine Learning model. The raw data needs to be cleaned and pre-processed to get better performance. It is the method to clean the data and makes it ready to feed the data to the model. Text classification is the heart of many software systems that involve text documents processing. The purpose of text classification is to classify the text documents automatically into two or many defined categories. In this paper ,various preprocessing and classification approaches are used such as NLP, Machine Learning, etc from pa
Стилі APA, Harvard, Vancouver, ISO та ін.
26

Ranjan, Nihar M., and Rajesh S. Prasad. "A Brief Survey of Text Document Classification Algorithms and Processes." Journal of Data Mining and Management 8, no. 1 (2023): 6–11. http://dx.doi.org/10.46610/jodmm.2023.v08i01.002.

Повний текст джерела
Анотація:
The exponential growth of unstructured data is one of the most critical challenges in data mining, text analytics, or data analytics. Around 80% of the world's data are available in unstructured format and most are left unattended due to the complexity of its analysis. It is a great challenge to guarantee the quality of the text document classifier that classifies documents based on user preferences because of large-scale terms and data patterns. The World Wide Web is growing rapidly and the availability of electronic documents is also increasing. Therefore, the automatic categorization of doc
Стилі APA, Harvard, Vancouver, ISO та ін.
27

Youngseok, Lee, and Cho Jungwon. "Web document classification using topic modeling based document ranking." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 3 (2021): 2386–92. https://doi.org/10.11591/ijece.v11i3.pp2386-2392.

Повний текст джерела
Анотація:
In this paper, we propose a web document ranking method using topic modeling for effective information collection and classification. The proposed method is applied to the document ranking technique to avoid duplicated crawling when crawling at high speed. Through the proposed document ranking technique, it is feasible to remove redundant documents, classify the documents efficiently, and confirm that the crawler service is running. The proposed method enables rapid collection of many web documents; the user can search the web pages with constant data update efficiently. In addition, the effic
Стилі APA, Harvard, Vancouver, ISO та ін.
28

Srikanth, Bethu* B. Sankara Babu. "DATA MINING AND TEXT MINING: EFFICIENT TEXT CLASSIFICATION USING SVMS FOR LARGE DATASETS." Global Journal of Engineering Science and Research Management 3, no. 8 (2016): 47–56. https://doi.org/10.5281/zenodo.60657.

Повний текст джерела
Анотація:
The Text mining and Data mining supports different kinds of algorithms for classification of large data sets. The Text Categorization is traditionally done by using the Term Frequency and Inverse Document Frequency. This method does not satisfy elimination of unimportant words in the document. For reducing the error classifying of documents in wrong category, efficient classification algorithms are needed. Support Vector Machines (SVM) is used based on the large margin data sets for classification algorithms that give good generalization, compactness and performance. Support Vector Machines (S
Стилі APA, Harvard, Vancouver, ISO та ін.
29

Hong, Jiwon, Dongho Jeong, and Sang-Wook Kim. "Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences." Applied Sciences 12, no. 8 (2022): 4088. http://dx.doi.org/10.3390/app12084088.

Повний текст джерела
Анотація:
Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a given document is malicious. We extracted plaintext features from the corpus of electronic documents and utilized them to train a classification model for detecting malicious documents. Our extensive experimental results with different combinations of three well-known vectorization strategies and three
Стилі APA, Harvard, Vancouver, ISO та ін.
30

D Krishna, Erukulla Laasya, A Sowmya Sri, T Ravinder Reddy, and Akhil Sanjoy. "A SURVEY ON BIOMEDICAL TEXT DOCUMENT CLASSIFICATION." international journal of engineering technology and management sciences 6, no. 6 (2022): 503–8. http://dx.doi.org/10.46647/ijetms.2022.v06i06.086.

Повний текст джерела
Анотація:
Information extraction, information retrieval, and text classification are only a few of the important study areas that fall under the heading of "bio medical text classification." In order to increase understanding of various information extraction opportunities in the field of data mining, this study analyses several text categorization approaches used in practise, their strengths and shortcomings. We have gathered a dataset with a strong emphasis on three categories, including "Thyroid Cancer," "Lung Cancer," and "Colon Cancer." This essay offers an empirical investigation of a classifier.
Стилі APA, Harvard, Vancouver, ISO та ін.
31

Rahamat Basha, S., and J. K. Rani. "A Comparative Approach of Dimensionality Reduction Techniques in Text Classification." Engineering, Technology & Applied Science Research 9, no. 6 (2019): 4974–79. http://dx.doi.org/10.48084/etasr.3146.

Повний текст джерела
Анотація:
This work deals with document classification. It is a supervised learning method (it needs a labeled document set for training and a test set of documents to be classified). The procedure of document categorization includes a sequence of steps consisting of text preprocessing, feature extraction, and classification. In this work, a self-made data set was used to train the classifiers in every experiment. This work compares the accuracy, average precision, precision, and recall with or without combinations of some feature selection techniques and two classifiers (KNN and Naive Bayes). The resul
Стилі APA, Harvard, Vancouver, ISO та ін.
32

Rahamat, Basha S., and J. K. Rani. "A Comparative Approach of Dimensionality Reduction Techniques in Text Classification." Engineering, Technology & Applied Science Research 9, no. 6 (2019): 4974–79. https://doi.org/10.5281/zenodo.3566201.

Повний текст джерела
Анотація:
This work deals with document classification. It is a supervised learning method (it needs a labeled document set for training and a test set of documents to be classified). The procedure of document categorization includes a sequence of steps consisting of text preprocessing, feature extraction, and classification. In this work, a self-made data set was used to train the classifiers in every experiment. This work compares the accuracy, average precision, precision, and recall with or without combinations of some feature selection techniques and two classifiers (KNN and Naive Bayes). The resul
Стилі APA, Harvard, Vancouver, ISO та ін.
33

Anne, Chaitanya, Avdesh Mishra, Md Tamjidul Hoque, and Shengru Tu. "Multiclass patent document classification." Artificial Intelligence Research 7, no. 1 (2017): 1. http://dx.doi.org/10.5430/air.v7n1p1.

Повний текст джерела
Анотація:
Text classification is used in information extraction and retrieval from a given text, and text classification has been considered as an important step to manage a vast number of records given in digital form that is far-reaching and expanding. This article addresses patent document classification problem into fifteen different categories or classes, where some classes overlap with each other for practical reasons. For the development of the classification model using machine learning techniques, useful features have been extracted from the given documents. The features are used to classify pa
Стилі APA, Harvard, Vancouver, ISO та ін.
34

Golub, Koraljka. "Automatic Subject Indexing of Text." KNOWLEDGE ORGANIZATION 46, no. 2 (2019): 104–21. http://dx.doi.org/10.5771/0943-7444-2019-2-104.

Повний текст джерела
Анотація:
Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collections, and enhance consistency of the metadata. In this work, automatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are discussed, in terms of their similarities and d
Стилі APA, Harvard, Vancouver, ISO та ін.
35

M.Shaikh, Mustafa, Ashwini A. Pawar, and Vibha B. Lahane. "Pattern Discovery Text Mining for Document Classification." International Journal of Computer Applications 117, no. 1 (2015): 6–12. http://dx.doi.org/10.5120/20516-2101.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
36

Elhadad, Mohamed, Khaled Badran, and Gouda Salama. "Towards Ontology-Based web text Document Classification." Journal of Engineering Science and Military Technologies 17, no. 17 (2017): 1–8. http://dx.doi.org/10.21608/ejmtc.2017.21564.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
37

Sinoara, Roberta A., Jose Camacho-Collados, Rafael G. Rossi, Roberto Navigli, and Solange O. Rezende. "Knowledge-enhanced document embeddings for text classification." Knowledge-Based Systems 163 (January 2019): 955–71. http://dx.doi.org/10.1016/j.knosys.2018.10.026.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
38

Elhadad, Mohamed, Khaled Badran, and Gouda Salama. "Towards Ontology-Based web text Document Classification." International Conference on Aerospace Sciences and Aviation Technology 17, AEROSPACE SCIENCES (2017): 1–8. http://dx.doi.org/10.21608/asat.2017.22749.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
39

Smith, Dan, and Richard Harvey. "Document Retrieval Using SIFT Image Features." JUCS - Journal of Universal Computer Science 17, no. (1) (2011): 3–15. https://doi.org/10.3217/jucs-017-01-0003.

Повний текст джерела
Анотація:
This paper describes a new approach to document classification based on visual features alone. Text-based retrieval systems perform poorly on noisy text. We have conducted series of experiments using cosine distance as our similarity measure, selecting varying numbers local interest points per page, and varying numbers of nearest neighbour points in the similarity calculations. We have found that a distance-based measure of similarity outperforms a rank-based measure except when there are few interest points. We show that using visual features substantially outperforms textbased approaches for
Стилі APA, Harvard, Vancouver, ISO та ін.
40

Endalie, Demeke, Getamesay Haile, and Wondmagegn Taye Abebe. "Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification." PeerJ Computer Science 8 (April 25, 2022): e961. http://dx.doi.org/10.7717/peerj-cs.961.

Повний текст джерела
Анотація:
Text classification is the process of categorizing documents based on their content into a predefined set of categories. Text classification algorithms typically represent documents as collections of words and it deals with a large number of features. The selection of appropriate features becomes important when the initial feature set is quite large. In this paper, we present a hybrid of document frequency (DF) and genetic algorithm (GA)-based feature selection method for Amharic text classification. We evaluate this feature selection method on Amharic news documents obtained from the Ethiopia
Стилі APA, Harvard, Vancouver, ISO та ін.
41

Ernawati, Iin. "NAIVE BAYES CLASSIFIER DAN SUPPORT VECTOR MACHINE SEBAGAI ALTERNATIF SOLUSI UNTUK TEXT MINING." Jurnal Teknologi Informasi dan Pendidikan 12, no. 2 (2019): 32–38. http://dx.doi.org/10.24036/tip.v12i2.219.

Повний текст джерела
Анотація:
This study was conducted to text-based data mining or often called text mining, classification methods commonly used method Naïve bayes classifier (NBC) and support vector machine (SVM). This classification is emphasized for Indonesian language documents, while the relationship between documents is measured by the probability that can be proven with other classification algorithms. This evident from the conclusion that the probability result Naïve Bayes Classifier (NBC) word “party” at least in the economic document and political. Then the result of the algorithm support vector machine (svm) w
Стилі APA, Harvard, Vancouver, ISO та ін.
42

Aubaid, Asmaa M., and Alok Mishra. "A Rule-Based Approach to Embedding Techniques for Text Document Classification." Applied Sciences 10, no. 11 (2020): 4009. http://dx.doi.org/10.3390/app10114009.

Повний текст джерела
Анотація:
With the growth of online information and sudden expansion in the number of electronic documents provided on websites and in electronic libraries, there is difficulty in categorizing text documents. Therefore, a rule-based approach is a solution to this problem; the purpose of this study is to classify documents by using a rule-based. This paper deals with the rule-based approach with the embedding technique for a document to vector (doc2vec) files. An experiment was performed on two data sets Reuters-21578 and the 20 Newsgroups to classify the top ten categories of these data sets by using a
Стилі APA, Harvard, Vancouver, ISO та ін.
43

Ni'mah, Ana Tsalitsatun, and Fahmi Syuhada. "Term Weighting Based Indexing Class and Indexing Short Document for Indonesian Thesis Title Classification." Journal of Computer Science and Informatics Engineering (J-Cosine) 6, no. 2 (2022): 167–75. http://dx.doi.org/10.29303/jcosine.v6i2.471.

Повний текст джерела
Анотація:
Document classification nowadays is an easy thing to do because there are the latest methods to get maximum results. Document classification using the term weighting TF-IDF-ICF method has been widely studied. Documents used in this research generally use large documents. If the term weighting TF-IDF method is used in a short text document such as the Thesis Title, the document will not get a perfect score from the classification results. Because in the IDF will calculate the weight of words that always appear to be few, ICF will calculate the weight of words that often appear in the class to b
Стилі APA, Harvard, Vancouver, ISO та ін.
44

Chiraratanasopha, Boonthida, Thanaruk Theeramunkong, and Salin Boonbrahm. "Hierarchical text classification using Relative Inverse Document Frequency." ECTI Transactions on Computer and Information Technology (ECTI-CIT) 15, no. 2 (2021): 166–76. http://dx.doi.org/10.37936/ecti-cit.2021152.240515.

Повний текст джерела
Анотація:
Automatic hierarchical text classification has been a challenging and in-needed task with an increasing of hierarchical taxonomy from the booming of knowledge organization. The hierarchical structure identifies the relationships of dependence between different categories in which can be overlapped of generalized and specific concepts within the tree. This paper presents the use of frequency of the occurring term in related categories among the hierarchical tree to help in document classification. The four extended term weighting of Relative Inverse Document Frequency (IDFr) including its locat
Стилі APA, Harvard, Vancouver, ISO та ін.
45

N J, Avinash, Krishnaraj Rao, Rama Moorthy H, et al. "A Novel Automatic Text Document Classification Using Learning based Text Classification(LbTC) Approach." Procedia Computer Science 258 (2025): 4279–90. https://doi.org/10.1016/j.procs.2025.04.677.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
46

Fragoso, Rogerio C. P., Roberto H. W. Pinheiro, and George Cavalcanti. "Data-driven Feature Selection Methods for Text Classification: an Empirical Evaluation." JUCS - Journal of Universal Computer Science 25, no. (4) (2019): 334–60. https://doi.org/10.3217/jucs-025-04-0334.

Повний текст джерела
Анотація:
Dimensionality reduction is a crucial task in text classification. The most adopted strategy is feature selection using filter methods. This approach presents a difficulty in determining the best size for the final feature vector. At Least One FeaTure (ALOFT), Maximum f Features per Document (MFD), Maximum f Features per Document-Reduced (MFDR) and Class-dependent Maximum f Features per Document-Reduced (cMFDR) are feature selection methods that define automatically the number of features per Corpus. However, MFD, MFDR, and cMFDR require a parameter that defines the number of features to be se
Стилі APA, Harvard, Vancouver, ISO та ін.
47

Idrush, G. Mahammad. "Offensive Language and Image Identification on Social Media Based on Text and Image Classification." International Journal for Research in Applied Science and Engineering Technology 13, no. 5 (2025): 2148–52. https://doi.org/10.22214/ijraset.2025.70351.

Повний текст джерела
Анотація:
A digital signature is like a digital version of a handwritten signature but much more secure. It ensures that digital documents are authentic, unaltered, and genuinely from the sender. Our project, Digital Signature Tool, focuses on creating an easy-to-use application for securely signing and verifying documents. Using advanced cryptographic methods like RSA or ECDSA, the tool allows users to generate and manage private and public keys securely. To sign a document, the sender uses their private key to create a unique digital signature, while the receiver uses the sender’s public key to verify
Стилі APA, Harvard, Vancouver, ISO та ін.
48

Parsafard, Pouyan, Hadi Veisi, Niloofar Aflaki, and Siamak Mirzaei. "Text Classification based on DiscriminativeSemantic Features and Variance of Fuzzy Similarity." International Journal of Intelligent Systems and Applications 14, no. 2 (2022): 26–39. http://dx.doi.org/10.5815/ijisa.2022.02.03.

Повний текст джерела
Анотація:
Due to the rapid growth of the Internet, large amounts of unlabelled textual data are producing daily. Clearly, finding the subject of a text document is a primary source of information in the text processing applications. In this paper, a text classification method is presented and evaluated for Persian and English. The proposed technique utilizes variance of fuzzy similarity besides discriminative and semantic feature selection methods. Discriminative features are those that distinguish categories with higher power and the concept of semantic feature takes into the calculations the similarit
Стилі APA, Harvard, Vancouver, ISO та ін.
49

Jia, Longjia, and Bangzuo Zhang. "A new document representation based on global policy for supervised term weighting schemes in text categorization." Mathematical Biosciences and Engineering 19, no. 5 (2022): 5223–40. http://dx.doi.org/10.3934/mbe.2022245.

Повний текст джерела
Анотація:
&lt;abstract&gt; &lt;p&gt;There are two main factors involved in documents classification, document representation method and classification algorithm. In this study, we focus on document representation method and demonstrate that the choice of representation methods has impacts on quality of classification results. We propose a document representation strategy for supervised text classification named document representation based on global policy (&lt;italic&gt;DRGP&lt;/italic&gt;), which can obtain an appropriate document representation according to the distribution of terms. The main idea o
Стилі APA, Harvard, Vancouver, ISO та ін.
50

Choo, Wou Onn, Lam Hong Lee, Yen Pei Tay, Khang Wen Goh, Dino Isa, and Suliman Mohamed Fati. "Automatic Folder Allocation System for Electronic Text Document Repositories Using Enhanced Bayesian Classification Approach." International Journal of Intelligent Information Technologies 15, no. 2 (2019): 1–19. http://dx.doi.org/10.4018/ijiit.2019040101.

Повний текст джерела
Анотація:
This article proposes a system equipped with the enhanced Bayesian classification techniques to automatically assign folders to store electronic text documents. Despite computer technology advancements in the information age where electronic text files are so pervasive in information exchange, almost every single document created or downloaded from the Internet requires manual classification by the users before being deposited into a folder in a computer. Not only does such a tedious task cause inconvenience to users, the time taken to repeatedly classify and allocate a folder for each text do
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!