To see the other types of publications on this topic, follow the link: Classification de document.

Journal articles on the topic 'Classification de document'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Classification de document.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Shvetsova-Vodka, G. N. "Document classification as a theoretical problem of documentology." Scientific and Technical Libraries, no. 9 (October 4, 2022): 147–68. http://dx.doi.org/10.33186/1027-3689-2022-9-147-168.

Full text
Abstract:
The changes and amendments to the general document classification as a theoretical problem of document studies and documentology, are discussed. The document in the meaning “recorded information” is the subject of classification. The author examines the faceted-block classification of document based on various characteristics united into six clusters: “Types of documents physical carrier”, “Types of documents by document acquisition circumstances”, “Types of documents by information representational and transfer tools (by signative component)”, “Types of documents by information reception (per
APA, Harvard, Vancouver, ISO, and other styles
2

Chung-Hsing Chen, Chung-Hsing Chen, and Ko-Wei Huang Chung-Hsing Chen. "Document Classification Using Lightweight Neural Network." 網際網路技術學刊 24, no. 7 (2023): 1505–11. http://dx.doi.org/10.53106/160792642023122407012.

Full text
Abstract:
<p>In recent years, OCR data has been used for learning and analyzing document classification. In addition, some neural networks have used image recognition for training, such as the network published by the ImageNet Large Scale Visual Recognition Challenge for document image training, AlexNet, GoogleNet, and MobileNet. Document image classification is important in data extraction processes and often requires significant computing power. Furthermore, it is difficult to implement image classification using general computers without a graphics processing unit (GPU). Therefore, this study p
APA, Harvard, Vancouver, ISO, and other styles
3

HAO, XIAOLONG, JASON T. L. WANG, MICHAEL P. BIEBER, and PETER A. NG. "HEURISTIC CLASSIFICATION OF OFFICE DOCUMENTS." International Journal on Artificial Intelligence Tools 03, no. 02 (1994): 233–65. http://dx.doi.org/10.1142/s0218213094000121.

Full text
Abstract:
Document Processing Systems (DPSs) support office workers to manage information. Document classification is a major function of DPSs. By analyzing a document’s layout and conceptual structures, we present in this paper a sample-based approach to document classification. We represent a document’s layout structure by an ordered labeled tree through a procedure known as nested segmentation and represent the document’s conceptual structure by a set of attribute type pairs. The layout similarities between the document to be classified and sample documents are determined by a previously developed ap
APA, Harvard, Vancouver, ISO, and other styles
4

D’Silva, Suzanne, Neha Joshi, Sudha Rao, Sangeetha Venkatraman, and Seema Shrawne. "Improved Algorithms for Document Classification &Query-based Multi-Document Summarization." International Journal of Engineering and Technology 3, no. 4 (2011): 404–9. http://dx.doi.org/10.7763/ijet.2011.v3.261.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Calvo, Rafael A., and H. A. Ceccatto. "Intelligent document classification." Intelligent Data Analysis 4, no. 5 (2000): 411–20. http://dx.doi.org/10.3233/ida-2000-4503.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Lee, Jae-Moon, and Rafael A. Calvo. "Scalable document classification." Intelligent Data Analysis 9, no. 4 (2005): 365–80. http://dx.doi.org/10.3233/ida-2005-9404.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Jaiswal, Babita. "Automatic Document Classification." DESIDOC Bulletin of Information Technology 19, no. 3 (1999): 23–28. http://dx.doi.org/10.14429/dbit.19.3.3486.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Surovtseva, Nataliya G. "Classification of documents as a theoretical problem in office work and in the archive." Herald of an archivist, no. 3 (2022): 756–71. http://dx.doi.org/10.28995/2073-0101-2022-3-756-771.

Full text
Abstract:
For archival and documentary studies, classification issues are of key importance for the development of the methodology of scientific research, the results of which should be applied in practice. The issues of document classification were raised by scientists and specialists in the field of working with documents throughout the Soviet period. K.G. Mityaev paid a lot of attention to the problem of document classification. He was the first to draw attention to the fact that it is necessary to achieve conjugation in the classification of documents in office work and archive. In archival science
APA, Harvard, Vancouver, ISO, and other styles
9

Jijo Varghese and P. Tamil Selvan. "A Novel Clustering and Matrix Based Computation for Big Data Dimensionality Reduction and Classification." Journal of Advanced Research in Applied Sciences and Engineering Technology 32, no. 1 (2023): 238–51. http://dx.doi.org/10.37934/araset.32.1.238251.

Full text
Abstract:
For higher dimensional or "Big Data (BD)" clustering and classification, the dimensions of documents have to be considered. The overhead of classifying methods might also be reduced by resolving the volumetric issue of documents. However, the dimensions of the shortened collection of documents might potentially generate noise and abnormalities. Previous noise and abnormality information removal strategies include several different approaches that have already been established throughout time. To increase classification accuracy, current classifications or new classification methods that has cr
APA, Harvard, Vancouver, ISO, and other styles
10

Uddin, Farid, Yibo Chen, Zuping Zhang, and Xin Huang. "Corpus Statistics Empowered Document Classification." Electronics 11, no. 14 (2022): 2168. http://dx.doi.org/10.3390/electronics11142168.

Full text
Abstract:
In natural language processing (NLP), document classification is an important task that relies on the proper thematic representation of the documents. Gaussian mixture-based clustering is widespread for capturing rich thematic semantics but ignores emphasizing potential terms in the corpus. Moreover, the soft clustering approach causes long-tail noise by putting every word into every cluster, which affects the natural thematic representation of documents and their proper classification. It is more challenging to capture semantic insights when dealing with short-length documents where word co-o
APA, Harvard, Vancouver, ISO, and other styles
11

Lee, Youngseok, and Jungwon Cho. "Web document classification using topic modeling based document ranking." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 3 (2021): 2386. http://dx.doi.org/10.11591/ijece.v11i3.pp2386-2392.

Full text
Abstract:
In this paper, we propose a web document ranking method using topic modeling for effective information collection and classification. The proposed method is applied to the document ranking technique to avoid duplicated crawling when crawling at high speed. Through the proposed document ranking technique, it is feasible to remove redundant documents, classify the documents efficiently, and confirm that the crawler service is running. The proposed method enables rapid collection of many web documents; the user can search the web pages with constant data update efficiently. In addition, the effic
APA, Harvard, Vancouver, ISO, and other styles
12

Mukherjee, Indrajit, Prabhat Kumar Mahanti, Vandana Bhattacharya, and Samudra Banerjee. "Text classification using document-document semantic similarity." International Journal of Web Science 2, no. 1/2 (2013): 1. http://dx.doi.org/10.1504/ijws.2013.056572.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Bai, Juho, Inwook Shim, and Seog Park. "MEXN: Multi-Stage Extraction Network for Patent Document Classification." Applied Sciences 10, no. 18 (2020): 6229. http://dx.doi.org/10.3390/app10186229.

Full text
Abstract:
The patent document has different content for each paragraph, and the length of the document is also very long. Moreover, patent documents are classified hierarchically as multi-labels. Many works have employed deep neural architectures to classify the patent documents. Traditional document classification methods have not well represented the characteristics of entire patent document contents because they usually require a fixed input length. To address this issue, we propose a neural network-based document classification for patent documents by designing a novel multi-stage feature extraction
APA, Harvard, Vancouver, ISO, and other styles
14

Al-Obaydy, Wasseem N. Ibrahem, Hala A. Hashim, Yassen AbdelKhaleq Najm, and Ahmed Adeeb Jalal. "Document classification using term frequency-inverse document frequency and K-means clustering." Indonesian Journal of Electrical Engineering and Computer Science 27, no. 3 (2022): 1517. http://dx.doi.org/10.11591/ijeecs.v27.i3.pp1517-1524.

Full text
Abstract:
Increased advancement in a variety of study subjects and information technologies, has increased the number of published research articles. However, researchers are facing difficulties and devote a significant time amount in locating scientific research publications relevant to their domain of expertise. In this article, an approach of document classification is presented to cluster the text documents of research articles into expressive groups that encompass a similar scientific field. The main focus and scopes of target groups were adopted in designing the proposed method, each group include
APA, Harvard, Vancouver, ISO, and other styles
15

et al., Nohuddin. "Content analytics based on random forest classification technique: An empirical evaluation using online news dataset." International Journal of ADVANCED AND APPLIED SCIENCES 8, no. 2 (2021): 77–84. http://dx.doi.org/10.21833/ijaas.2021.02.011.

Full text
Abstract:
In this paper, a study is established for exploiting a document classification technique for categorizing a set of random online documents. The technique is aimed to assign one or more classes or categories to a document, making it easier to manage and sort. This paper describes an experiment on the proposed method for classifying documents effectively using the decision tree technique. The proposed research framework is a Document Analysis based on the Random Forest Algorithm (DARFA). The proposed framework consists of 5 components, which are (i) Document dataset, (ii) Data Preprocessing, (ii
APA, Harvard, Vancouver, ISO, and other styles
16

Anne, Chaitanya, Avdesh Mishra, Md Tamjidul Hoque, and Shengru Tu. "Multiclass patent document classification." Artificial Intelligence Research 7, no. 1 (2017): 1. http://dx.doi.org/10.5430/air.v7n1p1.

Full text
Abstract:
Text classification is used in information extraction and retrieval from a given text, and text classification has been considered as an important step to manage a vast number of records given in digital form that is far-reaching and expanding. This article addresses patent document classification problem into fifteen different categories or classes, where some classes overlap with each other for practical reasons. For the development of the classification model using machine learning techniques, useful features have been extracted from the given documents. The features are used to classify pa
APA, Harvard, Vancouver, ISO, and other styles
17

Mao, Yafei, Yufang Sun, Peter Bauer, et al. "Highlighted Document Image Classification." Color and Imaging Conference 29, no. 1 (2021): 154–59. http://dx.doi.org/10.2352/issn.2169-2629.2021.29.154.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Mr. D Krishna, Erukulla Laasya, A Sowmya Sri, T Ravinder Reddy, and Akhil Sanjoy. "BIOMEDICAL TEXT DOCUMENT CLASSIFICATION." international journal of engineering technology and management sciences 7, no. 3 (2023): 788–92. http://dx.doi.org/10.46647/ijetms.2023.v07i03.121.

Full text
Abstract:
Information extraction, retrieval, and text categorization are only a few of the significant research fields covered by "bio medical text classification." This study examines many text categorization techniques utilised in practise, as well as their strengths and weaknesses, in order to improve knowledge of various information extraction opportunities in the field of data mining. We compiled a dataset with a focus on three categories: "Thyroid Cancer," "Lung Cancer," and "Colon Cancer." This paper presents an empirical study of a classifier. The investigation was carried out using biomedical l
APA, Harvard, Vancouver, ISO, and other styles
19

Choi, Gihyeon, Shinhyeok Oh, and Harksoo Kim. "Improving Document-Level Sentiment Classification Using Importance of Sentences." Entropy 22, no. 12 (2020): 1336. http://dx.doi.org/10.3390/e22121336.

Full text
Abstract:
Previous researchers have considered sentiment analysis as a document classification task, in which input documents are classified into predefined sentiment classes. Although there are sentences in a document that support important evidences for sentiment analysis and sentences that do not, they have treated the document as a bag of sentences. In other words, they have not considered the importance of each sentence in the document. To effectively determine polarity of a document, each sentence in the document should be dealt with different degrees of importance. To address this problem, we pro
APA, Harvard, Vancouver, ISO, and other styles
20

Eirund, Helmut, and Klaus Kreplin. "Knowledge based document classification supporting integrated document handling." ACM SIGOIS Bulletin 9, no. 2-3 (1988): 189–96. http://dx.doi.org/10.1145/966861.45430.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

N., SHOBHA RANI. "An Efficient Deep Classification for Malayalam Handwritten Document." Journal of Research on the Lepidoptera 51, no. 2 (2020): 01–12. http://dx.doi.org/10.36872/lepi/v51i2/301074.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Naïve, Anna Fay E., and Jocelyn B. Barbosa. "Efficient Accreditation Document Classification Using Naïve Bayes Classifier." Indian Journal of Science and Technology 15, no. 1 (2022): 9–18. http://dx.doi.org/10.17485/ijst/v15i1.1761.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Yerokhin, A. L., and O. V. Zolotukhin. "Fuzzy probabilistic neural network in document classification tasks." Information extraction and processing 2018, no. 46 (2018): 68–71. http://dx.doi.org/10.15407/vidbir2018.46.068.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Shvetsova-Vodka, H. M. "GENERAL CLASSIFICATION OF THE DOCUMENT: CHANGES AND ADDITIONS." Library Mercury, no. 2(26) (December 24, 2021): 143–64. http://dx.doi.org/10.18524/2707-3335.2021.2(26).245132.

Full text
Abstract:
Object, theme, purpose of work. The article is devoted to one of major problems of typology of document: general classification of document. A research purpose is to optimize the chart of general classification of document, specify some terms and offer material for a further discussion for development of typology of document. Method or methodology of realization of work. The research methodology is based on the information and communication approach, which allows to determine the features of the classification of the document, which ensure the fixation of information in the document and its tr
APA, Harvard, Vancouver, ISO, and other styles
25

Mannar Mannan, J., K. Sindhanai Selvan, and R. Mohemmed Yousuf. "Independent document ranking for E-learning using semantic-based document term classification." Journal of Intelligent & Fuzzy Systems 40, no. 1 (2021): 893–905. http://dx.doi.org/10.3233/jifs-201070.

Full text
Abstract:
Massive digital documents on Internet leading to use e-learning, and it becomes an emerging field of research due to the massive growth of internet users. E-learning requires suitable document ranking method to avoid navigating to the next Search Engine Result Page (SERP) frequently. The existing document ranking methods are lacking to rank the documents independently based on the conceptual contents. This paper proposes a novel method for ranking the documents independently based on the different classification of term it contains. In this approach, the terms are classified into five categori
APA, Harvard, Vancouver, ISO, and other styles
26

Kim, Pan-Jun, and Jae-Yun Lee. "Utilizing Unlabeled Documents in Automatic Classification with Inter-document Similarities." Journal of the Korean Society for information Management 24, no. 1 (2007): 251–71. http://dx.doi.org/10.3743/kosim.2007.24.1.251.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Wu, Tiandeng, Qijiong Liu, Yi Cao, Yao Huang, Xiao-Ming Wu, and Jiandong Ding. "Continual Graph Convolutional Network for Text Classification." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (2023): 13754–62. http://dx.doi.org/10.1609/aaai.v37i11.26611.

Full text
Abstract:
Graph convolutional network (GCN) has been successfully applied to capture global non-consecutive and long-distance semantic information for text classification. However, while GCN-based methods have shown promising results in offline evaluations, they commonly follow a seen-token-seen-document paradigm by constructing a fixed document-token graph and cannot make inferences on new documents. It is a challenge to deploy them in online systems to infer steaming text data. In this work, we present a continual GCN model (ContGCN) to generalize inferences from observed documents to unobserved docum
APA, Harvard, Vancouver, ISO, and other styles
28

Seifollahi, Sattar, Massimo Piccardi, and Alireza Jolfaei. "An Embedding-Based Topic Model for Document Classification." ACM Transactions on Asian and Low-Resource Language Information Processing 20, no. 3 (2021): 1–13. http://dx.doi.org/10.1145/3431728.

Full text
Abstract:
Topic modeling is an unsupervised learning task that discovers the hidden topics in a collection of documents. In turn, the discovered topics can be used for summarizing, organizing, and understanding the documents in the collection. Most of the existing techniques for topic modeling are derivatives of the Latent Dirichlet Allocation which uses a bag-of-word assumption for the documents. However, bag-of-words models completely dismiss the relationships between the words. For this reason, this article presents a two-stage algorithm for topic modelling that leverages word embeddings and word co-
APA, Harvard, Vancouver, ISO, and other styles
29

Kumari, Lalitha, and Ch Satyanarayana. "An novel cluster based feature selection and document classification model on high dimension trec data." International Journal of Engineering & Technology 7, no. 1.1 (2017): 466. http://dx.doi.org/10.14419/ijet.v7i1.1.10146.

Full text
Abstract:
TREC text documents are complex to analyze the features its relevant similar documents using the traditional document similarity measures. As the size of the TREC repository is increasing, finding relevant clustered documents from a large collection of unstructured documents is a challenging task. Traditional document similarity and classification models are implemented on homogeneous TREC data to find essential features for document entities that are similar to the TREC documents. Also, most of the traditional models are applicable to limited text document sets for text analysis. The main iss
APA, Harvard, Vancouver, ISO, and other styles
30

Guo, Zhong Wei, Qi Long Fu, Bo Yang, Tao Feng, and Meng Zi Zhang. "A Method of Words Classification and Attribute Description for Military Documents Electronic Lexicon." Applied Mechanics and Materials 278-280 (January 2013): 1989–93. http://dx.doi.org/10.4028/www.scientific.net/amm.278-280.1989.

Full text
Abstract:
Based on modern Chinese words classification and characteristics of military documents field,the method of classification and attribute description for military document words is presented. Through the description of morphology, syntax, semantic and reference, etc. it realizes the organic combination of words classification and attribute description. This method can meet the requirement of military documents intelligent processing, proved by building the military document electronic lexicon.
APA, Harvard, Vancouver, ISO, and other styles
31

Zheng, Jianming, Yupu Guo, Chong Feng, and Honghui Chen. "A Hierarchical Neural-Network-Based Document Representation Approach for Text Classification." Mathematical Problems in Engineering 2018 (2018): 1–10. http://dx.doi.org/10.1155/2018/7987691.

Full text
Abstract:
Document representation is widely used in practical application, for example, sentiment classification, text retrieval, and text classification. Previous work is mainly based on the statistics and the neural networks, which suffer from data sparsity and model interpretability, respectively. In this paper, we propose a general framework for document representation with a hierarchical architecture. In particular, we incorporate the hierarchical architecture into three traditional neural-network models for document representation, resulting in three hierarchical neural representation models for d
APA, Harvard, Vancouver, ISO, and other styles
32

Endalie, Demeke, Getamesay Haile, and Wondmagegn Taye Abebe. "Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification." PeerJ Computer Science 8 (April 25, 2022): e961. http://dx.doi.org/10.7717/peerj-cs.961.

Full text
Abstract:
Text classification is the process of categorizing documents based on their content into a predefined set of categories. Text classification algorithms typically represent documents as collections of words and it deals with a large number of features. The selection of appropriate features becomes important when the initial feature set is quite large. In this paper, we present a hybrid of document frequency (DF) and genetic algorithm (GA)-based feature selection method for Amharic text classification. We evaluate this feature selection method on Amharic news documents obtained from the Ethiopia
APA, Harvard, Vancouver, ISO, and other styles
33

Mariyam, Ayesha, SK Althaf Hussain Basha, and S. Viswanadha Raju. "On Optimality of Long Document Classification using Deep Learning." International Journal on Recent and Innovation Trends in Computing and Communication 10, no. 12 (2022): 51–58. http://dx.doi.org/10.17762/ijritcc.v10i12.5866.

Full text
Abstract:
Document classification is effective with elegant models of word numerical distributions. The word embeddings are one of the categories of numerical distributions of words from the WordNet. The modern machine learning algorithms yearn on classifying documents based on the categorical data. The context of interest on the categorical data is posed with weights and the sense and quality of the sentences is estimated for sensible classification of documents. The focus of the current work is on legal and criminal documents extracted from the popular news channels, particularly on classification of
APA, Harvard, Vancouver, ISO, and other styles
34

Guo, Zhong Wei, Peng Jiang, Jian Li Zhang, Xiao Song Zhang, and Li Jian Ji. "The Construction of Military Documents Electronic Lexicon." Applied Mechanics and Materials 373-375 (August 2013): 1682–85. http://dx.doi.org/10.4028/www.scientific.net/amm.373-375.1682.

Full text
Abstract:
Based on the classification system of modern Chinese words classification system and characteristics of military documents field, the military document words are classified and descripted with morphology, syntax, complex feature, semantic and reference. And a military documents electronic lexicon prototype is designed. This lexicon can meet the requirement of military document intelligent processing well proved by application of military documents electronic lexicon.
APA, Harvard, Vancouver, ISO, and other styles
35

Guo, Shun, and Nianmin Yao. "Generating word and document matrix representations for document classification." Neural Computing and Applications 32, no. 14 (2019): 10087–108. http://dx.doi.org/10.1007/s00521-019-04541-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Cheng, Betty Yee Man, Jaime G. Carbonell, and Judith Klein-Seetharaman. "Protein classification based on text document classification techniques." Proteins: Structure, Function, and Bioinformatics 58, no. 4 (2005): 955–70. http://dx.doi.org/10.1002/prot.20373.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Rahamat Basha, S., J. Keziya Rani, and J. J. C. Prasad Yadav. "A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification Accuracy." Engineering, Technology & Applied Science Research 9, no. 6 (2019): 5001–5. http://dx.doi.org/10.48084/etasr.3173.

Full text
Abstract:
Automatic summarization is the process of shortening one (in single document summarization) or multiple documents (in multi-document summarization). In this paper, a new feature selection method for the nearest neighbor classifier by summarizing the original training documents based on sentence importance measure is proposed. Our approach for single document summarization uses two measures for sentence similarity: the frequency of the terms in one sentence and the similarity of that sentence to other sentences. All sentences were ranked accordingly and the sentences with top ranks (with a thre
APA, Harvard, Vancouver, ISO, and other styles
38

Florence, Angelin. "Document Classification Using NLP Techniques." International Journal for Research in Applied Science and Engineering Technology 6, no. 4 (2018): 1222–24. http://dx.doi.org/10.22214/ijraset.2018.4209.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Bista, Umanga, Alexander Mathews, Minjeong Shin, Aditya Krishna Menon, and Lexing Xie. "Comparative Document Summarisation via Classification." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 20–28. http://dx.doi.org/10.1609/aaai.v33i01.330120.

Full text
Abstract:
Thispaperconsidersextractivesummarisationinacomparative setting: given two or more document groups (e.g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups. We formulate a set of new objective functions for this problem that connect recent literature on document summarisation, interpretable machine learning, and data subset selection. In particular, by casting the problem as a binary classification amongst different groups, we derive objectives based on the notion of maxi
APA, Harvard, Vancouver, ISO, and other styles
40

Catania, B., A. Maddalena, and A. Vakali. "XML document indexes: a classification." IEEE Internet Computing 9, no. 5 (2005): 64–71. http://dx.doi.org/10.1109/mic.2005.115.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Fernández, Julio, Jarvin A. Antón-Vargas, Yenny Villuendas-Rey, José F. Cabrera-Venegas, Yusbel Chávez, and Amadeo J. Argüelles-Cruz. "Clustering Techniques for Document Classification." Research in Computing Science 118, no. 1 (2016): 115–25. http://dx.doi.org/10.13053/rcs-118-1-11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Hong, Jiwon, Dongho Jeong, and Sang-Wook Kim. "Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences." Applied Sciences 12, no. 8 (2022): 4088. http://dx.doi.org/10.3390/app12084088.

Full text
Abstract:
Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a given document is malicious. We extracted plaintext features from the corpus of electronic documents and utilized them to train a classification model for detecting malicious documents. Our extensive experimental results with different combinations of three well-known vectorization strategies and three
APA, Harvard, Vancouver, ISO, and other styles
43

Ni'mah, Ana Tsalitsatun, and Fahmi Syuhada. "Term Weighting Based Indexing Class and Indexing Short Document for Indonesian Thesis Title Classification." Journal of Computer Science and Informatics Engineering (J-Cosine) 6, no. 2 (2022): 167–75. http://dx.doi.org/10.29303/jcosine.v6i2.471.

Full text
Abstract:
Document classification nowadays is an easy thing to do because there are the latest methods to get maximum results. Document classification using the term weighting TF-IDF-ICF method has been widely studied. Documents used in this research generally use large documents. If the term weighting TF-IDF method is used in a short text document such as the Thesis Title, the document will not get a perfect score from the classification results. Because in the IDF will calculate the weight of words that always appear to be few, ICF will calculate the weight of words that often appear in the class to b
APA, Harvard, Vancouver, ISO, and other styles
44

Wang, Bohan, Rui Qi, Jinhua Gao, Jianwei Zhang, Xiaoguang Yuan, and Wenjun Ke. "Mining the Frequent Patterns of Named Entities for Long Document Classification." Applied Sciences 12, no. 5 (2022): 2544. http://dx.doi.org/10.3390/app12052544.

Full text
Abstract:
Nowadays, a large amount of information is stored as text, and numerous text mining techniques have been developed for various applications, such as event detection, news topic classification, public opinion detection, and sentiment analysis. Although significant progress has been achieved for short text classification, document-level text classification requires further exploration. Long documents always contain irrelevant noisy information that shelters the prominence of indicative features, limiting the interpretability of classification results. To alleviate this problem, a model called MI
APA, Harvard, Vancouver, ISO, and other styles
45

AMIN, ADNAN, and RICKY SHIU. "PAGE SEGMENTATION AND CLASSIFICATION UTILIZING BOTTOM-UP APPROACH." International Journal of Image and Graphics 01, no. 02 (2001): 345–61. http://dx.doi.org/10.1142/s0219467801000219.

Full text
Abstract:
Document image processing has become an increasingly important technology in the automation of office documentation tasks. Automatic document scanners such as text readers and OCR (Optical Character Recognition) systems are an essential component of systems capable of those tasks. One of the problems in this field is that the document to be read is not always placed correctly on a flat-bed scanner. This means that the document may be skewed on the scanner bed, resulting in a skewed image. This skew has a detrimental effect on document analysis, document understanding, and character segmentatio
APA, Harvard, Vancouver, ISO, and other styles
46

Mohan, Divya, and Latha Ravindran Nair. "A Robust Deep Model for Improved Categorization of Legal Documents for Predictive Analytics." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 3s (2023): 175–83. http://dx.doi.org/10.17762/ijritcc.v11i3s.6179.

Full text
Abstract:
Predictive legal analytics is a technology used to predict the chances of successful and unsuccessful outcomes in a particular case. Predictive legal analytics is performed through automated document classification for facilitating legal experts in their classification of court documents to retrieve and understand the details of specific legal factors from legal judgments for accurate document analysis. However, extracting these factors from legal texts document is a time-consuming process. In order to facilitate the task of classifying documents, a robust method namely Distributed Stochastic
APA, Harvard, Vancouver, ISO, and other styles
47

Park, Heum. "Document Classification of Small Size Documents Using Extended Relief-F Algorithm." KIPS Transactions:PartB 16B, no. 3 (2009): 233–38. http://dx.doi.org/10.3745/kipstb.2009.16-b.3.233.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Mohammed Ali, Sura I., Marwah Nihad, Hussien Mohamed Sharaf, and Haitham Farouk. "Machine learning for text document classification-efficient classification approach." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 1 (2024): 703. http://dx.doi.org/10.11591/ijai.v13.i1.pp703-710.

Full text
Abstract:
<p>Numerous alternative methods for text classification have been created because of the increase in the amount of online text information available. The cosine similarity classifier is the most extensively utilized simple and efficient approach. It improves text classification performance. It is combined with estimated values provided by conventional classifiers such as Multinomial Naive Bayesian (MNB). Consequently, combining the similarity between a test document and a category with the estimated value for the category enhances the performance of the classifier. This approach provides
APA, Harvard, Vancouver, ISO, and other styles
49

Tkachenko, A. L., and L. A. Denisova. "Designing an information system for the electronic document management of a university: automatic classification of documents." Journal of Physics: Conference Series 2182, no. 1 (2022): 012035. http://dx.doi.org/10.1088/1742-6596/2182/1/012035.

Full text
Abstract:
Abstract To ensure the effective functioning of the university educational environment, document flow processes automation, which includes the task of documents automatic classification, is of great importance. The article considers the task of classifying university documents by machine learning methods in order to improve the quality of classification. Documents preprocessing was carried out, which made it possible to distinguish significant words in documents, due to which the accuracy of documents classification increased. Described are methods of extracting features from text TF and TF-ID
APA, Harvard, Vancouver, ISO, and other styles
50

Puri, Shalini, and Satya Prakash Singh. "A Hybrid Hindi Printed Document Classification System Using SVM and Fuzzy." Journal of Information Technology Research 12, no. 4 (2019): 107–31. http://dx.doi.org/10.4018/jitr.2019100106.

Full text
Abstract:
This article introduces a new advanced tri-layered segmentation and bi-leveled-classifier-based Hindi printed document classification system, which categorizes imaged documents into pre-defined mutually exclusive categories by using SVM and Fuzzy matching at character and document classifications, respectively. During training, the improved and noise-free image is segmented into lines and words by profiling. Then it obtains Shirorekha Less (SL) isolated characters along with upper, left and right modifier components from the SL words. These components use their locations and inter character-mo
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!