Log in

Relevant bibliographies by topics / Text Clustering Techniques / Journal articles

To see the other types of publications on this topic, follow the link: Text Clustering Techniques.

Journal articles on the topic 'Text Clustering Techniques'

Author: Grafiati

Published: 3 June 2025

Last updated: 13 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Text Clustering Techniques.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Kumar, Mukesh, and Amandeep Verma. "Text Clustering Techniques A Review." International Journal of Computer Sciences and Engineering 6, no. 6 (2018): 1091–99. http://dx.doi.org/10.26438/ijcse/v6i6.10911099.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Mohit, *. A. Charan Kumari Meghna Sharma. "TEXT CLUSTERING TECHNIQUES: A SURVEY." INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY 6, no. 5 (2017): 248–56. https://doi.org/10.5281/zenodo.573535.

Full text

Abstract:

The advancements in the fields of mobile computing, grid computing, cloud computing, Internet of Things and primarily due to the availability of internet in the hand-held devices were the vital key factors in the growth of large amounts of data. The main challenge is to organize this big data in a structured manner that helps to derive new insights, predictive analysis and to find trends, patterns and their correlations. One of the solutions is to cluster the text, a significant technique of data mining. This paper investigates various techniques experimented in text clustering. It also describes the process of text clustering along with various similarity measures.

APA, Harvard, Vancouver, ISO, and other styles

3

Upadhye, Akshata. "A Survey of Text Clustering Techniques: Algorithms, Applications, and Challenges." International Journal of Science and Research (IJSR) 10, no. 9 (2021): 1749–52. http://dx.doi.org/10.21275/sr24304163737.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

S. Patil, Ratna, and B. S. Chordia. "Mining Text Data using different Text Clustering Techniques." International Journal of Computer Trends and Technology 43, no. 2 (2017): 87–93. http://dx.doi.org/10.14445/22312803/ijctt-v43p113.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Sharma, Saurabh, and Vishal Gupta. "Recent Developments in Text Clustering Techniques." International Journal of Computer Applications 37, no. 6 (2012): 14–19. http://dx.doi.org/10.5120/4611-6604.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Liu, Wei, and Wilson Wong. "Web service clustering using text mining techniques." International Journal of Agent-Oriented Software Engineering 3, no. 1 (2009): 6. http://dx.doi.org/10.1504/ijaose.2009.022944.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Jalal, Ahmed Adeeb, and Basheer Husham Ali. "Text documents clustering using data mining techniques." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 1 (2021): 664. http://dx.doi.org/10.11591/ijece.v11i1.pp664-670.

Full text

Abstract:

Increasing progress in numerous research fields and information technologies, led to an increase in the publication of research papers. Therefore, researchers take a lot of time to find interesting research papers that are close to their field of specialization. Consequently, in this paper we have proposed documents classification approach that can cluster the text documents of research papers into the meaningful categories in which contain a similar scientific field. Our presented approach based on essential focus and scopes of the target categories, where each of these categories includes many topics. Accordingly, we extract word tokens from these topics that relate to a specific category, separately. The frequency of word tokens in documents impacts on weight of document that calculated by using a numerical statistic of term frequency-inverse document frequency (TF-IDF). The proposed approach uses title, abstract, and keywords of the paper, in addition to the categories topics to perform the classification process. Subsequently, documents are classified and clustered into the primary categories based on the highest measure of cosine similarity between category weight and documents weights.

APA, Harvard, Vancouver, ISO, and other styles

8

VarshaC., Pande*1 Dr. Harshala B. Pethe2 &. Dr. Abha. S. Khandelwal3. "CLUSTERING AND CLASSIFICATION TECHNIQUES USING TEXT MINING." GLOBAL JOURNAL OF ENGINEERING SCIENCE AND RESEARCHES [NC-Rase 18] (November 12, 2018): 9–15. https://doi.org/10.5281/zenodo.1483957.

Full text

Abstract:

The text is nothing but the combination of characters. Therefore, analyzing and extracting information patterns from such data sets are more complex. Several methods have been proposed for analyzing such texts and extracting information.Data mining, a specific area named text mining is used to classify the huge semi structured or unstructured data needs proper clustering. Maximum text documents involves fast retrieval of information, arrangement of documents, exploring of information from the documents. Declaration of text input data and classification of the documents is a complex process. Text Clustering is an unsupervised method in which no input out patterns is predefined. This method is based upon the idea of dividing the similar text into the same cluster. Individual cluster consists of number of records. The clustering is thought better if the contents of documents of intra cluster are more alike than the contents of inter-cluster documents. Classificationis used to find out in which group each data instance is related within a given dataset. It is used for classifying data into different classes according to some constrains. Several major kinds of classification algorithms including C4.5, ID3, k-nearest neighbor classifier, Naive Bayes, SVM, and ANN are used for classification. This paper describes the comparative study of clustering and Classification Algorithms.

APA, Harvard, Vancouver, ISO, and other styles

9

Ahmed, Adeeb Jalal, and Husham Ali Basheer. "Text documents clustering using data mining techniques." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 1 (2021): 664–70. https://doi.org/10.11591/ijece.v11i1.pp664-670.

Full text

Abstract:

Increasing progress in numerous research fields and information technologies, led to an increase in the publication of research papers. Therefore, researchers take a lot of time to find interesting research papers that are close to their field of specialization. Consequently, in this paper we have proposed documents classification approach that can cluster the text documents of research papers into the meaningful categories in which contain a similar scientific field. Our presented approach based on essential focus and scopes of the target categories, where each of these categories includes many topics. Accordingly, we extract word tokens from these topics that relate to a specific category, separately. The frequency of word tokens in documents impacts on weight of document that calculated by using a numerical statistic of term frequency-inverse document frequency (TF-IDF). The proposed approach uses title, abstract, and keywords of the paper, in addition to the categories topics to perform the classification process. Subsequently, documents are classified and clustered into the primary categories based on the highest measure of cosine similarity between category weight and documents weights.

APA, Harvard, Vancouver, ISO, and other styles

10

Vedmiediev, Daniil, and Nataliia Shapoval. "Text Message Clustering." Electronics and Control Systems 4, no. 78 (2023): 16–20. http://dx.doi.org/10.18372/1990-5548.78.18255.

Full text

Abstract:

The division into groups of text messages is considered, which can be useful when building a personalized approach in different systems. Тo solve this problem, the Embedded Word2Vec was proposed. To enhance the division into groups, the suggestion of employing mini-batch k-means is presented, offering a method with lower computational demands. This recommendation aligns with the practical need for efficient and scalable clustering methods, especially when dealing with large datasets. Furthermore, the proposed metric based on the greatest common sequence is highlighted as a valuable tool for evaluating the similarity of texts. This metric not only serves as a means to assess clustering quality but also underscores the methodological approach of directly working with text data. The combination of these techniques presents a comprehensive framework for robust and effective text clustering, with potential applications in diverse fields, such as personalized system interactions and information retrieval.

APA, Harvard, Vancouver, ISO, and other styles

11

A.Ananda, Shankar, and Kumar Dr.K.R.Ananda. "Data Mining Technique for Opinion Retrieval in Healthcare System." International Journal of Data Mining & Knowledge Management Process (IJDKP) 5, no. 5 (2019): 75–84. https://doi.org/10.5281/zenodo.3463239.

Full text

Abstract:

The aim of this paper is to use Text mining(TM) concepts in the field of Health care System. We no that now days decision making in health care involves number of opinions given by the group of medical experts for specific disease in the form of decisions which will be presented in medical database in the form of text. These decisions are then mined from database with the help of Data Mining techniques. Text document clustering is considered as tool for performing information based operations. For clustering normally K-means clustering technique is used. In this paper we use Bisecting K-means clustering technique and it is better compared to traditional K-means technique. The objective is to study the revealed groupings of similar opinion-types associated with the likelihood of physicians and medical experts.

APA, Harvard, Vancouver, ISO, and other styles

12

Hari, Prasad Bomma. "Data mining techniques and their applicability for data engineers in development and reporting." International Journal of Multidisciplinary Research and Growth Evaluation 03, no. 02 (2022): 625–27. https://doi.org/10.54660/.IJMRGE.2022.3.2-625-627.

Full text

Abstract:

AbstractWhen companies need to build decision making reports on huge amounts of data, datamining is their go-to method. Data mining is like being a detective who finds hiddenclues in a sea of information. Data engineers use clever techniques to turn raw datainto useful insights that businesses can act on. This paper explores various data miningtechniques like association rule learning, clustering, classification, regression analysis,and text mining. Additionally, the paper highlights the importance of new technologieslike machine learning, AI, and big data platforms, and how these advancements makedata processing more efficient and insightful.

APA, Harvard, Vancouver, ISO, and other styles

13

Bewoor, Mrunal S., and Suhas H. Patil. "Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms." Engineering, Technology & Applied Science Research 8, no. 1 (2018): 2562–67. https://doi.org/10.5281/zenodo.1207394.

Full text

Abstract:

<em>Abstract</em>—The availability of various digital sources has created a demand for text mining mechanisms. Effective summary generation mechanisms are needed in order to utilize relevant information from often overwhelming digital data sources. In this view, this paper conducts a survey of various single as well as multi-document text summarization techniques. It also provides analysis of treating a query sentence as a common one, segmented from documents for text summarization. Experimental results show the degree of effectiveness in text summarization over different clustering algorithms.

APA, Harvard, Vancouver, ISO, and other styles

14

Rao, Bapuji, and Brojo Kishore Mishra. "An Approach to Clustering of Text Documents Using Graph Mining Techniques." International Journal of Rough Sets and Data Analysis 4, no. 1 (2017): 38–55. http://dx.doi.org/10.4018/ijrsda.2017010103.

Full text

Abstract:

This paper introduces a new approach of clustering of text documents based on a set of words using graph mining techniques. The proposed approach clusters (groups) those text documents having searched successfully for the given set of words from a set of given text documents. The document-word relation can be represented as a bi-partite graph. All the clustering of text documents is represented as sub-graphs. Further, the paper proposes an algorithm for clustering of text documents for a given set of words. It is an automated system and requires minimal human interaction for the clustering of text documents. The algorithm has been implemented using C++ programming language and observed satisfactory results.

APA, Harvard, Vancouver, ISO, and other styles

15

Paladugu, Rama Krishna, and Gangadhara Rao Kancherla. "Harnessing Deep Learning Techniques for Text Clustering and Document Categorization." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 8 (2023): 125–39. http://dx.doi.org/10.17762/ijritcc.v11i8.7930.

Full text

Abstract:

This research paper delves into the realm of deep text clustering algorithms with the aim of enhancing the accuracy of document classification. In recent years, the fusion of deep learning techniques and text clustering has shown promise in extracting meaningful patterns and representations from textual data. This paper provides an in-depth exploration of various deep text clustering methodologies, assessing their efficacy in improving document classification accuracy. Delving into the core of deep text clustering, the paper investigates various feature representation techniques, ranging from conventional word embeddings to contextual embeddings furnished by BERT and GPT models.By critically reviewing and comparing these algorithms, we shed light on their strengths, limitations, and potential applications. Through this comprehensive study, we offer insights into the evolving landscape of document analysis and classification, driven by the power of deep text clustering algorithms.Through an original synthesis of existing literature, this research serves as a beacon for researchers and practitioners in harnessing the prowess of deep learning to enhance the accuracy of document classification endeavors.

APA, Harvard, Vancouver, ISO, and other styles

16

Bhardwaj, Bhavna. "Text Mining, its Utilities, Challenges and Clustering Techniques." International Journal of Computer Applications 135, no. 7 (2016): 22–24. http://dx.doi.org/10.5120/ijca2016908452.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Roussinov, Dmitri, and J. Leon Zhao. "Text clustering and summary techniques for CRM message management." Journal of Enterprise Information Management 17, no. 6 (2004): 424–29. http://dx.doi.org/10.1108/17410390410566715.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Elista, Kiki, Riska Evangelina Hutabarat, Jefri Ricardo Doloksaribu, and Desiana Bondar. "EFFECT OF USING CLUSTERING TECHNIQUE ON THE STUDENTS ACHIEVEMENT IN WRITING RECOUNT TEXT." ELT (English Language Teaching Prima Journal) 1, no. 2 (2020): 59–88. http://dx.doi.org/10.34012/eltp.v1i2.1352.

Full text

Abstract:

ABSTRACT This research was conducted to investigate the effect of using clustering techniques on the student achievement in writing recount text. Therefore, an experimental research was conducted to obtain the data, the population of this study were the first school year in the academic year 2019/2020 of SMA DHARMAWANGSA Medan which has two classes consisted of 80 students. They were taken as the sample. 40 students in the experimental group were taught using clustering techniques while the other 40 students in the control group were taught using free writing technique. Writing test was used to acquire to obtain data. The result the data analyzing indicated that there is a large effect of using clustering technique on the students’ achievement in writing recount text since the result tobserved> ttable or 2.568 > 2.06. Thus, the null hypothesis is rejected while the alternate hypothesis is accepted. This the student who are taught using clustering technique could achieve better achievement than students who were taught rewriting technique. Result of data analyzing suggests that teachers should consider the use of clustering techniques in writing recount text to their students. Keyword : Writing , Recount Text , Clustering Technique

APA, Harvard, Vancouver, ISO, and other styles

19

Heena, Girdher*, and Gaur Poonam. "TEXT MINING TECHNIQUES-A REVIEW." GLOBAL JOURNAL OF ENGINEERING SCIENCE AND RESEARCHES 4, no. 6 (2017): 68–73. https://doi.org/10.5281/zenodo.817359.

Full text

Abstract:

Text mining is a technology that is used to extract meaningful information from unstructured or semi structured text. The amount of data is increasing at tremendous speed. So there is a need to extract meaningful information from huge amount of data. Text mining techniques are used for this purpose. This paper focuses on text mining process, various techniques of text mining. In addition to this we have also discussed a comparison between text mining techniques on the basis of Goal, Algorithms and Tools.

APA, Harvard, Vancouver, ISO, and other styles

20

Wijayanto, Feri. "Clustering Analysis of Chess Portable Game Notation Text." Jurnal Sains, Nalar, dan Aplikasi Teknologi Informasi 3, no. 3 (2024): 137–42. http://dx.doi.org/10.20885/snati.v3.i3.42.

Full text

Abstract:

Chess is a game that requires a high level of intelligence and strategy. Generally, in order to understand complex move patterns and strategies, the expertise of chess masters is required. With the rapid development in the field of machine learning, the digitization of chess game recordings in Portable Game Notation (PGN) format, and the availability of large and widely accessible data, it is possible to apply machine learning techniques to analyze chess games. This research studies the use of text clustering algorithms, specifically hierarchical clustering and K-means clustering, to categorize chess games based on their moves. We extracted 100 chess games that use certain openings such as French Defence, Queen's Gambit Declined, and English Opening. In the implementation of hierarchical clustering, single, average, and complete linkage methods are used. As a result, our findings show that hierarchical clustering with single linkage is less effective. On the other hand, the average and complete linkage methods, as well as K-means clustering, successfully identify clusters corresponding to the original openings. Notably, K-means clustering showed the highest accuracy in clustering chess games. This research highlights the potential of machine learning techniques in uncovering strategic patterns in chess games, paving the way for deeper insights into game strategies.

APA, Harvard, Vancouver, ISO, and other styles

21

Shahana Bano, Mrs, B. Divyanjali, A. K M L R V Virajitha, and M. Tejaswi. "Document Summarization Using Clustering and Text Analysis." International Journal of Engineering & Technology 7, no. 2.32 (2018): 456. http://dx.doi.org/10.14419/ijet.v7i2.32.15740.

Full text

Abstract:

Document summarization is a procedure of shortening the content report with a product, so as to make the outline with the significant parts of unique record.Now a days ,users are very much tired about their works and they don’t have much time to spend reading a lot of information .they just want the maximum and accurate information which describes everything and occupies minimum space.This paper discusses an important approach for document summarization by using clustering and text analysis. In this paper, we are performing the clustering and text analytic techniques for reducing the data redundancy and for identifying similarity sentences in text of documents and grouping them in cluster based on their term frequency value of the words. Mainly these techniques help to reduce the data and documents are generated with high efficiency.

APA, Harvard, Vancouver, ISO, and other styles

22

Alqahtani, A., H. Alhakami, T. Alsubait, and A. Baz. "A Survey of Text Matching Techniques." Engineering, Technology & Applied Science Research 11, no. 1 (2021): 6656–61. https://doi.org/10.48084/etasr.3968.

Full text

Abstract:

Text matching is the process of identifying and locating particular text matches in raw data. Text matching is a vital component in practical applications and an essential process in several fields. Furthermore, several dynamic techniques have been introduced in this context in order to create ease in pattern generation from words. The process involves matching of text files, text mining, text clustering, association rule extraction, world cloud, natural language processing, and text similarity measures (knowledge-based, corpus-based, string-based, and hybrid similarities). The string-based approach forms the most conspicuous form of text mining applied in different cases. The survey attempted in the present study covers a new research premise that uses text-matching to solve problems. The study also summarizes different approaches that are being used in this domain.

APA, Harvard, Vancouver, ISO, and other styles

23

Khan, Danish. "Modeling and Semantic Clustering in Large-scale Text Data: A Review of Machine Learning Techniques and Applications." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 04 (2025): 1–9. https://doi.org/10.55041/ijsrem46510.

Full text

Abstract:

Abstract With the exponential growth of textual data across diverse domains, the task of efficiently modelling and clustering large-scale text has emerged as a key challenge in natural language processing (NLP). Conventional text representation approaches, such as Term Frequency-Inverse Document Frequency (TF-IDF) and Bag-of-Words (BoW), often fall short in capturing semantic nuances. This limitation has encouraged the adoption of more advanced techniques, including word embeddings (e.g., Word2Vec, GloVe) and transformer-based models like BERT and GPT. Similarly, traditional clustering algorithms such as K-Means and Hierarchical Clustering often struggle with the high dimensionality and sparsity inherent in text data. Consequently, models like Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), and deep learning-based clustering frameworks have gained popularity. This review paper presents a comprehensive overview of recent machine learning-based text representation and semantic clustering techniques, examining their performance, scalability, and relevance across applications. It also outlines persisting challenges such as interpretability, noise handling, and computational overhead, while identifying potential research directions to enhance semantic clustering in large-scale text environments. Keywords: Semantic Clustering, Text Representation, Word Embeddings, Transformer Models, Deep Learning in NLP, Text Mining.

APA, Harvard, Vancouver, ISO, and other styles

24

Hamou, Reda Mohamed, Hadj Ahmed Bouarara, and Abdelmalek Amine. "Bio-Inspired Techniques in the Clustering of Texts." International Journal of Applied Metaheuristic Computing 6, no. 4 (2015): 39–68. http://dx.doi.org/10.4018/ijamc.2015100103.

Full text

Abstract:

Today, the development of a large scale access network internet/intranet has increased the amount of textual information available online/offline, where billions of documents have been created. In the last few years, bio inspired techniques which invaded the world of text-mining such, as clustering, represents a critical problem in the digital society especially over the world of information retrieval (IR). The content of this article is a recapitulation of a set of works as a comparative study between the authors' experiments realized by applying a set of bio-inspired techniques (social spiders(SS), 2D Cellular automata (2D-CA), 3D cellular automata (3D-CA), Artificial immune system (AIS), Particle swarm optimization (PSO)) and other techniques founded in literature (Ants Colony Optimization (ACO) and Genetic algorithms (GAs)) for solving the text clustering challenge by using the benchmark Reuter 21785. They analyse the different results in term of entropy, f-measure, execution time, and clusters number in order to find the ideal configuration (similarity measure and text representation method) for each technique. Their objectives are to improve the efficiency of text clustering systems and make decisions that can be the starting point for other researchers.

APA, Harvard, Vancouver, ISO, and other styles

25

Alqahtani, A., H. Alhakami, T. Alsubait, and A. Baz. "A Survey of Text Matching Techniques." Engineering, Technology & Applied Science Research 11, no. 1 (2021): 6656–61. http://dx.doi.org/10.48084/etasr.3968.

Full text

Abstract:

Text matching is the process of identifying and locating particular text matches in raw data. Text matching is a vital component in practical applications and an essential process in several fields. Furthermore, several dynamic techniques have been introduced in this context in order to create ease in pattern generation from words. The process involves matching of text files, text mining, text clustering, association rule extraction, world cloud, natural language processing, and text similarity measures (knowledge-based, corpus-based, string-based, and hybrid similarities). The string-based approach forms the most conspicuous form of text mining applied in different cases. The survey attempted in the present study covers a new research premise that uses text-matching to solve problems. The study also summarizes different approaches that are being used in this domain.

APA, Harvard, Vancouver, ISO, and other styles

26

Muhammad, Aoun. "Comparative Analysis of Text Mining Techniques for News Article Summarization." LC International Journal of STEM (ISSN: 2708-7123) 4, no. 1 (2023): 52–63. https://doi.org/10.5281/zenodo.7893329.

Full text

Abstract:

Text mining research paper is a scientific study that focuses on the development and application of text mining techniques for extracting valuable information from unstructured textual data. The paper discusses the challenges of working with unstructured data and the need for advanced text mining techniques to address these challenges. The paper outlines the various steps involved in the text mining process, such as data preprocessing, text representation, and feature selection. It discusses the importance of selecting appropriate algorithms for different types of text mining tasks, including text classification, clustering, sentiment analysis, and topic modeling. The paper also discusses the challenges of evaluating text mining models, including issues related to data quality, model performance, and interpretability. It highlights the importance of using appropriate evaluation metrics and techniques to ensure the reliability and validity of the results. Finally, the paper provides case studies and real-world examples of text mining applications in various domains such as healthcare, social media analysis, and financial analysis. It emphasizes the potential of text mining to provide valuable insights and knowledge that can be used to support decision-making in different industries. Overall, the paper highlights the importance of text mining as a powerful tool for analyzing unstructured textual data and provides a comprehensive overview of the key techniques and challenges in this field.

APA, Harvard, Vancouver, ISO, and other styles

27

Mathuna, K. T., I. Elizabeth Shanthi, and K. Nandhini. "Applying Clustering Techniques for Efficient Text Mining in Twitter Data." International Journal of Web Technology 004, no. 002 (2015): 36–39. http://dx.doi.org/10.20894/ijwt.104.004.002.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

S, Saraswathi, and Arti R. "MULTI-DOCUMENT TEXT SUMMARIZATION USING CLUSTERING TECHNIQUES AND LEXICAL CHAINING." ICTACT Journal on Soft Computing 1, no. 1 (2010): 23–29. http://dx.doi.org/10.21917/ijsc.2010.0004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Patil, Kiran Sanajy, and Kurhade N. V. Prof. "Experimental Result Analysis of Text Categorization using Clustering and Classification Algorithms." International Journal of Trend in Scientific Research and Development 3, no. 4 (2019): 1216–19. https://doi.org/10.5281/zenodo.3590801.

Full text

Abstract:

In a world that routinely produces more textual data. It is very critical task to managing that textual data. There are many text analysis methods are available to managing and visualizing that data, but many techniques may give less accuracy because of the ambiguity of natural language. To provide the ne grained analysis, in this paper introduce e cient machine learning algorithms for categorize text data. To improve the accuracy, in proposed system I introduced Natural language toolkit NLTK python library to perform natural language processing. The main aim of proposed system is to generalize the model for real time text categorization applications by using e cient text classi cation as well as clustering machine learning algorithms and nd the efficient and accurate model for input dataset using performance measure concept. Patil Kiran Sanajy | Prof. Kurhade N. V. "Experimental Result Analysis of Text Categorization using Clustering and Classification Algorithms" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25077.pdf

APA, Harvard, Vancouver, ISO, and other styles

30

Dutta, Arjun. "Clustering Techniques and Their Applications: A Review." American Journal of Advanced Computing 1, no. 4 (2020): 1–6. http://dx.doi.org/10.15864/ajac.1404.

Full text

Abstract:

This paper deals with concise study on clustering: existing methods and developments made at various times. Clustering is defined as an unsupervised learning where the targets are sorted out on the foundation of some similarity inherent among them. In the recent times, we dispense with large masses of data including images, video, social text, DNA, gene information, etc. Data clustering analysis has come out as an efficient technique to accurately achieve the task of categorizing information into sensible groups. Clustering has a deep association with researches in several scientific fields. k-means algorithm was suggested in 1957. K-mean is the most popular partitional clustering method till date. In many commercial and non-commercial fields, clustering techniques are used. The applications of clustering in some areas like image segmentation, object and role recognition and data mining are highlighted. In this paper, we have presented a brief description of the surviving types of clustering approaches followed by a survey of the areas.

APA, Harvard, Vancouver, ISO, and other styles

31

Yopie A.T. Pangemanan, Jane E. Scipio, James Ed. Lalira, Morshe E. Lumansik, and Barrylyn S.H. Kussoy. "THE IMPLEMENTATION OF CLUSTERING TECHNIQUE IN LEARNING WRITING DESCRIPTTIVE TEXT AT 8TH GRADE OF SMP N 5 RANOYAPO." Santhet (Jurnal Sejarah Pendidikan Dan Humaniora) 8, no. 2 (2024): 12495–504. https://doi.org/10.36526/santhet.v8i2.4286.

Full text

Abstract:

The purpose of this research is to find out 1) whether there is a significant increase in students' ability to write texts with clustering techniques, and 2) any difficulties that are associated with students in using clustering techniques. The test used is a test writing descriptive text and questionnaires. The method in this research is Post-Test Pre-Test Unit Design. The research design uses one class as an experimental class which gets two consultations using the Clustering technique. The sample used was 20 students in class VIII SMP N 5 Ranoyapo. Research data were collected through pre-test, post-test and questionnaire. The results showed a difference between pre-test and post-test statistics after the application of clustering techniques with significance values. Furthermore, most students find it difficult to use adjectives and develop main topics when using grouping techniques to write descriptive paragraphs. This shows the grouping technique makes it easy for students to improve their writing skills

APA, Harvard, Vancouver, ISO, and other styles

32

Radomirović, Branislav, Vuk Jovanović, Bosko Nikolić, et al. "Text Document Clustering Approach by Improved Sine Cosine Algorithm." Information Technology and Control 52, no. 2 (2023): 541–61. http://dx.doi.org/10.5755/j01.itc.52.2.33536.

Full text

Abstract:

Due to the vast amounts of textual data available in various forms such as online content, social media comments, corporate data, public e-services and media data, text clustering has been experiencing rapid development. Text clustering involves categorizing and grouping similar content. It is a process of identifying significant patterns from unstructured textual data. Algorithms are being developed globally to extract useful and relevant information from large amounts of text data. Measuring the significance of content in documents to partition the collection of text data is one of the most important obstacles in text clustering. This study suggests utilizing an improved metaheuristics algorithm to fine-tune the K-means approach for text clustering task. The suggested technique is evaluated using the first 30 unconstrained test functions from the CEC2017 test-suite and six standard criterion text datasets. The simulation results and comparison with existing techniques demonstrate the robustness and supremacy of the suggested method.

APA, Harvard, Vancouver, ISO, and other styles

33

Harefa, Febriwan, Sahlan Tampubolon, and Arsen Nahum Pasaribu. "EXPLORING THE IMPACT OF MIND MAPPING AND CLUSTERING TECHNIQUES ON RECOUNT TEXT LEARNING FOR 10TH GRADE STUDENTS AT SMK NEGERI 1 DHARMA CARAKA." Academic Journal Perspective : Education, Language, and Literature 12, no. 1 (2024): 54–76. https://doi.org/10.33603/perspective.v12i1.9473.

Full text

Abstract:

This research aimed to determine the effectiveness of clustering and mind mapping techniques in enhancing the writing abilities of 10th-grade students at SMK Negeri 1 Dharma Caraka, Gunungsitoli Selatan, specifically in Recount text learning. The quasi-experimental study involved 231 students across seven classes, with a sample of 93 students divided into three groups: 31 students using the mind mapping technique, 31 using the clustering technique, and 31 using conventional methods. Data were analyzed using SPSS 29.00. The findings revealed significant improvements in the students' writing abilities with both techniques. The mind mapping group's post-test mean score was 84.51, up from a pre-test mean of 51.451. The clustering group's post-test mean was 89.482, compared to a pre-test mean of 54.838. An ANOVA analysis showed average scores of 84.51 for the mind mapping technique, 89.48 for the clustering technique, and 78.09 for the conventional technique. Overall, the clustering technique proved to be more effective than the mind mapping technique in improving the students' writing skills in Recount text learning. Thus, clustering is recommended for enhancing writing abilities in this context.

APA, Harvard, Vancouver, ISO, and other styles

34

Risma Rahajeng Lestari and Neny Triana Dewi. "THE EFFECTIVENESS OF USING CLUSTERING TECHNIQUE TOWARD WRITING PROCEDURE TEXT." Jurnal Ilmu Pendidikan Muhammadiyah Kramat Jati 3, no. 1 (2022): 1–9. http://dx.doi.org/10.55943/jipmukjt.v3i1.22.

Full text

Abstract:

This research was conducted to measure the effectiveness of cluster techniques in writing text procedures from students as students of Roudlotun Nasyiin Mojokerto. The researcher uses quasi experimental research for design. This design involves two classes as research subjects, the experimental class and the control class. These classes have similar abilities in English achievement. The researcher gave experimental treatment in treatment class and the researcher gave control treatment in control class. After doing the treatments in each class, the researcher gave post-test. The post-test scores in each class will be compared to measure the impact of doing these treatments. There are two variables in this research, independent variable and dependent variable. Independent variable is clustering technique and dependent variable is writing achievement. This treatment is given during some meeting before taking the data. The research result students’ ability in writing after taught by using clustering technique was very good. In short, clustering technique was an appropriate strategy to write a text, especially procedure text. The writing students’ ability taught by using non- clustering technique was lower than the students were taught by using clustering technique. The result of t-test was 7,153. After got the score, the researcher compared it with t- table (df = n1 + n2 – 2 = 72 ; significance 5% = 0.05). The value of t-table was 1.666. It signed that Ha was accepted and H0 was refused.

APA, Harvard, Vancouver, ISO, and other styles

35

Larabi-Marie-Sainte, Souad, Mashael Bin Alamir, and Abdulmajeed Alameer. "Arabic Text Clustering Using Self-Organizing Maps and Grey Wolf Optimization." Applied Sciences 13, no. 18 (2023): 10168. http://dx.doi.org/10.3390/app131810168.

Full text

Abstract:

Arabic text clustering is an essential topic in Arabic Natural Language Processing (ANLP). Its significance resides in various applications, such as document indexing, categorization, user review analysis, and others. After inspecting the current work on clustering Arabic text, it is observed that most researchers focus on applying K-Means clustering while hindering other clustering techniques. Our evaluation shows that K-Means has a weakness of inconsistent clustering results and weak clustering performance when the data dimensionality increases. Unlike K-Means clustering, Artificial Neural Networks (ANN) models such as Self-Organizing Maps (SOM) demonstrated higher accuracy and efficiency in clustering even with high dimensional datasets. In this paper, we introduce a new clustering model based on an optimization technique called Grey Wolf Optimization (GWO) used conjointly with SOM clustering to enhance its clustering performance and accuracy. The evaluation results of our proposed technique show an improvement in the effectiveness and efficiency in comparison with state-of-the-art approaches.

APA, Harvard, Vancouver, ISO, and other styles

36

Abualigah, Laith Mohammad, Essam Said Hanandeh, Ahamad Tajudin Khader, Mohammed Abdallh Otair, and Shishir Kumar Shandilya. "An Improved B-hill Climbing Optimization Technique for Solving the Text Documents Clustering Problem." Current Medical Imaging Formerly Current Medical Imaging Reviews 16, no. 4 (2020): 296–306. http://dx.doi.org/10.2174/1573405614666180903112541.

Full text

Abstract:

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.

APA, Harvard, Vancouver, ISO, and other styles

37

MORE, MAHADEV A. "CONTENT BASED IMAGE RETRIVAL USING DIFFERENT CLUSTERING TECHNIQUES." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 07, no. 09 (2023): 1–11. http://dx.doi.org/10.55041/ijsrem25835.

Full text

Abstract:

CBIR (Content based image retrieval) is the software system for retrieving the images from the database by using their features. In CBIR technique, the images are retrieved from the dataset by using the features like color, text, shape,texture and similarity. Object recognition technique is used in CBIR. Research on multimedia systems and content-based image retrieval is given tremendous importance during the last decade. The reason behind this is the fact that multimedia databases handle text, audio, video and image information, which are of prime interest in web and other high end user applications. Content-based Image retrieval deals with the extraction of knowledge, image data relationship, or other patternsnot expressly keep within the pictures. It uses ways from computer vision, image processing, image retrieval, data retrieval, machine learning, database and artificial intelligence. Rule retrieval has been applied to large image databases. The proposedsystem gives average accuracy of 90%. Keywords— CBIR, Color feature, Shape feature, Texture feature, Feature extraction, Clustering, Image Retrieval.

APA, Harvard, Vancouver, ISO, and other styles

38

Jinarat, Supakpong, Choochart Haruechaiyasak, and Arnon Rungsawang. "Graph-Based Concept Clustering for Web Search Results." International Journal of Electrical and Computer Engineering (IJECE) 5, no. 6 (2015): 1536. http://dx.doi.org/10.11591/ijece.v5i6.pp1536-1544.

Full text

Abstract:

A search engine usually returns a long list of web search results corresponding to a query from the user. Users must spend a lot of time for browsing and navigating the search results for the relevant results. Many research works applied the text clustering techniques, called web search results clustering, to handle the problem. Unfortunately, search result document returned from search engine is a very short text. It is difficult to cluster related documents into the same group because a short document has low informative content. In this paper, we proposed a method to cluster the web search results with high clustering quality using graph-based clustering with concept which extract from the external knowledge source. The main idea is to expand the original search results with some related concept terms. We applied the Wikipedia as the external knowledge source for concept extraction. We compared the clustering results of our proposed method with two well-known search results clustering techniques, Suffix Tree Clustering and Lingo. The experimental results showed that our proposed method significantly outperforms over the well-known clustering techniques.

APA, Harvard, Vancouver, ISO, and other styles

39

Mall, Shalu, Avinash Maurya, Ashutosh Pandey, and Davain Khajuria. "Centroid Based Clustering Approach for Extractive Text Summarization." International Journal for Research in Applied Science and Engineering Technology 11, no. 6 (2023): 3404–9. http://dx.doi.org/10.22214/ijraset.2023.53542.

Full text

Abstract:

Abstract: Extractive text summarization is the process of identifying the most important information from a large text and presenting it in a condensed form. One popular approach to this problem is the use of centroid-based clustering algorithms, which group together similar sentences based on their content and then select representative sentences from each cluster to form a summary. In this research, we present a centroid-based clustering algorithm for email summarization that combines the use of word embeddings with a clustering algorithm. We compare our algorithm to existing summarization techniques. Our results show that our approach stands close to existing methods in terms of summary quality, while also being computationally efficient. Overall, our work demonstrates the potential of centroid-based clustering algorithms for extractive text summarization and suggests avenues for further research in this area.

APA, Harvard, Vancouver, ISO, and other styles

40

Ahmed, Majid Hameed, Sabrina Tiun, Nazlia Omar, and Nor Samsiah Sani. "Short Text Clustering Algorithms, Application and Challenges: A Survey." Applied Sciences 13, no. 1 (2022): 342. http://dx.doi.org/10.3390/app13010342.

Full text

Abstract:

The number of online documents has rapidly grown, and with the expansion of the Web, document analysis, or text analysis, has become an essential task for preparing, storing, visualizing and mining documents. The texts generated daily on social media platforms such as Twitter, Instagram and Facebook are vast and unstructured. Most of these generated texts come in the form of short text and need special analysis because short text suffers from lack of information and sparsity. Thus, this topic has attracted growing attention from researchers in the data storing and processing community for knowledge discovery. Short text clustering (STC) has become a critical task for automatically grouping various unlabelled texts into meaningful clusters. STC is a necessary step in many applications, including Twitter personalization, sentiment analysis, spam filtering, customer reviews and many other social network-related applications. In the last few years, the natural-language-processing research community has concentrated on STC and attempted to overcome the problems of sparseness, dimensionality, and lack of information. We comprehensively review various STC approaches proposed in the literature. Providing insights into the technological component should assist researchers in identifying the possibilities and challenges facing STC. To gain such insights, we review various literature, journals, and academic papers focusing on STC techniques. The contents of this study are prepared by reviewing, analysing and summarizing diverse types of journals and scholarly articles with a focus on the STC techniques from five authoritative databases: IEEE Xplore, Web of Science, Science Direct, Scopus and Google Scholar. This study focuses on STC techniques: text clustering, challenges to short texts, pre-processing, document representation, dimensionality reduction, similarity measurement of short text and evaluation.

APA, Harvard, Vancouver, ISO, and other styles

41

Basha, M. John, and K. P. Kaliyamurthie. "An Improved Similarity Matching based Clustering Framework for Short and Sentence Level Text." International Journal of Electrical and Computer Engineering (IJECE) 7, no. 1 (2017): 551. http://dx.doi.org/10.11591/ijece.v7i1.pp551-558.

Full text

Abstract:

Text clustering plays a key role in navigation and browsing process. For an efficient text clustering, the large amount of information is grouped into meaningful clusters. Multiple text clustering techniques do not address the issues such as, high time and space complexity, inability to understand the relational and contextual attributes of the word, less robustness, risks related to privacy exposure, etc. To address these issues, an efficient text based clustering framework is proposed. The Reuters dataset is chosen as the input dataset. Once the input dataset is preprocessed, the similarity between the words are computed using the cosine similarity. The similarities between the components are compared and the vector data is created. From the vector data the clustering particle is computed. To optimize the clustering results, mutation is applied to the vector data. The performance the proposed text based clustering framework is analyzed using the metrics such as Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR) and Processing time. From the experimental results, it is found that, the proposed text based clustering framework produced optimal MSE, PSNR and processing time when compared to the existing Fuzzy C-Means (FCM) and Pairwise Random Swap (PRS) methods.

APA, Harvard, Vancouver, ISO, and other styles

42

M., John Basha, and Kaliyamurthie K.P. "An Improved Similarity Matching based Clustering Framework for Short and Sentence Level Text." International Journal of Electrical and Computer Engineering (IJECE) 7, no. 1 (2017): 551–58. https://doi.org/10.11591/ijece.v7i1.pp551-558.

Full text

Abstract:

Text clustering plays a key role in navigation and browsing process. For an efficient text clustering, the large amount of information is grouped into meaningful clusters. Multiple text clustering techniques do not address the issues such as, high time and space complexity, inability to understand the relational and contextual attributes of the word, less robustness, risks related to privacy exposure, etc. To address these issues, an efficient text based clustering framework is proposed. The Reuters dataset is chosen as the input dataset. Once the input dataset is preprocessed, the similarity between the words are computed using the cosine similarity. The similarities between the components are compared and the vector data is created. From the vector data the clustering particle is computed. To optimize the clustering results, mutation is applied to the vector data. The performance the proposed text based clustering framework is analyzed using the metrics such as Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR) and Processing time. From the experimental results, it is found that, the proposed text based clustering framework produced optimal MSE, PSNR and processing time when compared to the existing Fuzzy C-Means (FCM) and Pairwise Random Swap (PRS) methods.

APA, Harvard, Vancouver, ISO, and other styles

43

Anggitaningrum, Nindya Revani, Alimatun Alimatun, Hanafi Wibowo, Minkhatunnakhriyah Minkhatunnakhriyah, and Albiansyah Albiansyah. "Investigating Clustering Technique on Students’ Writing Skill in Narrative Text." Journal Polingua: Scientific Journal of Linguistic Literatura and Education 10, no. 1 (2021): 12–15. http://dx.doi.org/10.30630/polingua.v10i1.164.

Full text

Abstract:

Abstract— The purpose of this study was to explain the effect of clustering technique toward students’ writing skill of narrative text. This study used a quasi-experimental design with post-test only control group design. The population of this study was all students of tenth graders at SMA Tulus Bhakti Bekasi, with the amount of the students were 132 students. The samples of this research were taken by cluster random sampling consisting of 60 students. The students were divided into two classes, 30 experimental class (X IPS 1) and 30 other students in the control class (X IPS 2). The instrument used to collect data was learning achievement test in form of subjective test. The statistic method used to analyze the data was One-Way ANOVA using SPSS through the testing of Null Hypothesis. The result of ANOVA showed that value of sig (significance) is lower than 0.05 (0.000<0.05) or Fobserved with df (1/58) is higher than Ftable (28.185>4.01). Accordingly, the Null Hypothesis (Ho), stating that there is no effect of clustering technique on grade ten students’ writing skill of narrative text at SMA Tulus Bhakti, was rejected and the Alternative Hypothesis (Ha), stating that there is significant effect of clustering technique on grade ten students’ writing skill of narrative text at SMA Tulus Bhakti, was accepted. Based on the analysis of the data, the research has empirically proved that there is an effect of clustering technique has an effect on grade ten students’ writing skill of narrative text at SMA Tulus Bhakti Bekasi. Keywords— clustering techniques; quasi-experimental method; writing skill

APA, Harvard, Vancouver, ISO, and other styles

44

D., Mabuni. "Modified Cosine Similarity Measure based Data Classification in Data Mining." International Journal of Engineering and Advanced Technology (IJEAT) 9, no. 5 (2020): 649–54. https://doi.org/10.35940/ijeat.E9754.069520.

Full text

Abstract:

Text data analytics became an integral part of World Wide Web data management and Internet based applications rapidly growing all over the world. E-commerce applications are growing exponentially in the business field and the competitors in the E-commerce are gradually increasing many machine learning techniques for predicting business related operations with the aim of increasing the product sales to the greater extent. Usage of similarity measures is inevitable in modern day to day real applications. Cosine similarity plays a dominant role in text data mining applications such as text classification, clustering, querying, and searching and so on. A modified clustering based cosine similarity measure called MCS is proposed in this paper for data classification. The proposed method is experimentally verified by employing many UCI machine learning datasets involving categorical attributes. The proposed method is superior in producing more accurate classification results in majority of experiments conducted on the UCI machine learning datasets.

APA, Harvard, Vancouver, ISO, and other styles

45

Meng, Zu Qiang, Shi Mo Shen, and Qiu Lian Chen. "A Network Decomposition-Based Text Clustering Algorithm for Topic Detection." Applied Mechanics and Materials 239-240 (December 2012): 1318–23. http://dx.doi.org/10.4028/www.scientific.net/amm.239-240.1318.

Full text

Abstract:

Text clustering is one of the most popular topic detection techniques. However, the existing text clustering approaches require that each document has to be partitioned to one and only one cluster. This is not reasonable in some cases for there exist some documents which should not used to constitute topics. This paper firstly models a text document set as a network and designs a method for decomposing such a network, and then proposes a truly original text clustering algorithm for topic detection, called a network decomposition-based text clustering algorithm for topic detection (NDTCATD). The proposed algorithm ensures that meaningless documents can not be used to constitute topics. Experimental results show that NDTCATD is much better than bisecting k-means algorithm in terms of overall similarity and average cluster similarity. Therefore the proposed algorithm is reasonable and effective and is especially suitable for topic detection.

APA, Harvard, Vancouver, ISO, and other styles

46

Abualigah, Laith, Amir H. Gandomi, Mohamed Abd Elaziz, et al. "Nature-Inspired Optimization Algorithms for Text Document Clustering—A Comprehensive Analysis." Algorithms 13, no. 12 (2020): 345. http://dx.doi.org/10.3390/a13120345.

Full text

Abstract:

Text clustering is one of the efficient unsupervised learning techniques used to partition a huge number of text documents into a subset of clusters. In which, each cluster contains similar documents and the clusters contain dissimilar text documents. Nature-inspired optimization algorithms have been successfully used to solve various optimization problems, including text document clustering problems. In this paper, a comprehensive review is presented to show the most related nature-inspired algorithms that have been used in solving the text clustering problem. Moreover, comprehensive experiments are conducted and analyzed to show the performance of the common well-know nature-inspired optimization algorithms in solving the text document clustering problems including Harmony Search (HS) Algorithm, Genetic Algorithm (GA), Particle Swarm Optimization (PSO) Algorithm, Ant Colony Optimization (ACO), Krill Herd Algorithm (KHA), Cuckoo Search (CS) Algorithm, Gray Wolf Optimizer (GWO), and Bat-inspired Algorithm (BA). Seven text benchmark datasets are used to validate the performance of the tested algorithms. The results showed that the performance of the well-known nurture-inspired optimization algorithms almost the same with slight differences. For improvement purposes, new modified versions of the tested algorithms can be proposed and tested to tackle the text clustering problems.

APA, Harvard, Vancouver, ISO, and other styles

47

Dodda, Ratnam, and Alladi Suresh Babu. "Text document clustering using mayfly optimization algorithm with k-means technique." Indonesian Journal of Electrical Engineering and Computer Science 35, no. 2 (2024): 1099. http://dx.doi.org/10.11591/ijeecs.v35.i2.pp1099-1109.

Full text

Abstract:

Text clustering is a subfield of machine learning (ML) and natural language processing (NLP) that consists of grouping similar sentences or documents based on their content. However, insignificant features in the documents minimize the accuracy of information retrieval which makes it challenging for the clustering approach to efficiently cluster similar documents. In this research, the mayfly optimization algorithm (MOA) with a k-means approach is proposed for text document clustering (TDC) to effectively cluster similar documents. Initially, the data is obtained from Reuters-21678, 20-Newsgroup, and BBC sports datasets, and then pre-processing is established by stemming and stop word removal to remove unwanted phrases or words. The data imbalance approach is established using an adaptive synthetic sampling algorithm (ADASYN), then term frequency-inverse document frequency (TD-IDF) and WordNet features are employed for extracting features. Finally, MOA with the K-means technique is utilized for TDC. The proposed approach achieves better accuracy of 99.75%, 99.54%, and 98.24% when compared to the existing techniques like fuzzy rough set-based robust nearest neighbor-convolutional neural network (FRS-RNN-CNN), TopicStriker, Modsup-based frequent itemset, and rider optimization-based moth search algorithm (Modsup-Rn-MSA), hierarchical dirichlet-multinomial mixture, and multi-view clustering via consistent and specific non-negative matrix (MCCS).

APA, Harvard, Vancouver, ISO, and other styles

48

Ratnam, Dodda Alladi Suresh Babu. "Text document clustering using mayfly optimization algorithm with k-means technique." Indonesian Journal of Electrical Engineering and Computer Science 35, no. 2 (2024): 1099–109. https://doi.org/10.11591/ijeecs.v35.i2.pp1099-1109.

Full text

Abstract:

Text clustering is a subfield of machine learning (ML) and natural language processing (NLP) that consists of grouping similar sentences or documents based on their content. However, insignificant features in the documents minimize the accuracy of information retrieval which makes it challenging for the clustering approach to efficiently cluster similar documents. In this research, the mayfly optimization algorithm (MOA) with a k-means approach is proposed for text document clustering (TDC) to effectively cluster similar documents. Initially, the data is obtained from Reuters-21678, 20-Newsgroup, and BBC sports datasets, and then pre-processing is established by stemming and stop word removal to remove unwanted phrases or words. The data imbalance approach is established using an adaptive synthetic sampling algorithm (ADASYN), then term frequency-inverse document frequency (TD-IDF) and WordNet features are employed for extracting features. Finally, MOA with the K-means technique is utilized for TDC. The proposed approach achieves better accuracy of 99.75%, 99.54%, and 98.24% when compared to the existing techniques like fuzzy rough set-based robust nearest neighbor-convolutional neural network (FRS-RNN-CNN), TopicStriker, Modsup-based frequent itemset, and rider optimization-based moth search algorithm (Modsup-Rn-MSA), hierarchical dirichlet-multinomial mixture, and multi-view clustering via consistent and specific non-negative matrix (MCCS).

APA, Harvard, Vancouver, ISO, and other styles

49

Otradnov, K. K., and V. K. Raev. "EXPERIMENTAL STUDY OF TEXT DOCUMENTS VECTORIZATION TECHNIQUES AND THEIR CLUSTERING ALGORITHMS EFFICIENCY." Vestnik of Ryazan State Radio Engineering University 64 (2018): 73–84. http://dx.doi.org/10.21667/1995-4565-2018-64-2-73-84.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Punitha, S. C., and M. Punithavalli. "Performance Evaluation of Semantic Based and Ontology Based Text Document Clustering Techniques." Procedia Engineering 30 (2012): 100–106. http://dx.doi.org/10.1016/j.proeng.2012.01.839.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!