Log in

Relevant bibliographies by topics / K-Means Cluster (K-means) / Journal articles

To see the other types of publications on this topic, follow the link: K-Means Cluster (K-means).

Journal articles on the topic 'K-Means Cluster (K-means)'

Author: Grafiati

Published: 5 June 2025

Last updated: 24 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'K-Means Cluster (K-means).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Hedar, Abdel-Rahman, Abdel-Monem Ibrahim, Alaa Abdel-Hakim, and Adel Sewisy. "K-Means Cloning: Adaptive Spherical K-Means Clustering." Algorithms 11, no. 10 (2018): 151. http://dx.doi.org/10.3390/a11100151.

Full text

Abstract:

We propose a novel method for adaptive K-means clustering. The proposed method overcomes the problems of the traditional K-means algorithm. Specifically, the proposed method does not require prior knowledge of the number of clusters. Additionally, the initial identification of the cluster elements has no negative impact on the final generated clusters. Inspired by cell cloning in microorganism cultures, each added data sample causes the existing cluster ‘colonies’ to evaluate, with the other clusters, various merging or splitting actions in order for reaching the optimum cluster set. The proposed algorithm is adequate for clustering data in isolated or overlapped compact spherical clusters. Experimental results support the effectiveness of this clustering algorithm.

APA, Harvard, Vancouver, ISO, and other styles

2

de Maeyer, Rieke, Sami Sieranoja, and Pasi Fränti. "Balanced k-means revisited." Applied Computing and Intelligence 3, no. 2 (2023): 145–79. http://dx.doi.org/10.3934/aci.2023008.

Full text

Abstract:

<abstract><p>The $ k $-means algorithm aims at minimizing the variance within clusters without considering the balance of cluster sizes. Balanced $ k $-means defines the partition as a pairing problem that enforces the cluster sizes to be strictly balanced, but the resulting algorithm is impractically slow $ \mathcal{O}(n^3) $. Regularized $ k $-means addresses the problem using a regularization term including a balance parameter. It works reasonably well when the balance of the cluster sizes is a mandatory requirement but does not generalize well for soft balance requirements. In this paper, we revisit the $ k $-means algorithm as a two-objective optimization problem with two goals contradicting each other: to minimize the variance within clusters and to minimize the difference in cluster sizes. The proposed algorithm implements a balance-driven variant of $ k $-means which initially only focuses on minimizing the variance but adds more weight to the balance constraint in each iteration. The resulting balance degree is not determined by a control parameter that has to be tuned, but by the point of termination which can be precisely specified by a balance criterion.</p></abstract>

APA, Harvard, Vancouver, ISO, and other styles

3

Hetangi, D. Mehta* Daxa Vekariya Pratixa Badelia. "COMPARISON AND EVALUATION OF CLUSTER BASED IMAGE SEGMENTATION TECHNIQUES." Global Journal of Engineering Science and Research Management 4, no. 12 (2017): 24–33. https://doi.org/10.5281/zenodo.1098696.

Full text

Abstract:

Image segmentation is the classification of an image into different groups. Numerous algorithms using different approaches have been proposed for image segmentation. A major challenge in segmentation evaluation comes from the fundamental conflict between generality and objectivity. A review is done on different types of clustering methods used for image segmentation. Also a methodology is proposed to classify and quantify different clustering algorithms based on their consistency in different applications. There are different methods and one of the most popular methods is k-means clustering algorithm.  K-means clustering algorithm is an unsupervised algorithm and it is used to segment the interest area from the background. Enhanced k-means clustering is used to improve accuracy and efficiency of k means clustering algorithm. The number of clusters is changed for fuzzy c-mean algorithm. Subtractive cluster is used to generate the initial centers and these centers are used in k-means algorithm for the segmentation of image. Genetic algorithm is used for centroids in the given value K clusters (GAKM). GAKM is good for complex problems it retains best features. An outcome revealed that the accuracy and performance of GAKM is better than simple K-means and other clustering algorithms.

APA, Harvard, Vancouver, ISO, and other styles

4

Rosmayati, Mohemad, Naziah Mohd Muhait Nazratul, Maizura Mohamad Noor Noor, and Ali Othman Zulaiha. "Performance analysis in text clustering using k-means and k-medoids algorithms for Malay crime documents." International Journal of Electrical and Computer Engineering (IJECE) 12, no. 5 (2022): 5014–26. https://doi.org/10.11591/ijece.v12i5.pp5014-5026.

Full text

Abstract:

Few studies on text clustering for the Malay language have been conducted due to some limitations that need to be addressed. The purpose of this article is to compare the two clustering algorithms of k-means and k-medoids using Euclidean distance similarity to determine which method is the best for clustering documents. Both algorithms are applied to 1,000 documents pertaining to housebreaking crimes involving a variety of different modus operandi. Comparability results indicate that the k-means algorithm performed the best at clustering the relevant documents, with a 78% accuracy rate. K-means clustering also achieves the best performance for cluster evaluation when comparing the average within-cluster distance to the k-medoids algorithm. However, k-medoids perform exceptionally well on the Davis Bouldin index (DBI). Furthermore, the accuracy of k-means is dependent on the number of initial clusters, where the appropriate cluster number can be determined using the elbow method.

APA, Harvard, Vancouver, ISO, and other styles

5

Sihombing, Pardomuan Robinson, Yoshep Paulus Apri Caraka Yuda, Busminoloan Busminoloan, and Iis Hayyun Nurul Islam. "KOMPARASI PERFORMA K-MEANS DAN FUZZY C-MEANS." Jurnal Bayesian : Jurnal Ilmiah Statistika dan Ekonometrika 2, no. 2 (2022): 125–32. http://dx.doi.org/10.46306/bay.v2i2.35.

Full text

Abstract:

This study aims to test the performance of the K-Means Cluster method with Fuzzy C-Means. The data used is data from the Inclusive Economic Development Index in 34 provinces in Indonesia in 2021. The data is sourced from Bappenas. The optimum number of clusters suggested using the Elbow method technique is as many as 4 clusters. By paying attention to the silouhette value the K-Means method is as good as the Fuzzi C-Means. However, the K-Means method is better than the Fuzzy C-Means model when viewed based on the criteria of smaller AIC and BIC values and a larger R 2. The provinces of Papua and West Papua have negative cluster means values for all variables so it is said that it is still lacking for all pillars of the IEDI. On the other hand, the provinces of DI Yogyakarta and DKI Jakarta have positive cluster means values for all variables so that they are said to be good in terms of the economy and opportunities and access but still have high inequality and poverty. Comprehensive and targeted policies are needed so that inclusive economic development in Indonesia can be evenly distributed and increased every year

APA, Harvard, Vancouver, ISO, and other styles

6

Zahir, Zainuddin, and Alviadi Nur Risal Andi. "Balanced clustering for student admission school zoning by parameter tuning of constrained k-means." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 2 (2024): 2301–13. https://doi.org/10.11591/ijai.v13.i2.pp2301-2313.

Full text

Abstract:

The Indonesian government issued a regulation through the Ministry of Education and Culture, number 51 of 2018, which contains zoning rules to improve the quality of education in school educational institutions. This research aims to compare the performance of the k-means algorithm with the constrained k-means algorithm to model the zoning of each school area based on the shortest distance parameter between the school location and the domicile of prospective students. The study used data from 2,248 prospective students and 22 public school locations. The results of testing the k-means algorithm in grouping showed the formation of non-circular patterns in the cluster membership with different numbers of centroid cluster members. In contrast, testing the constrained k-means algorithm showed balanced outcomes in cluster membership with a membership value of 103 for each school as the cluster center. The research findings state that the developed constrained k-means algorithm solves the problem of unbalanced data clustering and overlapping issues in the process of new student admissions. In other words, the constrained k-means algorithm can be a reference for the government in making decisions on new student admissions.

APA, Harvard, Vancouver, ISO, and other styles

7

Pham, D. T., S. S. Dimov, and C. D. Nguyen. "An Incremental K-means algorithm." Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 218, no. 7 (2004): 783–95. http://dx.doi.org/10.1243/0954406041319509.

Full text

Abstract:

Data clustering is an important data exploration technique with many applications in engineering, including parts family formation in group technology and segmentation in image processing. One of the most popular data clustering methods is K-means clustering because of its simplicity and computational efficiency. The main problem with this clustering method is its tendency to coverge at a local minimum. In this paper, the cause of this problem is explained and an existing solution involving a cluster centre jumping operation is examined. The jumping technique alleviates the problem with local minima by enabling cluster centres to move in such a radical way as to reduce the overall cluster distortion. However, the method is very sensitive to errors in estimating distortion. A clustering scheme that is also based on distortion reduction through cluster centre movement but is not so sensitive to inaccuracies in distortion estimation is proposed in this paper. The scheme, which is an incremental version of the K-means algorithm, involves adding cluster centres one by one as clusters are being formed. The paper presents test results to demonstrate the efficacy of the proposed algorithm.

APA, Harvard, Vancouver, ISO, and other styles

8

Gubu, La, Edi Cahyono, Arman, Herdi Budiman, and Muh Kabil Djafar. "CLUSTER ANALYSIS FOR MEAN-VARIANCE PORTFOLIO SELECTION: A COMPARISON BETWEEN K-MEANS AND K-MEDOIDS CLUSTERING." Jurnal Riset dan Aplikasi Matematika (JRAM) 7, no. 2 (2023): 104–15. https://doi.org/10.26740/jram.v7n2.p104-115.

Full text

Abstract:

This paper presents the Mean-Variance (MV) portfolio selection using cluster analysis. Stocks are categorized into various clusters using K-Means and K-Medoids clustering. Based on the Sharpe ratio, a stock from each cluster is chosen to represent that cluster. Stocks with the greatest Sharpe ratio are those that are chosen for each cluster. With the guidance of the MV portfolio model, the optimum portfolio is identified. When there are many stocks included in the formation of the portfolio, we may efficiently create the optimal portfolio using this method. For the empirical study, the daily return of stocks traded on the Indonesia Stock Exchange that are part of the LQ-45 index from August 2022 to January 2023 was used to establish the weight of the portfolio, while the fundamental data of LQ-45 stocks for 2022 were used to build clusters. Using K-Means and K-Medoids clustering, this study's results show that LQ-45 stocks are divided into six groups. Additionally, it is obtained that for risk aversion , portfolio performance with K-Means clustering is better than portfolio performance with K-Medoids clustering. In contrast, for risk aversion , portfolio performance with K-Medoids clustering is better than portfolio performance with K-Means clustering.

APA, Harvard, Vancouver, ISO, and other styles

9

Tri Gustiane, Indri, Martanto Martanto, and Tati Suprapti. "CLUSTERING HASIL CEK DARAH DIABETES LANSIA MENGGUNAKAN METODE K-MEANS DI POSBINDU KP. LEBAKJERO DESA CIHERANG." JATI (Jurnal Mahasiswa Teknik Informatika) 8, no. 2 (2024): 2125–29. http://dx.doi.org/10.36040/jati.v8i2.9281.

Full text

Abstract:

Penelitian ini bertujuan untuk menganalisis hasil cek darah lansia yang menderita diabetes menggunakan metode K-Means. Diabetes adalah penyakit metabolic yang ditandai dengan tingginya kadar gula darah (hiperglikemia) yang disebabkan oleh kekurangan insulin atau tidak efektif insulin dalam mengatur metabolisme glukosa. Selain itu terdapat faktor-faktor lain menjadi penyebab terjadinya diabetes diantaranya seperti faktor keturunan, berat badan, usia, tekanan darah dan sebagainya. Diabetes penyakit kronis yang umumnya terjadi pada lansia dan membutuhkan pemantauan berkala untuk mengelola kondisi mereka. Dengan metode K-Means untuk mengelompokan lansia ke dalam kategori yang berbeda berdasarkan karakteristik darah mereka. Metode K-Means Clustering merupakan metode yang digunakan dalam data mining yang cara kerjanya mencari dan mengelompokan data yang mempunyai kemiripan karakteristik antara data satu dengan data lain yang telah diperol eh data yang memiliki kesamaan bukan data yang sama tetapi memiliki karakteristik yang sama, Dengan menerapkan metode K-Means Clustering dapat membantu pihak Posbindu Kp.Lebakjero Desa Ciherang. Penelitian ini akan di cluster menjadi Lansia yang memiliki penyakit Diabetes paling tinggi di Posbindu Kp.Lebakjero Desa Ciherang. Dalam Cluster tersebut atribut yang dipakai adalah Nama, Jenis Kelamin, Usia, dan Hasil Cek Darah. Hasil analisis dapat membantu petugas kesehatan dalam merancang intervensi yang lebih spesifik dan efektif untuk mengelola diabetes pada populasi lansia. Hasil penelitian K-Means Clustering dibantu hasil nilai DBI dengan -0.597, menjadi 6 cluster dimana hasil cluster0 57, cluster1 24, cluster2 30, cluster3 23, cluster4 44, cluster5 25 dan hasil paling optimal di cluster0 yaitu 57. Cluster0 dengan 57 lansia dimana hasil cluster adalah kp.lebakjero mempunyai lansia paling banyak dan mempunyai diabetes paling tinggi. Selain itu, penelitian ini juga untuk mencapai sesuatu hasil yang akurat terhadap data yang di hasilkan di Posbindu Kp.Lebakjero Desa Ciherang.

APA, Harvard, Vancouver, ISO, and other styles

10

Rifa, Isna Hidayatur, Hasih Pratiwi, and Respatiwulan Respatiwulan. "CLUSTERING OF EARTHQUAKE RISK IN INDONESIA USING K-MEDOIDS AND K-MEANS ALGORITHMS." MEDIA STATISTIKA 13, no. 2 (2020): 194–205. http://dx.doi.org/10.14710/medstat.13.2.194-205.

Full text

Abstract:

Earthquake is the shaking of the earth's surface due to the shift in the earth's plates. This disaster often happens in Indonesia due to the location of the country on the three largest plates in the world and nine small others which meet at an area to form a complex plate arrangement. An earthquake has several impacts which depend on the magnitude and depth. This research was, therefore, conducted to classify earthquake data in Indonesia based on the magnitudes and depths using one of the data mining techniques which is known as clustering through the application of k-medoids and k-means algorithms. However, k-medoids group data into clusters with medoid as the centroid and it involves using clustering large application (CLARA) algorithm while k-means divide data into k clusters where each object belongs to the cluster with the closest average. The results showed the best clustering for earthquake data in Indonesia based on magnitude and depth is the CLARA algorithm and five clusters were found to have total members of 2231, 1359, 914, 2392, and 199 objects for cluster 1 to cluster 5 respectively.

APA, Harvard, Vancouver, ISO, and other styles

11

Setiadi, Anggun, and Erma Delima Sikumbang. "K-Means Clustering Dalam Penerimaan Karyawan Baru." INFORMATICS FOR EDUCATORS AND PROFESSIONAL : Journal of Informatics 4, no. 2 (2020): 103. http://dx.doi.org/10.51211/itbi.v4i2.1304.

Full text

Abstract:

Dalam penerimaan karyawan baru sulitnya bagian SDM PT. Erdikha Elit Sekuritas dalam mengelompokkan data-data karyawan baru dan tidak adanya sistem tes dalam pemilihan karyawan baru. Metode K-Means Clustering adalah salah satu metode cluster analysis non hirarki yang berusaha untuk mengelompokkan data-data yang ada satu atau lebih cluster atau kelompok, oleh karena itu metode ini sangat cocok digunakan untuk mengatasi permasalahan dalam mengelompokkan data-data calon karyawan baru dan mengimplementasikan menggunakan software RapidMiner dengan hasil penelitian 0,125% untuk cluster 1 yang berjumlah 2 data karyawan baru, 0,125% untuk cluster 2 yang berjumlah 2 data karyawan baru, dan 0,750% untuk cluster 3 yang berjumlah 12 data karyawan baru. Strategi pemilihan karyawan baru nantinya akan mengikuti cluster yang terbentuk berdasarkan data yang paling banyak diantara 3 cluster yang ada, yaitu di cluster ke- 3, karena dengan data cluster yang paling banyaklah yang lebih banyak memenuhi kriteria. Kata kunci: K-Means Clustering, Penerimaan Karyawan Baru Abstract: In the case of hiring new employees, the difficulty of the HR department of PT. Erdikha Elit Sekuritas in classifying new employee data and the absence of a test system in the selection of new employees. K-Means Clustering method is a non-hierarchical cluster analysis method that seeks to group existing data into one or more clusters or groups, therefore this method is very suitable to be used to overcome problems in grouping data on prospective new employees and implements using RapidMiner software with research results of 0.125% for cluster 1 which amounts to 2 new employee data, 0.125% for cluster 2 which amounts to 2 new employee data, and 0.750% for cluster 3 which amounts to 12 new employee data. The new employee selection strategy will follow the clusters formed based on the most data among the 3 existing clusters, namely in the 3rd cluster, because with the most data clusters that meet more the required criteria. Keywords: Acceptance of new employees, K-Means Clustering.

APA, Harvard, Vancouver, ISO, and other styles

12

Muhamed, Lekaa, and Hayder Mohammed. "On Clustering Scheme for Kernel K-Means." Journal of Al-Rafidain University College For Sciences ( Print ISSN: 1681-6870 ,Online ISSN: 2790-2293 ), no. 1 (October 1, 2021): 544–54. http://dx.doi.org/10.55562/jrucs.v46i1.106.

Full text

Abstract:

Cluster analysis mainly concerned with dividing the number of data elements into clusters observation in the same cluster are homogeneous and are not homogeneous with other clusters, but in the case of nonparametric data it is not possible to deal with classic estimated because of obtaining misleading results This gave rise to adopt efficient estimation methods known as the kernel methods. One of the methods of clustering is Non-Hierarchical clustering aims to divide the dataset into (k) homogeneous cluster groups based on the idea of the central the tendency of the cluster group using (k) averages. There are many methods of non-hierarchical clustering, some depends on the arithmetic mean, and others depend on the mediator or mode.

APA, Harvard, Vancouver, ISO, and other styles

13

Simoes, Stanley, Deepak P, and Muiris MacCarthaigh. "Towards Fairer Centroids in K-means Clustering." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 19 (2024): 21583–91. http://dx.doi.org/10.1609/aaai.v38i19.30156.

Full text

Abstract:

There has been much recent interest in developing fair clustering algorithms that seek to do justice to the representation of groups defined along sensitive attributes such as race and sex. Within the centroid clustering paradigm, these algorithms are seen to generate clusterings where different groups are disadvantaged within different clusters with respect to their representativity, i.e., distance to centroid. In view of this deficiency, we propose a novel notion of cluster-level centroid fairness that targets the representativity unfairness borne by groups within each cluster, along with a metric to quantify the same. Towards operationalising this notion, we draw on ideas from political philosophy aligned with consideration for the worst-off group to develop Fair-Centroid; a new clustering method that focusses on enhancing the representativity of the worst-off group within each cluster. Our method uses an iterative optimisation paradigm wherein an initial cluster assignment is refined by reassigning objects to clusters such that the worst-off group in each cluster is benefitted. We compare our notion with a related fairness notion and show through extensive empirical evaluations on real-world datasets that our method significantly enhances cluster-level centroid fairness at low impact on cluster coherence.

APA, Harvard, Vancouver, ISO, and other styles

14

Al Rivan, Muhammad Ezar, and Randy Andreo Sonaru. "Perbandingan Metode K-Means dan GA K-Means untuk Clustering Dataset Heart Disease Patients." JATISI (Jurnal Teknik Informatika dan Sistem Informasi) 9, no. 3 (2022): 2585–97. http://dx.doi.org/10.35957/jatisi.v9i3.2799.

Full text

Abstract:

Penyakit jantung adalah kondisi dimana jantung sebagai organ vital manusia mengalami gangguan dan tidak berfungsi dengan baik dan merupakan penyakit yang paling mematikan di dunia serta menjadi penyebab utama kematian secara global, dengan total kematian sekitar 17,9 juta jiwa per tahunnya. Pada penelitian ini dilakukan pengelompokkan data pasien terdiagnosis penyakit jantung untuk melihat karakteristik dan persamaan dari setiap pasien. Dataset yang digunakan adalah dataset Heart Disease Patients yang berjumlah 303 data medis pasien dengan 11 atribut atau fitur. Metode K-Means dan GA K-Means digunakan untuk pengelompokan. Algoritma genetika digunakan untuk mengoptimasi centroid awal untuk pengelompokkan K-Means. Hasil penelitian dievaluasi dengan mencatat iterasi, inter cluster dan intra cluster masing-masing metode pengelompokkan. Algoritma genetika mampu mengoptimasi metode K-Means yang terlihat dari rata-rata iterasi dari 13,4 menjadi 12,5 dengan iterasi maksimum turun dari 21 iterasi menjadi 17 iterasi. Berdasarkan hasil perhitungan inter cluster dan intra cluster, hasil intra cluster dari GA K-Means lebih baik dibandingkan dengan K-Means dan untuk inter cluster sangat kecil perbedaannya, dimana rata-rata inter cluster metode K-Means sedikit lebih baik daripada GA K-Means.

APA, Harvard, Vancouver, ISO, and other styles

15

Dwitiyanti, Nurfidah, Siti Ayu Kumala, and Shinta Dwi Handayani. "Comparative Study of Earthquake Clustering in Indonesia Using K-Medoids, K-Means, DBSCAN, Fuzzy C-Means and K-AP Algorithms." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 8, no. 6 (2024): 768–78. https://doi.org/10.29207/resti.v8i6.5514.

Full text

Abstract:

Indonesia’s frequent earthquakes, caused by its position at the convergence of multiple tectonic plates, Indonesia's frequent earthquakes, caused by its position at the convergence of multiple tectonic plates, necessitate precise seismic zone identification to improve disaster preparedness. This research evaluates the effectiveness of five clustering algorithms—K-Medoids, K-Means, DBSCAN, Fuzzy C-Means, and K-Affinity Propagation (K-AP)—for analyzing earthquake data from January 2017 to January 2023. Using a dataset from BMKG encompassing 13,860 seismic events, each algorithm was assessed based on Silhouette Score and Cluster Purity metrics. Results indicated that K-Means provided the best balance, forming six clusters with a Silhouette Score of 0.3245 and Cluster Purity of 0.7366, making it the most suitable for seismic zone analysis. K-Medoids closely followed with a Silhouette Score of 0.3158 and Cluster Purity of 0.7190. Although DBSCAN effectively handled noise, its negative Silhouette values indicated poor clustering quality. Fuzzy C-Means and K-AP underperformed, with K-AP generating an impractically high number of clusters (196) and the lowest Silhouette Score (0.2550). This study offers a novel, comprehensive comparison of clustering algorithms for Indonesian earthquake data, emphasizing a dual-metric evaluation approach. By identifying K-Means as the most effective algorithm, provides valuable insights for disaster mitigation and seismic risk analysis.

APA, Harvard, Vancouver, ISO, and other styles

16

Fauziyah, Wardah Muna, and Anneke Iswani Achmad. "Penerapan Analisis Cluster Hybrid untuk Pengelompokan Kabupaten/Kota di Provinsi Jawa Barat Berdasarkan Indikator Kemiskinan Tahun 2022." Bandung Conference Series: Statistics 3, no. 2 (2023): 566–74. http://dx.doi.org/10.29313/bcss.v3i2.8610.

Full text

Abstract:

Abstract. Hybrid cluster analysis is a combination of hierarchical and non-hierarchical clusters, which has a goal as an alternative method. The advantage of hybrid cluster analysis is that it can determine k-clusters for the process of making non-hierarchical clusters through the results of making hierarchical cluster methods, which will produce the right k-clusters. With the advantages of the hybrid cluster analysis, this research will combine the single linkage method with k-means, then the ward method with k-means. The purpose of this study was to determine the grouping of districts/cities in West Java Province with the most optimal combination method based on poverty in 2022. Based on the results of hybrid cluster analysis using a combination of k-means clusters and single linkage, 4 clusters were obtained. While the results of hybrid cluster analysis using a combination of k-means clusters and the ward method obtained 3 clusters. Among the hybrid cluster analysis using a combination of k-means and single linkage clusters as well as a combination of k-means cluster and ward method, it can be said that the method with a combination of k-means and single linkage is the best or most optimal method with the smallest standard deviation ratio value of 88.38%. Abstrak. Analisis cluster hybrid merupakan kombinasi antara cluster hierarki dan non-hierarki, dimana memiliki tujuan sebagai metode alternatif. Kelebihan analisis cluster hybrid adalah dapat menentukan k-klaster untuk proses pembuatan cluster non-hierarki melalui hasil dari pembuatan cluster metode hierarki, dimana akan menghasilkan k-klaster yang tepat. Dengan kelebihan analisis cluster hybrid tersebut, maka dalam penelitian ini akan mengkombinasikan metode single linkage dengan k-means, kemudian ward method dengan k-means. Tujuan penelitian ini adalah dapat mengetahui pengelompokan Kabupaten/Kota di Provinsi Jawa Barat dengan metode kombinasi yang paling optimal berdasarkan kemiskinan tahun 2022. Berdasarkan hasil analisis cluster hybrid dengan menggunakan kombinasi cluster k-means dan single linkage diperoleh 4 cluster. Sedangkan hasil analisis cluster hybrid dengan menggunakan kombinasi cluster k-means dan ward method diperoleh 3 cluster. Diantara analisis cluster hybrid dengan menggunakan kombinasi cluster k-means dan single linkage serta kombinasi cluster k-means dan ward method, dapat dikatakan bahwa metode dengan kombinasi k-means dan single linkage merupakan metode terbaik atau yang paling optimal dengan nilai rasio simpangan baku terkecil yaitu sebesar 88,38%.

APA, Harvard, Vancouver, ISO, and other styles

17

Belhaouari, Samir Brahim, Shahnawaz Ahmed, and Samer Mansour. "Optimized K-Means Algorithm." Mathematical Problems in Engineering 2014 (2014): 1–14. http://dx.doi.org/10.1155/2014/506480.

Full text

Abstract:

The localization of the region of interest (ROI), which contains the face, is the first step in any automatic recognition system, which is a special case of the face detection. However, face localization from input image is a challenging task due to possible variations in location, scale, pose, occlusion, illumination, facial expressions, and clutter background. In this paper we introduce a new optimized k-means algorithm that finds the optimal centers for each cluster which corresponds to the global minimum of the k-means cluster. This method was tested to locate the faces in the input image based on image segmentation. It separates the input image into two classes: faces and nonfaces. To evaluate the proposed algorithm, MIT-CBCL, BioID, and Caltech datasets are used. The results show significant localization accuracy.

APA, Harvard, Vancouver, ISO, and other styles

18

Indraputra, R. A., and Rina Fitriana. "K-Means Clustering Data COVID-19." JURNAL TEKNIK INDUSTRI 10, no. 3 (2020): 275–82. http://dx.doi.org/10.25105/jti.v10i3.8428.

Full text

Abstract:

Intisari— Pandemi COVID-19 merupakan suatu kejadian yang menimbulkan banyak sekali data yang sulit diolah. Data-data yang sangat penting seperti jumlah infeksi yang terkonfirmasi, jumlah kematian, dan jumlah orang yang pulih dapat diperoleh dari database seperti Kaggle, akan tetapi data tersebut perlu diolah lagi agar dapat menjadi berguna. Tujuan dari penelitian ini adalah untuk memperoleh dan mengolah data COVID-19 yang terdapat pada Kaggle mengunakan metode Data Mining yaitu K-Means Clustering Untuk K-Means Clustering pada penelitian ini, akan digunakan tiga metode untuk mengolah data yaitu pengolahan menggunakan software Microsoft Excel, dan software Data Mining yaitu Weka dan KNIME. Dari hasil pengolahan data, diperoleh dua cluster data, dimana cluster 2 memiliki jumlah terjangkit dan meninggal yang lebih tinggi dibandingkan dengan cluster 1, maka daerah-daerah cluster tersebut perlu diprioritaskan penanganannya.Abstract— The COVID-19 pandemic is an event that has generated lots of data that are difficult to process. Crucial data such as number of confirmed infections, number of deaths, and number of people recovered can be obtained from databases such as Kaggle, however these data needs to be processed further to become useful. The purpose of this research is to obtain and process COVID-19 data contained in Kaggle using Data Mining method namely K-Means Clustering Therefore, to process Big Data such as this, a Data Mining technique can be used which is Clustering. For K-Means Clustering in this research, there will be three methods used to process this data which is processing using the Microsoft Excel software, and using the Weka and KNIME Data Mining software. From the data processing results, two data clusters are obtained, in which cluster 2 have higher number of confirmed cases and deaths compared to cluster 1, thus the regions in that cluster needs priority in handling.

APA, Harvard, Vancouver, ISO, and other styles

19

Fujiwara, Yasuhiro, Atsutoshi Kumagai, Yasutoshi Ida, Masahiro Nakano, Makoto Nakatsuji, and Akisato Kimura. "Efficient Algorithm for K-Multiple-Means." Proceedings of the ACM on Management of Data 2, no. 1 (2024): 1–26. http://dx.doi.org/10.1145/3639273.

Full text

Abstract:

K-Multiple-Means is an extension of K-means for the clustering of multiple means used in many applications, such as image segmentation, load balancing, and blind-source separation. Since K-means uses only one mean to represent each cluster, it fails to capture non-spherical cluster structures of data points. However, since K-Multiple-Means represents the cluster by computing multiple means and grouping them into specified c clusters, it can effectively capture the non-spherical clusters of the data points. To obtain the clusters, K-Multiple-Means updates a similarity matrix of a bipartite graph between the data points and the multiple means by iteratively computing the leading c singular vectors of the matrix. K-Multiple-Means, however, incurs a high computation cost for large-scale data due to the iterative SVD computations. Our proposal, F-KMM, increases the efficiency of K-Multiple-Means by computing the singular vectors from a smaller similarity matrix between the multiple means obtained from the similarity matrix of the bipartite graph. To compute the similarity matrix of the bipartite graph efficiently, we skip unnecessary distance computations and estimate lower bounding distances between the data points and the multiple means. Theoretically, the proposed approach guarantees the same clustering results as K-Multiple-Means since it can exactly compute the singular vectors from the similarity matrix between the multiple means. Experiments show that our approach is several orders of magnitude faster than previous clustering approaches that use multiple means.

APA, Harvard, Vancouver, ISO, and other styles

20

Shanthi, K., and Dr Sivabalakrishnan .M. "Performance Analysis of Improved K-Means & K-Means in Cluster Generation." International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering 03, no. 09 (2014): 11878–84. http://dx.doi.org/10.15662/ijareeie.2014.0309049.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Deng, Ai Ping, Ben Xiao, and Hui Yong Yuan. "Adaptive K-Means Algorithm with Dynamically Changing Cluster Centers and K-Value." Advanced Materials Research 532-533 (June 2012): 1373–77. http://dx.doi.org/10.4028/www.scientific.net/amr.532-533.1373.

Full text

Abstract:

In allusion to the disadvantage of having to obtain the number of clusters in advance and the sensitivity to selecting initial clustering centers in the K-means algorithm, an improved K-means algorithm is proposed, that the cluster centers and the number of clusters are dynamically changing. The new algorithm determines the cluster centers by calculating the density of data points and shared nearest neighbor similarity, and controls the clustering categories by using the average shared nearest neighbor self-similarity.The experimental results of IRIS testing data set show that the algorithm can select the cluster cennters and can distinguish between different types of cluster efficiently.

APA, Harvard, Vancouver, ISO, and other styles

22

Akbari, Gumilar, and Yusrila Kerlooza. "Peningkatan Hasil Cluster Menggunakan Algoritma Dynamic K-means dan K-means Binary Search Centroid." Jurnal Tata Kelola dan Kerangka Kerja Teknologi Informasi 4, no. 1 (2018): 25–33. http://dx.doi.org/10.34010/jtk3ti.v4i1.1395.

Full text

Abstract:

Pada studi kasus segmentasi pelanggan, data yang digunakan untuk segmentasi memiliki atribut data berdasarkan nilai Recency, Frequency, dan Monetery dan memiliki jumlah data 500, untuk membentuk segmentasi pelanggan dapat digunakan teknik clustering. Clustering adalah proses untuk mengelompokkan datum ke dalam sejumlah cluster (kelompok data). Salah satu teknik Clustering adalah teknik clustering partisi, algoritma clustering yang digunakan pada penelitian ini yaitu algoritma Dynamic K-means (DK) dan K-means Binary Search Centroid (KBSC). Pada algoritma Dynamic K-means memiliki kemampuan untuk mencari jumlah Cluster, namun memiliki kekurangan dalam penentuan titik centroid (pusat cluster), sedangkan algoritma KBSC memiliki kemampuan untuk menentukan titik centroid Cluster, namun memiliki kekurangan dalam mencari jumlah Cluster. Pada penelitian ini menggabungkan kedua algoritma antara algoritma DK dan KBSC dan akan diujikan pada data model buatan yang bertujuan untuk melihat karakteristik dari algoritma, dan diujikan pada data studi studi kasus yang bertujuan untuk mengetahui kemampuan algoritma dalam menyelasaikan kasus segmentasi pelanggan. Berdasarkan pengukuran Devies Bouldin Index (DBI) algoritma gabungan DK-KBSC menghasilkan nilai DBI lebih baik dibandingkan algoritma lainnya.saat diimplementasikan pada data kasus segmentasi pelanggan.

APA, Harvard, Vancouver, ISO, and other styles

23

Litvinenko, Natalya, Orken Mamyrbayev, Assem Shayakhmetova, and Mussa Turdalyuly. "Clusterization by the K-means method when K is unknown." ITM Web of Conferences 24 (2019): 01013. http://dx.doi.org/10.1051/itmconf/20192401013.

Full text

Abstract:

There are various methods of objects’ clusterization used in different areas of machine learning. Among the vast amount of clusterization methods, the K-means method is one of the most popular. Such a method has as pros as cons. Speaking about the advantages of this method, we can mention the rather high speed of objects clusterization. The main disadvantage is a necessity to know the number of clusters before the experiment. This paper describes the new way and the new method of clusterization, based on the K-means method. The method we suggest is also quite fast in terms of processing speed, however, it does not require the user to know in advance the exact number of clusters to be processed. The user only has to define the range within which the number of clusters is located. Besides, using suggested method there is a possibility to limit the radius of clusters, which would allow finding objects that express the criteria of one cluster in the most distinctive and accurate way, and it would also allow limiting the number of objects in each cluster within the certain range.

APA, Harvard, Vancouver, ISO, and other styles

24

Anggraini, Rahayu, Elin Haerani, Jasril Jasril, and Iis Afrianty. "Pengelompokkan Penyakit Pasien Menggunakan Algoritma K-Means." JURIKOM (Jurnal Riset Komputer) 9, no. 6 (2022): 1840. http://dx.doi.org/10.30865/jurikom.v9i6.5145.

Full text

Abstract:

Health is one of the most important factors besides education and income. Everyone has the same human rights to get good health services. A government agency that functions to serve all people who need medical services in Indonesia, namely the puskesmas. Ujung Batu Health Center which is located in Ujung Batu sub-district, Rokan Hulu Regency as one of the government agencies. The Ujung Batu health center stores patient medical record data, only sorting out the disease. Therefore, the medical record data needs to be processed using clustering or grouping using the K-Means method. This algorithm partitions the data into clusters so that data with the same characteristics are grouped into the same cluster and data with different characteristics are grouped. into another cluster. The data used consisted of 3875 records and 5 attributes, namely Gender, Participant Type, Diagnosis, Return Status, Address. From the test using the K-means algorithm, the clustering results show that cluster 1 has 710 data while cluster 2 has 3165 data. The results of the study show that the use of 2 clusters is the best cluster with a Silhouette Coefficient value showing results with a SC value of 0.646.

APA, Harvard, Vancouver, ISO, and other styles

25

Mr., Mohan Raj C. S., and Srikanth V. Dr. "K-Means and Fuzzy C-Means Algorithm for Mammogramy Image Segmentation." Sangrathan Journal, UGC Care Listed Journal 4, no. 1 (2024): 203–15. https://doi.org/10.5281/zenodo.11000974.

Full text

Abstract:

One of the foremost challenges in image analysis is image segmentation. The majority of medical applications often involve trained operators extracting images from targeted regions that may be physically distinct but statistically indistinguishable. Also, Image segmentation is time-consuming and has poor reproducibility often subjected to manual errors and biases. Identification of clusters in given data is another challenge during clustering. K-means is a widely used clustering technique that divides the data into K different clusters. In this strategy, clusters are specified in advance, which heavily depends on the early discovery of items that accurately reflect the clusters. In order to make clusters independent of the initial identification of cluster representation, several clustering researchers have concentrated on enhancing the clustering process. The proposed method in this paper advances an adaptive technique that grows the clusters without the initial selection of elements representing the cluster. It is found to be capable of segmenting the regions of smoothly varying intensity distributions. The method has produced a noticeable speedup in the search process.  

APA, Harvard, Vancouver, ISO, and other styles

26

Orisa, Mira. "Optimasi Cluster pada Algoritma K-Means." Prosiding SENIATI 6, no. 2 (2022): 430–37. http://dx.doi.org/10.36040/seniati.v6i2.5034.

Full text

Abstract:

Metode evaluasi yang digunakan adalah metode-metode internal. Metode internal melakukan evaluasi dengan melihat seberapa jauh jarak antar cluster dan seberapa padat cluster-cluster tersebut. Pengklasterisasian data dimodelkan menggunakan algoritma K-Means. Algoritma K-Means memiliki kelemahan dalam menentukan centroid awal. Centroid awal ditentukan secara random/acak untuk sejumlah k cluster yang dipilih. Sehingga keluaran yang dihasilkan bergantung pada pemilihan centroid awal tersebut. Algoritma K-Means harus dijalankan berulang kali untuk mendapatkan hasil cluster yang optimal. Evaluasi cluster untuk menemukan jumlah cluster terbaik pada algoritma K-means dapat ditentukan dengan metode internal seperti metode Elbow, Davies Bouldin Index, dan Silhouette Index. Metode Elbow merupakan Teknik evaluasi internal yang mengukur evaluasi cluster dengan Sum of Square Error(SSE). Metode Davies Bouldin Index mengukur evaluasi cluster dengan Sum of Square Within Cluster (SSW) dan Sum of Square Between Cluster (SSW). Sedangkan metode silhouette index menggunakan perhitungan nilai koefisien. Hasil optimasi cluster menggunakan metode elbow yaitu jumlah cluster optimal adalah 3 dengan titik elbow berada di k=3. Sedangkan untuk hasil optimasi untuk metode davies bouldin index dan silhouette index yaitu jumlah cluster optimal adalah 2 dengan jumlah nilai DBI terendah ada di k = 2 yaitu sebesar 0.3228986726354396 . SI yang mendekati 1 adalah di k=2 sebesar 0,894.

APA, Harvard, Vancouver, ISO, and other styles

27

Deng, Zilong, Yizhang Wang, and Mustafa Muwafak Alobaedy. "Federated k-means based on clusters backbone." PLOS One 20, no. 6 (2025): e0326145. https://doi.org/10.1371/journal.pone.0326145.

Full text

Abstract:

Federated clustering is a distributed clustering algorithm that does not require the transmission of raw data and is widely used. However, it struggles to handle Non-IID data effectively because it is difficult to obtain accurate global consistency measures under Non-Independent and Identically Distributed (Non-IID) conditions. To address this issue, we propose a federated k-means clustering algorithm based on a cluster backbone called FKmeansCB. First, we add Laplace noise to all the local data, and run k-means clustering on the client side to obtain cluster centers, which faithfully represent the cluster backbone (i.e., the data structures of the clusters). The cluster backbone represents the client’s features and can approximatively capture the features of different labeled data points in Non-IID situations. We then upload these cluster centers to the server. Subsequently, the server aggregates all cluster centers and runs the k-means clustering algorithm to obtain global cluster centers, which are then sent back to the client. Finally, the client assigns all data points to the nearest global cluster center to produce the final clustering results. We have validated the performance of our proposed algorithm using six datasets, including the large-scale MNIST dataset. Compared with the leading non-federated and federated clustering algorithms, FKmeansCB offers significant advantages in both clustering accuracy and running time.

APA, Harvard, Vancouver, ISO, and other styles

28

Abdelrahman, Radwan, Abdellatif Nasser, Radwan Eyad, and Akhozahieh Maryam. "Fitness function X-means for prolonging wireless sensor networks lifetime." International Journal of Electrical and Computer Engineering (IJECE) 13, no. 1 (2023): 465–72. https://doi.org/10.11591/ijece.v13i1.pp465-472.

Full text

Abstract:

X-means and k-means are clustering algorithms proposed as a solution for prolonging wireless sensor networks (WSN) lifetime. In general, X-means overcomes k-means limitations such as predetermined number of clusters. The main concept of X-means is to create a network with basic clusters called parents and then generate (j) number of children clusters by parents splitting. X-means did not provide any criteria for splitting parent’s clusters, nor does it provide a method to determine the acceptable number of children. This article proposes fitness function X-means (FFX-means) as an enhancement of X-means; FFX-means has a new method that determines if the parent clusters are worth splitting or not based on predefined network criteria, and later on it determines the number of children. Furthermore, FFX-means proposes a new cluster-heads selection method, where the cluster-head is selected based on the remaining energy of the node and the intra-cluster distance. The simulation results show that FFX-means extend network lifetime by 11.5% over X-means and 75.34% over k-means. Furthermore, the results show that FFX-means balance the node’s energy consumption, and nearly all nodes depleted their energy within an acceptable range of simulation rounds.

APA, Harvard, Vancouver, ISO, and other styles

29

Nugroho, Nursatio, and Faisal Dharma Adhinata. "Penggunaan Metode K-Means dan K-Means++ Sebagai Clustering Data Covid-19 di Pulau Jawa." Teknika 11, no. 3 (2022): 170–79. http://dx.doi.org/10.34148/teknika.v11i3.502.

Full text

Abstract:

Virus Corona (Covid-19) merupakan penyakit menular yang dapat ditularkan antara hewan dan manusia. Pada akhir Desember 2019, virus itu teridentifikasi di Provinsi Wuhan, Cina. Saat ini, seluruh dunia sedang berjuang, mencegah, dan akhirnya menaklukkan penyebaran virus corona. Penelitian ini bertujuan untuk mengklaster data penyebaran Covid-19 di setiap kabupaten di Pulau Jawa sehingga menghasilkan klaster zona yang harus dilaksanakan PPKM berdasarkan kasus positif, vaksin dosis pertama, dan dosis kedua. vaksin. Metode K-Means digunakan dengan cara menentukan jumlah cluster (K), mengatur pusat cluster secara arbitrer, mengelompokkan data ke dalam cluster dengan jarak terpendek, menghitung pusat cluster, dan mengulangi langkah 2-4 sampai tidak ada data yang berpindah ke lokasi yang berbeda. gugus. K-Means++ digunakan dengan cara memilih secara acak nilai k pertama dari pusat cluster pertama titik data, mengelompokkan data berdasarkan jarak minimum ke centroid, memperbarui nilai titik centroid dengan menentukan rata-rata setiap cluster, dan ulangi langkah 2 dan 3 sampai tidak ada yang bergerak. Berdasarkan jumlah kasus positif, sembuh, dan meninggal, kasus tersebut dikategorikan. Setelah dilakukan pengelompokan dan mendapatkan klaster pada masing-masing kelompok, setiap klaster akan dievaluasi kualitasnya menggunakan koefisien siluet untuk memilih yang terbaik. Hasil kajian tersebut diharapkan dapat mengungkap sejauh mana penyebaran virus Covid-19 di setiap kabupaten/kota di Pulau Jawa, serta cluster dengan skor Silhouette Coefficient tertinggi. Untuk hasil pengujian menggunakan Silhouette Coefficient, metode K-Means K=3 menghasilkan 0,825, K=4 menghasilkan 0,873, dan K=5 menghasilkan 0,862; untuk metode K-Means++, k=3 menghasilkan 0,822, K=4 menghasilkan 0,865, dan K = 5 menghasilkan 0,882. Hasil penelitian menunjukkan bahwa K-Means++ lebih unggul dalam memberikan informasi sejauh mana penyebaran virus Covid-19, dan uji Silhouette Coefficient digunakan untuk menentukan kualitas cluster yang optimal.

APA, Harvard, Vancouver, ISO, and other styles

30

Muthmainah, Sekar Ghaida, Asep Id Hadiana, and Melina Melina. "Comparative Analysis of K-Means and K-Medoids Clustering in Retail Store Product Grouping." International Journal of Quantitative Research and Modeling 5, no. 3 (2024): 280–94. https://doi.org/10.46336/ijqrm.v5i3.753.

Full text

Abstract:

The retail business is growing very rapidly with increasing business competition. The application of information technology is one strategy for understanding consumer product purchasing patterns and grouping sales products. This research aims to analyze and compare the K-Means and K-Medoids Clustering techniques for retail data based on the Davies Bouldin Index value and computing time. K-Means is an algorithm that divides data into k clusters based on centroids, while K-Medoids Clustering uses objects with medoids representing clusters as centroid centers. Clustering in both methods produces an optimal number of clusters of 3 clusters. The results of this research show that K-Means produced 358 data in Cluster 1, 292 data in Cluster 2, and 367 data in Cluster 3 with a DBI of 0.7160. Meanwhile, K-Medoids produced 295 data in Cluster 1, 360 data in Cluster 2, and 362 data in Cluster 3 with a DBI of 0.7153. In addition, this study calculated the average computation from 5 experiments, namely K-Means with an average time of 0.024278/s and K-Medoids of 0.05719/s. Based on the lower DBI, K-Medoids have better results in clustering, but the K-Means method is better in terms of computational efficiency. It is hoped that the results of this research will provide valuable insights for retail business people in analyzing sales data.

APA, Harvard, Vancouver, ISO, and other styles

31

Wakhidah, Nur. "CLUSTERING MENGGUNAKAN K-MEANS ALGORITHM." Jurnal Transformatika 8, no. 1 (2010): 33. http://dx.doi.org/10.26623/transformatika.v8i1.45.

Full text

Abstract:

Klasifikasi adalah proses pengorganisasian objek ke dalam kelompok yang anggotanya adalah sama dalam cara yang sama dan merupakan bagian dari pengenalan pola. Dua jenis pengklasifikasian adalah klasifikasi supervised dan klasifikasi unsupervised. K-means adalah jenis metode klasifikasi tak terawasi (unsupervised) yang partisi item data ke dalam satu atau lebih cluster. K-means mencoba untuk memodelkan dataset ke dalam kelompok sehingga item data yang di cluster memiliki karakteristik yang sama dan memiliki karakteristik yang berbeda dari kelompok lainnya.

APA, Harvard, Vancouver, ISO, and other styles

32

Rouf, Abdur, Marita Qoritunnadyah, Hasyim Asyari, and Maysas Yafi Urrohman. "Clustering of Lecturer Performance Using K-Means." Journal of Informatics Development 3, no. 1 (2024): 27–33. https://doi.org/10.30741/jid.v3i1.1430.

Full text

Abstract:

Lecturers serve as professional educators and scientists whose primary roles are knowledge transformation, development, and dissemination in fields such as science, technology, and the arts through education, research, and community service. They play a critical role in fostering an educated generation, and as such, must maintain high levels of integrity in their work. The academic position of a lecturer often reflects their involvement in research, community service, and scientific publications, indicating a broad scope of expertise. This study aims to cluster lecturers based on their academic positions, research activities, community service, and number of publications, using secondary data from the Community Service Research Institute, UPT Academic Positions and Lecturer Certification, and UPT Publications. The clustering was conducted using a non-hierarchical k-means method, which resulted in three clusters: Cluster 1 with 26 members showing minimal productivity in the tridharma tasks, Cluster 2 with 6 members demonstrating high engagement, and Cluster 3 with 20 members with moderate involvement. These findings suggest that universities need to monitor and support lecturers in Cluster 1 to improve their contributions to education, research, and community service. This clustering provides insights that can guide universities in promoting a balanced and active academic environment.

APA, Harvard, Vancouver, ISO, and other styles

33

Azizah, Anestasya Nur, Tatik Widiharih, and Arief Rachman Hakim. "Kernel K-Means Clustering untuk Pengelompokan Sungai di Kota Semarang Berdasarkan Faktor Pencemaran Air." Jurnal Gaussian 11, no. 2 (2022): 228–36. http://dx.doi.org/10.14710/j.gauss.v11i2.35470.

Full text

Abstract:

K-Means Clustering is one of the types of non-hierarchical cluster analysis which is frequently used, but has a weakness in processing data with non-linearly separable (do not have clear boundaries) characteristic and overlapping cluster, that is when visually the results of a cluster are between other clusters. The Gaussian Kernel Function in Kernel K-Means Clustering can be used to solve data with non-linearly separable characteristic and overlapping cluster. The difference between Kernel K-Means Clustering and K-Means lies on the input data that have to be plotted in a new dimension using kernel function. The real data used are the data of 47 rivers and 18 indicators of river water pollution from Dinas Lingkungan Hidup (DLH) of Semarang City in the first semester of 2019. The cluster results evaluation is used the Calinski-Harabasz, Silhouette, and Xie-Beni indexes. The goals of this study are to know the step concepts and analysis results of Kernel K-Means Clustering for the grouping of rivers in Semarang City based on water pollution factors. Based on the results of the study, the cluster results evaluation show that the best number of clusters K=4

APA, Harvard, Vancouver, ISO, and other styles

34

Sri Fastaf, Chindy Ayudia, and Yuni Yamasari. "Analisa Pemetaan Kriminalitas Kabupaten Bangkalan Menggunakan Metode K-Means dan K-Means++." Journal of Informatics and Computer Science (JINACS) 3, no. 04 (2022): 534–46. http://dx.doi.org/10.26740/jinacs.v3n04.p534-546.

Full text

Abstract:

Kriminalitas merupakan suatu permasalahan umum di kehidupan sehari-hari, tak terkecuali di Kabupaten Bangkalan. Bangkalan merupakan kabupaten yang terdiri dari 18 kecamatan, yang mana tindakan kriminalitas semakin meningkat di setiap tahun khususnya pencurian dengan pemberatan (curat) dan kasus pencurian kendaraan bermotor (curanmor). Maka dari itu perlu dilakukan pengelompokan daerah rawan kriminalitas dengan tujuan agar dapat membantu berupa pemberian informasi kepada pihak kepolisian setempat dalam upaya meningkatkan keamanan di Kabupaten Bangkalan. Dalam penelitian ini dengan menggunakan 10 dataset jenis kriminalitas dari 18 kecamatan di Kabupaten Bangkalan, dilakukan perbandingan antara dua metode clustering untuk memperoleh metode yang terbaik dalam pemetaan daerah kriminalitas. K-Means dan K-Means++ merupakan dua metode yang digunakan dalam penelitian ini. Dataset yang digunakan sebanyak 492 dari total kasus kriminalitas tahun 2021 di Kabupaten Bangkalan. Sebelum implementasi clustering, dilakukan validasi cluster dengan menentukan jumlah cluster optimum menggunakan metode Elbow. Hasil clustering pada 10 dataset jenis kriminalitas dengan menggunakan kedua metode terdapat perbedaan pemetaan pada 3 jenis kriminalitas yaitu penganiayaan, penipuan, dan perampokan. Selanjutnya dilakukan validitas dari kedua metode dengan menggunakan Silhouette Coefficient. Pada hasil validitas terdapat perbedaan nilai Silhouette pada 3 jenis kriminalitas yaitu penganiayaan, penipuan, dan perampokan. Hasil uji metode K-Means dan K-Means++ dengan Silhouette pada Penganiayaan sebesar 0,1683 dan 0,2314 secara berturut-turut, sedangkan pada hasil uji pada Penipuan masing-masing sebesar 0,2243 dan 0,2534, dan hasil uji pada Perampokan sebesar 0,4898 dan 0.4057. Berdasarkan hasil uji dengan Silhouette Coefficient, metode K-Means++ memberikan hasil uji yang lebih baik pada 2 jenis kriminalitas sedangkan metode K-Means lebih baik dalam 1 jenis kriminalitas.

APA, Harvard, Vancouver, ISO, and other styles

35

Saha, Jayasree, and Jayanta Mukherjee. "CNAK: Cluster number assisted K-means." Pattern Recognition 110 (February 2021): 107625. http://dx.doi.org/10.1016/j.patcog.2020.107625.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Salamah, Salamah, Dahlan Abdullah, and Nurdin Nurdin. "Comparative Analysis of K-Means and K-Medoids to Determine Study Programs." International Journal of Engineering, Science and Information Technology 5, no. 1 (2024): 167–76. https://doi.org/10.52088/ijesty.v5i1.673.

Full text

Abstract:

Education is the main foundation for the advancement of civilization. A high level of education in society is directly proportional to the progress of that civilization. Higher education plays an important role in shaping quality human resources and contributing to community and national development. In today’s era of information and technology, data processing and analysis are key to understanding the development of study programs in higher education institutions. Clustering techniques are used to identify patterns and relationships in large and complex datasets, which are crucial in determining study programs at educational institutions. This research compares two popular clustering methods, K-Means and K-Medoids to determine study programs. The data used consists of odd semester grades of 87 students in the third-years of high school with 5 variables. The information of clusters is based on the minimum academic criteria of 18 study programs representing 7 faculties in Malikussaleh University and grouped into 5 clusters. The evaluation of clusters is conducted using the Davies-Bouldin Index (DBI). The result of the study indicate that K-Means algorithm has 5 clusters with cluster members of 31, 5, 13, 26 and 17, and a DBI value of 1,19010. Meanwhile, the K-Medoids algorithm has 5 clusters with cluster members of 33, 15, 17, 17 and 5, and a DBI value of 1,27833. Based on the DBI value, the K-Means algorithm demonstrates better cluster quality compared to the K-Medoids algorithm.

APA, Harvard, Vancouver, ISO, and other styles

37

Hayuningtyas, Ratih Yulia, and Ida Darwati. "CLUSTERING HASIL PANEN UBI KAYU MENGGUNAKAN ALGORITMA K-MEANS." Jurnal Teknik Informasi dan Komputer (Tekinkom) 7, no. 1 (2024): 25. https://doi.org/10.37600/tekinkom.v7i1.1327.

Full text

Abstract:

Cassava is one commodity that has the potential to grow the country's economy. Cassava is a primary food requirement besides rice and corn. This research discusses the grouping of cassava products in the Trenggalek area, the data collected will later be formed into a group or cluster. There are 3 clusters created, namely high cluster, medium cluster and low cluster, to determine which data will enter the 3 clusters. Clustering is an analysis method of data mining. This research uses an algorithm, namely K-Means, to process cassava production results into a cluster. The results of the research produced a high cluster of 3 items, a medium cluster of 1 item and a low cluster of 10 items. Judging from these results, there are still many areas in Trenggalek that produce low quantities of cassava. This research can provide strategies or information to increase cassava production in the future.

APA, Harvard, Vancouver, ISO, and other styles

38

Nainggolan, Rena, Fenina Adline Twince Tobing, and Eva J. G. Harianja. "Sentiment; Clustering; K-Means Analysis Sentiment in Bukalapak Comments with K-Means Clustering Method." IJNMT (International Journal of New Media Technology) 9, no. 2 (2023): 87–92. http://dx.doi.org/10.31937/ijnmt.v9i2.2914.

Full text

Abstract:

Technological development are very fast ini this era of globalization, to facilitate the work of many aspevt that can be utilized, as well as for the flow of information. By applying computer technology in varios fields, such as educations, entertainment, healt, tourism, culinary and so on. Clustering id one of the Data Mining techniques. Clustering works by combining a number of data or objects into one cluster, with the aim that each data ini one cluster will have data that is a similar as possible and different from data or objects in other groups. K-Means Clustering has the ability to perform computations that are relatively fast and efficient in combining large amounts of data. In this research, there are 1407 comments which will training data and testing data.

APA, Harvard, Vancouver, ISO, and other styles

39

Ardhianto, Aan, Dwi Hartanti, and Joni Maulindar. "Implementasi Algoritma K-Means Untuk Rekomendasi Pengadaan Buku." Infotek: Jurnal Informatika dan Teknologi 8, no. 1 (2025): 160–71. https://doi.org/10.29408/jit.v8i1.28134.

Full text

Abstract:

One of the main challenges faced by libraries is determining the procurement of book collections that align with the needs and interests of borrowers. At the Sragen Regency Archives and Library Service, book procurement is often based on intuition or unstructured requests, resulting in many books that are less popular among visitors. This leads to a low number of book borrowings, meaning the library cannot provide optimal services. Based on this issue, the author attempts to cluster the book borrowing data for the year 2023 from the Sragen Regency Archives and Library Service by age group, book categories, number of borrowings, number of titles, and number of copies using data mining techniques with the k-means clustering algorithm. For the initial data processing, the author uses the Min-Max normalization method. After normalization, k-means is calculated with 3 clusters, followed by finding the optimal cluster using the elbow method, silhouette, and gap statistics. The results of the optimal cluster are compared with the results from the Dunn Index method. The research identifies three clusters: Cluster 1 contains book groups with low interest, consisting of 7 categories: General Works, Social Sciences, Language, Pure Sciences, Applied Sciences, Arts and Sports, History and Geography; Cluster 2 contains book groups with moderate interest, consisting of 2 categories: Philosophy and Psychology, Religion; and Cluster 3 contains the book group with the highest interest, consisting of 1 category: Literature

APA, Harvard, Vancouver, ISO, and other styles

40

Kamilah, Nur Azizah, Tatang Rohana, Rahmat Rahmat, and Ahmad Fauzi. "Implementasi Algoritma K-Means dan K-Medoids Dalam Klasterisasi Kasus Kekerasan Terhadap Perempuan." JURNAL MEDIA INFORMATIKA BUDIDARMA 8, no. 2 (2024): 810. http://dx.doi.org/10.30865/mib.v8i2.7558.

Full text

Abstract:

The number of women's violence in Indonesia is increasing. In West Java alone, 58,395 cases of violence against women were recorded. Violence against women that occurs in West Java is among the most common compared to other provinces. This high number shows that violence against women is still not being handled seriously. Therefore, clustering is carried out to achieve a more structured solution so that it can assist the government in providing appropriate and appropriate responses to the conditions of each region, so that case handling can be more focused. The aim of this research is to group districts or cities in West Java in cases of violence against women using the K-Means and K-Medoids algorithms into two clusters, namely, high and low. In this research, data grouping was carried out using 2 methods, namely the K-Means and K-Medoids algorithms to find out which comparison between the two algorithms is more optimal. It is hoped that this research will produce the best cluster, the results of this cluster can help the government and related agencies to determine which districts or cities should be prioritized in handling cases of violence against women in West Java. The results of this research produced 2 clusters. Cluster 0 (high) and cluster 1 (low). The number of cluster 0 (high) is 14 districts and cities, while cluster 1 (low) is 13 districts and cities. Comparing the clustering evaluation between K-Means and K-Medoids, the best cluster evaluation value was obtained using the K-Medoids Algorithm with a Silhoutte Coefficient evaluation of 0.43, while the Davies Bouldin Index evaluation results showed the best cluster results using the K-Means Algorithm with a DBI value of 0.95.

APA, Harvard, Vancouver, ISO, and other styles

41

Santoso, Dwi Budi, and Yuli Wahyuni. "ANALISIS SEGMENTASI PENJUALAN POMPA AIR MENGGUNAKAN ALGORITMA K-MEANS SEGMENTATION ANALYSIS OF WATER PUMP SALES USING K-MEANS ALGORITHM." Jurnal Aplikasi Bisnis dan Komputer 4, no. 2 (2024): 83–89. https://doi.org/10.33751/jubikom.v4i2.10696.

Full text

Abstract:

ABSTRAK Segmentasi penjualan merupakan hal yang penting dalam industri pompa air. Penelitian ini bertujuan untuk mengelompokkan produk pompa air berdasarkan karakteristik penjualan mereka menggunakan algoritma K-Means. Data penjualan yang digunakan mencakup informasi tentang nama produk, jumlah terjual, dan total pemasukan. Metode penelitian meliputi exploratory data analysis (EDA), preprocessing data (penanganan outlier dan transformasi data), penerapan algoritma K-Means, dan evaluasi cluster menggunakan metrik Inertia dan Silhouette Score. Hasil penelitian menunjukkan bahwa produk pompa air dapat dikelompokkan menjadi empat cluster berdasarkan karakteristik penjualan mereka. Cluster 0 memiliki jumlah penjualan dan pemasukan tinggi, cluster 1 memiliki jumlah penjualan dan pemasukan rendah, cluster 2 memiliki jumlah penjualan sedang hingga tinggi dengan pemasukan sedang, dan cluster 3 memiliki jumlah penjualan rendah namun pemasukan tinggi. Evaluasi Silhouette Score menunjukkan bahwa clustering yang dihasilkan cukup baik, meskipun tidak sempurna. Hasil segmentasi ini dapat memberikan wawasan berharga bagi perusahaan pompa air untuk mengembangkan strategi pemasaran yang lebih efektif dan tepat sasaran. Saran untuk penelitian selanjutnya adalah melakukan analisis lebih lanjut untuk mengidentifikasi faktor-faktor lain yang dapat menjelaskan variasi dalam setiap cluster dan mempertimbangkan penggunaan metode clustering lainnya. Kata kunci : Clustering, K-Means, Pompa Air, Segmentasi Penjualan ABSTRACT Sales segmentation is important in the water pump industry. This research aims to group water pump products based on their sales characteristics using the K-Means algorithm. The sales data used includes information about product name, quantity sold, and total income. Research methods include exploratory data analysis (EDA), data preprocessing (outlier handling and data transformation), application of the K-Means algorithm, and cluster evaluation using Inertia and Silhouette Score metrics. The research results show that water pump products can be grouped into four clusters based on their sales characteristics. Cluster 0 has high sales and income, cluster 1 has low sales and income, cluster 2 has medium to high sales with medium income, and cluster 3 has low sales but high income. Silhouette Score evaluation shows that the resulting clustering is quite good, although not perfect. The results of this segmentation can provide valuable insight for water pump companies to develop more effective and targeted marketing strategies. Suggestions for further research are to carry out further analysis to identify other factors that can explain variations within each cluster and consider the use of other clustering methods. Keywords: Clustering, K-Means, Sales Segmentation, Water Pump

APA, Harvard, Vancouver, ISO, and other styles

42

Br. Ginting, Rosa Lina, Relita Buaton, and Husnul Khair. "BPJS SERVICE DATA CLUSTERIZATION USING K-MEANS ALGORITHM." Journal of Engineering, Technology and Computing (JETCom) 2, no. 3 (2023): 155–64. https://doi.org/10.63893/jetcom.v2i3.131.

Full text

Abstract:

Employment BPJS is a program formed by the government to provide social protection to workers. Due to the large number of workers, for example, using the Death Insurance program (JKM), it will produce abundant and accumulating data. To find out BPJS service data is to group BPJS service data in BPJS. One of the most widely used methods in the clustering method is to use the K-Means algorithm. K-Means is a non-hierarchical (block) grouping method that seeks to partition data into clusters/groups so that data with the same characteristics will be included in the clustering method. in the same cluster and data with different characteristics are grouped into another group. From the 20 data obtained 3 groups, Cluster 1 has 3 data, Cluster 2 has 4 data, and Cluster 3 has 13 data. Cluster 1 has the male sex who has the BPJS Old Age Guarantee (JHT) program which gets class III services. Cluster 2 has the male sex who has the BPJS Death Insurance (JKM) program who gets class I services. cluster 3 there are women who have the BPJS Death Guarantee program (JKM) who get class II types of services.

APA, Harvard, Vancouver, ISO, and other styles

43

R., Gowri, and Rathipriya R. "Protein Motif Comparator using PSO K-Means." International Journal of Applied Metaheuristic Computing 7, no. 3 (2016): 56–68. http://dx.doi.org/10.4018/ijamc.2016070104.

Full text

Abstract:

The main goal of this paper is to compare the motif information extracted from clusters and biclusters of the protein using Motif Comparator. The clusters and biclusters are obtained using the PSO k-means algorithm. The functions of the proteins are preferably found from their motif information. The Motif Comparator is used to detect the clusters and biclusters, to locate the Significant Amino Acids present, to find the highly homologous cluster. The motif information acquired is based on the structure homogeneity of the protein sequence. The homogeneity is evaluated based on their secondary structure similarity of the protein.

APA, Harvard, Vancouver, ISO, and other styles

44

Rahmadayanti, Fitria, Inda Anggraini, and Tri Susanti. "Pengklasterisasian Data Penyakit Hipertensi dengan Menggunakan Metode K-Means." Journal of Information System Research (JOSH) 4, no. 2 (2023): 737–41. http://dx.doi.org/10.47065/josh.v4i2.2905.

Full text

Abstract:

The increasing number and variety of diseases suffered by the community due to lifestyle changes that are influenced by the progress of the times. Periodic disease data collection will increase the accumulation of data. This is often an error in the data search process so that it takes a long time to search the data. This study focuses on data collection on hypertension and aims to cluster the data. The method used in this study is CRISP-DM with a business understanding process, data understanding, data preparation, modeling, evaluation and deployment. The algorithm used in this clustering is the K-Means algorithm. The results of this study resulted in 2 clusters, namely cluster 0 Normal and cluster 1 Hypertension. The results of this study can provide information about the results of 2 clusters, namely cluster 0 Normal and cluster 1 Hypertension

APA, Harvard, Vancouver, ISO, and other styles

45

Wahyuni, Sri Ngudi, Nazmun Nahar Khanom, and Yuli Astuti. "K-Means Algorithm Analysis for Election Cluster Prediction." JOIV : International Journal on Informatics Visualization 7, no. 1 (2023): 1. http://dx.doi.org/10.30630/joiv.7.1.1107.

Full text

Abstract:

The general election is a democratic process that is carried out in every country whose system of government is presidential, including Indonesia, which conducts it every five years. In fact, some people abstain, leading to budget wasting and missing target. Thus, it is very important to identify clusters of general election districts and map the number of voters to map the budget for the upcoming election. This process needs prediction to help reduce budgeting risk as an early warning. Based on the latest election data taken from Margokaton, Yogyakarta, Indonesia, many people voted in 2021, but the number of abstainers is high. In this case, cluster prediction is important to identify the election participants in each area. The K-Means algorithm could also predict abstainer areas in election activities to facilitate early mitigation in drafting election budgeting. Therefore, this study aimed to identify the pattern of voters in the election using the K-means algorithm. The data parameters comprised the list of voters, Unused ballot papers, and the sum of abstainers. This study is important because it contributes to reducing the election budget of each area. The data obtained from the Indonesia Ministry of Internal Affairs official website in 2021 were processed using the RapidMiner tool. The results showed more than 11% of the non-voters in cluster 1, 16% in Cluster 2, and 8% in cluster 3. The evaluation of clusters value is 2.04, indicating that the clustering using K-means is suitable, as shown by the DBI value close to 0. The results indicate that testing the cluster optimization of the K-Means algorithm using DBI is highly recommended. Based on this prediction result, the government needs special attention to clusters with many abstainers to decrease the number of abstainers and prevent overbudgeting. These results indicate the need to review the election participant data in 2024. Furthermore, there is a need for continuous socialization and education about election activities to reduce the number of abstainers and prevent overbudgeting.

APA, Harvard, Vancouver, ISO, and other styles

46

Nur, Dahlia, Muhammad Fajri Raharjo, and Muhammad Fikriansyah Chaerul. "INFOGRAFIS KEPENDUDUKAN KOTA MAKASSAR MENGGUNAKAN ALGORITMA K-MEANS." Jurnal Teknologi Elekterika 20, no. 2 (2023): 64. http://dx.doi.org/10.31963/elekterika.v20i2.4488.

Full text

Abstract:

Makassar City is the fourth-largest city in Indonesia and the largest in Eastern Indonesia with an area of 175.77 square km and a population of Â±1.5 people in 2019, divided into 15 districts and 143 sub-districts. Uneven population density in each sub-district has an impact on the emergence of social problems in society such as: unhealthy environment, chaotic building management, population registration information that is not updated. To overcome this problem, it is necessary to have population info graphics that show the population, age of the population, education, etc. With info graphic data on the population of Makassar city that can be accessed online, people looking for data can use it as a reference. To achieve the objectives of this research, the K-means algorithm method is used to divide observations into K clusters where each observation is a member of the cluster with the closest average value, through an iterative process until the grouping converges. The K-means algorithm can be used on population data. For example, the grouping of population data is divided into three clusters, namely (cluster 1) dense, (cluster 2) medium and (cluster 3) not dense. The results obtained show that five districts: Mariso, Mamajang, Makassar, Bontoala and Tallo are the districts with the densest populations.

APA, Harvard, Vancouver, ISO, and other styles

47

Kim, Dong-Hyun, and Gyemin Lee. "New link of multiple correspondence analysis and K-means cluster analysis." Journal of the Korean Data And Information Science Society 33, no. 6 (2022): 1043–52. http://dx.doi.org/10.7465/jkdi.2022.33.6.1043.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Pratama, Yogi, Evy Sulistianingsih, Naomi Nessyana Debataraja, and Nurfitri Imro’ah. "K-Means Clustering dan Mean Variance Efficient Portfolio dalam Portofolio Saham." Jambura Journal of Probability and Statistics 5, no. 1 (2024): 24–30. http://dx.doi.org/10.37905/jjps.v5i1.20298.

Full text

Abstract:

K-means clustering is one of the non-hierarchical clustering algorithms that partitions n objects into k clusters. K-means clustering is used to determine which cluster an object belongs to by calculating the proximity distance between the object and the cluster center (centroid). This research aims to form a portfolio using K-means clustering and determine the weights of the portfolio using the Mean Variance Efficient Portfolio (MVEP) method. The data analyzed in this research is the closing price data of 11 stocks in the LQ45 index from January 3, 2022, to January 3, 2023. The analysis results obtained using K-means clustering reveal the formation of two portfolios. The first portfolio consists of the stocks BMRI, INCO, INDF, INTP, and SMGR. The second portfolio consists of the stocks ADRO, ANTM, BBRI, ERAA, and UNVR. Based on the MVEP method calculation, the weights of each stock in the first portfolio are 22.74\% (BMRI), 10.11\% (INCO), 49.76\% (INDF), 18.75\% (INTP), and -1.36\% (SMGR). The calculation results of stock weights show that there is a stock weight with a negative value, which is -1.36\% for SMGR, indicating a short sale in the investment. Furthermore, the weighting results for the second portfolio are 7.08\% (ADRO), 9.62\% (ANTM), 34.05\% (BBRI), 24.80\% (ERAA), and 24.45\% (UNVR).The variance values of stock portfolio 1 and stock portfolio 2 are 0.000080 and 0.000137, respectively. From the portfolio variance results, it is known that the risk of portfolio 1 is 0.008953 and the risk of portfolio 2 is 0.011706.

APA, Harvard, Vancouver, ISO, and other styles

49

Messakh, Gerald Claudio, Memi Nor Hayati, and Sifriyani Sifriyani. "Comparison K-Means and Fuzzy C-Means In Regencies/Cities Grouping Based on Educational Indicators." Jurnal Varian 7, no. 1 (2023): 99–114. http://dx.doi.org/10.30812/varian.v7i1.2879.

Full text

Abstract:

Cluster analysis is an analysis that aims to classify data based on the similarity of specific characteristics.The methods used in this research are K-Means and Fuzzy C-Means (FCM). K-Means is a partitionbased non-hierarchical data grouping method. FCM is a clustering technique in which the existenceof each data is determined by the degree of membership. The purpose of this study is to classifyregencies/cities in Kalimantan based on education indicators in 2021 using K-Means and FCM and findout which method is better to use between K-Means and FCM based on the standard deviation ratio soit can be used efficiently and effectively for decision making by the government to advance the levelof education on the island of Kalimantan. Based on the results of the analysis, it’s concluded that KMeans is the better method with the ratio of the standard deviation within a cluster against the standarddeviation between clusters of 0.6052 which produces optimal clusters of 2 clusters, namely the firstcluster consisting of 14 Regencies/Cities, while the second cluster consists of 42 Regencies/Cities inKalimantan.

APA, Harvard, Vancouver, ISO, and other styles

50

Mitra, Manu. "K-Means Clustering in Machine Learning – a Review." Peer Nest 1, no. 4 (2019): 1–14. https://doi.org/10.5281/zenodo.3401604.

Full text

Abstract:

K means clustering is unsupervised machine learning algorithm. It aims to partition n observations into k clusters where each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a portioning of the data space into Voronoi cells. In this review paper a sample data from UCI is taken and K-means algorithm is applied on Iris data set. Multi class logistic regression is also performed to compare its performance. Trained resultant model graph for K-means clustering is plotted. Score Model graphs such as F1 log scale, frequency log scale, cumulative distribution, probability density of multiclass logistic regression for “Iris-setosa” are plotted to view its performance Score Model graphs F1 log scale, frequency log scale, cumulative distribution, and probability density of multiclass logistic regression for “Iris-versicolor” are plotted to view its performance.  

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!