Log in

Relevant bibliographies by topics / Silhouette score value / Journal articles

To see the other types of publications on this topic, follow the link: Silhouette score value.

Journal articles on the topic 'Silhouette score value'

Author: Grafiati

Published: 3 June 2025

Last updated: 7 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Silhouette score value.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mulyani, Heti, Ricak Agus Setiawan, and Halimil Fathi. "Optimization of K Value in Clustering Using Silhouette Score (Case Study: Mall Customers Data)." Journal of Information Technology and Its Utilization 6, no. 2 (2023): 45–50. http://dx.doi.org/10.56873/jitu.6.2.5243.

Full text

Abstract:

Clustering is an important phase in data mining. The grouping method commonly used in data mining concepts is using K-Means. Choosing the best value of k in the k-means algorithm can be difficult. In this study the technique used to determine the value of k is the silhouette score. Then, to evaluate the k-means model uses the Davies Bouldin Index (DBI) technique. The best DBI value is close to 0. The parameters used are total consumer income and spending. Based on the results of this study it can be concluded that the silhouette score method can provide a k value with optimal results. For mall customer data of 200 data, the most optimal silhouette score is obtained at K = 5 with a DBI = 0.57.

APA, Harvard, Vancouver, ISO, and other styles

2

Ogbuabor, Godwin, and Ugwoke F. N. "Clustering Algorithm for a Healthcare Dataset Using Silhouette Score Value." International Journal of Computer Science and Information Technology 10, no. 2 (2018): 27–37. http://dx.doi.org/10.5121/ijcsit.2018.10203.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Godwin, Ogbuabor, and F. N. Ugwoke. "Clustering Algorithm for a Healthcare Dataset Using Silhouette Score Value." International Journal of Computer Science & Information Technology (IJCSIT) 10, no. 2 (2018): 27–37. https://doi.org/10.5281/zenodo.1248795.

Full text

Abstract:

The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining interesting research areas. Data mining tools and techniques help to discover and understand hidden patterns in a dataset which may not be possible by mainly visualization of the data. Selecting appropriate clustering method and optimal number of clusters in healthcare data can be confusing and difficult most times. Presently, a large number of clustering algorithms are available for clustering healthcare data, but it is very difficult for people with little knowledge of data mining to choose suitable clustering algorithms

APA, Harvard, Vancouver, ISO, and other styles

4

Adek Maulidya, Khairul, Zulham Sitorus, Andysah Putera Utama Siahaan, and Muhammad Iqbal. "Analysis Of Increasing Student Service Satisfaction Using K-Means Clustering Algorithm and Gaussian Mixture Models (GMM)." International Journal Of Computer Sciences and Mathematics Engineering 3, no. 1 (2024): 29–35. http://dx.doi.org/10.61306/ijecom.v3i1.62.

Full text

Abstract:

This research analyzes the comparison between two cluster analysis algorithms, namely K-Means Clustering and Gaussian Mixture Model (GMM), to gain a deeper understanding of data structure and model suitability. The results of the analysis show that the silhouette score value from using the K-Means algorithm is 0.44528, indicating relatively good cluster grouping, while the use of the Gaussian Mixture Model produces a silhouette score value of -0.500119, indicating a mismatch between the data points in the cluster and the probability overlap between clusters. Therefore, the conclusion states that based on the silhouette score value, using the K-Means Clustering algorithm is better because it produces better and more cohesive cluster grouping. The results of this analysis are that campuses can use this information to understand student needs more effectively and take appropriate corrective steps.

APA, Harvard, Vancouver, ISO, and other styles

5

Durairaj, M., and J. Hirudhaya Mary Asha. "Fuzzy probability based person recognition in smart environments." Journal of Intelligent & Fuzzy Systems 40, no. 5 (2021): 9437–52. http://dx.doi.org/10.3233/jifs-201913.

Full text

Abstract:

Biometric features are used to verify the people identity in the living places like smart apartments. To increase the chance of classification and recognition rate, the recognizing procedure contains various steps such as detection of silhouette from the gait profile, silhouette segmentation, reading features from the silhouette, classification of features and finally recognition of person using its probability value. Person recognition accuracy will be oscillated and declined due to blockage, radiance and posture variance problems. In the proposed work, the gait profile will be formed by capturing the gait of a targeted person in stipulated time to reach the destination. From the profile the silhouettes are detected using frame difference and segmented from the background using immediate thresholding and features are extracted from the silhouette using gray-level covariance matrix and optimized feature set is formed using PSO. These optimized features are fused, trained and classified using nearest neighbor support vectors. The fuzzy probability method is used for recognizing the person based on the probability value of the authentic and imposter scores. The relationship between the CMS, TPR, TNR and F-rate are calculated for 1 : 1 matcher from the gallery set. The performance of the classifiers are found to be perfect by plotting the DET graph and ROC curve. The proposed fuzzy probability theory is mingled with GLCMPSO and NSFV method for human recognition purpose. The performance of the proposed is proved to be acceptable for recognition with the optimal parameters (Entropy, SSIM, PSNR, CQM) calculation From the work, it is clear that, the rank probability is proportional to the match score value of the silhouette stored in the gallery.

APA, Harvard, Vancouver, ISO, and other styles

6

Samidi, Ronal Yulyanto Suladi, and Dewi Kusumaningsih. "Comparison of the RFM Model's Actual Value and Score Value for Clustering." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 7, no. 6 (2023): 1430–38. http://dx.doi.org/10.29207/resti.v7i6.5416.

Full text

Abstract:

Clustering algorithms and Recency-Frequency-Money (RFM) models are widely implemented in various sectors of e-commerce, banking, telecommunications and other industries to obtain customer segmentation. The RFM model will assess a line of data which includes the recency and frequency of data appearance, as well as the monetary value of a transaction made by a customer. Choosing the right RFM model also influences the analysis of cluster results, the output of cluster results is more compact for the same clusters (inter-cluster) and separate for other clusters (intra-cluster). Through an experimental approach, this research aims to find the best data set transformation model between actual RFM values and RFM scores. The method used is to compare the actual RFM value model and the RFM score and use the silhouette score value as an indicator to obtain the best clustering results using the K-Means algorithm. The subject of this research is a stall-based e-Commerce application, where data was taken in the Wiradesa area, Central Java. The resulting data set consisted of 273,454 rows with 18 attributes from January 2022 to December 2022 by collecting historical data from shopping outlets to wholesalers. The analysis of the data set was carried out by transforming the data set using the RFM method into actual values and score values; then the dataset was used to obtain the best cluster data. The results of this research show that transaction data based on time (time series) can be transformed into data in the RFM model where the actual value is better than the RFM score model with a silhouette score = 0.624646 and the number of clusters (K) =3. The results of the clustering process also form a series of data with a cluster label, thus forming supervised learning data.

APA, Harvard, Vancouver, ISO, and other styles

7

Siregar, Hotmaida Lestari, Muhammad Zarlis, and Syahril Efendi. "Cluster Analysis using K-Means and K-Medoids Methods for Data Clustering of Amil Zakat Institutions Donor." JURNAL MEDIA INFORMATIKA BUDIDARMA 7, no. 2 (2023): 668. http://dx.doi.org/10.30865/mib.v7i2.5315.

Full text

Abstract:

Cluster analysis is a multivariate analysis method whose purpose is to classify an object into a group based on certain characteristics. In cluster analysis, determining the number of initial clusters is very important so that the resulting clusters are also optimal. In this study, an analysis of the most optimal number of clusters for data classification will be carried out using the K-Means and K-Medoids methods. The data were analyzed using the RFM model and a comparative analysis was carried out based on the DBI value and cluster compactness which was assessed from the average silhouette score. The K-Means method produces the smallest DBI value of 0.485 and the highest average silhouette score value of 0.781 at k=6, while the K-Medoids method produces the smallest DBI value of 1.096 and the highest average silhouette score value of 0.517 at k=3. The results show that the best method for data clustering donations Amil Zakat Institutions is using the K-Means method with an optimal number of clusters of 6 clusters.

APA, Harvard, Vancouver, ISO, and other styles

8

Prasetya, Dwi Arman, Anggraini Puspita Sari, Mohammad Idhom, and Angela Lisanthoni. "Optimizing Clustering Analysis to Identify High-Potential Markets for Indonesian Tuber Exports." Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics 7, no. 1 (2025): 113–22. https://doi.org/10.35882/skzqbd57.

Full text

Abstract:

Agriculture is a key contributor to Indonesia's economic growth, with tubers representing the second most important food crop. Despite their significance, the export value of Indonesia’s tuber crops has not yet reached its full potential given the decline in the value of tuber exports since 2021. One of the contributing factors is the restricted range of export market options. This study aims to analyze export trade patterns to identify the most high-potential markets for Indonesian tuber commodities. Clustering analysis is used as a key method to identify market locations by grouping countries based on similar trade characteristics. Clustering was conducted using the Gaussian Mixture Model (GMM), which enhanced by Particle Swarm Optimization (PSO) and evaluated by silhouette score and DBI. The dataset is collected from Indonesia’s Central Bureau of Statistics from 2019 to 2023, focusing on 5 kinds of tuber exports with total of 455 entries and 8 columns. Using the AIC/BIC method, the optimal number of clusters obtained is 2 which are low market opportunities (cluster 0) and high market oppurtunities (cluster 1). Results showed that the GMM model without optimization has silhouette score of 0.7602 and DBI of 0.8398, while the GMM+PSO model achieved an improved silhouette score of 0.8884 and DBI of 0.5584. Both score are categorized as strong structure but, GMM+PSO has higher silhouette score and lower DBI score, demonstrating the effectiveness of PSO in enhancing the clustering model’s performance. The key potential markets for Indonesian tuber exports are primarily concentrated in Asia, including countries such as China, Malaysia, Thailand, Vietnam, Hong Kong, and United States.

APA, Harvard, Vancouver, ISO, and other styles

9

Hartama, Dedy, and Selli Oktaviani. "OPTIMIZATION OF K-MEANS AND K-MEDOIDS CLUSTERING USING DBI SILHOUETTE ELBOW ON STUDENT DATA." JURTEKSI (Jurnal Teknologi dan Sistem Informasi) 11, no. 2 (2025): 289–96. https://doi.org/10.33330/jurteksi.v11i2.3531.

Full text

Abstract:

Abstract: Clustering methods such as K-Means and K-Medoids are often used to analyze data, including student data, due to their efficiency. However, this method has weaknesses, such as sensitivity to selecting cluster centers (centroids) and cluster results that depend on medoid data. Clustering, an essential technique in data analysis, aims to reveal the natural structure of the data, even in the absence of labeled information. The study, conducted with complete objectivity, compared the performance of two popular clustering methods, K-Means, and K-Medoids, on student data. Three evaluation metrics, namely the Davies-Bouldin Index (DBI), silhouette score, and elbow method, were used to compare clustering and determine the ideal number of clusters for the two algorithms. The data taken in this study are in the form of names, attendance, assignments, formative, midterm exams, final exams, and quality numbers. Based on the existing optimization results, it can be concluded that the K-Means method excels in grouping Student Data. The best results were obtained from the K-Means Algorithm with the Silhouette Coefficient Method with a value of 0.7509 in cluster 2, and the Elbow Method with a value of 1428076.08 in cluster 2, DBI K-Medoids with a value of 0.7413 in cluster 3. So, the best cluster lies in 3 clusters. Keywords: clustering; davies-bouldin indek; elbow method; k-means; k-medoids; silhouette score; Abstrak : Metode clustering seperti K-Means dan K-Medoids sering digunakan untuk menganalisis data, termasuk data siswa, karena efisiensinya. Namun, metode ini memiliki kelemahan, seperti sensitivitas terhadap pemilihan pusat klaster (centroids) dan hasil klaster yang bergantung pada data medoid. Clustering, sebuah teknik penting dalam analisis data, bertujuan untuk mengungkapkan struktur alami dari data, bahkan tanpa adanya informasi berlabel. Penelitian ini, yang dilakukan dengan objektivitas penuh, membandingkan kinerja dua metode clustering populer, yaitu K-Means dan K-Medoids, pada data mahasiswa. Tiga metrik evaluasi, yaitu Davies-Bouldin Index (D.B.I.), silhouette score, dan metode elbow, digunakan untuk membandingkan clustering dan menentukan jumlah cluster yang ideal untuk kedua algoritma tersebut. data yang diambil dalam penelitian ini berupa nama, kehadiran, tugas, formatif, ujian tengah semester, ujian akhir semester, angka mutu. Berdasarkan hasil optimasi yang ada, dapat disimpulkan bahwasannya metode K-Means unggul dalam pengelompokkan Data Mahasiswa. Sehingga di peroleh hasil terbaik dari Algoritma K-Means dengan Metode Silhouette Coefficient dengan nilai 0,7509 di cluster 2, dan Elbow Method dengan nilai 1428076,08 di cluster 2, DBI K-Medoids dengan nilai 0,7413 di cluster 3. Sehingga cluster terbaik terletak pada 3 cluster. Kata kunci: klasterisasi; davies-bouldin indek; elbow method; k-means; k-medoids; silhouette score;

APA, Harvard, Vancouver, ISO, and other styles

10

Darmayanti, Irma, Dhanar Intan Surya Saputra, Inka Saputri, Nurul Hidayati, and Nandang Hermanto. "Clustering Sugar Content in Children's Snacks for Diabetes Prevention Using Unsupervised Learning." Journal of Information Systems and Informatics 6, no. 4 (2024): 2923–36. https://doi.org/10.51519/journalisi.v6i4.932.

Full text

Abstract:

Diabetes is a chronic health problem with increasing prevalence, especially among children, due to the consumption of sugary foods/beverages. This study aims to cluster children's snack products based on sugar content using unsupervised learning by combining Hierarchical Clustering and K-Means algorithms optimized using Silhouette Score. This combined approach utilizes Hierarchical Clustering to determine the optimal value (????) of K-Means, ensuring the efficiency and accuracy of data clustering. A total of 157 sample data were effectively clustered with K-means. The test results with Silhouette Score yielded the highest value of 0.380 for 2 clusters, while 3 clusters scored 0.350 and 0.277 for 4 clusters. For this reason, 2 clusters better represent the homogeneity of the data in the cluster, although it has not reached the ideal condition. Further analysis showed that high sugar and calorie content in sugary drinks, including milk, could increase blood glucose levels. These findings can be the basis for the development of consumer-friendly nutrition labels. However, support is needed from the government to create a labelling policy to ensure the sustainability of implementation in the community as an educational effort to prevent the risk of diabetes in children.

APA, Harvard, Vancouver, ISO, and other styles

11

Wardy, Dwiki Krisnanda, I. Ketut Gede Darma Putra, and Ni Kadek Dwi Rusjayanthi. "Clustering Artikel pada Portal Berita Online Menggunakan Metode K-Means." JITTER : Jurnal Ilmiah Teknologi dan Komputer 3, no. 1 (2022): 985. http://dx.doi.org/10.24843/jtrti.2022.v03.i01.p34.

Full text

Abstract:

The news categories on news portals are so diverse that the performance of the editors is increasing. The number of news articles each month, adds to the editor's task to manually categorize articles into predetermined categories. Clustering can be used to group data so that later it can group data in the same category with similar data. K-Means is a method that can be used to perform clustering. K-Means is a distance-based clustering technique that is divided into a series of clusters and only works for numeric attributes. The K-Means test conducted in this study is intended to compare cluster values. The K-Means made in this study apply TF-IDF, feature selection, and PCA. The cluster value assessment process uses visualization in the form of a bar plot of each metric value that is considered, namely the mean silhouette, accuracy, precision, recall, F1-score, and silhouette score. The results of the research that has been carried out by the K-Means method can achieve 94.93% accuracy and recall, 95.07% precision, and 94.94% F1-score.

APA, Harvard, Vancouver, ISO, and other styles

12

Firsta Rahmania Sucahyo, Indyah Hartami Santi, Mohammad Faried Rahmat, and Diki Fahrizal. "Comparing K-Means and K-medoids algorithms for clustering hamlet regions by tax liabilities in tax determination documents." International Journal of Science and Technology Research Archive 8, no. 1 (2025): 069–78. https://doi.org/10.53771/ijstra.2025.8.1.0023.

Full text

Abstract:

The application of data mining information technology in Village Offices, especially in village office administration services, is very important to ensure the efficiency and accuracy of information services. This research aims to compare the effectiveness of the K-Means and K-Medoids algorithms in clustering hamlet areas based on the tax owed in the tax assessment documents in Pandanarum village. Using quantitative descriptive methods, the two algorithms are applied to group hamlets based on tax payable data as the main variable. The clustering process is analyzed using evaluations such as Sum of Squared Errors (SSE) and Silhouette Score to determine the effectiveness of each algorithm. The research results show that the K-Medoids algorithm has lower performance compared to K-Means, especially in terms of cluster stability and a high Silhouette Score value with a value of 0.454615 and SSE 480.9462. Apart from that, the K-Medoids algorithm is more robust against outliers in the tax payable data, and produces a lower Silhoutte Score value with a value of 0.382616 and an SSE of 567.6125 which indicates weaker clustering. Thus, this research concludes that the K-Means algorithm is superior in clustering hamlet areas based on taxes owed compared to the K-Medoids algorithm.

APA, Harvard, Vancouver, ISO, and other styles

13

Saputra, Hermansyah Adi, Galih Wasis Wicaksono, and Yufis Azhar. "Rekomendasi Grup Pada Website Alumni Teknik Informatika Universitas Muhammadiyah Malang." Jurnal Repositor 2, no. 12 (2020): 1647. http://dx.doi.org/10.22219/repositor.v2i12.526.

Full text

Abstract:

AbstrakBelakangan ini hampir seluruh universitas yang ada di indonesia memiliki sistem informasi alumninya sendiri-sendiri. Sistem informasi alumni mampu memberikan informasi tentang kondisi alumninya setelah menyelesaikan masa perkuliahannya. Alumni merupakan aktor yang berperan penting dalam pendidikan. Saat ini jurusan Informatika Fakultas Teknik Universitas Muhammadiyah Malang telah memiliki website alumni. Permasalahannya belum adanya sistem yang memberikan alumni rekomendasi grup pada sistem, sehingga para alumni mampu saling bertukar informasi didalamnya. Dengan adanya data alumni dan juga di dukung dengan adanya tracer study, dapat di bentuk suatu rekomendasi grup dari data tracer study. K-medoid adalah metode pengelompokan data ke dalam sejumlah cluster tanpa adanya struktur hirarki antara satu dengan yang lainnya. Algoritma k-medoid memiliki nilai coefficient yang lebih tinggi di bandingkan dengan k-means dalam penelitian ini. Yang mana k-medoid mendapatkan nilai rata-rata Silhouette Score 0.7325888099 dalam pengujian dengan jumlah cluster 5 dan perulangan sebanyak 10 kali. Jika dibandingkan dengan k-means yang hanya memiliki nilai rata-rata Silhouette Score 0.6872873866.AbstractLately, Almost all universities in Indonesia have their own alumni information systems. The alumni information system is able to provide information about the condition of its alumni after collage graduation. Alumni are actors who play important role in education. Currently, the Department of Informatics, Faculty of Engineering, University of Muhammadiyah Malang has an alumni website. The problem is the absence of system that gives alumni group recommendation on the system, so that alumni are able to exchange information in this website. With the alumni data and also supported by the existence of a tracer study, it can be formed as group recommendation from the data tracer study. Clustering is one of tools in data mining that aims to group object into clusters. K-medoid is a method of grouping data into a number of clusters without hierarchical structure from one another. The k-medoid algorithm has higher coefficient value compared to k-means in this study. This K-medoid gets an average value of Silhouette Score 0.7325888099 in testing with the number of clusters 5 and repetitions 10 times. When compared with k-means which only has an average value of Silhouette Score 0.6872873866.

APA, Harvard, Vancouver, ISO, and other styles

14

Aji, Briyan Gifari, Dwi Chandra Aditya Sondawa, Muhammad Rifky Gifari, and Sena Wijayanto. "Penerapan Algoritma K-Means Untuk Clustering Harga Rumah Di Bandung." Jurnal Ilmiah Informatika Global 14, no. 2 (2023): 17–23. http://dx.doi.org/10.36982/jiig.v14i2.3189.

Full text

Abstract:

The need for shelter is one of the fundamental aspects of daily life for humans. A house serves not only as a place to seek protection and rest but also as a venue for socializing with family. One of the factors influencing the decision in choosing a house is its price. House prices vary in each region, depending on factors such as location and other attributes. In major cities like Bandung, house prices differ based on their categories. However, many people still find it challenging to determine the value and discern whether a house is classified as affordable or expensive. Hence, there is a need for a clustering process of house prices in Bandung to aid in comprehending and categorizing house prices based on attributes such as the house price, total building area, and total land area. To understand and analyze the patterns of house prices in Bandung, this study utilizes the K-Means method to cluster the house price data into several groups based on their similarity in attributes. Additionally, the research aims to determine the optimal number of clusters through the cluster validation process using the silhouette index. The findings show that when using n_cluster=2, a silhouette score of 0.8870 is obtained, and with n_cluster=3, the silhouette score is 0.8009. These results indicate that clustering with n_cluster=2 and n_cluster=3 both exhibit strong interpretative structures. Thus, the clustering of house prices in Bandung can be effectively grouped into 2 clusters, as evidenced by the higher silhouette score obtained with n_cluster=2, approaching 1 compared to n_cluster=3.

APA, Harvard, Vancouver, ISO, and other styles

15

Mutawalli, Lalu, Sofiansyah Fadli, and Supardianto Supardianto. "Komparasi Metode Perhitungan Jarak K-Means Paling Baik Terhadap Pembentukan Pola Kunjungan Wisatawan Mancanegara." Journal of Information System Research (JOSH) 5, no. 1 (2023): 159–66. http://dx.doi.org/10.47065/josh.v5i1.4377.

Full text

Abstract:

Understanding patterns among foreign tourists is an urgent matter. These patterns can become knowledge that helps in making better decisions because they are data-driven. The pattern to be elaborated on is regarding the clustering of visits by foreign tourists to tourist destinations in Jakarta. Data mining is an approach that extracts knowledge patterns from a dataset. K-Means is one of the data mining algorithms used for clustering data, where data is grouped based on similarity in features and attributes. This study compares the Euclidean Distance, Manhattan Distance, and Haversine Distance methods to obtain more representative data clusters for the datasets. The datasets in this study are not normally distributed due to outlier data; hence, the DBSCAN algorithm is used for improvement without removing or cutting the data, as it can result in a significant amount of missing values that could affect information that does not align with empirical facts. In this study, 5 clusters were created based on elbow calculation results. The K-Means cluster testing in Euclidean distance yielded a Silhouette Score of 0.36, Inertia of 0.86, and Davies-Bouldin Index of 2.39. The Manhattan method resulted in a Silhouette Score of 0.65, Inertia of 1.46, and Davies-Bouldin Index of 0.47. Meanwhile, applying the Haversine method resulted in a Silhouette Score of 0.36, Inertia of 0.03, and a value of 2.39 for the Davies-Bouldin Index.

APA, Harvard, Vancouver, ISO, and other styles

16

Tampubolon, Andrew Lomaksan Manuel, Thio Marta Elisa Yuridis Butar Butar, and Siti Rochimah. "Segmentasi Pelanggan Majalah pada Situs Web E-Commerce dengan K-Means++ dan Metode RFM." Jurnal Teknologi Informasi dan Ilmu Komputer 11, no. 6 (2024): 1243–52. https://doi.org/10.25126/jtiik.1168208.

Full text

Abstract:

Segmentasi pelanggan merupakan salah satu metode yang dapat diterapkan untuk memaksimalkan peluang bisnis. Hal tersebut dapat membantu bisnis agar tetap kompetitif dalam persaingan pasar. Penerapan Artificial Intelligence (AI) dapat membantu dalam memberikan pemahaman kepada pelaku bisnis tentang segmentasi pelanggan berdasarkan riwayat transaksi. Penelitian ini menerapkan metode Recency, Frequency, and Monetary (RFM) yang dipadukan dengan algoritma clustering K-Means++ untuk melakukan segmentasi pelanggan. Silhouette score menjadi indikator pemilihan nilai k yang paling optimal dalam menentukan jumlah cluster. Kerangka kerja CRISP-DM yang digunakan dalam makalah ini juga membantu mempertahankan proses analisis yang konsisten. Pendekatan statistik sederhana ddigunakan untuk mengklasifikasikan setiap fitur dalam RFM menjadi label low, medium, dan high dalam hal menangkap pola segmentasi pelanggan. Hasil eksperimen menunjukkan nilai k = 3 sebagai yang paling optimal berdasarkan nilai WSS sebesar 843,214747 dan silhouette score sebesar 0,638181. Eksperimen juga menunjukkan bahwa cluster 0 memiliki nilai RFM rata-rata sebesar 1,14 (low), 1,20 (low), dan 301.640 (low). Cluster 1 memiliki nilai RFM rata-rata sebesar 249,61 (high), 2,62 (medium), dan 799,934 (medium). Cluster 2 memiliki nilai RFM rata-rata sebesar 233,01 (medium), 6,41 (high), dan 2018,088 (high). Abstract Customer segmentation is one method that can be applied to maximize business opportunities. It can help businesses remain competitive in the market competition. The application of Artificial Intelligence (AI) can assist in providing business stakeholders with an understanding of customer segmentation based on transaction history. This study applies the Recency, Frequency, and Monetary (RFM) method combined with the K-Means++ clustering algorithm for customer segmentation. The Silhouette score serves as an indicator for selecting the most optimal value of k to determine the number of clusters. The CRISP-DM framework used in this paper also helps maintain a consistent analysis process. A simple statistical approach is used to classify each RFM feature into low, medium, and high labels to capture customer segmentation patterns. Experimental results show that k = 3 is the most optimal value based on a WSS value of 843.214747 and a silhouette score of 0.638181. The experiments also indicate that Cluster 0 has average RFM values of 1.14 (low), 1.20 (low), and 301,640 (low). Cluster 1 has average RFM values of 249.61 (high), 2.62 (medium), and 799,934 (medium). Cluster 2 has average RFM values of 233.01 (medium), 6.41 (high), and 2018.088 (high).

APA, Harvard, Vancouver, ISO, and other styles

17

Tampubolon, Andrew Lomaksan Manuel, Thio Marta Elisa Yuridis Butar Butar, and Siti Rochimah. "Segmentasi Pelanggan Majalah pada Situs Web E-Commerce dengan K-Means++ dan Metode RFM." Jurnal Teknologi Informasi dan Ilmu Komputer 11, no. 6 (2024): 1243–52. https://doi.org/10.25126/jtiik.2024118208.

Full text

Abstract:

Segmentasi pelanggan merupakan salah satu metode yang dapat diterapkan untuk memaksimalkan peluang bisnis. Hal tersebut dapat membantu bisnis agar tetap kompetitif dalam persaingan pasar. Penerapan Artificial Intelligence (AI) dapat membantu dalam memberikan pemahaman kepada pelaku bisnis tentang segmentasi pelanggan berdasarkan riwayat transaksi. Penelitian ini menerapkan metode Recency, Frequency, and Monetary (RFM) yang dipadukan dengan algoritma clustering K-Means++ untuk melakukan segmentasi pelanggan. Silhouette score menjadi indikator pemilihan nilai k yang paling optimal dalam menentukan jumlah cluster. Kerangka kerja CRISP-DM yang digunakan dalam makalah ini juga membantu mempertahankan proses analisis yang konsisten. Pendekatan statistik sederhana ddigunakan untuk mengklasifikasikan setiap fitur dalam RFM menjadi label low, medium, dan high dalam hal menangkap pola segmentasi pelanggan. Hasil eksperimen menunjukkan nilai k = 3 sebagai yang paling optimal berdasarkan nilai WSS sebesar 843,214747 dan silhouette score sebesar 0,638181. Eksperimen juga menunjukkan bahwa cluster 0 memiliki nilai RFM rata-rata sebesar 1,14 (low), 1,20 (low), dan 301.640 (low). Cluster 1 memiliki nilai RFM rata-rata sebesar 249,61 (high), 2,62 (medium), dan 799,934 (medium). Cluster 2 memiliki nilai RFM rata-rata sebesar 233,01 (medium), 6,41 (high), dan 2018,088 (high). Abstract Customer segmentation is one method that can be applied to maximize business opportunities. It can help businesses remain competitive in the market competition. The application of Artificial Intelligence (AI) can assist in providing business stakeholders with an understanding of customer segmentation based on transaction history. This study applies the Recency, Frequency, and Monetary (RFM) method combined with the K-Means++ clustering algorithm for customer segmentation. The Silhouette score serves as an indicator for selecting the most optimal value of k to determine the number of clusters. The CRISP-DM framework used in this paper also helps maintain a consistent analysis process. A simple statistical approach is used to classify each RFM feature into low, medium, and high labels to capture customer segmentation patterns. Experimental results show that k = 3 is the most optimal value based on a WSS value of 843.214747 and a silhouette score of 0.638181. The experiments also indicate that Cluster 0 has average RFM values of 1.14 (low), 1.20 (low), and 301,640 (low). Cluster 1 has average RFM values of 249.61 (high), 2.62 (medium), and 799,934 (medium). Cluster 2 has average RFM values of 233.01 (medium), 6.41 (high), and 2018.088 (high).

APA, Harvard, Vancouver, ISO, and other styles

18

Dharmawan, Tio, Chinta 'Aliyyah Candramaya, and Vandha Pradwiyasma Widharta. "Forming Dataset of The Undergraduate Thesis using Simple Clustering Methods." International Journal of Innovation in Enterprise System 7, no. 01 (2023): 31–40. http://dx.doi.org/10.25124/ijies.v7i01.187.

Full text

Abstract:

Each university collects many undergraduate theses data but has yet to process it to make it easier for students to find references as desired. This study aims to classify and compare the grouping of documents using expert and simple clustering methods. Experts have done ground truth using OR Boolean Retrieval and keyword generation. The best cluster was discovered by the experiments using the K-Means, K-Medoids, and DBSCAN clustering methods and using Euclidean, Manhattan, City Block, and Cosine Similarity metrics. The cluster with the best Silhouette Score compared to the accuracy of the categorization of each document. The K-Means clustering method and the Cosine Similarity metric gave the best results with a Silhouette Score value of 0.105534. The comparison between ground truth and the best cluster results shows an accuracy of 33.42%. The result shows that the simple clustering method cannot handle data with Negative Skewness and Leptokurtic Kurtosis.

APA, Harvard, Vancouver, ISO, and other styles

19

Taufiq, Reny Medikawati, Rahmad Firdaus, Fitri Handayani, Putri Fadhilla Muarif, and Riza Rindriani Rizqy. "Density Based Clustering Untuk Pemetaan Daerah Rawan Gempa Bumi Di Wilayah Sumatera Barat Menggunakan Metode DBSCAN." JURNAL FASILKOM 14, no. 3 (2025): 817–22. https://doi.org/10.37859/jf.v14i3.8833.

Full text

Abstract:

Earthquakes are natural disasters that cannot be prevented or avoided. One of the areas affected is the West Sumatra region, where West Sumatra is one of the regions in Indonesia which is in the Sumatra basin which is vulnerable to earthquakes. Therefore, density-based clustering analysis can be carried out which aims to produce a point map of earthquake-prone areas in the West Sumatra region using the Density Based Spatial Clustering of Application with Noise (DBSCAN) method. In implementing the DBSCAN algorithm, epsilon and minpts parameters are required using the K-Nearest Neighbors method with evaluation of results using the Silhouette Coefficient. The results of DBSCAN clustering using KNN input parameters obtained a total of 3 clusters and 1 noise with a silhouette coefficient value of 0.310 from the 2010-2023 data period. However, from the testing stage without using KNN, we got a high silhouette score, namely 0.890 with 2 clusters and 1 noise.

APA, Harvard, Vancouver, ISO, and other styles

20

W, Winarno. "Comparison Of Clustering Levels Of The Learning Burnout Of Students Using The Fuzzy C-Means And K-Means Methods." Jurnal Teknologi Informasi dan Pendidikan 16, no. 1 (2023): 38–53. http://dx.doi.org/10.24036/jtip.v16i1.668.

Full text

Abstract:

Learning burnout is an impact from work done in a manner Keep going continuously, causing fatigue physical and emotional. If boredom study no handled, got cause students no productive and inhibits potency student . So from that study this proposed method clustering for group level saturation study students. The clustering process in research this use Fuzzy C-Means and K-Means. According to the previous study, Fuzzy C-Means and K-Means can produce results in the best clusters. Destination of study this is to compare performance from method Fuzzy C-Means and K-Means. The dataset used in this study is the boredom of students. Testing was conducted with the use amount clusters 3,4,5. Test results system with method Fuzzy C-Means get score Meanwhile, the global silhouette coefficient is 0.278 for K-Means results testing get score The global silhouette coefficient is 0.287. Temporary for results Davies Bouldin Index, methods Fuzzy C-Means get score 0.224and the K-Means method get value 0.384 of value, the Fuzzy C-Means generates more clusters _ good from K-Means. However both of them have weak structure _ because some data has data distance between one more clusters far from distance between different data clusters, so that creates that data worth.

APA, Harvard, Vancouver, ISO, and other styles

21

Relangi, Naga Durga Satya Siva Kiran, Aparna Chaparala, and Radhika Sajja. "Identification of Potential Quality of Groundwater Using Improved Fuzzy C Means Clustering Method." Mathematical Modelling of Engineering Problems 9, no. 5 (2022): 1369–77. http://dx.doi.org/10.18280/mmep.090527.

Full text

Abstract:

The groundwater quality assessment gained more attention among the water quality management stations and researchers. The conventional water quality index method and artificial neural network models are used to assess groundwater. But these models are inadequate to handle data with uncertainty. In this work, we propose an improved Fuzzy C Means clustering method to identify the homogeneous clusters with respect to groundwater quality. For this purpose 1020 groundwater samples data with 7 physiochemical parameters of the year 2019 are collected from West Godavari, Andhra Pradesh, India. The effectiveness of the proposed clustering method is evaluated with two standard clustering methods namely K-means and Fuzzy C Means. The initial selection of the number of clusters and cluster centers determines the success of both the conventional K Means and Fuzzy C Means clustering methods. The proposed improved Fuzzy C Means method identifies the optimal number of clusters based on the water index value. The proposed improved Fuzzy C Means clustering method is implemented on the groundwater data set. The performance is computed with the help of the silhouette score and Davies Bouldin Index. The proposed clustering method outperforms with the existing K Means and Fuzzy C Means with silhouette score of 0.857 and Davies Bouldin Index value of 0.502 when the number of clusters are 4.

APA, Harvard, Vancouver, ISO, and other styles

22

Zahrotun, Lisna, Utaminingsih Linarti, Banu Harli Trimulya Suandi As, Herri Kurnia, and Liya Yusrina Sabila. "Comparison of K-Medoids Method and Analytical Hierarchy Clustering on Students' Data Grouping." JOIV : International Journal on Informatics Visualization 7, no. 2 (2023): 446. http://dx.doi.org/10.30630/joiv.7.2.1204.

Full text

Abstract:

One sign of how successfully the educational process is carried out on campus in a university is the timely graduation of students. This study compares the Analytic Hierarchy Clustering (AHC) approach with the K-Medoids method, a data mining technique for categorizing student data based on school origin, region of origin, average math score, TOEFL, GPA, and length study. This study was carried out at University X, which contains a variety of architectural styles. The R department, the S department, the T department, and the U department make up one of them. K-Medoids and AHC techniques Utilize the number of clusters 2, 3, and 4 and the silhouette coefficient approach. The evaluation's findings indicate a value. Although there is a linear silhouette between the AHC and K-Medoids methods, the AHC approach (departments R: 0.88, S: 0.87, T: 0.88, and U: 0.88) has a more excellent Silhouette value than K-Medoids (department R: 0.35, department S: 0.65 number of cluster 2, department T: 0.67 number of cluster 2 and program Study U: 0,52). The results of the second approach, which includes the K-Medoids and AHC procedures, are determined by the data distribution to be clustered rather than by the quantity of data or clusters. Based on this methodology, University X can refer to the grouping outcomes for the four departments with two achievements to receive results on schedule.

APA, Harvard, Vancouver, ISO, and other styles

23

Airlangga, Gregorius. "ADVANCED MACHINE LEARNING TECHNIQUES FOR SEISMIC ANOMALY DETECTION IN INDONESIA: A COMPARATIVE STUDY OF LOF, ISOLATION FOREST, AND ONE-CLASS SVM." Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika 5, no. 1 (2024): 49–61. http://dx.doi.org/10.46306/lb.v5i1.490.

Full text

Abstract:

This study presents a comprehensive comparison of three machine learning algorithms for anomaly detection within seismic data, focusing on the unique geographical and geological context of Indonesia, a region prone to frequent seismic events. Local Outlier Factor (LOF), Isolation Forest, and One-Class SVM were assessed using a meticulously curated dataset from the Indonesian Meteorology, Climatology, and Geophysical Agency, standardized to ensure consistent feature scale. Our analysis encompassed both statistical metrics and visualizations to evaluate the performance of each algorithm. The One-Class SVM emerged as the most effective method, achieving the highest silhouette score, indicative of its superior cluster formation and clear distinction between inliers and outliers. The Isolation Forest also demonstrated strong performance with a favorable silhouette score and Davies-Bouldin index, suggesting effective anomaly isolation capabilities. In contrast, the LOF algorithm showed less precision, as indicated by lower silhouette scores and a higher Davies-Bouldin index, suggesting potential challenges in distinguishing between normal and anomalous seismic patterns. Statistical validation using the Kruskal-Wallis H-test confirmed significant differences in the anomaly score distributions of the three algorithms, with a p-value of 0.0. Visualizations through PCA and t-SNE reinforced the quantitative findings, displaying a clear demarcation of anomalies by the One-Class SVM and Isolation Forest, unlike the LOF.The findings underscore the importance of selecting appropriate anomaly detection methods for seismic data analysis, highlighting the robustness of One-Class SVM and Isolation Forest for such applications. The implications of this research are profound for seismic risk management, providing insights that enhance the accuracy and reliability of earthquake prediction systems, which is vital for regions with high seismic activity such as Indonesia.

APA, Harvard, Vancouver, ISO, and other styles

24

Husna, Farida Amila, Diana Purwitasari, Bayu Adjie Sidharta, Drigo Alexander Sihombing, Amiq Fahmi, and Mauridhi Hery Purnomo. "A Clustering Approach for Mapping Dengue Contingency Plan." Scientific Journal of Informatics 9, no. 2 (2022): 149–60. http://dx.doi.org/10.15294/sji.v9i2.36885.

Full text

Abstract:

Purpose: The dengue epidemic has an increasing number of sufferers and spreading areas along with increased mobility and population density. Therefore, it is necessary to control and prevent Dengue Hemorrhagic Fever (DHF) by mapping a DHF contingency plan. However, mapping a dengue contingency plan is not easy because clinical and managerial issues, vector control, preventive measures, and surveillance must be considered. This work introduces a cluster-based dengue contingency planning method by grouping patient cases according to their environment and demographics, then mapping out a plan and selecting the appropriate plan for each area.Methods: We used clustering with silhouette scoring to select features, the best cluster formation, the best clustering method, and cluster severity. Cluster severity is carried out by levelling the attributes of the average value to low, medium, high, and extreme, which are related to the plans each region sets for village type and season type.Result: In five years of data (2016-2020) ±15K cases from Semarang City, Indonesia, feature selection results show that environmental and demography group features have the biggest silhouette score. With these features, it is found that K-Means has a high silhouette score compared to DBSCAN and agglomerative with three optimum numbers of clusters. K-Means also successfully mapped the cluster severity and assigned the cluster to a suitable contingency policy.Novelty: Most of the research on DHF cases is about predicting DHF cases and measuring the risk of DHF occurrence. There are not many studies that discuss the policy recommendations for dengue control.

APA, Harvard, Vancouver, ISO, and other styles

25

Abidin, Mochammad Syahrul, Yeni Kustiyahningsih, Eza Rahmanita, Budi Dwi Satoto, and Muhammad Iqbal Firmansyah. "Application of the DBSCAN Algorithm in MSME Clustering using the Silhouette Coefficient Method." Jurnal Sistem Cerdas 7, no. 3 (2024): 260–70. https://doi.org/10.37396/jsc.v7i3.472.

Full text

Abstract:

MSMEs participate in the very important contribution of developing Indonesia's economy, where this industry contributes to GDP and also to the absorption of labor. Most MSMEs in Sidoarjo Regency are still constrained by financial management and the utilization of technology. This research will apply the DBSCAN method to clustering MSMEs in Sidoarjo for the purpose of finding patterns in characteristics related to capital, turnover, and workforce. The analysis will involve 1,479 MSMEs, while the research methodology applies the CRISP-DM method to guide the process from business understanding up to the implementation phase. Normalization using Simple Feature Scaling was applied before clustering. The results of this analysis provide insight that the best possible combination of the parameters in DBSCAN is epsilon (ε) 0.10 and MinPts 16, which gives the optimal value of Silhouette Score as 0.4304. It creates seven clusters, in which the third has the highest Silhouette value of 0.9326, indicating that there are high similarities recorded within that cluster. These results provide essential lessons to develop more targeted policy strategies and interventions for MSMEs in Sidoarjo and explore the capabilities of DBSCAN as an effective analytical tool in determining the characteristics of businesses in the region.

APA, Harvard, Vancouver, ISO, and other styles

26

Ningrum, Afifah Vera Ferencia Fitria, Mochammad Anshori, and Risqy Siwi Pradini. "Klasterisasi Peserta KB Aktif di Desa Kalirejo Lawang Menggunakan Metode K-Means." Jurnal Indonesia : Manajemen Informatika dan Komunikasi 6, no. 1 (2025): 729–41. https://doi.org/10.35870/jimik.v6i1.1273.

Full text

Abstract:

The Family Planning (KB) program in Kalirejo Lawang Village faces challenges in the process of clustering active participants, which is time-consuming and prone to errors. Based on these challenges, a clustering solution using the K-Means algorithm was proposed. Experiments were conducted by testing the number of clusters from 2 to 8 and evaluating them using the silhouette score. The results of the study showed that the optimal number of clusters is two, as indicated by a silhouette score of 0.447. This value represents the best clustering quality compared to other cluster numbers, where the scores for clusters 3 to 8 did not exceed this value. This demonstrates that clustering into two groups provides the most optimal results. Lower scores for clusters 3 to 8 indicate that dividing the clusters into more groups did not create clear separations or worsened the cohesion within clusters. The conclusion of this study shows that the K-Means method can be applied and is reliable for clustering active KB participants in Kalirejo Lawang Village. With its speed and accuracy, K-Means offers a significant solution to improving the efficiency of the KB program at the village level. The practical implication of this research is to provide a more structured basis for planning and decision-making in the KB program at the village level. These findings mark an important step in optimizing the management of KB program data, opening opportunities for broader implementation in other areas.

APA, Harvard, Vancouver, ISO, and other styles

27

Wikamulia, Nathaniel, and Sani Muhamad Isa. "Predictive business intelligence dashboard for food and beverage business." Bulletin of Electrical Engineering and Informatics 12, no. 5 (2023): 3016–26. http://dx.doi.org/10.11591/eei.v12i5.5162.

Full text

Abstract:

This research was conducted to provide an example of predictive business intelligence (BI) dashboard implementation for the food and beverage business (businesses that sell fast-expired goods). This research was conducted using data from a bakery's transactional database. The data are used to perform demand forecasting using extreme gradient boosting (XGBoost), and recency, frequency, and monetary value (RFM) analysis using mini batch k-means (MBKM). The data are processed and displayed in a BI dashboard created using Microsoft Power BI. The XGBoost model created resulted in a root mean square error (RMSE) value of 0.188 and an R2 score of 0.931. The MBKM model created resulted in a Dunn index value of 0.4264, a silhouette score value of 0.4421, and a Davies-Bouldin index value of 0.8327. After the BI dashboard is evaluated by the end user using a questionnaire, the BI dashboard gets a final score of 4.77 out of 5. From the BI dashboard evaluation, it was concluded that the predictive BI dashboard succeeded in helping the analysis process in the bakery business by: accelerating the decision-making process, implementing a data-driven decision-making system, and helping businesses discover new insights.

APA, Harvard, Vancouver, ISO, and other styles

28

Prasetyawan, Daru, Agus Mulyanto, and Rahmadhan Gatra. "Pemetaan Lintasan Karir Alumni Berdasarkan Analisis Cluster: Kombinasi K-Means dan Reduksi Dimensi Autoencoder." Edumatic: Jurnal Pendidikan Informatika 9, no. 1 (2025): 198–207. https://doi.org/10.29408/edumatic.v9i1.29713.

Full text

Abstract:

Alumni career mapping is a crucial aspect of evaluating and developing higher education programs. Cluster analysis, particularly the integration of k-means and autoencoder methods, has emerged as an effective solution for grouping complex and multi-dimensional alumni career data. This study aims to implement and assess the combination of k-means and autoencoder algorithms in alumni career mapping based on GPA, study duration, waiting time, job type, salary, job level, and field of study suitability. The autoencoder is employed to reduce dimensions, while k-means clusters alumni into groups based on the similarity of their career profiles. The data used in the cluster analysis is sourced from the tracer study. Pre-processing of the tracer study data is conducted through several stages, including cleaning, encoding, and normalization. The evaluation results indicate that the combination of k-means and autoencoder yields superior Silhouette and DBI scores. The Silhouette score with the autoencoder achieved 0.6112, while without it, the score was only 0.3956. The DBI value with the autoencoder is 0.566, whereas without it, the DBI reached 1.022. This cluster analysis effectively grouped the tracer study data into six clusters based on similarities in career profiles. The clustering results suggest that the formed clusters are more influenced by the alumni's job type and duration of study.

APA, Harvard, Vancouver, ISO, and other styles

29

Anggraeni, Rini, Farrikh Alzami, Aris Nurhindarto, et al. "Clustering IT Incidents Using K-Means: Improving Incident Response Time in Service Management." Sinkron 9, no. 2 (2025): 936–47. https://doi.org/10.33395/sinkron.v9i2.14822.

Full text

Abstract:

Incident management is one of the critical processes in Information Technology service management that aims to manage disruptions and minimize the impact of unexpected incidents on business services. This study applies the K-Means algorithm to cluster IT service incidents, aiming to enhance company operational efficiency. Utilizing a dataset from the UCI Machine Learning Repository comprising 141,712 events related to 24,918 incidents, this research analyzes incident patterns and characteristics for optimized handling. The data was analyzed through a series of preprocessing stages, and the elbow and silhouette methods were used to determine the optimal number of clusters. From the results, it was successfully grouped into 4 (four) clusters with a distortion score value of 964264294.569 and 0.52 silhouette score based on incident characteristics, such as urgency, priority, and number of reassignments. From this, the clustering results show that the K-Means algorithm effectively identifies incidents that require further handling, such as those with high urgency and priority, as well as helping the company focus resources to resolve incidents that have the most impact on the business sector. This research provides a data-driven solution to improve incident management and Service Level Agreement (SLA) fulfillment, while offering a framework for more effective and efficient IT incident analysis and resource allocation.

APA, Harvard, Vancouver, ISO, and other styles

30

Abdullah, Ahmad Irfan, Adri Priadana, Muhajir Muhajir, and Syahrir Nawir Nur. "Data Mining for Determining The Best Cluster Of Student Instagram Account As New Student Admission Influencer." Telematika 18, no. 2 (2021): 255. http://dx.doi.org/10.31315/telematika.v18i2.5067.

Full text

Abstract:

Purpose: This study aims to apply the web data extraction method to extract student Instagram account data and the K-Means data mining method to perform clustering automatically to determine the best cluster of students' Instagram accounts as influencers for new student admissions.Design/methodology/approach: This study implemented the web data extraction method to extract student Instagram account data. This study also implemented a data mining method called K-Means to cluster data and the Silhouette Coefficient method to determine the best number of clusters.Findings/result: This study has succeeded in determining the seven best student accounts from 100 accounts that can be used as influencers for new student admissions with the highest Silhouette Score for the number of influencers selected between 5-10, which is 0.608 of the 22 clusters.Originality/value/state of the art: Research related to the determination of the best cluster of students' Instagram accounts as new student admissions influencers using web data extraction and K-Means has never been done in previous studies.

APA, Harvard, Vancouver, ISO, and other styles

31

Ramadhan, Hafid, Mohammad Rizal Abdan Kamaludin, Muhammad Alfan Nasrullah, and Dwi Rolliawati. "Comparison of Hierarchical, K-Means and DBSCAN Clustering Methods for Credit Card Customer Segmentation Analysis Based on Expenditure Level." Journal of Applied Informatics and Computing 7, no. 2 (2023): 246–51. http://dx.doi.org/10.30871/jaic.v7i2.5790.

Full text

Abstract:

The amount of data from credit card users is increasing from year to year. Credit cards are an important need for people to make payments. The increasing number of credit card users is because it is considered more effective and efficient. The third method used today has a function to determine the effective outcome of credit card user scenarios. In this study, a comparison was made using the Hierarchical Clustering, K-Means and DBSCAN methods to determine the results of credit card customer segmentation analysis to be used as a market strategy. The results obtained based on the best silhouette coefficient score method is two cluster hierarchical clustering with 0.82322 score. Based on the best mean value customers are divided into two segments, and it is suggested to develop strategies for both segments.

APA, Harvard, Vancouver, ISO, and other styles

32

Paramita, Cinantya, Fauzi Adi Rafrastara, and Catur Supriyanto. "Pemanfaatan Algoritma K-Means untuk Membuktikan Implementasi Undang-Undang Pelanggaran Hukum Korupsi di Pengadilan Negeri Banjarmasin." Jurnal Informatika: Jurnal Pengembangan IT 8, no. 2 (2023): 149–54. http://dx.doi.org/10.30591/jpit.v8i2.5216.

Full text

Abstract:

This research aims to demonstrate the implementation of the Anti-Corruption Law in the Banjarmasin District Court by utilizing the K-Means algorithm. Corruption, which persists in Indonesia over a prolonged period, has reached a critical level, making it crucial to enforce the law fairly and firmly. In this study, the panel of judges in the Banjarmasin District Court was analyzed using the K-Means Clustering method and silhouette coefficient to decide corruption cases that result in state losses. The research findings indicate that the optimal number of clusters is 3, with a value of 0.686. However, there is also a lowest value among the 4 clusters, which is 0.454. These clusters are then divided into three categories of enforcement, namely cases that have been executed (108 cases), cases that will be executed (26 cases), and cases that have not been executed (2 cases). All clusters have a silhouette score of 0.742, indicating successful enforcement. This research provides concrete evidence that the panel of judges in the Banjarmasin District Court has implemented the Anti-Corruption Law while considering state losses. By utilizing the K-Means algorithm, this study also contributes to a better understanding of enforcement practices in the court. It is expected that the results of this research will support efforts to enhance the implementation of the Anti-Corruption Law in Indonesia, particularly in the Banjarmasin District Court

APA, Harvard, Vancouver, ISO, and other styles

33

Ji, Hyeonbin, Ingeun Hwang, Junghwon Kim, Suan Lee, and Wookey Lee. "Leveraging feature extraction and risk-based clustering for advanced fault diagnosis in equipment." PLOS ONE 19, no. 12 (2024): e0314931. https://doi.org/10.1371/journal.pone.0314931.

Full text

Abstract:

In the contemporary manufacturing landscape, the advent of artificial intelligence and big data analytics has been a game-changer in enhancing product quality. Despite these advancements, their application in diagnosing failure probability and risk remains underexplored. The current practice of failure risk diagnosis is impeded by the manual intervention of managers, leading to varying evaluations for identical products or similar facilities. This study aims to bridge this gap by implementing advanced data analysis techniques on maintenance data from an aluminum extruder. We have employed text embedding, dimensionality reduction, and feature extraction methods, integrating the K-means algorithm with the Silhouette Score for risk level classification. Our findings reveal that the combination of Word2Vec for embedding and Contractive Auto Encoder for dimensionality reduction and feature extraction yields high-performance results. The optimal cluster count, identified as three, achieved the highest Silhouette Score. Statistical analysis using one-way ANOVA confirmed the significance of these findings with a p-value of 5.3213 × e−6, well within the 5% significance threshold. Furthermore, this study utilized BERTopic for topic modeling to extract principal topics from each cluster, facilitating an in-depth analysis of the clusters in relation to the extruder’s characteristics. The outcome of this research offers a novel methodology for facility managers to objectively diagnose equipment failures. By minimizing subjective judgment, this approach is poised to significantly enhance the efficacy of quality assurance systems in manufacturing, leveraging the robust capabilities of artificial intelligence.

APA, Harvard, Vancouver, ISO, and other styles

34

Laurenso, Justin, Danny Jiustian, Felix Fernando, Vartin Suhandi, and Theresia Herlina Rochadiani. "Implementation of K-Means, Hierarchical, and BIRCH Clustering Algorithms to Determine Marketing Targets for Vape Sales in Indonesia." Journal of Applied Informatics and Computing 8, no. 1 (2024): 62–70. http://dx.doi.org/10.30871/jaic.v8i1.4871.

Full text

Abstract:

In today's era, smoking is a common thing in everyday life. Along with the development of the times, an innovation emerged, namely the electric cigarette or vape. Electric cigarettes or vapes use electricity to produce vapor. The e-cigarette business is very promising in today's business world due to the consistent increase in market demand. However, determining the target buyer is one of the things that is quite important in determining the success of a business. In this analysis, the background of each region in Indonesia has different diversity; therefore, observation of data is needed to find out which regions in Indonesia have the potential to increase marketing based on profits (margins) to support the target market analysis process so that companies do not suffer losses and increase business success. In this study, the analysis will be carried out using vape quantity, margin, and purchasing power data in each region, which is processed using 3 algorithms: K-Means, Hierarchical, and BIRCH. The results of the clustering of the three algorithms produce two clusters. The K-means, Hierarchical, and BIRCH algorithms produce the same clusters: a potential cluster consisting of 18 cities and a non-potential cluster consisting of 45 cities. To see the performance of the model results, an evaluation was carried out using the Silhouette score, Davies Bouldin, Calinski Harabasz, and Dunn index, which obtained results of 0.765201, 0.376322, 315.949434, and 0.013554. From these results, it can be concluded that the clustering results are not too good and not too bad because the greater the Silhouette Score, Calinski Harabasz, and Dunn Index value, the better the clustering results while for Davies Bouldin the smaller the value means the better the clustering results.

APA, Harvard, Vancouver, ISO, and other styles

35

ASLANTAŞ, Gözde, Mustafacan GENÇGÜL, Merve RUMELLİ, Mustafa ÖZSARAÇ, and Gözde BAKIRLI. "Customer Segmentation Using K-Means Clustering Algorithm and RFM Model." Deu Muhendislik Fakultesi Fen ve Muhendislik 25, no. 74 (2023): 491–503. http://dx.doi.org/10.21205/deufmd.2023257418.

Full text

Abstract:

The key points in customer segmentation are determining target customer groups and satisfying their needs. Recency-Frequency-Monetary (RFM) analysis and K-Means clustering algorithm are the popular methods for customer segmentation when analyzing customer behavior. In our study, we adapt the K-means clustering algorithm to RFM model by extracting features that represent RFM aspects of home appliances. Customers with similar RFM-oriented features are assigned to the same clusters, while customers with non-similar RFM-oriented features are assigned to different clusters. In the experiments, clustering achieved the determined threshold for Silhouette Score. The resulting clusters were ranked and named by Customer Lifetime Value (CLV) metric, which measures how valuable a customer is to the business.

APA, Harvard, Vancouver, ISO, and other styles

36

Nugraha, Dimas Reza, Ahmad Turmudi Zy, and Aswan Supriyadi Sunge. "The Use of K-Means Algorithm Clustering in Grouping Life Expectancy (Case Study: Provinces in Indonesia)." Journal of Computer Networks, Architecture and High Performance Computing 6, no. 3 (2024): 1055–65. http://dx.doi.org/10.47709/cnahpc.v6i3.4171.

Full text

Abstract:

Life expectancy is defined as information that illustrates the age of the death of a population. Life expectancy is a general picture of the state of a region. If the infant mortality rate is high, then the life expectancy in the area is low. And vice versa, if the infant mortality rate is low, the life expectancy in the region is high. Life expectancy is also a benchmark for government actions in improving the welfare of society and the human development index. For this reason, it is necessary to group life expectancy data to make it easier to determine the provinces with high, middle, and low life expectancy. The results of cluster testing using the silhouette score method showed that two subjects had a low silhouette score level, which caused the cluster value to be less than optimal, namely East Java & Gorontalo. The clustering results found that the cluster was divided into 3, namely cluster 1, with a high level of life expectancy consisting of 10 provinces, namely East Java, Riau, North Sulawesi, Bali, North Kalimantan, DKI Jakarta, West Java, Central Java, East Kalimantan and Special Region of Yogyakarta. Cluster 2 has a level of middle-life expectancy consisting of 18 provinces, namely Gorontalo, North Maluku, Central Sulawesi, South Kalimantan, North Sumatra, Bengkulu, West Sumatra, Central Kalimantan, Aceh, South Sumatra, Banten, Kep. Riau, South Sulawesi, Kep. Bangka Belitung, Lampung, West Kalimantan, Southeast Sulawesi and Jambi. Cluster 3, with a low level of life expectancy, consists of 6 provinces, namely West Sulawesi, Papua, Maluku, West Papua, West Nusa Tenggara, and East Nusa Tenggara.

APA, Harvard, Vancouver, ISO, and other styles

37

Yanto, Vito Dwi, and Irma Handayani. "Implementation of The K-Means Clustering Algorithm in Determining The Rate of Indramayu Mango Fruit." Journal of Scientific Research, Education, and Technology (JSRET) 3, no. 4 (2024): 1929–38. https://doi.org/10.58526/jsret.v3i4.609.

Full text

Abstract:

This research aims to classify the ripeness levels of Indramayu mangoes using the K-Means Clustering algorithm based on HSV (Hue, Saturation, Value) color features. The process begins with capturing mango images, followed by preprocessing steps such as normalization and resizing to enhance image quality. Next, color feature extraction is conducted, focusing on the Hue value as an indicator of color changes that characterize ripeness levels. The optimal number of clusters is determined using the Elbow method, resulting in two clusters: ripe mangoes and unripe mangoes. The clustering quality evaluation is performed using the Silhouette Score, which indicates an accuracy of 80%. The results demonstrate that the K-Means algorithm successfully classifies Indramayu mangoes, generating 495 image data divided into two main categories. This study contributes to improving the efficiency of automated mango ripeness classification, with potential applications in the agricultural industry.

APA, Harvard, Vancouver, ISO, and other styles

38

Anggriani, Yesi Pitaloka, Alfis Arif, and Febriansyah Febriansyah. "Implementasi Algoritma K-Means Clustering Dalam Menentukan Blok Tanaman Sawit Produktif Pada PT Arta Prigel." Jurnal Komtika (Komputasi dan Informatika) 8, no. 1 (2024): 22–32. http://dx.doi.org/10.31603/komtika.v8i1.11192.

Full text

Abstract:

The purpose of this study is to implement the K-Means Clustering method to determine the patterns of productive oil palm production based on their blocks at Pt Arta Prigel. The research is motivated by issues within the oil palm blocks, such as the absence of productive block summaries, insufficient plantation land analysis, and erroneous decision-making. The development method utilizes CRISP-DM, with data spanning 2 years from October 2021 to October 2023. From the 1275 production records, after cleaning, 1015 records remain. Filtering the initial 51 blocks results in 37 blocks for the years 2021 and 2022, and 46 blocks for the year 2023. After clustering, the production outcomes for the year 2021 are as follows: cluster_0 has 34 blocks, cluster_1 has 2 blocks, and cluster_2 has 10 blocks. For the year 2022, cluster_0 has 36 blocks, cluster_1 has 8 blocks, and cluster_2 has 28 blocks. In the year 2023, cluster_0 has 39 blocks, cluster_1 has 8 blocks, and cluster_2 has 33 blocks. The testing method employs the silhouette coefficient, and the silhouette score testing results indicate the formation of 3 clusters (K=3) with a value of 0.61. The findings of this study include patterns, graphs, and production tables generated using the K-Means Clustering method at Pt Arta Prigel.

APA, Harvard, Vancouver, ISO, and other styles

39

Anggriani, Yesi Pitaloka, Alfis Arif, and Febriansyah Febriansyah. "Implementation of the K-Means Clustering Algorithm in Determining Productive Oil Palm Blocks at Pt Arta Prigel." JISA(Jurnal Informatika dan Sains) 7, no. 1 (2024): 53–58. http://dx.doi.org/10.31326/jisa.v7i1.2008.

Full text

Abstract:

The purpose of this study is to implement the K-Means Clustering method to determine the patterns of productive oil palm production based on their blocks at Pt Arta Prigel. The research is motivated by issues within the oil palm blocks, such as the absence of productive block summaries, insufficient plantation land analysis, and erroneous decision-making. The development method utilizes CRISP-DM, with data spanning 2 years from October 2021 to October 2023. From the 1275 production records, after cleaning, 1015 records remain. Filtering the initial 51 blocks results in 37 blocks for the years 2021 and 2022, and 46 blocks for the year 2023. After clustering, the production outcomes for the year 2021 are as follows: cluster_0 has 34 blocks, cluster_1 has 10 blocks. For the year 2022, cluster_0 has 24 blocks, cluster_1 has 37 blocks. In the year 2023, cluster_0 has 44 blocks, cluster_1 has 27 blocks. The testing method employs the silhouette coefficient, and the silhouette score testing results indicate the formation of 2 clusters (K=2) with a value of 0.62, the results obtained from testing with 2 clusters indicate that the formed clusters are accurate. The findings of this study include patterns, graphs, and production tables generated using the K-Means Clustering method at Pt Arta Prigel.

APA, Harvard, Vancouver, ISO, and other styles

40

Gani, Friansyah, Hasan S. Panigoro, Sri Lestari Mahmud, Emli Rahmi, Salmun K. Nasib, and La Ode Nashar. "Implementation of K-Nearest Neighbor Algorithm on Density-Based Spatial Clustering Application with Noise Method on Stunting Clustering." Jurnal Diferensial 6, no. 2 (2024): 170–78. https://doi.org/10.35508/jd.v6i2.16278.

Full text

Abstract:

This paper studies the implementation of the K-Nearest Neighbor (KNN) algorithm on Density-Based Spatial Clustering Application with Noise (DBSCAN) method on stunting Clustering in the eastern region of Indonesia in 2022. The DBSCAN method is used because it is more efficient to perform the Clustering process for irregular Clustering shapes. The main objective of this study is to apply the KNN algorithm to the DBSCAN Clustering technique in 161 Districts/Cities in 11 provinces in eastern Indonesia. A comparison of the performance evaluation of the DBSCAN Clustering technique is done by considering the value of the Silhouette score, BetaCV score, and Davies-Bouldin score indicating the quality of the Clusters formed with the lowest results scores of 0.67 and 1.84 with epsilon value = 3.4 and minimum point value = 2 resulting in 4 Clusters. The results of Clustering 161 Districts and Cities based on the factors that cause stunting formed 4 Clusters where Cluster 0 consists of 119 Districts and Cities with very high stunting characteristics, Cluster 1 consists of 3 Districts and Cities with high stunting characteristics, the results of Cluster 2 consist of 2 Districts and Cities with low stunting characteristics, then the results of Cluster 2 consist of 2 Districts and Cities with low stunting characteristics and Cluster 3 consists of 2 Cities with very low stunting characteristics.

APA, Harvard, Vancouver, ISO, and other styles

41

Wikantari, Made Mita, Yuliant Sibaroni, and Aditya Firman Ihsan. "Clustering Content Types and User Motivation Using DBSCAN on Twitter." Journal of Computer System and Informatics (JoSYC) 4, no. 4 (2023): 741–48. http://dx.doi.org/10.47065/josyc.v4i4.3750.

Full text

Abstract:

We are currently in an era full of information and communication technology. One of the communication media used is Twitter. Twitter is a microblogging service that is used by its users to express their thoughts on a topic called a tweet. Tweets that are posted can be either positive tweets or negative tweets. One of the topics that is currently being discussed by Twitter users is Anies Baswedan as a 2024 Indonesian Presidential Candidate. Many people have tweeted this but it is not known how many users support or reject Anies Baswedan to run as a 2024 Indonesian presidential candidate. To assist the analysis, use the method clustering namely algorithm (Density-Based Spatial Clustering of Application with Noise). DBSCAN has the advantage of being able to detect data that is not included in a cluster and will be considered noise. This can improve the accuracy of the grouping because the data in the cluster will be cleaner. The TF-IDF Vectorizer is used to make it easier for programs to manage data because it can turn sentences into vectors that can be processed by the algorithm. To determine the evaluation of the program, the silhouette score method will be used. The results of calculating the silhouette score show a value of 0.29 with the formation of 3 clusters. Then an analysis is carried out based on the top words from each cluster and it can be identified that cluster 0 has a positive category supporting Anies Baswedan to run for the 2024 Presidential Candidate and cluster 1 has a negative category that does not support Anies Baswedan not advancing for the 2024 Presidential Candidate.

APA, Harvard, Vancouver, ISO, and other styles

42

Villamor, Juanga Jean Marie, Myrafe Sebastian Ylagan, and Patricia Almie Louise R. Mariscotes. "Exploring and Contrasting Artistic Architectural Elements in Selected Residential Structures throughout Davao City: A Comparative Investigation." European Journal of Arts, Humanities and Social Sciences 2, no. 1 (2025): 73–88. https://doi.org/10.59324/ejahss.2025.2(1).09.

Full text

Abstract:

This study quantitatively assessed the aesthetic value of selected residential buildings in Davao City and explored the factors influencing aesthetic preferences. Specifically, it aimed to quantify aesthetic scores based on Order and Complexity, compare the results between ancestral and modern houses, and identify the dominant characteristics contributing to their overall aesthetics. A total of sixty-four (64) residential houses were evaluated, comprising thirty-two (32) Heritage Houses recognized by Museo Dabawenyo and thirty-two (32) Modern Houses, evenly distributed across the three districts of Davao City. The assessment employed Birkhoff’s Aesthetic Measure, which considers two primary factors: Order (O) and Complexity (C). The Order factor encompassed six components: symmetry, repetition, equilibrium, disposition, color harmony, and negative factor. Meanwhile, the Complexity factor comprised four components: form complexity, ornaments, silhouette differentiation, and color contrast. Each component was scored on a scale of 0 to 2, and these scores were utilized to compute each house’s aesthetic measure (Order/Complexity) for further interpretation. Findings revealed that both ancestral and modern houses prioritized bilateral symmetry, balanced proportions, and moderate detail complexity. Quantitative results indicated a stronger emphasis on Order rather than Complexity across both house types. This study contributes to the objective evaluation of residential aesthetics and offers valuable insights for architects and designers in creating visually appealing buildings and living environments.

APA, Harvard, Vancouver, ISO, and other styles

43

Achmad, Syifa Latifah, Ahmad Fauzi, Rahmat Rahmat, and Jamaludin Indra. "SEGMENTASI PELANGGAN MENGGUNAKAN K-MEANS CLUSTERING DI TOKO RETAIL." Jurnal Teknik Informasi dan Komputer (Tekinkom) 7, no. 2 (2024): 736. https://doi.org/10.37600/tekinkom.v7i2.1226.

Full text

Abstract:

Advancements in information technology have transformed various aspects of human life, including the business world. Companies are required to use technology and data effectively to enhance their competitive advantage. One increasingly relevant strategy is Customer Relationship Management (CRM), where customer data is the main focus. Consumer data segmentation is an approach used to group customers based on certain characteristics. In this study, the K-Means Clustering algorithm is applied to consumer data segmentation to improve the marketing strategy of a store. The study begins with the collection of customer data from the Dan+Dan Telukjambe 2 store, followed by Exploratory Data Analysis (EDA) to understand the patterns and characteristics of the data. Preprocessing steps are carried out to ensure the data is ready for use, including removing irrelevant columns, handling missing values, and data transformation. Principal Component Analysis (PCA) is used to reduce data dimensions before applying K-Means Clustering. The Elbow Method and Silhouette Score are used to determine the optimal number of clusters. The study results indicate that the optimal number of clusters is six. Evaluation using the Silhouette Coefficient provides an average coefficient value of 0.66, indicating good clustering quality. Further analysis shows different distributions of age, purchasing power, occupation, and marital status in each cluster, providing deep insights into customer segments. The resulting clusters offer valuable information for developing more effective and targeted marketing strategies

APA, Harvard, Vancouver, ISO, and other styles

44

Rodiatun, Rodiatun, and Sri Lestari. "Assessment Clusterization Teacher Performance with K-Means Algorithm Clustering and Agglomerative Hierarchical Clustering (AHC)." sinkron 9, no. 1 (2025): 357–65. https://doi.org/10.33395/sinkron.v9i1.14200.

Full text

Abstract:

Research This aims to do clustering evaluation teacher performance with the application of the K-means clustering algorithm and agglomerative hierarchical clustering (AHC). Background study This is based on needs to increase quality teaching through analysis and evaluation and better teacher performance. The methods applied involving assessment data collection performance from teachers in the environment education local, processed using a second algorithm The results of the research show that the silhouette score value for K-means reached 0.364, while AHC produced a value 0.343. With Thus, K-means is proven more effective in grouping assessment data and teacher performance compared to AHC. The conclusion of the study This confirms the importance of implementation of the K-means algorithm to get more insight into good evaluation teacher performance. Author Ready to do repairs or revisions to the manuscript. This is in accordance with comments and suggestions from the reviewer as a condition beginning. For processing more, carry on.

APA, Harvard, Vancouver, ISO, and other styles

45

Bulut, Yunus. "DENETİMSİZ ÖĞRENME: KÜMELEME ANALİZİ İLE OECD ÜLKELERİNDE ÖZGÜRLÜK." International Journal of Educational and Social Sciences 2, no. 2 (2023): 83–106. https://doi.org/10.5281/zenodo.10429529.

Full text

Abstract:

This study was carried out using cluster analysis, one of the unguided learning methods, to evaluate the economic freedom levels of OECD countries in 2023. With the optimal k value determined by using genetic algorithms and annealing simulation optimization algorithms, countries were examined with two different k-means clustering analyses. According to the results of the analysis, the annealing simulation method performed better than the genetic algorithm method. The high silhouette score indicates that the clusters formed by the simulated annealing method are more homogeneous and well separated from each other, while the low clustering error indicates that these clusters are closer and more distinct from the data points. The results of this study make an important contribution by providing a scientifically based approach to assessing levels of economic freedom and shaping future economic policies.

APA, Harvard, Vancouver, ISO, and other styles

46

Amalia, Nur Laita Rizki, Ahmad Afif Supianto, Nanang Yudi Setiawan, Vicky Zilvan, Asri Rizki Yuliani, and Ade Ramdan. "Student Academic Mark Clustering Analysis and Usability Scoring on Dashboard Development Using K-Means Algorithm and System Usability Scale." Jurnal Ilmu Komputer dan Informasi 14, no. 2 (2021): 137–43. http://dx.doi.org/10.21609/jiki.v14i2.980.

Full text

Abstract:

Learning activities are one of the processes of delivering information or messages from teachers to students. SMPN 4 Sidoarjo is a State Junior High School (JHS) located in Sidoarjo Regency. During the learning process, the collected academic score data were still not well organized by teachers and school principals in monitoring student learning performance. The score data is from Bahasa Indonesia subject from a teacher with 222 data included at 2019/2020 school year. The method used in student clustering is K-Means. The number of clusters are determined using the elbow method and displayed in graphic form. Clustering result can be used as a reference for teachers in determining study groups and determining the best treatment for each cluster. The best clustering results are proven by validation score using Davies-Bouldin Index, Silhouette Width, and Calinski-Harabasz Index. Three clusters were obtained for each class level of data, while the cluster ranges from two to five for the data for each study group. The dashboard is used in order to visualize the clustering result. Usability testing using System Usability Scale (SUS) has a score value of 87.5, which means that the dashboard can be accepted by SMPN 4 Sidoarjo.

APA, Harvard, Vancouver, ISO, and other styles

47

Pribowo, Putra. "Data Prediction Of Receivables In 2021-2023 At Bank Syariah Indonesia Tbk With Regression And Clustering Methods." Eduvest - Journal of Universal Studies 5, no. 2 (2025): 1830–45. https://doi.org/10.59188/eduvest.v5i2.1803.

Full text

Abstract:

This study aims to predict receivables data at Bank Syariah Indonesia Tbk for the 2021-2023 period using regression and clustering methods. Data analysis methods such as regression and clustering have been used to predict credit risk and receivables payment behavior. A linear regression model is applied to predict the future value of different types of receivables (Murabahah, Istishna, Multijasa, Qardh, Serent), while K-Means clustering is used to group data based on five main variables. The results of the analysis show that the linear regression model is able to predict future values with quite good accuracy, shown by the compatibility between the actual value and the predicted value. K-Means clustering produces three fairly good clusters, with a silhouette score of 0.51, which indicates adequate cluster quality. Visualization of the results of the analysis shows the distribution and patterns in the data, providing insight into the relationships between different types of receivables. This research provides a deeper understanding of the structure of receivables data and aids in decision-making based on future predictions and data grouping

APA, Harvard, Vancouver, ISO, and other styles

48

Raditya, Muhammad Hafidh, Indwiarti, and Aniq Atiqi Rohmawati. "House Prices Segmentation Using Gaussian Mixture Model-Based Clustering." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 6, no. 5 (2022): 866–71. http://dx.doi.org/10.29207/resti.v6i5.4459.

Full text

Abstract:

House is a place for humans to live and a main necessity for humans. For years, the need for houses is increasing and varied so that it affects the selling price of the house. Therefore, more research is needed to learn about the selling price of houses. This research is only focusing on house price segmentation in DKI Jakarta using the Gaussian Mixture Model-Based Clustering Method with the Expectation-Maximization algorithm. The goal of this research is to make a house price segmentation model so that we can obtain useful information for the potential buyer. Clustering with GMM utilize the log-likelihood function to optimize the GMM parameters. The result of this research is houses in DKI Jakarta can be segmented into 3 different clusters. The first cluster is for the low-profile houses. The second cluster is for the mid-profile houses. The third cluster is for the high-profile houses. The silhouette score that was produced by the clustering method is 0.60866 meaning that this score is quite good because it’s close to a value of 1.

APA, Harvard, Vancouver, ISO, and other styles

49

Nikmah, Tiara Lailatul, Nur Hazimah Syani Harahap, Gina Cahya Utami, and Muhammad Mirza Razzaq. "Customer Segmentation Based on Loyalty Level Using K-Means and LRFM Feature Selection in Retail Online Store." Jurnal ELTIKOM 7, no. 1 (2023): 21–28. http://dx.doi.org/10.31961/eltikom.v7i1.648.

Full text

Abstract:

Customer experience is a key component in increasing sales numbers. Customers are important assets that must be kept up for a corporation or firm. Prioritizing customer service is one way to protect client loyalty. To ensure that service priority is right on target, this research was conducted on groups of consumers who are anticipated to have high business prospects. The 2011 retail online shop sales dataset with 379,980 records and eight char-acteristics was used. The length, recency, frequency, and monetary (LRFM) feature selection approach was used in the study process to select features for further segmentation using the K-Means data mining method to define consumer types. Following the completion of the research, clients were divided into four categories: Premium Loyalty, Inertia Loyalty, Latent Loyalty, and No Loyalty. The correct clustering results are displayed in the vali-dation test using the Silhouette Score Index technique, which yielded a score value of 0.943898. Based on the outcomes of this segmentation, business actors may prioritize providing clients with the proper service.

APA, Harvard, Vancouver, ISO, and other styles

50

Irianto, Maulana Rafael, Achmad Maududie, and Fajrin Nurman Arifin. "Implementation of K-Means Clustering Method for Trend Analysis of Thesis Topics (Case Study: Faculty of Computer Science, University of Jember)." BERKALA SAINSTEK 10, no. 4 (2022): 210. http://dx.doi.org/10.19184/bst.v10i4.29524.

Full text

Abstract:

The development of information technology causes a large number of digital documents, especially thesis documents, so that it can create opportunities for students to take the same and not varied topics. Thesis documents can be grouped by topic by identifying the abstract section. The results of the grouping can be seen with the trend with data visualization so that it can be analyzed to find out the trend of each topic. Retrieval of data in the repository of the University of Jember through a web scraping process as many as 490 thesis documents for students of the Faculty of Computer Science, University of Jember. The preprocessing stage is carried out by text mining methods which include cleaning, filtering, stemming, and tokenizing. Then calculate the weight of each word with the Term Frequency - Inverse Document Frequency algorithm, followed by the dimension reduction process using the Principal Component Analysis algorithm, which is normalized by Z-Score first. The outliers removal process is carried out before classifying documents. Furthermore, document grouping uses the K-Means Clustering method with Cosine Similarity as the distance calculation and the Silhouette Coefficient algorithm as a test. The test results were carried out with various k values and the optimal value was obtained at k = 2 with a Silhouette value of 0.80. Then the topic detection uses the Latent Dirichlet Allocation algorithm for each cluster that has been formed. Each cluster is visualized with a line chart and Trend Linear algorithm and analyzed to find out the trend. From the results of the analysis, it can be concluded that the topic of Decision Support System Development is trending down, and the topic of IT Performance Measurement and Forecasting is trending up. It can be concluded that the topic of Decision Support System Development needs to be reduced so that other topics can emerge.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!