Log in

Relevant bibliographies by topics / Synthetic minority oversampling technique / Journal articles

To see the other types of publications on this topic, follow the link: Synthetic minority oversampling technique.

Journal articles on the topic 'Synthetic minority oversampling technique'

Author: Grafiati

Published: 2 June 2025

Last updated: 19 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Synthetic minority oversampling technique.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Hooda, Sakshi, and Suman Mann. "Distributed Synthetic Minority Oversampling Technique." International Journal of Computational Intelligence Systems 12, no. 2 (2019): 929. http://dx.doi.org/10.2991/ijcis.d.190719.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Suci, Wulan, and Samsudin Samsudin. "Algoritma K-Nearest Neighbors dan Synthetic Minority Oversampling Technique dalam Prediksi Pemesanan Tiket Pesawat." JURNAL MEDIA INFORMATIKA BUDIDARMA 6, no. 3 (2022): 1775. http://dx.doi.org/10.30865/mib.v6i3.4374.

Full text

Abstract:

This study applies the Synthetic Minority Oversampling Technique to improve the performance of the K-Nearest Neighbors method in predicting the unbalanced data class. Most classification algorithms implicitly assume that the processed data has a balanced distribution, so that the standard classifier is more inclined towards data with a dominant class number (majority class). The use of Synthetic Minority Oversampling Technique can improve the performance of the K-Nearest Neighbors method for flight ticket booking data. Although in terms of accuracy, Synthetic Minority Oversampling Technique wi

APA, Harvard, Vancouver, ISO, and other styles

3

Rahardian, Hanif, Mohammad Reza Faisal, Friska Abadi, Radityo Adi Nugroho, and Rudy Herteno. "IMPLEMENTATION OF DATA LEVEL APPROACH TECHNIQUES TO SOLVE UNBALANCED DATA CASE ON SOFTWARE DEFECT CLASSIFICATION." Journal of Data Science and Software Engineering 1, no. 01 (2020): 53–62. http://dx.doi.org/10.20527/jdsse.v1i01.13.

Full text

Abstract:

Defects can cause significant software rework, delays, and high costs, to prevent disability it must be predictable the possibility of defects. To predict the disability the metrics software dataset is used. NASA MDP is one of the popular software metrics used to predict software defects by having 13 datasets and is generally unbalanced. The reward in the dataset can reduce the prediction of software defects because more unbalanced data produces a majority class. Data imbalance can be handled with 2 approaches, namely the data level approach technique and the algorithm level approach technique

APA, Harvard, Vancouver, ISO, and other styles

4

Gnip, Peter, Liberios Vokorokos, and Peter Drotár. "Selective oversampling approach for strongly imbalanced data." PeerJ Computer Science 7 (June 18, 2021): e604. http://dx.doi.org/10.7717/peerj-cs.604.

Full text

Abstract:

Challenges posed by imbalanced data are encountered in many real-world applications. One of the possible approaches to improve the classifier performance on imbalanced data is oversampling. In this paper, we propose the new selective oversampling approach (SOA) that first isolates the most representative samples from minority classes by using an outlier detection technique and then utilizes these samples for synthetic oversampling. We show that the proposed approach improves the performance of two state-of-the-art oversampling methods, namely, the synthetic minority oversampling technique and

APA, Harvard, Vancouver, ISO, and other styles

5

Erlin, Erlin, Yenny Desnelita, Nurliana Nasution, Laili Suryati, and Fransiskus Zoromi. "Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang." MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer 21, no. 3 (2022): 677–90. http://dx.doi.org/10.30812/matrik.v21i3.1726.

Full text

Abstract:

Dalam aplikasi machine learning sangat umum ditemukan kumpulan data dalam berbagai tingkat ketidakseimbangan mulai dari ketidakseimbangan kecil, sedang sampai ekstrim. Sebagian besar model machine learning yang dilatih pada data tidak seimbang akan memiliki bias dengan memberikan tingkat akurasi yang tinggi pada kelas mayoritas dan sebaliknya rendah pada kelas minoritas. Tujuan penelitian ini adalah untuk mengevaluasi dampak dari SMOTE (Synthetic Minority Oversampling Technique) pada pengklasifikasi Random Forest untuk memprediksi penyakit jantung. Data berjumlah 299 berasal dari UCI Machine l

APA, Harvard, Vancouver, ISO, and other styles

6

Vijayvargiya, Ankit, Aparna Sinha, Naveen Gehlot, Ashutosh Jena, Rajesh Kumar, and Kieran Moran. "S-WD-EEMD: A hybrid framework for imbalanced sEMG signal analysis in diagnosis of human knee abnormality." PLOS ONE 19, no. 5 (2024): e0301263. http://dx.doi.org/10.1371/journal.pone.0301263.

Full text

Abstract:

The diagnosis of human knee abnormalities using the surface electromyography (sEMG) signal obtained from lower limb muscles with machine learning is a major problem due to the noisy nature of the sEMG signal and the imbalance in data corresponding to healthy and knee abnormal subjects. To address this challenge, a combination of wavelet decomposition (WD) with ensemble empirical mode decomposition (EEMD) and the Synthetic Minority Oversampling Technique (S-WD-EEMD) is proposed. In this study, a hybrid WD-EEMD is considered for the minimization of noises produced in the sEMG signal during the c

APA, Harvard, Vancouver, ISO, and other styles

7

Viana, Diogo, Maria Teixeira, José Baptista, and Tiago Pinto. "Synthetic minority oversampling technique for synthetic meteorological data generation*." IET Conference Proceedings 2024, no. 29 (2025): 798–802. https://doi.org/10.1049/icp.2024.4759.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Ai, Xusheng, Jian Wu, Victor S. Sheng, Pengpeng Zhao, and Zhiming Cui. "Immune Centroids Oversampling Method for Binary Classification." Computational Intelligence and Neuroscience 2015 (2015): 1–11. http://dx.doi.org/10.1155/2015/109806.

Full text

Abstract:

To improve the classification performance of imbalanced learning, a novel oversampling method, immune centroids oversampling technique (ICOTE) based on an immune network, is proposed. ICOTE generates a set of immune centroids to broaden the decision regions of the minority class space. The representative immune centroids are regarded as synthetic examples in order to resolve the imbalance problem. We utilize an artificial immune network to generate synthetic examples on clusters with high data densities, which can address the problem of synthetic minority oversampling technique (SMOTE), which

APA, Harvard, Vancouver, ISO, and other styles

9

Hanifatul Azizah, Bagus Setya Rintyarna, and Triawan Adi Cahyanto. "Sentimen Analisis Untuk Mengukur Kepercayaan Masyarakat Terhadap Pengadaan Vaksin Covid-19 Berbasis Bernoulli Naive Bayes." BIOS : Jurnal Teknologi Informasi dan Rekayasa Komputer 3, no. 1 (2022): 23–29. http://dx.doi.org/10.37148/bios.v3i1.36.

Full text

Abstract:

Penelitian ini berisi tentang analisis sentimen masyarakat Indonesia pada Twitter terhadap kebijakan pemerintah dalam menangani kasus pandemi covid-19. Penelitian ini menggunakan metode Bernoulli Naive Bayes dalam melakukan pemodelan dan pengujian klasifikasi terhadap data sentimen. Digunakan juga metode pengukuran performa akurasi, presisi dan recall untuk mengukur performa metode Bernoulli Naive Bayes. Pada pembagian dan skenario pengujian digunakan teknik K Fold Cross Validation dengan nilai k = 2, 4, 5, 8 dan 10. ketidakseimbangan data dalam penelitian ini diselesaikan dengan menggunakan t

APA, Harvard, Vancouver, ISO, and other styles

10

Zhu, Tuanfei, Yaping Lin, and Yonghe Liu. "Synthetic minority oversampling technique for multiclass imbalance problems." Pattern Recognition 72 (December 2017): 327–40. http://dx.doi.org/10.1016/j.patcog.2017.07.024.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Zhang, Yining. "Machine learning with oversampling for space debris classification based on radar cross section." Applied and Computational Engineering 49, no. 1 (2024): 102–8. http://dx.doi.org/10.54254/2755-2721/49/20241070.

Full text

Abstract:

Over the past few years, the likelihood of collision of space objects increases as the quantity of space debris rises. Space debris classification and identification becomes more crucial to space assets security and space situation awareness. Radar cross section (RCS), one of the essential arguments for tracking space debris, was measured by European Incoherent Scatter Scientific Association (EISCAT) and other radar systems. This study investigates the effectiveness of seven machine learning methods employed to address the classification of space objects based on RCS data from European Space A

APA, Harvard, Vancouver, ISO, and other styles

12

Saputra, Pramana Yoga, Moch Zawaruddin Abdullah, and Annisa Puspa Kirana. "Improvisasi Teknik Oversampling MWMOTE Untuk Penanganan Data Tidak Seimbang." JURNAL MEDIA INFORMATIKA BUDIDARMA 5, no. 2 (2021): 398. http://dx.doi.org/10.30865/mib.v5i2.2811.

Full text

Abstract:

Imbalance data is a condition which there is a distinction in the quantity of data that results withinside the majority class (classes with very many members) and minority class (classes with very few members). It can complicate the classification process since the machine learning algorithm method is designed to classify already balanced data. The oversampling process technique is used to resolve data imbalance by applying synthetic data to the minority class in such a manner that it has the same volume of data as the majority class. MWMOTE is an oversampling technique that generates syntheti

APA, Harvard, Vancouver, ISO, and other styles

13

Mustaqim, Mustaqim, Budi Warsito, and Bayu Surarso. "COMBINATION OF SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) AND BACKPROPAGATION NEURAL NETWORK TO CONTRACEPTIVE IUD PREDICTION." MEDIA STATISTIKA 13, no. 1 (2020): 36–46. http://dx.doi.org/10.14710/medstat.13.1.36-46.

Full text

Abstract:

Data imbalance occurs when the amount of data in a class is more than other data. The majority class is more data, while the minority class is fewer. Imbalance class will decrease the performance of the classification algorithm. Data on IUD contraceptive use is imbalanced data. National IUD failure in 2018 was 959 or 3.5% from 27.400 users. Synthetic minority oversampling technique (SMOTE) is used to balance data on IUD failure. Balanced data is then predicted with neural networks. The system is for predicting someone when using IUD whether they have a pregnancy or not. This study uses 250 dat

APA, Harvard, Vancouver, ISO, and other styles

14

Santoso, Noviyanti, Wahyu Wibowo, and Hilda Hikmawati. "Integration of synthetic minority oversampling technique for imbalanced class." Indonesian Journal of Electrical Engineering and Computer Science 13, no. 1 (2019): 102. http://dx.doi.org/10.11591/ijeecs.v13.i1.pp102-108.

Full text

Abstract:

In the data mining, a class imbalance is a problematic issue to look for the solutions. It probably because machine learning is constructed by using algorithms with assuming the number of instances in each balanced class, so when using a class imbalance, it is possible that the prediction results are not appropriate. They are solutions offered to solve class imbalance issues, including oversampling, undersampling, and synthetic minority oversampling technique (SMOTE). Both oversampling and undersampling have its disadvantages, so SMOTE is an alternative to overcome it. By integrating SMOTE in

APA, Harvard, Vancouver, ISO, and other styles

15

Santoso, Noviyanti, Wahyu Wibowo, and Hilda Hikmawati. "Integration of synthetic minority oversampling technique for imbalanced class." Indonesian Journal of Electrical Engineering and Computer Science 13, no. 1 (2019): 102–8. https://doi.org/10.11591/ijeecs.v13.i1.pp102-108.

Full text

Abstract:

In the data mining, a class imbalance is a problematic issue to look for the solutions. It probably because machine learning is constructed by using algorithms with assuming the number of instances in each balanced class, so when using a class imbalance, it is possible that the prediction results are not appropriate. They are solutions offered to solve class imbalance issues, including oversampling, undersampling, and synthetic minority oversampling technique (SMOTE). Both oversampling and undersampling have its disadvantages, so SMOTE is an alternative to overcome it. By integrating SMOTE in

APA, Harvard, Vancouver, ISO, and other styles

16

Anju Fauziah and Julan Hernadi. "Klasifikasi Data Tak Seimbang Menggunakan Algoritma Random Forest dengan SMOTE dan SMOTE-ENN." Teknomatika: Jurnal Informatika dan Komputer 17, no. 2 (2025): 38–47. https://doi.org/10.30989/teknomatika.v17i2.1530.

Full text

Abstract:

Algoritma random forest merupakan salah satu metode klasifikasi mesin pembelajaran yang banyak digunakan karena memiliki keunggulan dalam mengurangi risiko overfitting sekaligus meningkatkan kinerja prediksi secara umum. Namun untuk data dengan kelas tidak seimbang, algoritma ini tidak mampu mencapai kinerja maksimal khususnya dalam memprediksi data pada kelas minoritas. Untuk itu artikel ini menawarkan dua metode resampling untuk menyeimbangkan data, yaitu Synthetic Minority Oversampling Technique (SMOTE) dan Synthetic Minority Oversampling Technique with Edited Nearest Neighbors (SMOTE-ENN).

APA, Harvard, Vancouver, ISO, and other styles

17

Jin, Dian, Dehong Xie, Di Liu, and Murong Gong. "Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification." Intelligent Data Analysis 27, no. 3 (2023): 635–52. http://dx.doi.org/10.3233/ida-226612.

Full text

Abstract:

Synthetic Minority Oversampling Technique (SMOTE) and some extensions based on it are popularly used to balance imbalanced data. In this study, we concentrate on solving overfitting of the classification model caused by choosing instances to oversample that increase the occurrence of overlaps with the majority class. Our method called Clustering-based Improved Adaptive Synthetic Minority Oversampling Technique (CI-ASMOTE1) decomposes minority instances into sub-clusters according to their connectivity in the feature space and then selects minority sub-clusters which are relatively close to the

APA, Harvard, Vancouver, ISO, and other styles

18

Sun, Maohua, Ruidi Yang, and Mengying Liu. "Privacy-Preserving Minority Oversampling Protocols with Fully Homomorphic Encryption." Security and Communication Networks 2022 (March 10, 2022): 1–9. http://dx.doi.org/10.1155/2022/3068199.

Full text

Abstract:

In recent years, blockchain and machine-learning techniques have received increasing attention both in theoretical and practical aspects. However, the applications of these techniques have many challenges, one of which is the privacy-preserving issue. In this paper, we focus on, specifically, the privacy-preserving issue of imbalanced datasets, a commonly found problem in real-world applications. Built based on the fully homomorphic encryption technique, this paper presents two new secure protocols, Privacy-Preserving Synthetic Minority Oversampling Protocol (PPSMOS) and Borderline Privacy-Pre

APA, Harvard, Vancouver, ISO, and other styles

19

HAJJAOUI, Btıssam. "Crucial Challenges In Corporate Credit Risk Assessment: A Case Study." Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi 7, no. 2 (2024): 834–54. http://dx.doi.org/10.47495/okufbed.1340798.

Full text

Abstract:

This article aims to assess corporate credit risk by predicting the variable that indicates whether the customer has defaulted or not. The dataset used for this purpose is obtained from one of the leading institutions in the finance sector in Türkiye. It consists of 401 variables generally referring to the applicant's data, corporate data, shareholder data, and the applicant's credit history within the creditor's institution. We reduce this large number of variables by identifying the input variables from the others and then studying those inputs to avoid using strongly correlated variables an

APA, Harvard, Vancouver, ISO, and other styles

20

Sumantiawan, Dody Indra, Jatmiko Endro Suseno, and Wahyul Amien Syafei. "Sentiment Analysis of Customer Reviews Using Support Vector Machine and Smote-Tomek Links For Identify Customer Satisfaction." J. Sistem Info. Bisnis 13, no. 1 (2023): 1–9. http://dx.doi.org/10.21456/vol13iss1pp1-9.

Full text

Abstract:

Shopping activities in the online market, especially fashion trends, continue to increase with all the promo efforts offered. One of the considerations for buying products on the online market is to read reviews. Each consumer review shows the level of interest in the product. The number of negative reviews and the emergence of many varied reviews pose a problem in categorizing reviews. Sentiment analysis is a way of looking at the polarity of reviews to classify positive and negative reviews. The Support Vector Machine method and the combination of the Synthetic Minority Oversampling Techniqu

APA, Harvard, Vancouver, ISO, and other styles

21

Jiang, Liangxiao, Chen Qiu, and Chaoqun Li. "A Novel Minority Cloning Technique for Cost-Sensitive Learning." International Journal of Pattern Recognition and Artificial Intelligence 29, no. 04 (2015): 1551004. http://dx.doi.org/10.1142/s0218001415510040.

Full text

Abstract:

In many real-world applications, it is often the case that the class distribution of instances is imbalanced and the costs of misclassification are different. Thus, the class-imbalanced cost-sensitive learning has attracted much attention from researchers. Sampling is one of the widely used techniques in dealing with the class-imbalance problem, which alters the class distribution of instances so that the minority class is well represented in the training data. In this paper, we propose a novel Minority Cloning Technique (MCT) for class-imbalanced cost-sensitive learning. MCT alters the class

APA, Harvard, Vancouver, ISO, and other styles

22

Zhang, Dong, Xiang Huang, Gen Li, Shengjie Kong, and Liang Dong. "MWMOTE-FRIS-INFFC: An Improved Majority Weighted Minority Oversampling Technique for Solving Noisy and Imbalanced Classification Datasets." Applied Sciences 15, no. 9 (2025): 4670. https://doi.org/10.3390/app15094670.

Full text

Abstract:

In view of the data of fault diagnosis and good product testing in the industrial field, high-noise unbalanced data samples exist widely, and such samples are very difficult to analyze in the field of data analysis. The oversampling technique has proved to be a simple solution to unbalanced data in the past, but it has no significant resistance to noise. In order to solve the binary classification problem of high-noise unbalanced data, an enhanced majority-weighted minority oversampling technique, MWMOTE-FRIS-INFFC, is introduced in this study, which is specially used for processing noise-unba

APA, Harvard, Vancouver, ISO, and other styles

23

Antonio, Roy, and Hironimus Leong. "PERFORMANCE OF SYNTHETIC MINORITY OVER-SAMPLING TECHNIQUE ON SUPPORT VECTOR MACHINE AND K-NEAREST NEIGHBOR FOR SENTIMENT ANALYSIS OF METAVERSE IN INDONESIA." Proxies : Jurnal Informatika 6, no. 2 (2024): 160–70. http://dx.doi.org/10.24167/proxies.v6i2.12459.

Full text

Abstract:

The metaverse is one of the most discussed things on social media, Twitter in Indonesia. This view can be both positive and negative in Indonesian society, hence the need for sentiment analysis. However, creating a sentiment classification model with unbalanced data will reduce performance. For this reason, Synthetic Minority Oversampling is needed in Support Vector Machine and K-Nearest Neighbor algorithms. The results of Synthetic Minority Oversampling can improve the accuracy of the Support Vector Machine and K-Nearest Neighbor algorithms.

APA, Harvard, Vancouver, ISO, and other styles

24

Cinar, Ahmet Cevahir. "A synergistic oversampling technique with differential evolution and safe level synthetic minority oversampling." Applied Soft Computing 172 (March 2025): 112819. https://doi.org/10.1016/j.asoc.2025.112819.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Mia, Rajib, Shapla Khanam, Amira Mahjabeen, et al. "Exploring Machine Learning for Predicting Cerebral Stroke: A Study in Discovery." Electronics 13, no. 4 (2024): 686. http://dx.doi.org/10.3390/electronics13040686.

Full text

Abstract:

Cerebral strokes, the abrupt cessation of blood flow to the brain, lead to a cascade of events, resulting in cellular damage due to oxygen and nutrient deprivation. Contemporary lifestyle factors, including high glucose levels, heart disease, obesity, and diabetes, heighten the risk of stroke. This research investigates the application of robust machine learning (ML) algorithms, including logistic regression (LR), random forest (RF), and K-nearest neighbor (KNN), to the prediction of cerebral strokes. Stroke data is collected from Harvard Dataverse Repository. The data includes—clinical, physi

APA, Harvard, Vancouver, ISO, and other styles

26

Raveendhran, Nareshkumar, and Nimala Krishnan. "A novel hybrid SMOTE oversampling approach for balancing class distribution on social media text." Bulletin of Electrical Engineering and Informatics 14, no. 1 (2025): 638–46. http://dx.doi.org/10.11591/eei.v14i1.8380.

Full text

Abstract:

Depression is a frequent and dangerous medical disorder that has an unhealthy effect on how a person feels, thinks, and acts. Depression is also quite prevalent. Early detection and treatment of depression may avoid painful and perhaps life-threatening symptoms. An imbalance in the data creates several challenges. Consequently, the majority learners will have biases against the class that constitutes the majority and, in extreme situations, may completely dismiss the class that constitutes the minority. For decades, class disparity research has employed traditional machine learning methods. In

APA, Harvard, Vancouver, ISO, and other styles

27

Xiong, Chuang, Runhan Zhao, Jingtao Xu, et al. "Construct and Validate a Predictive Model for Surgical Site Infection after Posterior Lumbar Interbody Fusion Based on Machine Learning Algorithm." Computational and Mathematical Methods in Medicine 2022 (August 23, 2022): 1–11. http://dx.doi.org/10.1155/2022/2697841.

Full text

Abstract:

Purpose. Surgical site infection is one of the serious complications after lumbar fusion. Early prediction and timely intervention can reduce the harm to patients. The aims of this study were to construct and validate a machine learning model for predicting surgical site infection after posterior lumbar interbody fusion, to screen out the most important risk factors for surgical site infection, and to explore whether synthetic minority oversampling technique could improve the model performance. Method. This study reviewed 584 patients who underwent posterior lumbar interbody fusion for degener

APA, Harvard, Vancouver, ISO, and other styles

28

Maradana Durga Venkata Prasad. "Multi-Entity Real-Time Fraud Detection System using Machine Learning: Improving Fraud Detection Efficiency using FROST-Enhanced Oversampling." Journal of Electrical Systems 20, no. 7s (2024): 1380–94. http://dx.doi.org/10.52783/jes.3710.

Full text

Abstract:

Fraudulent transactions pose a significant threat to financial institutions and e-commerce platforms. Machine learning models, trained on historical labeled data (fraudulent vs. legitimate transactions), are often employed to identify and prevent fraud. However, real-world datasets frequently exhibit class imbalance, where fraudulent transactions (minority class) are significantly outnumbered by legitimate transactions (majority class). Machine learning models may perform poorly as a result of this imbalance, underestimating fraud and favouring the majority class. This paper proposes a novel a

APA, Harvard, Vancouver, ISO, and other styles

29

Alharbi, Fayez, Lahcen Ouarbya, and Jamie A. Ward. "Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition." Sensors 22, no. 4 (2022): 1373. http://dx.doi.org/10.3390/s22041373.

Full text

Abstract:

Human activity recognition (HAR) using wearable sensors is an increasingly active research topic in machine learning, aided in part by the ready availability of detailed motion capture data from smartphones, fitness trackers, and smartwatches. The goal of HAR is to use such devices to assist users in their daily lives in application areas such as healthcare, physical therapy, and fitness. One of the main challenges for HAR, particularly when using supervised learning methods, is obtaining balanced data for algorithm optimisation and testing. As people perform some activities more than others (

APA, Harvard, Vancouver, ISO, and other styles

30

Ayesha Shakith and L. Arockiam. "EMSMOTE: Ensemble multiclass synthetic minority oversampling technique to improve accuracy of multilingual sentiment analysis on imbalance data." Scientific Temper 15, no. 04 (2024): 3099–104. https://doi.org/10.58414/scientifictemper.2024.15.4.17.

Full text

Abstract:

Natural language processing (NLP) tasks, such as multilingual sentiment analysis, are inherently challenging, especially when dealing with unbalanced data. A dataset is considered imbalanced when one class significantly dominates the others, creating an unbalanced distribution. In many domains, the minority class holds crucial information, presenting unique challenges. This research addresses these challenges using an ensemble-based oversampling technique, EMSMOTE (Ensemble Multiclass Synthetic Minority Oversampling Technique). By leveraging SMOTE, EMSMOTE generates multiple synthetic datasets

APA, Harvard, Vancouver, ISO, and other styles

31

A, Krishnapriya, and al. et. "Machine Learning For Medicare Fraud Detection: Tackling Class Imbalance With SMOTE-ENN." International Journal of Computational Learning & Intelligence 4, no. 4 (2025): 716–24. https://doi.org/10.5281/zenodo.15251088.

Full text

Abstract:

The realm of healthcare fraud detection is continually changing and encounters substantial obstacles, especially when dealing with data imbalance problems. Earlier research primarily concentrated on standard machine learning (ML) methods, which often have difficulty with imbalanced data. This issue manifests in several ways. It involves the danger of overfitting with Random Oversampling (ROS), the creation of noise by the Synthetic Minority Oversampling Technique (SMOTE), and the possible loss of vital information with Random Undersampling (RUS). Furthermore, enhancing model performance, exami

APA, Harvard, Vancouver, ISO, and other styles

32

Djafar, Nur Mutmainnah, and Achmad Fauzan. "Implementation of K-Nearest Neighbor using the oversampling technique on mixed data for the classification of household welfare status." Statistics in Transition new series 25, no. 1 (2024): 109–24. http://dx.doi.org/10.59170/stattrans-2024-007.

Full text

Abstract:

Welfare is closely related to poverty and the socio-economic disparities in a society. Based on data from the Central Bureau of Statistics, Kulon Progo in Indonesia had the highest poverty rate in the province of the Special Region of Yogyakarta; an increasing trend was observed every year from 2019 to 2021; Kulon Progo also had a low poverty line (after Gunung Kidul) compared to other regencies/cities in this province. This study aimed to classify the household welfare status in Kulon Progo in March 2021 using the K-Nearest Neighbor (KNN) method. Since imbalance was found between the poor and

APA, Harvard, Vancouver, ISO, and other styles

33

Purnawan, I. Ketut Adi, Adhi Dharma Wibawa, Arik Kurniawati, and Mauridhi Hery Purnomo. "Optimizing Diabetic Neuropathy Severity Classification Using Electromyography Signals Through Synthetic Oversampling Techniques." Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI) 13, no. 3 (2024): 681–90. https://doi.org/10.23887/janapati.v13i3.85675.

Full text

Abstract:

Electromyography signals are electrical signals generated by muscle activity and are very useful for analyzing the health conditions of muscles and nerves. Data imbalance is a prevalent issue in EMG signal data, especially when addressing patients with varied health conditions and restricted data availability. A major difficulty for machine learning models is class imbalance in datasets, which frequently leads to biased predictions favoring the dominant class and neglecting the minority classes. The data augmentation method employs the Synthetic Minority Over Sampling Technique (SMOTE) and Ran

APA, Harvard, Vancouver, ISO, and other styles

34

Muhammad, Rizky Pribadi, Dwi Purnomo Hindriyanto, and Hendry Hendry. "A three-step combination strategy for addressing outliers and class imbalance in software defect prediction." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 3 (2024): 2987–98. https://doi.org/10.11591/ijai.v13.i3.pp2987-2998.

Full text

Abstract:

Software defect prediction often involves datasets with imbalanced distributions where one or more classes are underrepresented, referred to as the minority class, while other classes are overrepresented, known as the majority class. This imbalance can hinder accurate predictions of the minority class, leading to misclassification. While the synthetic minority oversampling technique (SMOTE) is a widely used approach to address imbalanced learning data, it can inadvertently generate synthetic minority samples that resemble the majority class and are considered outliers. This study aims to enhan

APA, Harvard, Vancouver, ISO, and other styles

35

Nguyen, Teo, Kerrie Mengersen, Damien Sous, and Benoit Liquet. "SMOTE-CD: SMOTE for compositional data." PLOS ONE 18, no. 6 (2023): e0287705. http://dx.doi.org/10.1371/journal.pone.0287705.

Full text

Abstract:

Compositional data are a special kind of data, represented as a proportion carrying relative information. Although this type of data is widely spread, no solution exists to deal with the cases where the classes are not well balanced. After describing compositional data imbalance, this paper proposes an adaptation of the original Synthetic Minority Oversampling TEchnique (SMOTE) to deal with compositional data imbalance. The new approach, called SMOTE for Compositional Data (SMOTE-CD), generates synthetic examples by computing a linear combination of selected existing data points, using composi

APA, Harvard, Vancouver, ISO, and other styles

36

Chin, F. Y., C. A. Lim, and K. H. Lem. "Handling leukaemia imbalanced data using synthetic minority oversampling technique (SMOTE)." Journal of Physics: Conference Series 1988, no. 1 (2021): 012042. http://dx.doi.org/10.1088/1742-6596/1988/1/012042.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Chamorro-Atalaya, Omar, Florcita Aldana-Trejo, Nestor Alvarado-Bravo, et al. "Student Satisfaction Classification Algorithm Using the Minority Synthetic Oversampling Technique." International Journal of Information and Education Technology 13, no. 7 (2023): 1094–100. http://dx.doi.org/10.18178/ijiet.2023.13.7.1909.

Full text

Abstract:

This study is based on the university students’ opinions on the social network Twitter, to learn the teaching performance in the context of virtual learning using sentiment analysis technique. However, to establishing the classification algorithm, an imbalance was evidenced in the amounts of opinions that qualify the teaching performance with the satisfied and dissatisfied class. Therefore, the objective of this investigation is to determine the improvement in the performance of the student satisfaction classification algorithm, based on the class balancing method from the application of the m

APA, Harvard, Vancouver, ISO, and other styles

38

Alkhawaldeh, Ibraheem M., Ibrahem Albalkhi, and Abdulqadir Jeprel Naswhan. "Challenges and limitations of synthetic minority oversampling techniques in machine learning." World Journal of Methodology 13, no. 5 (2023): 373–78. http://dx.doi.org/10.5662/wjm.v13.i5.373.

Full text

Abstract:

Oversampling is the most utilized approach to deal with class-imbalanced datasets, as seen by the plethora of oversampling methods developed in the last two decades. We argue in the following editorial the issues with oversampling that stem from the possibility of overfitting and the generation of synthetic cases that might not accurately represent the minority class. These limitations should be considered when using oversampling techniques. We also propose several alternate strategies for dealing with imbalanced data, as well as a future work perspective.

APA, Harvard, Vancouver, ISO, and other styles

39

Putra, Muhammad Akmal A., Suwarno, and Rahman Azis Prasojo. "Improving Transformer Health Index Prediction Performance Using Machine Learning Algorithms with a Synthetic Minority Oversampling Technique." Energies 18, no. 9 (2025): 2364. https://doi.org/10.3390/en18092364.

Full text

Abstract:

Machine learning (ML) has emerged as a powerful tool in transformer condition assessment, enabling more accurate diagnostics by leveraging historical test data. However, imbalanced datasets, often characterized by limited samples in poor transformer conditions, pose significant challenges to model performance. This study investigates the application of oversampling techniques to enhance ML model accuracy in predicting the Health Index of transformers. A dataset comprising 3850 transformer tests collected from utilities across Indonesia was used. Key parameters, including oil quality, dissolved

APA, Harvard, Vancouver, ISO, and other styles

40

Chang, Young-Soo, Hee-Sung Park, and Il-Joon Moon. "Predicting the Cochlear Dead Regions Using a Machine Learning-Based Approach with Oversampling Techniques." Medicina 57, no. 11 (2021): 1192. http://dx.doi.org/10.3390/medicina57111192.

Full text

Abstract:

Background and Objectives: Determining the presence or absence of cochlear dead regions (DRs) is essential in clinical practice. This study proposes a machine learning (ML)-based model that applies oversampling techniques for predicting DRs in patients. Materials and Methods: We used recursive partitioning and regression for classification tree (CT) and logistic regression (LR) as prediction models. To overcome the imbalanced nature of the dataset, oversampling techniques to duplicate examples in the minority class or to synthesize new examples from existing examples in the minority class were

APA, Harvard, Vancouver, ISO, and other styles

41

Bansal, Ankita, Makul Saini, Rakshit Singh, and Jai Kumar Yadav. "Analysis of SMOTE." International Journal of Information Retrieval Research 11, no. 2 (2021): 15–37. http://dx.doi.org/10.4018/ijirr.2021040102.

Full text

Abstract:

The tremendous amount of data generated through IoT can be imbalanced causing class imbalance problem (CIP). CIP is one of the major issues in machine learning where most of the samples belong to one of the classes, thus producing biased classifiers. The authors in this paper are working on four imbalanced datasets belonging to diverse domains. The objective of this study is to deal with CIP using oversampling techniques. One of the commonly used oversampling approaches is synthetic minority oversampling technique (SMOTE). In this paper, the authors have suggested modifications in SMOTE and pr

APA, Harvard, Vancouver, ISO, and other styles

42

Anis, Maira, and Mohsin Ali. "Investigating the Performance of Smote for Class Imbalanced Learning: A Case Study of Credit Scoring Datasets." European Scientific Journal, ESJ 13, no. 33 (2017): 340. http://dx.doi.org/10.19044/esj.2017.v13n33p340.

Full text

Abstract:

Classification of datasets is one of the major issues encountered by the data mining community. This problem heightens when the real world datasets is also imbalanced in nature. A dataset happens to be imbalanced when the numbers of observations belonging to rare class are greatly outnumbered by the observations of another class. Class with greater number of observation is called the majority or the negative class, while the other with rare observations is referred to as the minority or the positive class. Literature represents number of resampling techniques that address the problem of class

APA, Harvard, Vancouver, ISO, and other styles

43

Liu, Ankang, Lingfei Cheng, and Changdong Yu. "SASMOTE: A Self-Attention Oversampling Method for Imbalanced CSI Fingerprints in Indoor Positioning Systems." Sensors 22, no. 15 (2022): 5677. http://dx.doi.org/10.3390/s22155677.

Full text

Abstract:

WiFi localization based on channel state information (CSI) fingerprints has become the mainstream method for indoor positioning due to the widespread deployment of WiFi networks, in which fingerprint database building is critical. However, issues, such as insufficient samples or missing data in the collection fingerprint database, result in unbalanced training data for the localization system during the construction of the CSI fingerprint database. To address the above issue, we propose a deep learning-based oversampling method, called Self-Attention Synthetic Minority Oversampling Technique (

APA, Harvard, Vancouver, ISO, and other styles

44

Wiharto, Wiharto, and Angga Exca Pradipta Syaifuddin. "Squeeze-excitation half U-Net and synthetic minority oversampling technique oversampling for papilledema image classification." IAES International Journal of Artificial Intelligence (IJ-AI) 14, no. 2 (2025): 1410. https://doi.org/10.11591/ijai.v14.i2.pp1410-1419.

Full text

Abstract:

The emergence of various convolutional neural networks (CNN) architectures indicates progress in the computer vision field. However, most of the architectures have large parameters, which tends to increase the computational cost of the training process. Additionaly, imbalanced data sources are often encountered, causing the model to overfit. The aim of this study is to evaluate a new method to classify retinal fundus images from imbalanced data into the corresponding classes by using fewer parameters than the previous method. To achieve this, squeeze-excitation half U-Net (SEHUNET) architectur

APA, Harvard, Vancouver, ISO, and other styles

45

Wiharto, Wiharto, and Exca Pradipta Syaifuddin Angga. "Squeeze-excitation half U-Net and synthetic minority oversampling technique oversampling for papilledema image classification." IAES International Journal of Artificial Intelligence (IJ-AI) 14, no. 2 (2025): 1410–19. https://doi.org/10.11591/ijai.v14.i2.pp1410-1419.

Full text

Abstract:

The emergence of various convolutional neural networks (CNN) architectures indicates progress in the computer vision field. However, most of the architectures have large parameters, which tends to increase the computational cost of the training process. Additionaly, imbalanced data sources are often encountered, causing the model to overfit. The aim of this study is to evaluate a new method to classify retinal fundus images from imbalanced data into the corresponding classes by using fewer parameters than the previous method. To achieve this, squeeze-excitation half U-Net (SEHUNET) architectur

APA, Harvard, Vancouver, ISO, and other styles

46

Hu, Libin, and Yunfeng Zhang. "GDSMOTE: A Novel Synthetic Oversampling Method for High-Dimensional Imbalanced Financial Data." Mathematics 12, no. 24 (2024): 4036. https://doi.org/10.3390/math12244036.

Full text

Abstract:

Synthetic oversampling methods for dealing with imbalanced classification problems have been widely studied. However, the current synthetic oversampling methods still cannot perform well when facing high-dimensional imbalanced financial data. The failure of distance measurement in high-dimensional space, error accumulation caused by noise samples, and the reduction of recognition accuracy of majority samples caused by the distribution of synthetic samples are the main reasons that limit the performance of current methods. Taking these factors into consideration, a novel synthetic oversampling

APA, Harvard, Vancouver, ISO, and other styles

47

Tekkali, Chandana Gouri, and Karthika Natarajan. "An advancement in AdaSyn for imbalanced learning: An application to fraud detection in digital transactions." Journal of Intelligent & Fuzzy Systems 46, no. 5-6 (2024): 11381–96. http://dx.doi.org/10.3233/jifs-236392.

Full text

Abstract:

Imbalanced Learning is a significant issue in machine learning, affecting the performance and accuracy of binary or multi-classification algorithms, especially in large-scale data handling and classification. There are some popular techniques to covert this imbalanced data into a balanced one such as undersampling, under-sampling with tomek links, randomized oversampling, synthetic minority oversampling technique (SMOTE), and adaptive synthetic generation (ADASYN). Generally, the ADASYN algorithm could be used to propagate minority sample points to rise the imbalanced ratio between majority an

APA, Harvard, Vancouver, ISO, and other styles

48

Gao, Kehan, Taghi M. Khoshgoftaar, and Amri Napolitano. "An Empirical Investigation of Combining Filter-Based Feature Subset Selection and Data Sampling for Software Defect Prediction." International Journal of Reliability, Quality and Safety Engineering 22, no. 06 (2015): 1550027. http://dx.doi.org/10.1142/s0218539315500278.

Full text

Abstract:

The main goal of software quality engineering is to produce a high-quality software product through the use of various techniques and processes. Classification models are effective tools for software quality prediction, helping practitioners to detect potentially problematic modules and eventually improve software product. However, two potential problems, high dimensionality and class imbalance, may affect the classifiers performance. In this study, we propose a data pre-processing approach, in which feature selection is combined with data sampling, to overcome these problems. We investigate t

APA, Harvard, Vancouver, ISO, and other styles

49

Li, Der-Chiang, Ssu-Yang Wang, Kuan-Cheng Huang, and Tung-I. Tsai. "Learning class-imbalanced data with region-impurity synthetic minority oversampling technique." Information Sciences 607 (August 2022): 1391–407. http://dx.doi.org/10.1016/j.ins.2022.06.067.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Li, Yihong, Yunpeng Wang, Tao Li, Beibei Li, and Xiaolong Lan. "SP-SMOTE: A novel space partitioning based synthetic minority oversampling technique." Knowledge-Based Systems 228 (September 2021): 107269. http://dx.doi.org/10.1016/j.knosys.2021.107269.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!