To see the other types of publications on this topic, follow the link: SMOTE technique.

Journal articles on the topic 'SMOTE technique'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'SMOTE technique.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. "SMOTE: Synthetic Minority Over-sampling Technique." Journal of Artificial Intelligence Research 16 (June 1, 2002): 321–57. http://dx.doi.org/10.1613/jair.953.

Full text
Abstract:
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ``normal'' examples with only a small percentage of ``abnormal'' or ``interesting'' examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the se
APA, Harvard, Vancouver, ISO, and other styles
2

Bansal, Ankita, Makul Saini, Rakshit Singh, and Jai Kumar Yadav. "Analysis of SMOTE." International Journal of Information Retrieval Research 11, no. 2 (2021): 15–37. http://dx.doi.org/10.4018/ijirr.2021040102.

Full text
Abstract:
The tremendous amount of data generated through IoT can be imbalanced causing class imbalance problem (CIP). CIP is one of the major issues in machine learning where most of the samples belong to one of the classes, thus producing biased classifiers. The authors in this paper are working on four imbalanced datasets belonging to diverse domains. The objective of this study is to deal with CIP using oversampling techniques. One of the commonly used oversampling approaches is synthetic minority oversampling technique (SMOTE). In this paper, the authors have suggested modifications in SMOTE and pr
APA, Harvard, Vancouver, ISO, and other styles
3

Santoso, Noviyanti, Wahyu Wibowo, and Hilda Hikmawati. "Integration of synthetic minority oversampling technique for imbalanced class." Indonesian Journal of Electrical Engineering and Computer Science 13, no. 1 (2019): 102. http://dx.doi.org/10.11591/ijeecs.v13.i1.pp102-108.

Full text
Abstract:
In the data mining, a class imbalance is a problematic issue to look for the solutions. It probably because machine learning is constructed by using algorithms with assuming the number of instances in each balanced class, so when using a class imbalance, it is possible that the prediction results are not appropriate. They are solutions offered to solve class imbalance issues, including oversampling, undersampling, and synthetic minority oversampling technique (SMOTE). Both oversampling and undersampling have its disadvantages, so SMOTE is an alternative to overcome it. By integrating SMOTE in
APA, Harvard, Vancouver, ISO, and other styles
4

Shoohi, Liqaa M., and Jamila H. Saud. "Adaptation Proposed Methods for Handling Imbalanced Datasets based on Over-Sampling Technique." Al-Mustansiriyah Journal of Science 31, no. 2 (2020): 25. http://dx.doi.org/10.23851/mjs.v31i2.740.

Full text
Abstract:
Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it
APA, Harvard, Vancouver, ISO, and other styles
5

Rachburee, Nachirat, and Wattana Punlumjeak. "Oversampling technique in student performance classification from engineering course." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 4 (2021): 3567. http://dx.doi.org/10.11591/ijece.v11i4.pp3567-3574.

Full text
Abstract:
<span>The first year of an engineering student was important to take proper academic planning. All subjects in the first year were essential for an engineering basis. Student performance prediction helped academics improve their performance better. Students checked performance by themselves. If they were aware that their performance are low, then they could make some improvement for their better performance. This research focused on combining the oversampling minority class data with various kinds of classifier models. Oversampling techniques were SMOTE, Borderline-SMOTE, SVMSMOTE, and A
APA, Harvard, Vancouver, ISO, and other styles
6

Kasanah, Anis Nikmatul, Muladi Muladi, and Utomo Pujianto. "Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 3, no. 2 (2019): 196–201. http://dx.doi.org/10.29207/resti.v3i2.945.

Full text
Abstract:
Amount of information in the form of online news needs to be balanced with the ability of readers to sort or classify subjective or objective news. So that a special system is needed that can be used for online news objectivity classification so that it can help readers to pick up subjective or objective news. This research proposes the development of techniques in machine learning to help sort out news objectivity automatically based on the content of the news. The algorithm proposed is K-Nearest Neighbor (KNN) algorithm. News samples obtained from kompas.com by scrapping occur imbalance clas
APA, Harvard, Vancouver, ISO, and other styles
7

Rekha, Gillala, and V. Krishna Reddy. "A Novel Approach for Handling Outliers in Imbalanced Data." International Journal of Engineering & Technology 7, no. 3.1 (2018): 1. http://dx.doi.org/10.14419/ijet.v7i3.1.16783.

Full text
Abstract:
Most of the traditional classification algorithms assume their training data to be well-balanced in terms of class distribution. Real-world datasets, however, are imbalanced in nature thus degrade the performance of the traditional classifiers. To solve this problem, many strategies are adopted to balance the class distribution at the data level. The data level methods balance the imbalance distribution between majority and minority classes using either oversampling or under sampling techniques. The main concern of this paper is to remove the outliers that may generate while using oversampling
APA, Harvard, Vancouver, ISO, and other styles
8

Lee, Taejun, Minju Kim, and Sung-Phil Kim. "Improvement of P300-Based Brain–Computer Interfaces for Home Appliances Control by Data Balancing Techniques." Sensors 20, no. 19 (2020): 5576. http://dx.doi.org/10.3390/s20195576.

Full text
Abstract:
The oddball paradigm used in P300-based brain–computer interfaces (BCIs) intrinsically poses the issue of data imbalance between target stimuli and nontarget stimuli. Data imbalance can cause overfitting problems and, consequently, poor classification performance. The purpose of this study is to improve BCI performance by solving this data imbalance problem with sampling techniques. The sampling techniques were applied to BCI data in 15 subjects controlling a door lock, 15 subjects an electric light, and 14 subjects a Bluetooth speaker. We explored two categories of sampling techniques: oversa
APA, Harvard, Vancouver, ISO, and other styles
9

Kurniawati, Yulia Ery. "Class Imbalanced Learning Menggunakan Algoritma Synthetic Minority Over-sampling Technique – Nominal (SMOTE-N) pada Dataset Tuberculosis Anak." Jurnal Buana Informatika 10, no. 2 (2019): 134. http://dx.doi.org/10.24002/jbi.v10i2.2441.

Full text
Abstract:
Class Imbalance Learning (CIL) merupakan proses pembelajaran untuk representasi data dan ekstraksi informasi dengan distribusi data yang buruk untuk mendukung pembuatan keputusan yang efektif dalam proses pengambilan keputusan. SMOTE-N adalah salah satu pendekatan data-level dalam CIL mengunakan metode over-sampling. SMOTE-N menghasilkan instance sintesis untuk menyeimbangkan jumlah instance pada kelas minoritasnya. Penelitian ini mengaplikasikan SMOTE-N pada dataset Tuberculosis Anak (TB Anak) yang memiliki ketidakseimbangan kelas. Metode over-sampling dipilih untuk menghindari kehilangan inf
APA, Harvard, Vancouver, ISO, and other styles
10

de Carvalho, Alexandre M., and Ronaldo C. Prati. "DTO-SMOTE: Delaunay Tessellation Oversampling for Imbalanced Data Sets." Information 11, no. 12 (2020): 557. http://dx.doi.org/10.3390/info11120557.

Full text
Abstract:
One of the significant challenges in machine learning is the classification of imbalanced data. In many situations, standard classifiers cannot learn how to distinguish minority class examples from the others. Since many real problems are unbalanced, this problem has become very relevant and deeply studied today. This paper presents a new preprocessing method based on Delaunay tessellation and the preprocessing algorithm SMOTE (Synthetic Minority Over-sampling Technique), which we call DTO-SMOTE (Delaunay Tessellation Oversampling SMOTE). DTO-SMOTE constructs a mesh of simplices (in this paper
APA, Harvard, Vancouver, ISO, and other styles
11

Akbar, Shahid, Maqsood Hayat, Muhammad Kabir, and Muhammad Iqbal. "iAFP-gap-SMOTE: An Efficient Feature Extraction Scheme Gapped Dipeptide Composition is Coupled with an Oversampling Technique for Identification of Antifreeze Proteins." Letters in Organic Chemistry 16, no. 4 (2019): 294–302. http://dx.doi.org/10.2174/1570178615666180816101653.

Full text
Abstract:
Antifreeze proteins (AFPs) perform distinguishable roles in maintaining homeostatic conditions of living organisms and protect their cell and body from freezing in extremely cold conditions. Owing to high diversity in protein sequences and structures, the discrimination of AFPs from non- AFPs through experimental approaches is expensive and lengthy. It is, therefore, vastly desirable to propose a computational intelligent and high throughput model that truly reflects AFPs quickly and accurately. In a sequel, a new predictor called “iAFP-gap-SMOTE” is proposed for the identification of AFPs. Pr
APA, Harvard, Vancouver, ISO, and other styles
12

Wibowo, Prasetyo, and Chastine Fatichah. "An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset." Register: Jurnal Ilmiah Teknologi Sistem Informasi 7, no. 1 (2021): 63. http://dx.doi.org/10.26594/register.v7i1.2206.

Full text
Abstract:
Class imbalance occurs when the distribution of classes between the majority and the minority classes is not the same. The data on imbalanced classes may vary from mild to severe. The effect of high-class imbalance may affect the overall classification accuracy since the model is most likely to predict most of the data that fall within the majority class. Such a model will give biased results, and the performance predictions for the minority class often have no impact on the model. The use of the oversampling technique is one way to deal with high-class imbalance, but only a few are used to so
APA, Harvard, Vancouver, ISO, and other styles
13

Fernandez, Alberto, Salvador Garcia, Francisco Herrera, and Nitesh V. Chawla. "SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary." Journal of Artificial Intelligence Research 61 (April 20, 2018): 863–905. http://dx.doi.org/10.1613/jair.1.11192.

Full text
Abstract:
The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to different type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several different domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also significantly contributed to new supervised learning paradigms, including multilabel cla
APA, Harvard, Vancouver, ISO, and other styles
14

Seo, Jae-Hyun, and Yong-Hyuk Kim. "Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection." Computational Intelligence and Neuroscience 2018 (November 1, 2018): 1–11. http://dx.doi.org/10.1155/2018/9704672.

Full text
Abstract:
The KDD CUP 1999 intrusion detection dataset was introduced at the third international knowledge discovery and data mining tools competition, and it has been widely used for many studies. The attack types of KDD CUP 1999 dataset are divided into four categories: user to root (U2R), remote to local (R2L), denial of service (DoS), and Probe. We use five classes by adding the normal class. We define the U2R, R2L, and Probe classes, which are each less than 1% of the total dataset, as rare classes. In this study, we attempt to mitigate the class imbalance of the dataset. Using the synthetic minori
APA, Harvard, Vancouver, ISO, and other styles
15

Mukherjee, Mimi, and Matloob Khushi. "SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features." Applied System Innovation 4, no. 1 (2021): 18. http://dx.doi.org/10.3390/asi4010018.

Full text
Abstract:
Real-world datasets are heavily skewed where some classes are significantly outnumbered by the other classes. In these situations, machine learning algorithms fail to achieve substantial efficacy while predicting these underrepresented instances. To solve this problem, many variations of synthetic minority oversampling methods (SMOTE) have been proposed to balance datasets which deal with continuous features. However, for datasets with both nominal and continuous features, SMOTE-NC is the only SMOTE-based oversampling technique to balance the data. In this paper, we present a novel minority ov
APA, Harvard, Vancouver, ISO, and other styles
16

Mustaqim, Mustaqim, Budi Warsito, and Bayu Surarso. "COMBINATION OF SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) AND BACKPROPAGATION NEURAL NETWORK TO CONTRACEPTIVE IUD PREDICTION." MEDIA STATISTIKA 13, no. 1 (2020): 36–46. http://dx.doi.org/10.14710/medstat.13.1.36-46.

Full text
Abstract:
Data imbalance occurs when the amount of data in a class is more than other data. The majority class is more data, while the minority class is fewer. Imbalance class will decrease the performance of the classification algorithm. Data on IUD contraceptive use is imbalanced data. National IUD failure in 2018 was 959 or 3.5% from 27.400 users. Synthetic minority oversampling technique (SMOTE) is used to balance data on IUD failure. Balanced data is then predicted with neural networks. The system is for predicting someone when using IUD whether they have a pregnancy or not. This study uses 250 dat
APA, Harvard, Vancouver, ISO, and other styles
17

Bejjanki, Kiran Kumar, Jayadev Gyani, and Narsimha Gugulothu. "Class Imbalance Reduction (CIR): A Novel Approach to Software Defect Prediction in the Presence of Class Imbalance." Symmetry 12, no. 3 (2020): 407. http://dx.doi.org/10.3390/sym12030407.

Full text
Abstract:
Software defect prediction (SDP) is the technique used to predict the occurrences of defects in the early stages of software development process. Early prediction of defects will reduce the overall cost of software and also increase its reliability. Most of the defect prediction methods proposed in the literature suffer from the class imbalance problem. In this paper, a novel class imbalance reduction (CIR) algorithm is proposed to create a symmetry between the defect and non-defect records in the imbalance datasets by considering distribution properties of the datasets and is compared with SM
APA, Harvard, Vancouver, ISO, and other styles
18

Davagdorj, Khishigsuren, Jong Seol Lee, Van Huy Pham, and Keun Ho Ryu. "A Comparative Analysis of Machine Learning Methods for Class Imbalance in a Smoking Cessation Intervention." Applied Sciences 10, no. 9 (2020): 3307. http://dx.doi.org/10.3390/app10093307.

Full text
Abstract:
Smoking is one of the major public health issues, which has a significant impact on premature death. In recent years, numerous decision support systems have been developed to deal with smoking cessation based on machine learning methods. However, the inevitable class imbalance is considered a major challenge in deploying such systems. In this paper, we study an empirical comparison of machine learning techniques to deal with the class imbalance problem in the prediction of smoking cessation intervention among the Korean population. For the class imbalance problem, the objective of this paper i
APA, Harvard, Vancouver, ISO, and other styles
19

Khamsan, Muhammad Muhaimin, and Ruhaila Maskat. "HANDLING HIGHLY IMBALANCED OUTPUT CLASS LABEL." MALAYSIAN JOURNAL OF COMPUTING 4, no. 2 (2019): 304. http://dx.doi.org/10.24191/mjoc.v4i2.7021.

Full text
Abstract:
In practice, a balanced target class is rare. However, an imbalanced target class can be handled by resampling the original dataset, either by oversampling/upsampling or undersampling/downsampling. A popular upsampling technique is Synthetic Minority Over-sampling Technique (SMOTE). This technique increases the minority class by generating synthetic class labels and assigned the class based on the K-Nearest Neighbour (K-NN). SMOTE upsampling can only upsample at most one minority class at a time, which means for a multiclass dataset, it needs to undergo multilayer SMOTE to balance the class la
APA, Harvard, Vancouver, ISO, and other styles
20

Wang, Xin, Yue Yang, Mingsong Chen, et al. "AGNES-SMOTE: An Oversampling Algorithm Based on Hierarchical Clustering and Improved SMOTE." Scientific Programming 2020 (September 23, 2020): 1–9. http://dx.doi.org/10.1155/2020/8837357.

Full text
Abstract:
Aiming at low classification accuracy of imbalanced datasets, an oversampling algorithm—AGNES-SMOTE (Agglomerative Nesting-Synthetic Minority Oversampling Technique) based on hierarchical clustering and improved SMOTE—is proposed. Its key procedures include hierarchically cluster majority samples and minority samples, respectively; divide minority subclusters on the basis of the obtained majority subclusters; select “seed sample” based on the sampling weight and probability distribution of minority subcluster; and restrict the generation of new samples in a certain area by centroid method in t
APA, Harvard, Vancouver, ISO, and other styles
21

Umma, Fatiya Nur, Budi Warsito, and Di Asih I. Maruddani. "KLASIFIKASI STATUS KEMISKINAN RUMAH TANGGA DENGAN ALGORITMA C5.0 DI KABUPATEN PEMALANG." Jurnal Gaussian 10, no. 2 (2021): 221–29. http://dx.doi.org/10.14710/j.gauss.v10i2.29934.

Full text
Abstract:
Pemalang regency is a district which has amount of poverty around 16.04%. One of the effort that must be improved in tackling poverty is increasing the accuracy of the government program’s target. The improvement of target accuracy is expected to give the better impact on the welfare of the population. This study classified the poverty status of households in Pemalang regency using C5.0 Algorithm. The poverty status of households is divided into two classes, namely poor and non-poor. There was an imbalance of data in both classes. Data imbalances were handled by using Synthetic Minority Oversa
APA, Harvard, Vancouver, ISO, and other styles
22

Heranova, Omer. "Synthetic Minority Oversampling Technique pada Averaged One Dependence Estimators untuk Klasifikasi Credit Scoring." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 3, no. 3 (2019): 443–50. http://dx.doi.org/10.29207/resti.v3i3.1275.

Full text
Abstract:
Bank or financial institution is a business entity whose activities are collecting funds from the public in the form of deposits and channeling them to the public in the form of credit and or other forms. In credit financing problems often occur and one of the problems faced in credit assessment is imbalance class data sets or dataset class imbalances. This problem can be overcome by resampling method, namely by using Oversampling, undersampling and hybrids that combine the two sampling approaches. This research proposes the method of applying SMOTE or Synthetic Minority Oversampling Technique
APA, Harvard, Vancouver, ISO, and other styles
23

Zhao, Ziqi, Yonghong Xu, and Yong Zhao. "SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting." Genes 10, no. 12 (2019): 965. http://dx.doi.org/10.3390/genes10120965.

Full text
Abstract:
The prediction of protein–ligand binding sites is important in drug discovery and drug design. Protein–ligand binding site prediction computational methods are inexpensive and fast compared with experimental methods. This paper proposes a new computational method, SXGBsite, which includes the synthetic minority over-sampling technique (SMOTE) and the Extreme Gradient Boosting (XGBoost). SXGBsite uses the position-specific scoring matrix discrete cosine transform (PSSM-DCT) and predicted solvent accessibility (PSA) to extract features containing sequence information. A new balanced dataset was
APA, Harvard, Vancouver, ISO, and other styles
24

Thamrin, Sri Astuti, Dian Sidik, Hedi Kuswanto, Armin Lawi, and Ansariadi Ansariadi. "Exploration of Obesity Status of Indonesia Basic Health Research 2013 With Synthetic Minority Over-Sampling Techniques." Indonesian Journal of Statistics and Its Applications 5, no. 1 (2021): 75–91. http://dx.doi.org/10.29244/ijsa.v5i1p75-91.

Full text
Abstract:
The accuracy of the data class is very important in classification with a machine learning approach. The more accurate the existing data sets and classes, the better the output generated by machine learning. In fact, classification can experience imbalance class data in which each class does not have the same portion of the data set it has. The existence of data imbalance will affect the classification accuracy. One of the easiest ways to correct imbalanced data classes is to balance it. This study aims to explore the problem of data class imbalance in the medium case dataset and to address th
APA, Harvard, Vancouver, ISO, and other styles
25

Chin, F. Y., C. A. Lim, and K. H. Lem. "Handling leukaemia imbalanced data using synthetic minority oversampling technique (SMOTE)." Journal of Physics: Conference Series 1988, no. 1 (2021): 012042. http://dx.doi.org/10.1088/1742-6596/1988/1/012042.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Hidayati, Isti Samrotul, and I. Made Arcana. "PENERAPAN CHAID DENGAN PENDEKATAN SMOTE PADA KEMATIAN BALITA DI KAWASAN TIMUR INDONESIA TAHUN 2017." Seminar Nasional Official Statistics 2019, no. 1 (2020): 357–67. http://dx.doi.org/10.34123/semnasoffstat.v2019i1.97.

Full text
Abstract:
Metode Chi-squared Automatic Interaction Detection (CHAID) merupakan metode segmentasi berdasarkan hubungan variabel respon dan penjelas menggunakan uji chi-square, yang dalam penerapannya perlu memperhatikan keseimbangan data untuk meminimalkan kesalahan dalam klasifikasi. Salah satu pendekatan yang dapat digunakan pada data yang tidak seimbang adalah metode Synthetic Minority Over-sampling Technique (SMOTE). Dalam penelitian ini, metode CHAID dengan pendekatan SMOTE diterapkan pada Angka Kematian Balita (AKBa) di Kawasan Timur Indonesia (KTI). Tujuannya adalah untuk mengetahui variabel-varia
APA, Harvard, Vancouver, ISO, and other styles
27

GAO, KEHAN, TAGHI M. KHOSHGOFTAAR, and RANDALL WALD. "THE USE OF UNDER- AND OVERSAMPLING WITHIN ENSEMBLE FEATURE SELECTION AND CLASSIFICATION FOR SOFTWARE QUALITY PREDICTION." International Journal of Reliability, Quality and Safety Engineering 21, no. 01 (2014): 1450004. http://dx.doi.org/10.1142/s0218539314500041.

Full text
Abstract:
Software quality prediction models are useful tools for creating high quality software products. The general process is that practitioners use software metrics and defect data along with various data mining techniques to build classification models for identifying potentially faulty program modules, thereby enabling effective project resource allocation. The predictive accuracy of these classification models is often affected by the quality of input data. Two main problems which can affect the quality of input data are high dimensionality (too many independent attributes in a dataset) and clas
APA, Harvard, Vancouver, ISO, and other styles
28

Hemalatha, Putta, and Geetha Mary Amalanathan. "FG-SMOTE: Fuzzy-based Gaussian synthetic minority oversampling with deep belief networks classifier for skewed class distribution." International Journal of Intelligent Computing and Cybernetics 14, no. 2 (2021): 270–87. http://dx.doi.org/10.1108/ijicc-12-2020-0202.

Full text
Abstract:
PurposeAdequate resources for learning and training the data are an important constraint to develop an efficient classifier with outstanding performance. The data usually follows a biased distribution of classes that reflects an unequal distribution of classes within a dataset. This issue is known as the imbalance problem, which is one of the most common issues occurring in real-time applications. Learning of imbalanced datasets is a ubiquitous challenge in the field of data mining. Imbalanced data degrades the performance of the classifier by producing inaccurate results.Design/methodology/ap
APA, Harvard, Vancouver, ISO, and other styles
29

Gui, Chun. "Analysis of imbalanced data set problem: The case of churn prediction for telecommunication." Artificial Intelligence Research 6, no. 2 (2017): 93. http://dx.doi.org/10.5430/air.v6n2p93.

Full text
Abstract:
Class-imbalanced datasets are common in the field of mobile Internet industry. We tested three kinds of feature selection techniques-Random Forest (RF), Relative Weight (RW) and Standardized Regression Coefficients (SRC); three kinds of balance methods-over-sampling (OS), under-sampling (US) and synthetic minority over-sampling (SMOTE); a widely used classification method-RF. The combined models are composed of feature selection techniques, balancing techniques and classification method. The original dataset which has 45 thousand records and 22 features were used to evaluate the performances o
APA, Harvard, Vancouver, ISO, and other styles
30

Qu, Zhengwei, Hongwen Li, Yunjing Wang, Jiaxi Zhang, Ahmed Abu-Siada, and Yunxiao Yao. "Detection of Electricity Theft Behavior Based on Improved Synthetic Minority Oversampling Technique and Random Forest Classifier." Energies 13, no. 8 (2020): 2039. http://dx.doi.org/10.3390/en13082039.

Full text
Abstract:
Effective detection of electricity theft is essential to maintain power system reliability. With the development of smart grids, traditional electricity theft detection technologies have become ineffective to deal with the increasingly complex data on the users’ side. To improve the auditing efficiency of grid enterprises, a new electricity theft detection method based on improved synthetic minority oversampling technique (SMOTE) and improve random forest (RF) method is proposed in this paper. The data of normal and electricity theft users were classified as positive data (PD) and negative dat
APA, Harvard, Vancouver, ISO, and other styles
31

Huang, Min-Wei, Chien-Hung Chiu, Chih-Fong Tsai, and Wei-Chao Lin. "On Combining Feature Selection and Over-Sampling Techniques for Breast Cancer Prediction." Applied Sciences 11, no. 14 (2021): 6574. http://dx.doi.org/10.3390/app11146574.

Full text
Abstract:
Breast cancer prediction datasets are usually class imbalanced, where the number of data samples in the malignant and benign patient classes are significantly different. Over-sampling techniques can be used to re-balance the datasets to construct more effective prediction models. Moreover, some related studies have considered feature selection to remove irrelevant features from the datasets for further performance improvement. However, since the order of combining feature selection and over-sampling can result in different training sets to construct the prediction model, it is unknown which or
APA, Harvard, Vancouver, ISO, and other styles
32

Ijaz, Muhammad Fazal, Muhammad Attique, and Youngdoo Son. "Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods." Sensors 20, no. 10 (2020): 2809. http://dx.doi.org/10.3390/s20102809.

Full text
Abstract:
Globally, cervical cancer remains as the foremost prevailing cancer in females. Hence, it is necessary to distinguish the importance of risk factors of cervical cancer to classify potential patients. The present work proposes a cervical cancer prediction model (CCPM) that offers early prediction of cervical cancer using risk factors as inputs. The CCPM first removes outliers by using outlier detection methods such as density-based spatial clustering of applications with noise (DBSCAN) and isolation forest (iForest) and by increasing the number of cases in the dataset in a balanced way, for exa
APA, Harvard, Vancouver, ISO, and other styles
33

Wijaya, Junjun, Agus M. Soleh, and Akbar Rizki. "Penanganan Data Tidak Seimbang pada Pemodelan Rotation Forest Keberhasilan Studi Mahasiswa Program Magister IPB." Xplore: Journal of Statistics 2, no. 2 (2018): 32–40. http://dx.doi.org/10.29244/xplore.v2i2.99.

Full text
Abstract:
Graduate school of Bogor Agricultural University (SPs-IPB) stated that not all students of IPB master program successfully complete their studies. This becomes an evaluation for IPB to be more selective in choosing students in the future. This study aims to model the success classification of IPB master students in 2011 to 2015. The classification method used is rotation forest. The percentage of students who graduated is very large compared to those who did not pass, this can cause the evaluation value different. SMOTE (Synthetic Minority Oversampling Technique) is one of method to handle suc
APA, Harvard, Vancouver, ISO, and other styles
34

Bao, Fuguang, Yongqiang Wu, Zhaogang Li, Yongzhao Li, Lili Liu, and Guanyu Chen. "Effect Improved for High-Dimensional and Unbalanced Data Anomaly Detection Model Based on KNN-SMOTE-LSTM." Complexity 2020 (September 17, 2020): 1–17. http://dx.doi.org/10.1155/2020/9084704.

Full text
Abstract:
High-dimensional and unbalanced data anomaly detection is common. Effective anomaly detection is essential for problem or disaster early warning and maintaining system reliability. A significant research issue related to the data analysis of the sensor is the detection of anomalies. The anomaly detection is essentially an unbalanced sequence binary classification. The data of this type contains characteristics of large scale, high complex computation, unbalanced data distribution, and sequence relationship among data. This paper uses long short-term memory networks (LSTMs) combined with histor
APA, Harvard, Vancouver, ISO, and other styles
35

Dentamaro, Vincenzo, Donato Impedovo, and Giuseppe Pirlo. "LICIC: Less Important Components for Imbalanced Multiclass Classification." Information 9, no. 12 (2018): 317. http://dx.doi.org/10.3390/info9120317.

Full text
Abstract:
Multiclass classification in cancer diagnostics, using DNA or Gene Expression Signatures, but also classification of bacteria species fingerprints in MALDI-TOF mass spectrometry data, is challenging because of imbalanced data and the high number of dimensions with respect to the number of instances. In this study, a new oversampling technique called LICIC will be presented as a valuable instrument in countering both class imbalance, and the famous “curse of dimensionality” problem. The method enables preservation of non-linearities within the dataset, while creating new instances without addin
APA, Harvard, Vancouver, ISO, and other styles
36

Rendón, Eréndira, Roberto Alejo, Carlos Castorena, Frank J. Isidro-Ortega, and Everardo E. Granda-Gutiérrez. "Data Sampling Methods to Deal With the Big Data Multi-Class Imbalance Problem." Applied Sciences 10, no. 4 (2020): 1276. http://dx.doi.org/10.3390/app10041276.

Full text
Abstract:
The class imbalance problem has been a hot topic in the machine learning community in recent years. Nowadays, in the time of big data and deep learning, this problem remains in force. Much work has been performed to deal to the class imbalance problem, the random sampling methods (over and under sampling) being the most widely employed approaches. Moreover, sophisticated sampling methods have been developed, including the Synthetic Minority Over-sampling Technique (SMOTE), and also they have been combined with cleaning techniques such as Editing Nearest Neighbor or Tomek’s Links (SMOTE+ENN and
APA, Harvard, Vancouver, ISO, and other styles
37

Hamdy, Abeer, and Abdulrahman El-Laithy. "SMOTE and Feature Selection for More Effective Bug Severity Prediction." International Journal of Software Engineering and Knowledge Engineering 29, no. 06 (2019): 897–919. http://dx.doi.org/10.1142/s0218194019500311.

Full text
Abstract:
“Severity” is one of the essential features of software bug reports, which is a crucial factor for developers to decide which bug should be fixed immediately and which bug could be delayed to a next release. Severity assignment is a manual process and its accuracy depends on the experience of the assignee. Prior research proposed several models to automate this process. These models are based on textual preprocessing of historical bug reports and classification techniques. Although bug repositories suffer from severity class imbalance, none of the prior studies investigated the impact of imple
APA, Harvard, Vancouver, ISO, and other styles
38

Li, Yihong, Yunpeng Wang, Tao Li, Beibei Li, and Xiaolong Lan. "SP-SMOTE: A novel space partitioning based synthetic minority oversampling technique." Knowledge-Based Systems 228 (September 2021): 107269. http://dx.doi.org/10.1016/j.knosys.2021.107269.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

P V R N S S V Sai Leela, Bankapalli Jyothi, Pullagura Indira priyadarsini,. "Towards Intelligent Machine Learning Models for Intrusion Detection System." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 5 (2021): 643–55. http://dx.doi.org/10.17762/turcomat.v12i5.1062.

Full text
Abstract:
The Internet has become an important resource for mankind. Explicitly information security is an interminable domain to the present world. Hence a more potent Intrusion Detection System (IDS) should be built. Machine Learning techniques are used in developing proficient models for IDS. Imbalanced Learning is a crucial task for many classification processes. Resampling training data towards a more balanced distribution is an effective way to combat this issue. There are most prevalent techniques like under sampling and oversampling.In this paper, the issues of imbalanced data distribution and h
APA, Harvard, Vancouver, ISO, and other styles
40

Gul, Hira, Nadeem Javaid, Ibrar Ullah, Ali Mustafa Qamar, Muhammad Khalil Afzal, and Gyanendra Prasad Joshi. "Detection of Non-Technical Losses Using SOSTLink and Bidirectional Gated Recurrent Unit to Secure Smart Meters." Applied Sciences 10, no. 9 (2020): 3151. http://dx.doi.org/10.3390/app10093151.

Full text
Abstract:
Energy consumption is increasing exponentially with the increase in electronic gadgets. Losses occur during generation, transmission, and distribution. The energy demand leads to increase in electricity theft (ET) in distribution side. Data analysis is the process of assessing the data using different analytical and statistical tools to extract useful information. Fluctuation in energy consumption patterns indicates electricity theft. Utilities bear losses of millions of dollar every year. Hardware-based solutions are considered to be the best; however, the deployment cost of these solutions i
APA, Harvard, Vancouver, ISO, and other styles
41

Mustaqim, Mustaqim, Budi Warsito, and Bayu Surarso. "Kombinasi Synthetic Minority Oversampling Technique (SMOTE) dan Neural Network Backpropagation untuk menangani data tidak seimbang pada prediksi pemakaian alat kontrasepsi implan." Register: Jurnal Ilmiah Teknologi Sistem Informasi 5, no. 2 (2019): 128. http://dx.doi.org/10.26594/register.v5i2.1705.

Full text
Abstract:
Combination of Synthetic Minority Oversampling Technique (SMOTE) and Backpropagation Neural Network to handle imbalanced class in predicting the use of contraceptive implants Kegagalan akibat pemakaian alat kontrasepsi implan merupakan terjadinya kehamilan pada wanita saat menggunakan alat kontrasepsi secara benar. Kegagalan pemakaian kontrasepsi implan tahun 2018 secara nasional sejumlah 1.852 pengguna atau 4% dari 41.947 pengguna. Rasio angka kegagalan dan keberhasilan pemakaian kontrasepsi implan yang cenderung tidak seimbang (imbalance class) membuatnya sulit diprediksi. Ketidakseimbangan
APA, Harvard, Vancouver, ISO, and other styles
42

Sulistiyowati, Nina, and Mohamad Jajuli. "INTEGRASI NAIVE BAYES DENGAN TEKNIK SAMPLING SMOTE UNTUK MENANGANI DATA TIDAK SEIMBANG." NUANSA INFORMATIKA 14, no. 1 (2020): 34. http://dx.doi.org/10.25134/nuansa.v14i1.2411.

Full text
Abstract:
Classification of data with unbalanced classes is a major problem in the field of machine learning and data mining. If working on unbalanced data, almost all classification algorithms will produce much higher accuracy for majority classes than minority classes. This research will implement the Synthetic Minority Over-sampling Technique (SMOTE) method to overcome unbalanced data on credit customer data in Rawamerta teacher cooperatives. The research methodology uses SEMMA with the stages of research Sample, Explore, Modify, Model, and Asses. The Sample Phase was conducted to choose the data of
APA, Harvard, Vancouver, ISO, and other styles
43

Fan, Ziqi, Yuanbo Wu, Changwei Zhou, Xiaojun Zhang, and Zhi Tao. "Class-Imbalanced Voice Pathology Detection and Classification Using Fuzzy Cluster Oversampling Method." Applied Sciences 11, no. 8 (2021): 3450. http://dx.doi.org/10.3390/app11083450.

Full text
Abstract:
The Massachusetts Eye and Ear Infirmary (MEEI) database is an international-standard training database for voice pathology detection (VPD) systems. However, there is a class-imbalanced distribution in normal and pathological voice samples and different types of pathological voice samples in the MEEI database. This study aimed to develop a VPD system that uses the fuzzy clustering synthetic minority oversampling technique algorithm (FC-SMOTE) to automatically detect and classify four types of pathological voices in a multi-class imbalanced database. The proposed FC-SMOTE algorithm processes the
APA, Harvard, Vancouver, ISO, and other styles
44

Park, Kwang Ho, Erdenebileg Batbaatar, Yongjun Piao, Nipon Theera-Umpon, and Keun Ho Ryu. "Deep Learning Feature Extraction Approach for Hematopoietic Cancer Subtype Classification." International Journal of Environmental Research and Public Health 18, no. 4 (2021): 2197. http://dx.doi.org/10.3390/ijerph18042197.

Full text
Abstract:
Hematopoietic cancer is a malignant transformation in immune system cells. Hematopoietic cancer is characterized by the cells that are expressed, so it is usually difficult to distinguish its heterogeneities in the hematopoiesis process. Traditional approaches for cancer subtyping use statistical techniques. Furthermore, due to the overfitting problem of small samples, in case of a minor cancer, it does not have enough sample material for building a classification model. Therefore, we propose not only to build a classification model for five major subtypes using two kinds of losses, namely rec
APA, Harvard, Vancouver, ISO, and other styles
45

Zhou, Kaibo, Jianyu Zhang, Yusong Ren, Zhen Huang, and Luanxiao Zhao. "A gradient boosting decision tree algorithm combining synthetic minority oversampling technique for lithology identification." GEOPHYSICS 85, no. 4 (2020): WA147—WA158. http://dx.doi.org/10.1190/geo2019-0429.1.

Full text
Abstract:
Lithology identification based on conventional well-logging data is of great importance for geologic features characterization and reservoir quality evaluation in the exploration and production development of petroleum reservoirs. However, there are some limitations in the traditional lithology identification process: (1) It is very time consuming to build a model so that it cannot realize real-time lithology identification during well drilling, (2) it must be modeled by experienced geologists, which consumes a lot of manpower and material resources, and (3) the imbalance of labeled data in we
APA, Harvard, Vancouver, ISO, and other styles
46

Douzas, Georgios, Fernando Bacao, Joao Fonseca, and Manvel Khudinyan. "Imbalanced Learning in Land Cover Classification: Improving Minority Classes’ Prediction Accuracy Using the Geometric SMOTE Algorithm." Remote Sensing 11, no. 24 (2019): 3040. http://dx.doi.org/10.3390/rs11243040.

Full text
Abstract:
The automatic production of land use/land cover maps continues to be a challenging problem, with important impacts on the ability to promote sustainability and good resource management. The ability to build robust automatic classifiers and produce accurate maps can have a significant impact on the way we manage and optimize natural resources. The difficulty in achieving these results comes from many different factors, such as data quality and uncertainty. In this paper, we address the imbalanced learning problem, a common and difficult conundrum in remote sensing that affects the quality of cl
APA, Harvard, Vancouver, ISO, and other styles
47

Rezki, Muhammad, Desiana Nur Kholifah, Muhammad Faisal, Priyono Priyono, and Rachmat Suryadithia. "Analisis Review Pengguna Google Meet dan Zoom Cloud Meeting Menggunakan Algoritma Naïve Bayes." Jurnal Infortech 2, no. 2 (2020): 264–70. http://dx.doi.org/10.31294/infortech.v2i2.9286.

Full text
Abstract:
Saat ini seluruh dunia sedang menghadapi wabah penyakit menular yaitu virus Covid 19. Pembatasan sosial atau menjaga jarak adalah serangkaian tindakan pengendalian infeksi nonfarmasi yang dimaksudkan untuk menghentikan atau memperlambat penyebaran penyakit menular tersebut. Sehingga seluruh masyarakat diharapkan untuk beraktifitas dirumah untuk menghentikan penyebaran virus Covid 19. Agar tetap bisa menjalankan aktifitas dirumah diperlukan virtual meet untuk berkomunikasi sesama team atau rekan kerja. Saat ini virtual meet telah banyak dipakai. Penilaian Sebuah Aplikasi di Playstore memiliki t
APA, Harvard, Vancouver, ISO, and other styles
48

Turlapati, Venkata Pavan Kumar, and Manas Ranjan Prusty. "Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19." Intelligence-Based Medicine 3-4 (December 2020): 100023. http://dx.doi.org/10.1016/j.ibmed.2020.100023.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Fonseca, Joao, Georgios Douzas, and Fernando Bacao. "Improving Imbalanced Land Cover Classification with K-Means SMOTE: Detecting and Oversampling Distinctive Minority Spectral Signatures." Information 12, no. 7 (2021): 266. http://dx.doi.org/10.3390/info12070266.

Full text
Abstract:
Land cover maps are a critical tool to support informed policy development, planning, and resource management decisions. With significant upsides, the automatic production of Land Use/Land Cover maps has been a topic of interest for the remote sensing community for several years, but it is still fraught with technical challenges. One such challenge is the imbalanced nature of most remotely sensed data. The asymmetric class distribution impacts negatively the performance of classifiers and adds a new source of error to the production of these maps. In this paper, we address the imbalanced learn
APA, Harvard, Vancouver, ISO, and other styles
50

Ahlawat, Khyati, Anuradha Chug, and Amit Prakash Singh. "Empirical Evaluation of Map Reduce Based Hybrid Approach for Problem of Imbalanced Classification in Big Data." International Journal of Grid and High Performance Computing 11, no. 3 (2019): 23–45. http://dx.doi.org/10.4018/ijghpc.2019070102.

Full text
Abstract:
Imbalanced datasets are the ones with uneven distribution of classes that deteriorates classifier's performance. In this paper, SVM classifier is combined with K-Means clustering approach and a hybrid approach, Hy_SVM_KM is introduced. The performance of proposed method is also empirically evaluated using Accuracy and FN Rate measure and compared with existing methods like SMOTE. The results have shown that the proposed hybrid technique has outperformed traditional machine learning classifier SVM in mostly datasets and have performed better than known pre-processing technique SMOTE for all dat
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!