Log in

Relevant bibliographies by topics / Model-based oversampling / Journal articles

To see the other types of publications on this topic, follow the link: Model-based oversampling.

Journal articles on the topic 'Model-based oversampling'

Author: Grafiati

Published: 5 June 2025

Last updated: 2 August 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Model-based oversampling.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Lee, Ji-Na, and Ji-Yeoun Lee. "An Efficient SMOTE-Based Deep Learning Model for Voice Pathology Detection." Applied Sciences 13, no. 6 (2023): 3571. http://dx.doi.org/10.3390/app13063571.

Full text

Abstract:

The Saarbruecken Voice Database (SVD) is a public database used by voice pathology detection systems. However, the distributions of the pathological and normal voice samples show a clear class imbalance. This study aims to develop a system for the classification of pathological and normal voices that uses efficient deep learning models based on various oversampling methods, such as the adaptive synthetic sampling (ADASYN), synthetic minority oversampling technique (SMOTE), and Borderline-SMOTE directly applied to feature parameters. The suggested combinations of oversampled linear predictive c

APA, Harvard, Vancouver, ISO, and other styles

2

Chang, Young-Soo, Hee-Sung Park, and Il-Joon Moon. "Predicting the Cochlear Dead Regions Using a Machine Learning-Based Approach with Oversampling Techniques." Medicina 57, no. 11 (2021): 1192. http://dx.doi.org/10.3390/medicina57111192.

Full text

Abstract:

Background and Objectives: Determining the presence or absence of cochlear dead regions (DRs) is essential in clinical practice. This study proposes a machine learning (ML)-based model that applies oversampling techniques for predicting DRs in patients. Materials and Methods: We used recursive partitioning and regression for classification tree (CT) and logistic regression (LR) as prediction models. To overcome the imbalanced nature of the dataset, oversampling techniques to duplicate examples in the minority class or to synthesize new examples from existing examples in the minority class were

APA, Harvard, Vancouver, ISO, and other styles

3

Vebriyanti, Lo Mei Ly, Shantika Martha, Wirda Andani, and Setyo Wira Rizki. "Analisis Kelayakan Kredit Menggunakan Classification Tree dengan Teknik Random Oversampling." Euler : Jurnal Ilmiah Matematika, Sains dan Teknologi 12, no. 1 (2024): 1–8. http://dx.doi.org/10.37905/euler.v12i1.24182.

Full text

Abstract:

Credit is providing money or bills based on the agreement between a bank and another party. Lending is inseparable from bad credit risk, so credit analysis must be conducted on prospective debtors before approving a proposed loan. This research aims to analyze creditworthiness using a Classification Tree as a classification method with Random Oversampling to overcome imbalanced data. This study uses secondary data on the status of debtors from a bank in West Kalimantan. Research data amounted to 800 data samples consisting of collectability variables as target variables and 10 independent vari

APA, Harvard, Vancouver, ISO, and other styles

4

Lee, Taehwa, and Soojin Lee. "Transformer-based Intrusion Detection Model with Packet Payload Analysis and Oversampling." Journal of Korean Institute of Information Technology 22, no. 10 (2024): 27–34. http://dx.doi.org/10.14801/jkiit.2024.22.10.27.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Xu, Yanping, Xiaoyu Zhang, Zhenliang Qiu, Xia Zhang, Jian Qiu, and Hua Zhang. "Oversampling Imbalanced Data Based on Convergent WGAN for Network Threat Detection." Security and Communication Networks 2021 (November 5, 2021): 1–14. http://dx.doi.org/10.1155/2021/9206440.

Full text

Abstract:

Class imbalance is a common problem in network threat detection. Oversampling the minority class is regarded as a popular countermeasure by generating enough new minority samples. Generative adversarial network (GAN) is a typical generative model that can generate any number of artificial minority samples, which are close to the real data. However, it is difficult to train GAN, and the Nash equilibrium is almost impossible to achieve. Therefore, in order to improve the training stability of GAN for oversampling to detect the network threat, a convergent WGAN-based oversampling model called con

APA, Harvard, Vancouver, ISO, and other styles

6

Kang, Hangoo, Dongil Kim, and Sungsu Lim. "Machine Learning-Based Anomaly Detection on Seawater Temperature Data with Oversampling." Journal of Marine Science and Engineering 12, no. 5 (2024): 807. http://dx.doi.org/10.3390/jmse12050807.

Full text

Abstract:

This study deals with a method for anomaly detection in seawater temperature data using machine learning methods with oversampling techniques. Data were acquired from 2017 to 2023 using a Conductivity–Temperature–Depth (CTD) system in the Pacific Ocean, Indian Ocean, and Sea of Korea. The seawater temperature data consist of 1414 profiles including 1218 normal and 196 abnormal profiles. This dataset has an imbalance problem in which the amount of abnormal data is insufficient compared to that of normal data. Therefore, we generated abnormal data with oversampling techniques using duplication,

APA, Harvard, Vancouver, ISO, and other styles

7

Xiong, Chuang, Runhan Zhao, Jingtao Xu, et al. "Construct and Validate a Predictive Model for Surgical Site Infection after Posterior Lumbar Interbody Fusion Based on Machine Learning Algorithm." Computational and Mathematical Methods in Medicine 2022 (August 23, 2022): 1–11. http://dx.doi.org/10.1155/2022/2697841.

Full text

Abstract:

Purpose. Surgical site infection is one of the serious complications after lumbar fusion. Early prediction and timely intervention can reduce the harm to patients. The aims of this study were to construct and validate a machine learning model for predicting surgical site infection after posterior lumbar interbody fusion, to screen out the most important risk factors for surgical site infection, and to explore whether synthetic minority oversampling technique could improve the model performance. Method. This study reviewed 584 patients who underwent posterior lumbar interbody fusion for degener

APA, Harvard, Vancouver, ISO, and other styles

8

García-Vicente, Clara, David Chushig-Muzo, Inmaculada Mora-Jiménez, et al. "Evaluation of Synthetic Categorical Data Generation Techniques for Predicting Cardiovascular Diseases and Post-Hoc Interpretability of the Risk Factors." Applied Sciences 13, no. 7 (2023): 4119. http://dx.doi.org/10.3390/app13074119.

Full text

Abstract:

Machine Learning (ML) methods have become important for enhancing the performance of decision-support predictive models. However, class imbalance is one of the main challenges for developing ML models, because it may bias the learning process and the model generalization ability. In this paper, we consider oversampling methods for generating synthetic categorical clinical data aiming to improve the predictive performance in ML models, and the identification of risk factors for cardiovascular diseases (CVDs). We performed a comparative study of several categorical synthetic data generation meth

APA, Harvard, Vancouver, ISO, and other styles

9

A, Sagaya Priya, and Britto Ramesh Kumar S. "Semi-Supervised Intrusion Detection Based on Stacking and Feature-Engineering to Handle Data Imbalance." Indian Journal of Science and Technology 15, no. 46 (2022): 2548–54. https://doi.org/10.17485/IJST/v15i46.1885.

Full text

Abstract:

Abstract <strong>Objectives:</strong> To design an architecture that can effectively handle the imbalance levels and complexities in the network data to provide qualitative predictions. <strong>Methods:</strong> Experiments were performed with KDD CUP 99 dataset, NSL- KDD dataset and UNSW- NB15 dataset. Comparisons were performed with SAVAERDNN model. Oversampling technique is used for data balancing, and the stacking architecture handles the issue of overtraining introduced due to oversampling.<strong> Findings:</strong> The proposed Stacking and Feature engineeringba

APA, Harvard, Vancouver, ISO, and other styles

10

Fieri, Brillian, Joshua La'la, and Derwin Suhartono. "Introversion-Extraversion Prediction using Machine Learning." JOIV : International Journal on Informatics Visualization 7, no. 4 (2023): 2154. http://dx.doi.org/10.62527/joiv.7.4.1019.

Full text

Abstract:

Introversion and extroversion are personality traits that assess the type of interaction between people and others. Introversion and extraversion have their advantages and disadvantages. Knowing their personality, people can utilize these advantages and disadvantages for their benefit. This study compares and evaluates several machine learning models and dataset balancing methods to predict the introversion-extraversion personality based on the survey result conducted by Open-Source Psychometrics Project. The dataset was balanced using three balancing methods, and fifteen questions were chosen

APA, Harvard, Vancouver, ISO, and other styles

11

Fieri, Brillian, Joshua La'la, and Derwin Suhartono. "Introversion-Extraversion Prediction using Machine Learning." JOIV : International Journal on Informatics Visualization 7, no. 4 (2023): 2154–60. http://dx.doi.org/10.30630/joiv.7.4.1019.

Full text

Abstract:

Introversion and extroversion are personality traits that assess the type of interaction between people and others. Introversion and extraversion have their advantages and disadvantages. Knowing their personality, people can utilize these advantages and disadvantages for their benefit. This study compares and evaluates several machine learning models and dataset balancing methods to predict the introversion-extraversion personality based on the survey result conducted by Open-Source Psychometrics Project. The dataset was balanced using three balancing methods, and fifteen questions were chosen

APA, Harvard, Vancouver, ISO, and other styles

12

Fieri, Brillian, Joshua La'la, and Derwin Suhartono. "Introversion-Extraversion Prediction using Machine Learning." JOIV : International Journal on Informatics Visualization 7, no. 4 (2023): 2154. http://dx.doi.org/10.30630/joiv.7.4.01019.

Full text

Abstract:

Introversion and extroversion are personality traits that assess the type of interaction between people and others. Introversion and extraversion have their advantages and disadvantages. Knowing their personality, people can utilize these advantages and disadvantages for their benefit. This study compares and evaluates several machine learning models and dataset balancing methods to predict the introversion-extraversion personality based on the survey result conducted by Open-Source Psychometrics Project. The dataset was balanced using three balancing methods, and fifteen questions were chosen

APA, Harvard, Vancouver, ISO, and other styles

13

Putra, Muhammad Akmal A., Suwarno, and Rahman Azis Prasojo. "Improving Transformer Health Index Prediction Performance Using Machine Learning Algorithms with a Synthetic Minority Oversampling Technique." Energies 18, no. 9 (2025): 2364. https://doi.org/10.3390/en18092364.

Full text

Abstract:

Machine learning (ML) has emerged as a powerful tool in transformer condition assessment, enabling more accurate diagnostics by leveraging historical test data. However, imbalanced datasets, often characterized by limited samples in poor transformer conditions, pose significant challenges to model performance. This study investigates the application of oversampling techniques to enhance ML model accuracy in predicting the Health Index of transformers. A dataset comprising 3850 transformer tests collected from utilities across Indonesia was used. Key parameters, including oil quality, dissolved

APA, Harvard, Vancouver, ISO, and other styles

14

Ren, Jinfu, Yang Liu, and Jiming Liu. "EWGAN: Entropy-Based Wasserstein GAN for Imbalanced Learning." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 10011–12. http://dx.doi.org/10.1609/aaai.v33i01.330110011.

Full text

Abstract:

In this paper, we propose a novel oversampling strategy dubbed Entropy-based Wasserstein Generative Adversarial Network (EWGAN) to generate data samples for minority classes in imbalanced learning. First, we construct an entropyweighted label vector for each class to characterize the data imbalance in different classes. Then we concatenate this entropyweighted label vector with the original feature vector of each data sample, and feed it into the WGAN model to train the generator. After the generator is trained, we concatenate the entropy-weighted label vector with random noise feature vectors

APA, Harvard, Vancouver, ISO, and other styles

15

Maraden, Yan, Gunawan Wibisono, I. Gde Dharma Nugraha, et al. "Enhancing Electricity Theft Detection through K-Nearest Neighbors and Logistic Regression Algorithms with Synthetic Minority Oversampling Technique: A Case Study on State Electricity Company (PLN) Customer Data." Energies 16, no. 14 (2023): 5405. http://dx.doi.org/10.3390/en16145405.

Full text

Abstract:

Electricity theft has caused massive losses and damage to electricity utilities. The damage affects the electricity supply’s quality and increases the generation load. The losses happen not only for the electricity utilities but also affect the legitimate users who have to pay excessive electricity bills. That is why the method to detect electricity theft is indispensable. Recently, machine learning algorithms have been used to develop a model for detecting electricity theft. However, most algorithms have problems due to imbalanced data, overfitting issues, and lack of data. Therefore, this pa

APA, Harvard, Vancouver, ISO, and other styles

16

Sug, Hyontai. "An Oversampling Technique with Descriptive Statistics." WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS 21 (July 17, 2024): 318–32. http://dx.doi.org/10.37394/23209.2024.21.31.

Full text

Abstract:

Oversampling is often applied as a means to win a better knowledge model. Several oversampling methods based on synthetic instances have been suggested, and SMOTE is one of the representative oversampling methods that can generate synthetic instances of a minor class. Until now, the oversampled data has been used conventionally to train machine learning models without statistical analysis, so it is not certain that the machine learning models will be fine for unseen cases in the future. However, because such synthetic data is different from the original data, we may wonder how much it resemble

APA, Harvard, Vancouver, ISO, and other styles

17

Owoade, Ayoade Akeem. "Identifying Financial Fraud Transactions Using Decision Tree Classifier Algorithm." Dutse Journal of Pure and Applied Sciences 11, no. 1a (2025): 338–52. https://doi.org/10.4314/dujopas.v11i1a.31.

Full text

Abstract:

Fraud in financial transactions is a critical issue for businesses, governments, and consumers, causing substantial financial losses and eroding trust in financial systems. Rule-based systems and other conventional fraud detection techniques have trouble identifying complex fraudulent activity. This research applies various machine learning (ML) algorithms to detect fraud in financial transactions. The research compares the performance of supervised ML techniques, such as random forest, logistic regression, and decision tree classifier, using a publicly available Kaggle dataset. It evaluates t

APA, Harvard, Vancouver, ISO, and other styles

18

Pertiwi, Dwika Ananda Agustina, Kamilah Ahmad, Jumanto Unjung, and Much Aziz Muslim. "A Performance Comparison of Data Balancing Model to Improve Credit Risk Prediction in P2P Lending." Scientific Journal of Informatics 11, no. 4 (2024): 881–90. https://doi.org/10.15294/sji.v11i4.14018.

Full text

Abstract:

Purpose: The problem of imbalanced datasets often affects the performance of classification models for prediction, one of which is credit risk prediction in P2P lending. To overcome this problem, several data balancing models have been applied in the existing literature. However, existing research only evaluates performance based on classification model performance. Thus, in addition to measuring the performance of classification models, this study involves the contribution of the performance of data balancing models including Random Oversampling (ROS), Random Undersampling (RUS), and Syntheti

APA, Harvard, Vancouver, ISO, and other styles

19

Kucharczyk, Marcin, Grzegorz Dziwoki, Jacek Izydorczyk, et al. "Equalizer Parameters’ Adjustment Based on an Oversampled Channel Model for OFDM Modulation Systems." Electronics 13, no. 5 (2024): 843. http://dx.doi.org/10.3390/electronics13050843.

Full text

Abstract:

A physical model of a wireless transmission channel in the time domain usually consists of the main propagation path and only a few reflections. The reasonable assumptions made about the channel model can improve its parameters’ estimation by a greedy OFDM (Orthogonal Frequency Division Multiplexing) equalizer. The equalizer works flawlessly if delays between propagation paths are in the sampling grid. Otherwise, the channel impulse response loses its compressible characteristic and the number of coefficients to find increases. It is possible to get back to the simple channel model by data ove

APA, Harvard, Vancouver, ISO, and other styles

20

Zhou, Yuqing, Canyang Ye, Deqiang Huang, Bihui Peng, Bintao Sun, and Huan Zhang. "Synthetic Minority Oversampling Enhanced FEM for Tool Wear Condition Monitoring." Processes 11, no. 6 (2023): 1785. http://dx.doi.org/10.3390/pr11061785.

Full text

Abstract:

Recent advances in artificial intelligence (AI) technology have led to increasing interest in the development of AI-based tool wear condition monitoring methods, heavily relying on large training samples. However, the high cost of tool wear experiment and the uncertainty of tool wear change in the machining process lead to the problems of sample missing and insufficiency in the model training stage, which seriously affects the identification accuracy of many AI models. In this paper, a novel identification method based on finite-element modeling (FEM) and the synthetic minority oversampling te

APA, Harvard, Vancouver, ISO, and other styles

21

Pradipta, Gede Angga, Putu Desiana Wulaning Ayu, Made Liandana, and Dandy Pramana Hostiadi. "Weighted nearest neighbors and radius oversampling for imbalanced data classification." IAES International Journal of Artificial Intelligence (IJ-AI) 14, no. 1 (2025): 416. http://dx.doi.org/10.11591/ijai.v14.i1.pp416-427.

Full text

Abstract:

The challenges associated with high-dimensional and imbalanced datasets were observed to often lead to a degradation in the performance of classical machine learning algorithms. In the case of high dimensional data, not all features contribute significantly and are considered relevant to the performance of the model. Therefore, this study introduced a novel method called feature weighted variance analysis-nearest neighbors (WFVANN) which was developed on the foundation of k-nearest neighbors (KNN). The process involved modifying the calculation of the Euclidean distance by fully considering th

APA, Harvard, Vancouver, ISO, and other styles

22

Gede, Angga Pradipta, Desiana Wulaning Ayu Putu, Liandana Made, and Pramana Hostiadi Dandy. "Weighted nearest neighbors and radius oversampling for imbalanced data classification." IAES International Journal of Artificial Intelligence (IJ-AI) 14, no. 1 (2025): 416–27. https://doi.org/10.11591/ijai.v14.i1.pp416-427.

Full text

Abstract:

The challenges associated with high-dimensional and imbalanced datasets were observed to often lead to a degradation in the performance of classical machine learning algorithms. In the case of high dimensional data, not all features contribute significantly and are considered relevant to the performance of the model. Therefore, this study introduced a novel method called feature weighted variance analysis-nearest neighbors (WFVANN) which was developed on the foundation of k-nearest neighbors (KNN). The process involved modifying the calculation of the Euclidean distance by fully considering th

APA, Harvard, Vancouver, ISO, and other styles

23

Zhafran, Kamil Elian, and Deni Saepudin. "Stock Industry Sector Prediction Based on Financial Reports Using Random Forest." Building of Informatics, Technology and Science (BITS) 6, no. 2 (2024): 1002–11. https://doi.org/10.47065/bits.v6i2.5743.

Full text

Abstract:

This study aims to predict the stock industry sector on the Indonesia Stock Exchange (IDX) based on financial reports using the Random Forest method. Implementing this machine learning approach is crucial due to the complexity of financial data, which demands robust and adaptive methods for accurate predictions. The dataset comprises financial data from companies across 10 industrial sectors on the IDX, spanning 2010-2022, and includes 17 features from each financial report. Notably, there is an imbalance in the number of companies per sector, with sector B representing 14.76% and sector G onl

APA, Harvard, Vancouver, ISO, and other styles

24

Mia, Mia, Anis Fitri Nur Masruriyah, and Adi Rizky Pratama. "The Utilization of Decision Tree Algorithm In Order to Predict Heart Disease." JURNAL SISFOTEK GLOBAL 12, no. 2 (2022): 138. http://dx.doi.org/10.38101/sisfotek.v12i2.551.

Full text

Abstract:

The data on heart disease patients obtained from the Ministry of Health of the Republic of Indonesia in 2020 explains that heart disease has increased every year and ranks as the highest cause of death in Indonesia, especially at productive ages. If people with heart disease are not treated properly, then in their effective period a patient can experience death more quickly. Thus, a predictive model that is able to help medical personnel solve health problems is built. This study employed the Random Forest and Decision Tree algorithm classification process by processing cardiac patient data to

APA, Harvard, Vancouver, ISO, and other styles

25

Liebenlito, Muhaza, Yanne Irene, and Abdul Hamid. "Classification of Tuberculosis and Pneumonia in Human Lung Based on Chest X-Ray Image using Convolutional Neural Network." InPrime: Indonesian Journal of Pure and Applied Mathematics 2, no. 1 (2020): 24–32. http://dx.doi.org/10.15408/inprime.v2i1.14545.

Full text

Abstract:

AbstractIn this paper, we use chest x-ray images of Tuberculosis and Pneumonia to diagnose the patient using a convolutional neural network model. We use 4273 images of pneumonia, 1989 images of normal, and 394 images of tuberculosis. The data are divided into 80% as the training set and 20% as the testing set. We do the preprocessing steps to all of our images data, such as resize, converting RGB to grayscale, and Gaussian normalization. On the training dataset, the sampling technique used is undersampling and oversampling to balance each class. The best model was chosen based on the Area und

APA, Harvard, Vancouver, ISO, and other styles

26

Fan, Zongwen, Shaleeza Sohail, Fariza Sabrina, and Xin Gu. "Sampling-Based Machine Learning Models for Intrusion Detection in Imbalanced Dataset." Electronics 13, no. 10 (2024): 1878. http://dx.doi.org/10.3390/electronics13101878.

Full text

Abstract:

Cybersecurity is one of the important considerations when adopting IoT devices in smart applications. Even though a huge volume of data is available, data related to attacks are generally in a significantly smaller proportion. Although machine learning models have been successfully applied for detecting security attacks on smart applications, their performance is affected by the problem of such data imbalance. In this case, the prediction model is preferable to the majority class, while the performance for predicting the minority class is poor. To address such problems, we apply two oversampli

APA, Harvard, Vancouver, ISO, and other styles

27

Ramadhan, Nur Ghaniaviyanto, Azka Khoirunnisa, Kurnianingsih Kurnianingsih, and Takako Hashimoto. "A Hybrid ROS-SVM Model for Detecting Target Multiple Drug Types." JOIV : International Journal on Informatics Visualization 7, no. 3 (2023): 794. http://dx.doi.org/10.30630/joiv.7.3.1171.

Full text

Abstract:

Misleading in determining the decision to use the target drug will be fatal, even to death. This study examines five pharmacological targets designated as types A, B, C, X, and Y. Early detection of misleading drug targeting will reduce the risk of death. This study aims to develop hybrid random oversampling techniques (ROS) and support vector machine (SVM) methods. The use of the oversampling technique in this study aims to balance classes in the dataset; due to the data collection in each class, there is a relatively large gap. This study applies five schemes to see which combination of mode

APA, Harvard, Vancouver, ISO, and other styles

28

Hu, Feng, and Hang Li. "A Novel Boundary Oversampling Algorithm Based on Neighborhood Rough Set Model: NRSBoundary-SMOTE." Mathematical Problems in Engineering 2013 (2013): 1–10. http://dx.doi.org/10.1155/2013/694809.

Full text

Abstract:

Rough set theory is a powerful mathematical tool introduced by Pawlak to deal with imprecise, uncertain, and vague information. The Neighborhood-Based Rough Set Model expands the rough set theory; it could divide the dataset into three parts. And the boundary region indicates that the majority class samples and the minority class samples are overlapped. On the basis of what we know about the distribution of original dataset, we only oversample the minority class samples, which are overlapped with the majority class samples, in the boundary region. So, the NRSBoundary-SMOTE can expand the decis

APA, Harvard, Vancouver, ISO, and other styles

29

van de Wouw, Didrika S., Ryan T. McKay, Bruno B. Averbeck, and Nicholas Furl. "Explaining human sampling rates across different decision domains." Judgment and Decision Making 17, no. 3 (2022): 487–512. http://dx.doi.org/10.1017/s1930297500003557.

Full text

Abstract:

AbstractUndersampling biases are common in the optimal stopping literature, especially for economic full choice problems. Among these kinds of number-based studies, the moments of the distribution of values that generates the options (i.e., the generating distribution) seem to influence participants’ sampling rate. However, a recent study reported an oversampling bias on a different kind of optimal stopping task: where participants chose potential romantic partners from images of faces (Furl et al., 2019). The authors hypothesised that this oversampling bias might be specific to mate choice. W

APA, Harvard, Vancouver, ISO, and other styles

30

Liu, Ankang, Lingfei Cheng, and Changdong Yu. "SASMOTE: A Self-Attention Oversampling Method for Imbalanced CSI Fingerprints in Indoor Positioning Systems." Sensors 22, no. 15 (2022): 5677. http://dx.doi.org/10.3390/s22155677.

Full text

Abstract:

WiFi localization based on channel state information (CSI) fingerprints has become the mainstream method for indoor positioning due to the widespread deployment of WiFi networks, in which fingerprint database building is critical. However, issues, such as insufficient samples or missing data in the collection fingerprint database, result in unbalanced training data for the localization system during the construction of the CSI fingerprint database. To address the above issue, we propose a deep learning-based oversampling method, called Self-Attention Synthetic Minority Oversampling Technique (

APA, Harvard, Vancouver, ISO, and other styles

31

Wang, Jinliang, Yang Zhang, Haijiao Shi, Ying Yang, Shuai Wang, and Fengrong Wang. "Construction of Mitochondrial Protection and Monitoring Model of Lon Protease Based on Machine Learning under Myocardial Ischemia Environment." Journal of Environmental and Public Health 2022 (October 8, 2022): 1–10. http://dx.doi.org/10.1155/2022/4805009.

Full text

Abstract:

The localization of a protein’s submitochondrial structure is important for therapeutic design of associated disorders caused by mitochondrial abnormalities because many human diseases are directly tied to mitochondria. When Lon protease expression changes, glycolysis replaces respiratory metabolism in the cell, which is a common occurrence in cancer cells. The fact that protein formation is a dynamic research object makes it impossible to reproduce the unique living environment of proteins in an experimental setting, which surely makes it more challenging to determine protein function through

APA, Harvard, Vancouver, ISO, and other styles

32

Rohman, Muhammad Ghofar, Zubaile Abdullah, Shahreen Kasim, and Rasyidah. "Hybrid Logistic Regression Random Forest on Predicting Student Performance." JOIV : International Journal on Informatics Visualization 9, no. 2 (2025): 852. https://doi.org/10.62527/joiv.9.2.3972.

Full text

Abstract:

The research aims to investigate the effects of unbalanced data on machine learning, overcome imbalanced data using SMOTE oversampling, and improve machine learning performance using hyperparameter tuning. This study proposed a model that combines logistic regression and random forests as a hybrid logistic regression, random forest, and random search SV that uses SMOTE oversampling and hyperparameter tuning. The result of this study showed that the prediction model using the hybrid logistic regression, random forest, and random search SV that we proposed produces more effective performance tha

APA, Harvard, Vancouver, ISO, and other styles

33

Park, Kwang Ho, Erdenebileg Batbaatar, Yongjun Piao, Nipon Theera-Umpon, and Keun Ho Ryu. "Deep Learning Feature Extraction Approach for Hematopoietic Cancer Subtype Classification." International Journal of Environmental Research and Public Health 18, no. 4 (2021): 2197. http://dx.doi.org/10.3390/ijerph18042197.

Full text

Abstract:

Hematopoietic cancer is a malignant transformation in immune system cells. Hematopoietic cancer is characterized by the cells that are expressed, so it is usually difficult to distinguish its heterogeneities in the hematopoiesis process. Traditional approaches for cancer subtyping use statistical techniques. Furthermore, due to the overfitting problem of small samples, in case of a minor cancer, it does not have enough sample material for building a classification model. Therefore, we propose not only to build a classification model for five major subtypes using two kinds of losses, namely rec

APA, Harvard, Vancouver, ISO, and other styles

34

Tummalapalli, Vaibhav. "Stratified sampling in Cohort-based data for Machine learning Model development." International Scientific Journal of Engineering and Management 04, no. 05 (2025): 1–8. https://doi.org/10.55041/isjem03377.

Full text

Abstract:

Abstract—Cohort-based data is a prevalent structure in many industries, enabling longitudinal analyses and tracking customer behaviors over time. However, sampling such data for model development presents unique challenges, especially when events (e.g., responses, purchases) are unevenly distributed across cohorts. Random sampling can introduce biases, leading to models that fail to generalize. This paper presents a stratified sampling framework designed to maintain the proportional representation of events and non-events within each cohort, even when oversampling or undersampling is applied.

APA, Harvard, Vancouver, ISO, and other styles

35

Pradana, Rilo Chandra, and Derwin Suhartono. "A Cost-Sensitive Hybrid Model of ALBERT Model and Convolutional Neural Network for Personality Classification." CommIT (Communication and Information Technology) Journal 19, no. 1 (2025): 89–99. https://doi.org/10.21512/commit.v19i1.11822.

Full text

Abstract:

A tremendous amount of text data from social media activity can be used to extract information about a user’s personality, including the Myers-Briggs Type Indicator (MBTI). The MBTI personality type is extensively used to identify individual traits, which helps to solve problems in human resources and mental health awareness. Nonetheless, constructing an effective model for classifying MBTI types that are insensitive to unbalanced data remains a major challenge, as certain types dominate the social media environment. The research proposes a hybrid classification model that combines the transfo

APA, Harvard, Vancouver, ISO, and other styles

36

Hu, Wen-Jing, Gang Bai, Yan Wang, et al. "Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique." World Journal of Gastrointestinal Oncology 16, no. 4 (2024): 1227–35. http://dx.doi.org/10.4251/wjgo.v16.i4.1227.

Full text

Abstract:

BACKGROUND Postoperative delirium, particularly prevalent in elderly patients after abdominal cancer surgery, presents significant challenges in clinical management. AIM To develop a synthetic minority oversampling technique (SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients. METHODS In this retrospective cohort study, we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022. The incidence of postoperative delirium was recorded for 7 d post-surgery. Patients wer

APA, Harvard, Vancouver, ISO, and other styles

37

Tingyue, Wang, and Maheyzah Md Siraj. "Anomaly Network-based Intrusion Detection Model Based on Hybrid CNN-LSTM and Attention Mechanism." International Journal of Innovative Computing 15, no. 1 (2025): 73–80. https://doi.org/10.11113/ijic.v15n1.524.

Full text

Abstract:

With the growing frequency of network attacks, traditional anomaly-based intrusion detection models often fail to identify advanced attack patterns and suffer from high false positive rates. This paper proposes a hybrid deep learning model integrating Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and an Attention Mechanism to enhance detection accuracy and robustness. Leveraging CNNs for spatial feature extraction, LSTMs for temporal pattern recognition, and Attention Mechanisms for prioritizing critical data, the model effectively identifies diverse intrusion types. Using

APA, Harvard, Vancouver, ISO, and other styles

38

Sani, Dian Ahkam. "A Random Oversampling and BERT-based Model Approach for Handling Imbalanced Data in Essay Answer Correction." JURNAL INFOTEL 16, no. 4 (2024): 729–39. https://doi.org/10.20895/infotel.v16i4.1224.

Full text

Abstract:

The task of automated essay scoring has long been plagued by the challenge of imbalanced datasets, where the distribution of scores or labels is skewed towards certain categories. This imbalance can lead to poor performance of machine learning models, as they tend to be biased towards the majority class. One potential solution to this problem is the use of oversampling techniques, which aim to balance the dataset by increasing the representation of the minority class. In this paper, we propose a novel approach that combines random oversampling with a BERT-base uncased model for essay answer co

APA, Harvard, Vancouver, ISO, and other styles

39

Yin, Li, and Yijun Chen. "An Intrusion Detection Model Based on Extra Trees Algorithm with Dimensionality Reduction and Oversampling." Journal of Computing and Information Technology 31, no. 4 (2024): 219–31. http://dx.doi.org/10.20532/cit.2023.1005771.

Full text

Abstract:

With the advancement of the university information process, more and more application systems are running on the campus network, and the information system becomes larger and more complex. With the rapid growth of network users and the popularization and deepening of computer knowledge, the campus network has been transformed from an experimental network for education and scientific research into an operational network that attaches equal importance to education, scientific research, and service. As the most important transmission carrier of digital information, how to ensure its security has

APA, Harvard, Vancouver, ISO, and other styles

40

Akouhar, Mohamed, Abdallah Abarda, Mohamed El Fatini, and Mohamed Ouhssini. "Enhancing credit card fraud detection: the impact of oversampling rates and ensemble methods with diverse feature selection." Radioelectronic and Computer Systems 2025, no. 1 (2025): 85–101. https://doi.org/10.32620/reks.2025.1.06.

Full text

Abstract:

The subject matter of this article is enhancing credit card fraud detection systems by exploring the impact of oversampling rates and ensemble methods with diverse feature selection techniques. Credit card fraud has become a major issue in the financial world, leading to substantial losses for both financial institutions and consumers. As the volume of credit card transactions continues to grow, accurately detecting fraudulent behavior has become increasingly challenging. The goal of this study is to enhance credit card fraud detection by analyzing oversampling rates to select the optimal one

APA, Harvard, Vancouver, ISO, and other styles

41

Petinrin, Olutomilayo Olayemi, Faisal Saeed, Naomie Salim, Muhammad Toseef, Zhe Liu, and Ibukun Omotayo Muyide. "Dimension Reduction and Classifier-Based Feature Selection for Oversampled Gene Expression Data and Cancer Classification." Processes 11, no. 7 (2023): 1940. http://dx.doi.org/10.3390/pr11071940.

Full text

Abstract:

Gene expression data are usually known for having a large number of features. Usually, some of these features are irrelevant and redundant. However, in some cases, all features, despite being numerous, show high importance and contribute to the data analysis. In a similar fashion, gene expression data sometimes have limited instances with a high rate of imbalance among the classes. This can limit the exposure of a classification model to instances of different categories, thereby influencing the performance of the model. In this study, we proposed a cancer detection approach that utilized data

APA, Harvard, Vancouver, ISO, and other styles

42

Khleel, Nasraldeen Alnor Adam, and Károly Nehéz. "Deep convolutional neural network model for bad code smells detection based on oversampling method." Indonesian Journal of Electrical Engineering and Computer Science 26, no. 3 (2022): 1725. http://dx.doi.org/10.11591/ijeecs.v26.i3.pp1725-1735.

Full text

Abstract:

Code <span>smells refers to any symptoms or anomalies in the source code that shows violation of design principles or implementation. Early detection of bad code smells improves software quality. Nowadays several artificial neural network (ANN) models have been used for different topics in software engineering: software defect prediction, software vulnerability detection, and code clone detection. It is not necessary to know the source of the data when using ANN models but require large training sets. Data imbalance is the main challenge of artificial intelligence techniques in detecting

APA, Harvard, Vancouver, ISO, and other styles

43

Zhou, Rongsheng, Weihao Yin, Wenjin Li, et al. "Prediction Model for Infectious Disease Health Literacy Based on Synthetic Minority Oversampling Technique Algorithm." Computational and Mathematical Methods in Medicine 2022 (March 25, 2022): 1–6. http://dx.doi.org/10.1155/2022/8498159.

Full text

Abstract:

Objective. Improving health literacy in infectious diseases is a direct manifestation of the solid advance in disease control and prevention. Our study is aimed at exploring applying synthetic minority oversampling technique (SMOTE) in the prediction assessment of whether residents and business employees have infectious disease health literacy. Methods. The Chinese resident infectious disease health literacy evaluation scale was used to investigate the associated variables. The screened variables were input variables and the presence or absence of infectious diseases health literacy as outcome

APA, Harvard, Vancouver, ISO, and other styles

44

Khleel, Nasraldeen Alnor Adam, and Károly Nehéz. "Deep convolutional neural network model for bad code smells detection based on oversampling method." Indonesian Journal of Electrical Engineering and Computer Science 26, no. 3 (2022): 1725–35. https://doi.org/10.11591/ijeecs.v26.i3.pp1725-1735.

Full text

Abstract:

Code smells refers to any symptoms or anomalies in the source code that shows violation of design principles or implementation. Early detection of bad code smells improves software quality. Nowadays several artificial neural network (ANN) models have been used for different topics in software engineering: software defect prediction, software vulnerability detection, and code clone detection. It is not necessary to know the source of the data when using ANN models but require large training sets. Data imbalance is the main challenge of artificial intelligence techniques in detecting the code sm

APA, Harvard, Vancouver, ISO, and other styles

45

El-Amir, Shrouk, and Ibrahim El-Henawy. "An Improved Model Using Oversampling Technique and Cost-Sensitive Learning for Imbalanced Data Problem." Information Sciences with Applications 2 (March 16, 2024): 33–50. http://dx.doi.org/10.61356/j.iswa.2024.213073.

Full text

Abstract:

In today's world, classification learning is a vital task because of the advancement in technology. However, during the classification process, we found the classifiers (the traditional classification techniques) couldn't handle the imbalanced data, which means the instances (majority instances) that belong to one class are many more than the instances (minority instances) that belong to another class. The use of oversampling approaches and cost-sensitive strategies are two popular approaches for addressing the imbalanced class snag. However, the best outcomes are achieved by combining the two

APA, Harvard, Vancouver, ISO, and other styles

46

Su, Qianqian. "Smart Teaching Design Mode based on Machine Learning and its Effect Evaluation." Mathematical Problems in Engineering 2022 (July 30, 2022): 1–8. http://dx.doi.org/10.1155/2022/9019339.

Full text

Abstract:

With the continuous progress of science and technology, the mode of smart teaching is more and more applied in the actual teaching process. Under the intelligent teaching system mode, teachers and students can jointly complete the construction of curriculum resources and classroom teaching tasks. This paper evaluates its application effect by introducing two examples of smart teaching. Among them, the evaluation effect of the academic teaching network system shows that the teaching network system based on machine learning is composed of three parts: model selection, data preparation, and model

APA, Harvard, Vancouver, ISO, and other styles

47

Park, Jihwan, Mi Jung Rho, Hyong Woo Moon, et al. "Dr. Answer AI for Prostate Cancer: Predicting Biochemical Recurrence Following Radical Prostatectomy." Technology in Cancer Research & Treatment 20 (January 1, 2021): 153303382110246. http://dx.doi.org/10.1177/15330338211024660.

Full text

Abstract:

Objectives: To develop a model to predict biochemical recurrence (BCR) after radical prostatectomy (RP), using artificial intelligence (AI) techniques. Patients and Methods: This study collected data from 7,128 patients with prostate cancer (PCa) who received RP at 3 tertiary hospitals. After preprocessing, we used the data of 6,755 cases to generate the BCR prediction model. There were 16 input variables with BCR as the outcome variable. We used a random forest to develop the model. Several sampling techniques were used to address class imbalances. Results: We achieved good performance using

APA, Harvard, Vancouver, ISO, and other styles

48

Chen, Yiheng, Jinbai Zou, Lihai Liu, and Chuanbo Hu. "Improved Oversampling Algorithm for Imbalanced Data Based on K-Nearest Neighbor and Interpolation Process Optimization." Symmetry 16, no. 3 (2024): 273. http://dx.doi.org/10.3390/sym16030273.

Full text

Abstract:

The problems of imbalanced datasets are generally considered asymmetric issues. In asymmetric problems, artificial intelligence models may exhibit different biases or preferences when dealing with different classes. In the process of addressing class imbalance learning problems, the classification model will pay too much attention to the majority class samples and cannot guarantee the classification performance of the minority class samples, which might be more valuable. By synthesizing the minority class samples and changing the data distribution, unbalanced datasets can be optimized. Traditi

APA, Harvard, Vancouver, ISO, and other styles

49

Oture, Osasere, Muhammad Zahid Iqbal, and Xining (Ning) Wang. "Enhanced Diagnosis of Thyroid Diseases Through Advanced Machine Learning Methodologies." Sci 7, no. 2 (2025): 66. https://doi.org/10.3390/sci7020066.

Full text

Abstract:

Thyroid disease is a health concern related to the thyroid gland, which is vital for controlling the metabolism of the human body. Predominantly affecting women in their fourth or fifth decades of life, thyroid disease can result in physical and mental issues. This research focuses on improving the diagnostic process by creating a classification model that utilises various machine learning models and a deeplearning model to categorise three types of thyroid disease conditions. This research developed an automated system capable of classifying three thyroid conditions using five machine learnin

APA, Harvard, Vancouver, ISO, and other styles

50

Wang, Yiying. "Fraud detection based on FS-SMOTE model for credit card." Highlights in Science, Engineering and Technology 70 (November 15, 2023): 316–23. http://dx.doi.org/10.54097/hset.v70i.12479.

Full text

Abstract:

In the financial security technology, credit card fraud detection technology is an important technical means, which collects and analyzes the credit card transaction data in a certain period of time, detects fraud in many credit card transactions, and takes the corresponding alarm response. At the same time, in view of the extremely imbalanced characteristics of credit card fraud customer data set, the number of minority samples is increased by Smote, which is a representative algorithm of oversampling technology. Logistic regression, KNN, Decision tree, Bagging and Stochastic gradient descent

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!