Log in

Relevant bibliographies by topics / KNN imputation / Journal articles

To see the other types of publications on this topic, follow the link: KNN imputation.

Journal articles on the topic 'KNN imputation'

Author: Grafiati

Published: 4 June 2025

Last updated: 1 August 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'KNN imputation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Gautam, Ramu, and Shahram Latifi. "COMPARISON OF SIMPLE MISSING DATA IMPUTATION TECHNIQUES FOR NUMERICAL AND CATEGORICAL DATASETS." Journal of Research in Engineering and Applied Sciences 8, no. 1 (2023): 468–75. http://dx.doi.org/10.46565/jreas.202381468-475.

Full text

Abstract:

Almost every dataset has missing data. The common reasons are sensor error, equipment malfunction, human error, or translation loss. We study the efficacy of statistical (mean, median, mode) and machine learning based (k-nearest neighbors) imputation methods in accurately imputing missing data in numerical datasets with data missing not at random (MNAR) and data missing completely at random (MCAR) as well as categorical datasets. Imputed datasets are used to make prediction on the test set and Mean squared error (MSE) in prediction is used as the measure of performance of the imputation. Mean

APA, Harvard, Vancouver, ISO, and other styles

2

Mamat, Naeimah, and Siti Fatin Mohd Razali. "Comparisons of Various Imputation Methods for Incomplete Water Quality Data: A Case Study of The Langat River, Malaysia." Jurnal Kejuruteraan 35, no. 1 (2023): 191–201. http://dx.doi.org/10.17576/jkukm-2023-35(1)-18.

Full text

Abstract:

In this study, the ability of numerous statistical and machine learning models to impute water quality data was investigated at three monitoring stations along the Langat River in Malaysia. Inconsistencies in the percentage of missing data between monitoring stations (varying from 20 percent (moderate) to over 50 percent (high)) represent the greatest obstacle of the study. The main objective was to select the best method for imputation and compare whether there are differences between the methods used by the different stations. The paper focuses on different imputation methods such as Multipl

APA, Harvard, Vancouver, ISO, and other styles

3

Abidin, Nadzurah Zainal, and Amelia Ritahani Ismail. "An improved K-Nearest neighbour with grasshopper optimization algorithm for imputation of missing data." International Journal of Advances in Intelligent Informatics 7, no. 3 (2021): 304. http://dx.doi.org/10.26555/ijain.v7i3.696.

Full text

Abstract:

K-nearest neighbors (KNN) has been extensively used as imputation algorithm to substitute missing data with plausible values. One of the successes of KNN imputation is the ability to measure the missing data simulated from its nearest neighbors robustly. However, despite the favorable points, KNN still imposes undesirable circumstances. KNN suffers from high time complexity, choosing the right k, and different functions. Thus, this paper proposes a novel method for imputation of missing data, named KNNGOA, which optimized the KNN imputation technique based on the grasshopper optimization algor

APA, Harvard, Vancouver, ISO, and other styles

4

Qin, Yongsong, Shichao Zhang, and Chengqi Zhang. "Combining kNN Imputation and Bootstrap Calibrated." International Journal of Data Warehousing and Mining 6, no. 4 (2010): 61–73. http://dx.doi.org/10.4018/jdwm.2010100104.

Full text

Abstract:

The k-nearest neighbor (kNN) imputation, as one of the most important research topics in incomplete data discovery, has been developed with great successes on industrial data. However, it is difficult to obtain a mathematical valid and simple procedure to construct confidence intervals for evaluating the imputed data. This paper studies a new estimation for missing (or incomplete) data that is a combination of the kNN imputation and bootstrap calibrated EL (Empirical Likelihood). The combination not only releases the burden of seeking a mathematical valid asymptotic theory for the kNN imputati

APA, Harvard, Vancouver, ISO, and other styles

5

Murad Ali, Khan. "Enhancing Material Property Predictions through Optimized KNN Imputation and Deep Neural Network Modeling." IgMin Research 2, no. 6 (2024): 425–31. http://dx.doi.org/10.61927/igmin197.

Full text

Abstract:

In materials science, the integrity and completeness of datasets are critical for robust predictive modeling. Unfortunately, material datasets frequently contain missing values due to factors such as measurement errors, data non-availability, or experimental limitations, which can significantly undermine the accuracy of property predictions. To tackle this challenge, we introduce an optimized K-Nearest Neighbors (KNN) imputation method, augmented with Deep Neural Network (DNN) modeling, to enhance the accuracy of predicting material properties. Our study compares the performance of our Enhance

APA, Harvard, Vancouver, ISO, and other styles

6

Hina, Ayub, and Jamil Harun. "Enhancing Missing Values Imputation through Transformer-Based Predictive Modeling." IgMin Research 2, no. 1 (2024): 025–31. http://dx.doi.org/10.61927/igmin140.

Full text

Abstract:

This paper tackles the vital issue of missing value imputation in data preprocessing, where traditional techniques like zero, mean, and KNN imputation fall short in capturing intricate data relationships. This often results in suboptimal outcomes, and discarding records with missing values leads to significant information loss. Our innovative approach leverages advanced transformer models renowned for handling sequential data. The proposed predictive framework trains a transformer model to predict missing values, yielding a marked improvement in imputation accuracy. Comparative analysis agains

APA, Harvard, Vancouver, ISO, and other styles

7

Manyol, Moïse, Samuel Eke, Alphonse J. M. Massoma, Alain Biboum, and Ruben Mouangue. "Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis." International Transactions on Electrical Energy Systems 2022 (October 3, 2022): 1–10. http://dx.doi.org/10.1155/2022/8546588.

Full text

Abstract:

The accuracy of a knowledge extraction algorithm in a large database depends on the quality of the data preprocessing and the methods used. The massive amounts of data that we collect every day are putting storage capacity at a premium. In reality, many databases are characterized by attributes with outliers, redundant, and even more missing values. Missing data and outliers are ubiquitous in our databases, and imputation techniques will help us mitigate their influence. To solve this problem, as well as the problem of data size, this paper proposes a data preprocessing approach based on the k

APA, Harvard, Vancouver, ISO, and other styles

8

Kim, Sung Won, and Young Il Kim. "A Data Imputation Approach for Missing Power Consumption Measurements in Water-Cooled Centrifugal Chillers." Energies 18, no. 11 (2025): 2779. https://doi.org/10.3390/en18112779.

Full text

Abstract:

In the process of collecting operational data for the performance analysis of water-cooled centrifugal chillers, missing values are inevitable due to various factors such as sensor errors, data transmission failures, and failure of the measurement system. When a substantial amount of missing data is present, the reliability of data analysis decreases, leading to potential distortions in the results. To address this issue, it is necessary to either minimize missing occurrences by utilizing high-precision measurement equipment or apply reliable imputation techniques to compensate for missing val

APA, Harvard, Vancouver, ISO, and other styles

9

Syauqi, Rofiq Muhammad, Puspita Nurul Sabrina, and Irma Santikarama. "K-Means Clustering with KNN and Mean Imputation on CPU Benchmark Compilation Data." Journal of Applied Informatics and Computing 7, no. 2 (2023): 231–39. http://dx.doi.org/10.30871/jaic.v7i2.6491.

Full text

Abstract:

In the rapidly evolving digital age, data is becoming a valuable source for decision-making and analysis. Clustering, as an important technique in data analysis, has a key role in organizing and understanding complex datasets. One of the effective clustering algorithms is k-means. However, this algorithm is prone to the problem of missing values, which can significantly affect the quality of the resulting clusters. To overcome this challenge, imputation methods are used, including mean imputation and K-Nearest Neighbor (KNN) imputation. This study aims to analyze the impact of imputation metho

APA, Harvard, Vancouver, ISO, and other styles

10

Du, Wenyou, Yichen Wang, Guanglei Meng, and Yuming Guo. "Privacy-Preserving Vertical Federated KNN Feature Imputation Method." Electronics 13, no. 2 (2024): 381. http://dx.doi.org/10.3390/electronics13020381.

Full text

Abstract:

Federated learning stands as a pivotal component in the construction of data infrastructure. It significantly fortifies the safety and reliability of data circulation links, facilitating credible sharing and openness among diverse subjects. The presence of missing data poses a pervasive and challenging issue in the implementation of federated learning. Current research on imputation missing values predominantly concentrates on centralized methods and horizontal federation scenarios. However, there is a notable absence of exploration in the context of vertical federated application scenarios. I

APA, Harvard, Vancouver, ISO, and other styles

11

Alrawajfi, Ala, Mohd Tahir Ismail, Sadam Al Wadi, Saleh Atiewi, and Ahmad Awajan. "Multiple imputation methods: a case study of daily gold price." PeerJ Computer Science 10 (September 25, 2024): e2337. http://dx.doi.org/10.7717/peerj-cs.2337.

Full text

Abstract:

Data imputation strategies are necessary to address the prevalent difficulty of missing values in data observation and recording operations. This work utilizes diverse imputation methods to forecast and complete absent values inside a financial time-series dataset, specifically the daily prices of gold. The predictive accuracy of imputed data is assessed in comparison to the original entire dataset to ensure its robustness. The imputation methods are validated using actual closing price data obtained from a daily gold price website. The examined approaches include mean imputation, k-nearest ne

APA, Harvard, Vancouver, ISO, and other styles

12

Kipkogei, Merary, Arori Wilfred Omwansa, and Otieno Joyce Akinyi. "On Student’s-t ARMA Modelling of Missing Values." Asian Journal of Probability and Statistics 26, no. 12 (2024): 265–86. https://doi.org/10.9734/ajpas/2024/v26i12697.

Full text

Abstract:

In this paper, the study intends to mitigate the missing problem in the context of univariate ARMA time series models. Our main objective of this paper was to derive imputation estimators for ARMA models under the student’s-t distribution assumptions and evaluate their imputation performance. The study also utilized the method of optimal interpolation criterion of missing values to build the novel imputation estimators for ARMA models. A data set of 1000 samples were generated using statistical R software. One hundred (100) points of missing values ware created within the generated sample data

APA, Harvard, Vancouver, ISO, and other styles

13

Parr, Christine L., Anette Hjartåker, Ida Scheel, Eiliv Lund, Petter Laake, and Marit B. Veierød. "Comparing methods for handling missing values in food-frequency questionnaires and proposing k nearest neighbours imputation: effects on dietary intake in the Norwegian Women and Cancer study (NOWAC)." Public Health Nutrition 11, no. 4 (2008): 361–70. http://dx.doi.org/10.1017/s1368980007000365.

Full text

Abstract:

AbstractObjectiveTo investigate item non-response in a postal food-frequency questionnaire (FFQ), and to assess the effect of substituting/imputing missing values on dietary intake levels in the Norwegian Women and Cancer study (NOWAC). We have adapted and probably for the first time applied k nearest neighbours (KNN) imputation to FFQ data.DesignData from a recent reproducibility study were used. The FFQ was mailed twice (test–retest) about 3 months apart to the same subjects. Missing responses in the test FFQ were imputed using the null value (frequencies = null, amount = smallest), the samp

APA, Harvard, Vancouver, ISO, and other styles

14

Bai, B. Mathura, Mangathayaru N., and Padmaja Rani B. "Modified K-Nearest Neighbour Using Proposed Similarity Fuzzy Measure for Missing Data Imputation on Medical Datasets (MKNNMBI)." International Journal of Fuzzy System Applications 11, no. 3 (2022): 1–15. http://dx.doi.org/10.4018/ijfsa.306278.

Full text

Abstract:

Early disease diagnosis is a burning problem in health sector, medical domain and disease management. During analysis, quality of the data can be achieved only if the data is complete. Missing values reduces the efficiency of data analysis task. Researchers proposed various imputation methods but always there was a need for a better imputation method. This paper objective is to propose a method for imputation using proposed similarity fuzzy measure through which we can impute missing values by finding k similar instances called as Modified k-Nearest Neighbour for imputation of missing data (MK

APA, Harvard, Vancouver, ISO, and other styles

15

Zhang, Shichao. "Nearest neighbor selection for iteratively kNN imputation." Journal of Systems and Software 85, no. 11 (2012): 2541–52. http://dx.doi.org/10.1016/j.jss.2012.05.073.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Nguyen, Trung, Simon Jones, Mariela Soto-Berelov, Andrew Haywood, and Samuel Hislop. "A Comparison of Imputation Approaches for Estimating Forest Biomass Using Landsat Time-Series and Inventory Data." Remote Sensing 10, no. 11 (2018): 1825. http://dx.doi.org/10.3390/rs10111825.

Full text

Abstract:

The prediction of forest biomass at the landscape scale can be achieved by integrating data from field plots with satellite imagery, in particular data from the Landsat archive, using k-nearest neighbour (kNN) imputation models. While studies have demonstrated different kNN imputation approaches for estimating forest biomass from remote sensing data and forest inventory plots, there is no general agreement on which approach is most appropriate for biomass estimation across large areas. In this study, we compared several imputation approaches for estimating forest biomass using Landsat time-ser

APA, Harvard, Vancouver, ISO, and other styles

17

Ünal, Fatma, and Hakan Koğar. "An investigation into the effect of different missing data imputation methods on IRT-based differential item functioning." International Journal of Assessment Tools in Education 11, no. 3 (2024): 445–62. http://dx.doi.org/10.21449/ijate.1417166.

Full text

Abstract:

The purpose of this study is to examine the effect of missing data imputation methods, namely regression imputation (RI), multiple imputation (MI) and k-nearest neighbor (kNN) on differential item functioning (DIF). In this regard, the datasets used in the research were created by deleting some of the data via the missing completely at random mechanism from the complete datasets obtained from 600 students in Türkiye, the United Kingdom, the USA, New Zealand and Australia, who answered booklets 14 and 15 from the PISA 2018 science literacy test. Data imputation was applied to the datasets throu

APA, Harvard, Vancouver, ISO, and other styles

18

Rahman, Caecilia A., and Abdul Kudus. "Penggunaan Metode K Nearest Neighborhood untuk Imputasi Data Tersensor Kanan pada Pasien Kanker Paru-Paru Sel Kecil." Bandung Conference Series: Statistics 2, no. 2 (2022): 441–48. http://dx.doi.org/10.29313/bcss.v2i2.4615.

Full text

Abstract:

Abstract. In a study, it is usually necessary to have complete data for the accuracy of parameter estimation, but in survival analysis incomplete data is often found called censored data, this can happen due to limited research time and others. To complete the censored data, imputation is needed, one of method to imputating the censored data is K-Nearest Neighborhood (KNN) method. KNN imputation is designed to find K nearest neighbors from censored data to all complete data and then fill in the censored data with events that are most similar to its neighbors. If the target variable (or attribu

APA, Harvard, Vancouver, ISO, and other styles

19

Poyatos, Rafael, Oliver Sus, Llorenç Badiella, Maurizio Mencuccini, and Jordi Martínez-Vilalta. "Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information." Biogeosciences 15, no. 9 (2018): 2601–17. http://dx.doi.org/10.5194/bg-15-2601-2018.

Full text

Abstract:

Abstract. The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass p

APA, Harvard, Vancouver, ISO, and other styles

20

Sasu, Gabriel-Vasilică, Bogdan-Iulian Ciubotaru, Nicolae Goga, and Andrei Vasilățeanu. "Addressing Missing Data Challenges in Geriatric Health Monitoring: A Study of Statistical and Machine Learning Imputation Methods." Sensors 25, no. 3 (2025): 614. https://doi.org/10.3390/s25030614.

Full text

Abstract:

In geriatric healthcare, missing data pose significant challenges, especially in systems used for frailty monitoring in elderly individuals. This study explores advanced imputation techniques used to enhance data quality and maintain model performance in a system designed to detect frailty insights. We introduce missing data mechanisms—Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR)—into a dataset collected from smart bracelets, simulating real-world conditions. Imputation methods, including Expectation–Maximization (EM), matrix completion, Bayesi

APA, Harvard, Vancouver, ISO, and other styles

21

Nida, Hafiza. "Comparison of missing data imputation methods using weather data." Pakistan Journal of Agricultural Sciences 60, no. 02 (2023): 327–36. http://dx.doi.org/10.21162/pakjas/23.228.

Full text

Abstract:

Researchers and data analysts commonly experience with missing data in their field of studies. It is necessary to handle missing data properly to obtain better and more reliable outcomes of any research. The objective of this research is to evaluate different imputation techniques for handling missing observations in weather data. For this purpose weather data of daily rainfall maximum temperature (Tmax) and minimum temperature (Tmin) of 23 stations of Pakistan were taken from Pakistan Metrological department from 1981 to 2020. There are total 14610 observation of each variable and each variab

APA, Harvard, Vancouver, ISO, and other styles

22

Khan, Murad Ali. "A Comparative Study on Imputation Techniques: Introducing a Transformer Model for Robust and Efficient Handling of Missing EEG Amplitude Data." Bioengineering 11, no. 8 (2024): 740. http://dx.doi.org/10.3390/bioengineering11080740.

Full text

Abstract:

In clinical datasets, missing data often occur due to various reasons including non-response, data corruption, and errors in data collection or processing. Such missing values can lead to biased statistical analyses, reduced statistical power, and potentially misleading findings, making effective imputation critical. Traditional imputation methods, such as Zero Imputation, Mean Imputation, and k-Nearest Neighbors (KNN) Imputation, attempt to address these gaps. However, these methods often fall short of accurately capturing the underlying data complexity, leading to oversimplified assumptions

APA, Harvard, Vancouver, ISO, and other styles

23

Lestari, Sri, Yulmaini Yulmaini, Aswin Aswin, Singgih Yulizar Ma'ruf, Sulyono Sulyono, and Ruki Rizal Nul Fikri. "Alleviating cold start and sparsity problems in the micro, small, and medium enterprises marketplace using clustering and imputation techniques." International Journal of Electrical and Computer Engineering (IJECE) 14, no. 3 (2024): 3220. http://dx.doi.org/10.11591/ijece.v14i3.pp3220-3229.

Full text

Abstract:

Recommendation systems are often implemented in e-commerce and micro, small, and medium enterprises (MSMEs) marketplaces to improve consumer services by providing product recommendations according to their interests. However, it still faces problems, namely sparsity and cold start, thus affecting the quality of recommendations. This research proposes clustering and imputation techniques to overcome this problem. The clustering technique used is k-means, while the missing value imputation method uses average values. The imputation results are then implemented in the k-nearest neighbor (KNN) and

APA, Harvard, Vancouver, ISO, and other styles

24

Ahmed Malik, Eiyaz, and Rajendra Gupta. "Effectiveness of Correlation Assisted SVM-based Imputation Method for Missing Data Prediction." International Journal of Advanced Networking and Applications 17, no. 01 (2025): 6772–77. https://doi.org/10.35444/ijana.2025.17108.

Full text

Abstract:

Handling missing data is a critical challenge in data processing, as it can significantly impact the performance of machine learning models. This paper proposes a Correlation Assisted Support Vector Machine (CA-SVM) based imputation method that integrates correlation analysis with the predictive power of Support Vector Machines to enhance missing data prediction accuracy. The CA-SVM approach identifies highly correlated attributes to guide the SVM in estimating missing values more effectively, preserving the underlying structure of the dataset. A comparative study is conducted between the prop

APA, Harvard, Vancouver, ISO, and other styles

25

Xu, Jingjing, Yuanshan Wang, Xiangnan Xu, Kian-Kai Cheng, Daniel Raftery, and Jiyang Dong. "NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data." Molecules 26, no. 19 (2021): 5787. http://dx.doi.org/10.3390/molecules26195787.

Full text

Abstract:

In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputation remains a challenging problem. Here, we propose a non-negative matrix factorization (NMF)-based method for NA imputation in MS-based metabolomics data, which makes use of both global and local information of the data. The proposed method was compared with three commonly used methods: k-nearest n

APA, Harvard, Vancouver, ISO, and other styles

26

K. Rizwana Parveen. "Enhanced Credit Scoring Prediction Using KNN-Z-Score Based Logistic Regression (KZ-LR) Algorithm." Journal of Electrical Systems 20, no. 3 (2024): 7230–37. https://doi.org/10.52783/jes.7419.

Full text

Abstract:

Credit scoring is a critical tool in the financial sector, enabling lenders to assess borrower risk and make informed decisions. Building an accurate credit scoring model requires extensive data preprocessing to address challenges such as missing values, feature scaling, and data normalization. This study utilizes the [8]dataset to develop a credit scoring model using logistic regression. The preprocessing phase incorporates advanced techniques like KNN imputation, Z-score standardization, and min-max normalization to ensure data integrity and uniformity. Comparative analysis of imputation met

APA, Harvard, Vancouver, ISO, and other styles

27

Lestari, Sri, Aswin Aswin, Ma'ruf Singgih Yulizar, Sulyono Sulyono, and Nul Fikri Ruki Rizal. "Alleviating cold start and sparsity problems in the micro, small, and medium enterprises marketplace using clustering and imputation techniques." Alleviating cold start and sparsity problems in the micro, small, and medium enterprises marketplace using clustering and imputation techniques 14, no. 3 (2024): 3220–29. https://doi.org/10.11591/ijece.v14i3.pp3220-3229.

Full text

Abstract:

Recommendation systems are often implemented in e-commerce and micro, small, and medium enterprises (MSMEs) marketplaces to improve consumer services by providing product recommendations according to their interests. However, it still faces problems, namely sparsity and cold start, thus affecting the quality of recommendations. This research proposes clustering and imputation techniques to overcome this problem. The clustering technique used is k-means, while the missing value imputation method uses average values. The imputation results are then implemented

APA, Harvard, Vancouver, ISO, and other styles

28

Ryu, Jewan, Seung Yeon Lee, Choong Sung Yi, and Sung Hoon Kim. "A Study on the Comparison of Deep Learning-Based Imputations for Green Algae and Water Quality Data." Crisis and Emergency Management: Theory and Praxis 21, no. 2 (2025): 89–98. https://doi.org/10.14251/crisisonomy.2025.21.2.89.

Full text

Abstract:

This study examined various imputation techniques and assessed their performance in handling missing data related to green algae and water quality. Using data from the Daecheong Dam area, a total of 83 weekly datasets from April 2004 to December 2023 were collected and analyzed, including key determinants of green algae bloom: Cyanobacteria cell count, chlorophyll-a concentration, water temperature, and total phosphorus. Artificially induced missing values were implemented for periods of 2, 4, and 8 weeks in each key variable, and missing data were imputed using linear interpolation, kNN, BRIT

APA, Harvard, Vancouver, ISO, and other styles

29

Widianti, Anisa, and Irfan Pratama. "PENANGANAN MISSING VALUES DAN PREDIKSI DATA TIMBUNAN SAMPAH BERBASIS MACHINE LEARNING." Rabit : Jurnal Teknologi dan Sistem Informasi Univrab 9, no. 2 (2024): 242–51. http://dx.doi.org/10.36341/rabit.v9i2.4789.

Full text

Abstract:

Permasalahan peningkatan jumlah sampah seiring dengan bertambahnya jumlah penduduk dan aktivitas manusia menjadi tantangan serius dalam pengelolaan sampah di Jawa Tengah. Salah satu hambatan utama dalam penelitian prediksi sampah adalah banyaknya data yang kosong atau missing value. Hal itu dapat mengurangi akurasi model prediksi. Dalam penelitian ini menggunakan tiga metode dalam pengisian nilai missing value. Metode tersebut adalah Mean Imputation, Interpolation dan KNN Imputer. Setelah data terisi semua dengan penanganan missing value diatas, selanjutnya menghitung nilai prediksi. Penelitia

APA, Harvard, Vancouver, ISO, and other styles

30

Mazdadi, Muhammad Itqan, Triando Hamonangan Saragih, Irwan Budiman, Andi Farmadi, and Ahmad Tajali. "The Effectiveness of Data Imputations on Myocardial Infarction Complication Classification Using Machine Learning Approach with Hyperparameter Tuning." Jurnal Ilmiah Teknik Elektro Komputer dan Informatika 10, no. 3 (2024): 520–33. https://doi.org/10.26555/jiteki.v10i3.29479.

Full text

Abstract:

Complications from Myocardial Infarction (MI) represent a critical medical emergency caused by the blockage of blood flow to the heart muscle, primarily due to a blood clot in a coronary artery narrowed by atherosclerotic plaque. Diagnosing MI involves physical examination, electrocardiogram (ECG) evaluation, blood sample analysis for specific heart enzyme levels, and imaging techniques such as coronary angiography. Proactively predicting acute myocardial complications can mitigate adverse outcomes, and this study focuses on early prediction using classification methods. Machine learning algor

APA, Harvard, Vancouver, ISO, and other styles

31

Huang, Min-Wei, Chih-Fong Tsai, Shu-Ching Tsui, and Wei-Chao Lin. "Combining data discretization and missing value imputation for incomplete medical datasets." PLOS ONE 18, no. 11 (2023): e0295032. http://dx.doi.org/10.1371/journal.pone.0295032.

Full text

Abstract:

Data discretization aims to transform a set of continuous features into discrete features, thus simplifying the representation of information and making it easier to understand, use, and explain. In practice, users can take advantage of the discretization process to improve knowledge discovery and data analysis on medical domain problem datasets containing continuous features. However, certain feature values were frequently missing. Many data-mining algorithms cannot handle incomplete datasets. In this study, we considered the use of both discretization and missing-value imputation to process

APA, Harvard, Vancouver, ISO, and other styles

32

Goh, Guo Dong, Xi Huang, Sheng Huang, Jia Li Janessa Thong, Jia Jun Seah, and Wai Yee Yeong. "Data imputation strategies for process optimization of laser powder bed fusion of Ti6Al4V using machine learning." Materials Science in Additive Manufacturing 2, no. 1 (2023): 50. http://dx.doi.org/10.36922/msam.50.

Full text

Abstract:

A database linking process parameters and material properties for additive manufacturing enables the performance of the material to be determined based on the process parameters, which are useful in the design and fabrication stage of a product. The data, however, are often incomplete as each individual research work focused on certain process parameters and material properties due to the wide range of variables available. Imputation of missing data is thus required to complete the material library. In this work, we attempt to collate the data of Ti6Al4V, a popular alloy used in aerospace and

APA, Harvard, Vancouver, ISO, and other styles

33

Cooper, Nathaniel, Maria Giovanna Dainotti, Aditya Narendra, Ioannis Liodakis, and Malgorzata Bogdan. "Fermi LAT AGN classification using supervised machine learning." Monthly Notices of the Royal Astronomical Society 525, no. 2 (2023): 1731–45. http://dx.doi.org/10.1093/mnras/stad2193.

Full text

Abstract:

ABSTRACT Classifying active galactic nuclei (AGNs) is a challenge, especially for BL Lacertae objects (BLLs), which are identified by their weak emission line spectra. To address the problem of classification, we use data from the fourth Fermi Catalog, Data Release 3. Missing data hinder the use of machine learning to classify AGNs. A previous paper found that Multivariate Imputation by Chain Equations (MICE) imputation is useful for estimating missing values. Since many AGNs have missing redshift and the highest energy, we use data imputation with MICE and k-nearest neighbours (kNN) algorithm

APA, Harvard, Vancouver, ISO, and other styles

34

Sanju, Sanju, and Vinay Kumar. "Analysis of Incomplete Data Under Different Missingness Mechanism using Imputation Methods for Wheat Genotypes." Current Agriculture Research Journal 11, no. 3 (2024): 1050–56. http://dx.doi.org/10.12944/carj.11.3.33.

Full text

Abstract:

Missing values is a persistent problem in analysis of agriculture data. To improve the quality of the data in the agriculture study, imputation has drawn a lot of research interest. Non-missing data was removed with varying frequency from the genotypic data of the wheat crop by different missingness mechanism. Imputation methods namely last observation carried forward, mean, regression and KNN are applied to these data sets and compared their parameter with the parameter of original data. The performances of imputation methods are also evaluated by root mean square error for solving missing va

APA, Harvard, Vancouver, ISO, and other styles

35

Salem, Milad, Shayan Taheri, and Jiann-Shiun Yuan. "An Experimental Evaluation of Fault Diagnosis from Imbalanced and Incomplete Data for Smart Semiconductor Manufacturing." Big Data and Cognitive Computing 2, no. 4 (2018): 30. http://dx.doi.org/10.3390/bdcc2040030.

Full text

Abstract:

The SECOM dataset contains information about a semiconductor production line, entailing the products that failed the in-house test line and their attributes. This dataset, similar to most semiconductor manufacturing data, contains missing values, imbalanced classes, and noisy features. In this work, the challenges of this dataset are met and many different approaches for classification are evaluated to perform fault diagnosis. We present an experimental evaluation that examines 288 combinations of different approaches involving data pruning, data imputation, feature selection, and classificati

APA, Harvard, Vancouver, ISO, and other styles

36

Keerin, Phimmarin, and Tossapon Boongoen. "Improved KNN Imputation for Missing Values in Gene Expression Data." Computers, Materials & Continua 70, no. 2 (2022): 4009–25. http://dx.doi.org/10.32604/cmc.2022.020261.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Faquih, Tariq, Maarten van Smeden, Jiao Luo, et al. "A Workflow for Missing Values Imputation of Untargeted Metabolomics Data." Metabolites 10, no. 12 (2020): 486. http://dx.doi.org/10.3390/metabo10120486.

Full text

Abstract:

Metabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are frequently missing and, if neglected or sub-optimally imputed, can cause biased study results. We provided a publicly available, user-friendly R script to streamline the imputation of missing endogenous, unannotated, and xenobiotic metabolites. We evaluated the multivariate imputation by chained equations (MICE) and k-nearest neighbors (kNN) analyses implemented in our script by simulations using measu

APA, Harvard, Vancouver, ISO, and other styles

38

Fouad, Khaled M., Mahmoud M. Ismail, Ahmad Taher Azar, and Mona M. Arafa. "Advanced methods for missing values imputation based on similarity learning." PeerJ Computer Science 7 (July 21, 2021): e619. http://dx.doi.org/10.7717/peerj-cs.619.

Full text

Abstract:

The real-world data analysis and processing using data mining techniques often are facing observations that contain missing values. The main challenge of mining datasets is the existence of missing values. The missing values in a dataset should be imputed using the imputation method to improve the data mining methods’ accuracy and performance. There are existing techniques that use k-nearest neighbors algorithm for imputing the missing values but determining the appropriate k value can be a challenging task. There are other existing imputation techniques that are based on hard clustering algor

APA, Harvard, Vancouver, ISO, and other styles

39

Zhang, Zhengnan, Lin Cao, Christopher Mulverhill, Hao Liu, Yong Pang, and Zengyuan Li. "Prediction of Diameter Distributions with Multimodal Models Using LiDAR Data in Subtropical Planted Forests." Forests 10, no. 2 (2019): 125. http://dx.doi.org/10.3390/f10020125.

Full text

Abstract:

Tree diameter distributions are essential for the calculation of stem volume and biomass, as well as simulation of growth and yield and to understand timber assortments. Accurate and reliable prediction of tree diameter distributions is critical for optimizing forest structure compositions, scheduling silvicultural operations and promoting sustainable management. In this study, we investigated the potential of airborne Light Detection and Ranging (LiDAR) data for predicting tree diameter distributions using a bimodal finite mixture model (FMM) and a multimodal k-nearest neighbor (KNN) model (c

APA, Harvard, Vancouver, ISO, and other styles

40

Alsaber, A., A. Al-Herz, J. Pan, et al. "THU0556 MISSING DATA AND MULTIPLE IMPUTATION IN RHEUMATOID ARTHRITIS REGISTRIES USING SEQUENTIAL RANDOM FOREST METHOD." Annals of the Rheumatic Diseases 79, Suppl 1 (2020): 519.1–519. http://dx.doi.org/10.1136/annrheumdis-2020-eular.4838.

Full text

Abstract:

Background:Missing data in clinical epidemiological researches violate the intention to treat principle,reduce statistical power and can induce bias if they are related to patient’s response to treatment. In multiple imputation (MI), covariates are included in the imputation equation to predict the values of missing data.Objectives:To find the best approach to estimate and impute the missing values in Kuwait Registry for Rheumatic Diseases (KRRD) patients data.Methods:A number of methods were implemented for dealing with missing data. These includedMultivariate imputation by chained equations(

APA, Harvard, Vancouver, ISO, and other styles

41

Chandra, Winoto, Bambang Suprihatin, and Yulia Resti. "Median-KNN Regressor-SMOTE-Tomek Links for Handling Missing and Imbalanced Data in Air Quality Prediction." Symmetry 15, no. 4 (2023): 887. http://dx.doi.org/10.3390/sym15040887.

Full text

Abstract:

The Air Quality Index (AQI) dataset contains information on measurements of pollutants and ambient air quality conditions at certain location that can be used to predict air quality. Unfortunately, this dataset often has many missing observations and imbalanced classes. Both of these problems can affect the performance of the prediction model. In particular, predictions for the minority class are very important because inaccurate predictions can be fatal or cause big losses. Moreover, the missing data may lead to biased results. This paper proposes the single imputation of the median and the m

APA, Harvard, Vancouver, ISO, and other styles

42

Kim, Minkyung, Sangdon Park, Joohyung Lee, Yongjae Joo, and Jun Choi. "Learning-Based Adaptive Imputation Methodwith kNN Algorithm for Missing Power Data." Energies 10, no. 10 (2017): 1668. http://dx.doi.org/10.3390/en10101668.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Priya N, Hari, and Rajeswari S. "Covid-19 Prediction Using Enhanced KNN Imputation for Data Pre-Processing." International Research Journal of Multidisciplinary Scope 05, no. 01 (2024): 714–28. http://dx.doi.org/10.47857/irjms.2024.v05i01.0345.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Chan, Kai. "Abstract A041: A Hybrid Active Learning (AL) Random Forest Model with KNN Imputation to Predict Recurrence in Ductal Carcinoma In Situ (DCIS) of the Breast." Clinical Cancer Research 31, no. 13_Supplement (2025): A041. https://doi.org/10.1158/1557-3265.aimachine-a041.

Full text

Abstract:

Abstract Background: DCIS is a significant clinical burden that may benefit from machine learning. However, the data requirement of traditional Machine Learning (ML) is impractical. Herein, I explore the use of an emerging ML technique that can work with a small dataset. Methods: A dataset from a published DCIS study (Chan et al., 2001; DOI: 10.1002/1097-0142(2001010191:1&lt;9::aid-cncr2&gt;3.0.co;2-e)) was used with permission. Unlike the original analysis, patients with missing margin width data were retained. Little’s Missing Completely at Random (MCAR) test (P = 0.88) supported thi

APA, Harvard, Vancouver, ISO, and other styles

45

Kumar, Nishith, Md Aminul Hoque, Md Shahjaman, S. M. Shahinul Islam, and Md Nurul Haque Mollah. "A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis." Current Bioinformatics 14, no. 1 (2018): 43–52. http://dx.doi.org/10.2174/1574893612666171121154655.

Full text

Abstract:

Background: Metabolomics data generation and quantification are different from other types of molecular “omics” data in bioinformatics. Mass spectrometry (MS) based (gas chromatography mass spectrometry (GC-MS), liquid chromatography mass spectrometry (LC-MS), etc.) metabolomics data frequently contain missing values that make some quantitative analysis complex. Typically metabolomics datasets contain 10% to 20% missing values that originate from several reasons, like analytical, computational as well as biological hazard. Imputation of missing values is a very important and interesting issue

APA, Harvard, Vancouver, ISO, and other styles

46

Ismail, Amelia Ritahani, Nadzurah Zainal Abidin, and Mhd Khaled Maen. "Systematic Review on Missing Data Imputation Techniques with Machine Learning Algorithms for Healthcare." Journal of Robotics and Control (JRC) 3, no. 2 (2022): 143–52. http://dx.doi.org/10.18196/jrc.v3i2.13133.

Full text

Abstract:

Missing data is one of the most common issues encountered in data cleaning process especially when dealing with medical dataset. A real collected dataset is prone to be incomplete, inconsistent, noisy and redundant due to potential reasons such as human errors, instrumental failures, and adverse death. Therefore, to accurately deal with incomplete data, a sophisticated algorithm is proposed to impute those missing values. Many machine learning algorithms have been applied to impute missing data with plausible values. However, among all machine learning imputation algorithms, KNN algorithm has

APA, Harvard, Vancouver, ISO, and other styles

47

Huang, Shu-Fen, and Ching-Hsue Cheng. "A Safe-Region Imputation Method for Handling Medical Data with Missing Values." Symmetry 12, no. 11 (2020): 1792. http://dx.doi.org/10.3390/sym12111792.

Full text

Abstract:

Medical data usually have missing values; hence, imputation methods have become an important issue. In previous studies, many imputation methods based on variable data had a multivariate normal distribution, such as expectation-maximization and regression-based imputation. These assumptions may lead to deviations in the results, which sometimes create a bottleneck. In addition, directly deleting instances with missing values may have several problems, such as losing important data, producing invalid research samples, and leading to research deviations. Therefore, this study proposed a safe-reg

APA, Harvard, Vancouver, ISO, and other styles

48

Difa Fitria, Triando Hamonangan Saragih, Muliadi, Dwi Kartini, and Fatma Indriani. "A Classification of Appendicitis Disease in Children Using SVM with KNN Imputation and SMOTE Approach." Journal of Electronics, Electromedical Engineering, and Medical Informatics 6, no. 3 (2024): 302–11. https://doi.org/10.35882/jeeemi.v6i3.470.

Full text

Abstract:

This study evaluates the effect of SMOTE and KNN imputation techniques on the performance of SVM classification models on a nearly balanced dataset. The results show that using SMOTE increases model precision but decreases recall. This shows the importance of careful consideration when choosing data processing strategies to achieve optimal classification model performance. This study evaluates the effect of the Synthetic Minority Over-sampling Technique (SMOTE) and K-Nearest Neighbors (KNN) imputation on the performance of Support Vector Machine (SVM) classification models on nearly balanced d

APA, Harvard, Vancouver, ISO, and other styles

49

Ahmed, Syed Ejaz, Dursun Aydın, and Ersin Yılmaz. "Estimation of Right-censored SETAR-type Nonlinear Time-series Model." E3S Web of Conferences 409 (2023): 02010. http://dx.doi.org/10.1051/e3sconf/202340902010.

Full text

Abstract:

This paper focuses on estimating the Self-Exciting Threshold Autoregressive (SETAR) type time-series model under right-censored data. As is known, the SETAR model is used when the underlying function of the relation-ship between the time-series itself (Yt), and its p delays $$({Y_{t - j}})_{j = 1}^p$$ violates the lin-earity assumption and this function is formed by multiple behaviors that called regime. This paper addresses the right-censored dependent time-series problem which has a serious negative effect on the estimation performance. Right-censored time series cause biased coefficient est

APA, Harvard, Vancouver, ISO, and other styles

50

Cappelletti, Luca, Tommaso Fontana, Guido Walter Di Donato, Lorenzo Di Tucci, Elena Casiraghi, and Giorgio Valentini. "Complex Data Imputation by Auto-Encoders and Convolutional Neural Networks—A Case Study on Genome Gap-Filling." Computers 9, no. 2 (2020): 37. http://dx.doi.org/10.3390/computers9020037.

Full text

Abstract:

Missing data imputation has been a hot topic in the past decade, and many state-of-the-art works have been presented to propose novel, interesting solutions that have been applied in a variety of fields. In the past decade, the successful results achieved by deep learning techniques have opened the way to their application for solving difficult problems where human skill is not able to provide a reliable solution. Not surprisingly, some deep learners, mainly exploiting encoder-decoder architectures, have also been designed and applied to the task of missing data imputation. However, most of th

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!