To see the other types of publications on this topic, follow the link: Feature and model selection.

Journal articles on the topic 'Feature and model selection'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Feature and model selection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Huber, Florian, and Volker Steinhage. "Conditional Feature Selection: Evaluating Model Averaging When Selecting Features with Shapley Values." Geomatics 4, no. 3 (2024): 286–310. http://dx.doi.org/10.3390/geomatics4030016.

Full text
Abstract:
In the field of geomatics, artificial intelligence (AI) and especially machine learning (ML) are rapidly transforming the field of geomatics with respect to collecting, managing, and analyzing spatial data. Feature selection as a building block in ML is crucial because it directly impacts the performance and predictive power of a model by selecting the most critical variables and eliminating the redundant and irrelevant ones. Random forests have now been used for decades and allow for building models with high accuracy. However, finding the most expressive features from the dataset by selecting the most important features within random forests is still a challenging question. The often-used internal Gini importances of random forests are based on the amount of training examples that are divided by a feature but fail to acknowledge the magnitude of change in the target variable, leading to suboptimal selections. Shapley values are an established and unified framework for feature attribution, i.e., specifying how much each feature in a trained ML model contributes to the predictions for a given instance. Previous studies highlight the effectiveness of Shapley values for feature selection in real-world applications, while other research emphasizes certain theoretical limitations. This study provides an application-driven discussion of Shapley values for feature selection by first proposing four necessary conditions for a successful feature selection with Shapley values that are extracted from a multitude of critical research in the field. Given these valuable conditions, Shapley value feature selection is nevertheless a model averaging procedure by definition, where unimportant features can alter the final selection. Therefore, we additionally present Conditional Feature Selection (CFS) as a novel algorithm for performing feature selection that mitigates this problem and use it to evaluate the impact of model averaging in several real-world examples, covering the use of ML in geomatics. The results of this study show Shapley values as a good measure for feature selection when compared with Gini feature importances on four real-world examples, improving the RMSE by 5% when averaged over selections of all possible subset sizes. An even better selection can be achieved by CFS, improving on the Gini selection by approximately 7.5% in terms of RMSE. For random forests, Shapley value calculation can be performed in polynomial time, offering an advantage over the exponential runtime of CFS, building a trade-off to the lost accuracy in feature selection due to model averaging.
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Jun, Yuanyuan Xu, Hengpeng Xu, Zhe Sun, Zhenglu Yang, and Jinmao Wei. "An Effective Multi-Label Feature Selection Model Towards Eliminating Noisy Features." Applied Sciences 10, no. 22 (2020): 8093. http://dx.doi.org/10.3390/app10228093.

Full text
Abstract:
Feature selection has devoted a consistently great amount of effort to dimension reduction for various machine learning tasks. Existing feature selection models focus on selecting the most discriminative features for learning targets. However, this strategy is weak in handling two kinds of features, that is, the irrelevant and redundant ones, which are collectively referred to as noisy features. These features may hamper the construction of optimal low-dimensional subspaces and compromise the learning performance of downstream tasks. In this study, we propose a novel multi-label feature selection approach by embedding label correlations (dubbed ELC) to address these issues. Particularly, we extract label correlations for reliable label space structures and employ them to steer feature selection. In this way, label and feature spaces can be expected to be consistent and noisy features can be effectively eliminated. An extensive experimental evaluation on public benchmarks validated the superiority of ELC.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Gang, Yang Zhao, Jiasi Zhang, and Yongjie Ning. "A Novel End-To-End Feature Selection and Diagnosis Method for Rotating Machinery." Sensors 21, no. 6 (2021): 2056. http://dx.doi.org/10.3390/s21062056.

Full text
Abstract:
Feature selection is to obtain effective features from data, also known as feature engineering. Traditional feature selection and predictive model learning are separated, and there is a problem of inconsistency of criteria. This paper presents an end-to-end feature selection and diagnosis method that organically unifies feature expression learning and machine prediction learning into one model. The algorithm first combines the prediction model to calculate the mean impact value (MIVs) of the feature and realizes primary feature selection for the prediction model by selecting the feature with a larger MIV. In order to take into account the performance of the feature itself, the within-class and between-class discriminant analysis (WBDA) method is proposed, and combined with the feature diversity strategy, the feature-oriented secondary selection is realized. Eventually, feature vectors obtained by two selections are classified using a multi-class support vector machine (SVM). Compared with the modified network variable selection algorithm (MIVs), the principal component analysis dimensionality reduction algorithm (PCA), variable selection based on compensative distance evaluation technology (CDET), and other algorithms, the proposed method MIVs-WBDA exhibits excellent classification accuracy owing to the fusion of feature selection and predictive model learning. According to the results of classification accuracy testing after dimensionality reduction on rotating machinery status, the MIVs-WBDA method has a 3% classification accuracy improvement under the low-dimensional feature set. The typical running time of this classification learning algorithm is less than 10 s, while using deep learning, its running time will be more than a few hours.
APA, Harvard, Vancouver, ISO, and other styles
4

M Hafidz Ariansyah, Esmi Nur Fitri, and Sri Winarno. "IMPROVING PERFORMANCE OF STUDENTS’ GRADE CLASSIFICATION MODEL USES NAÏVE BAYES GAUSSIAN TUNING MODEL AND FEATURE SELECTION." Jurnal Teknik Informatika (Jutif) 4, no. 3 (2023): 493–501. http://dx.doi.org/10.52436/1.jutif.2023.4.3.737.

Full text
Abstract:
Student grades are a relevant variable for predicting student academic performance. In achieving good and quality student performance, it is necessary to analyze or evaluate the factors that influence student performance. When a educator can predict students' academic performance from the start, the educator can adjust the way of learning so that learning can run effectively. The purpose of this research is to study how it is applied to determine the interrelationships between variables and find out which variables have an effect, then use it as a feature selection technique. Then, researchers review the most popular classifier, Gaussian Naïve Bayes (GNB). Next, we survey the feature selection models and discuss the feature selection approach. In this study, researchers will classify student grades based on existing features to evaluate student performance, so it can guide educators in selecting learning methods and assist students in planning the learning process. The result is that applying Gaussian Naïve Bayes (GNB) without feature selection has a lower accuracy of 10.12% while using feature selection the accuracy increases to 10.12%.
APA, Harvard, Vancouver, ISO, and other styles
5

Asst., Professor Mohammad Salim Hamdard, and Professor Hedayatullah Lodin Asst. "Effect of Feature Selection on the Accuracy of Machine Learning Model." INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY RESEARCH AND ANALYSIS 06, no. 09 (2023): 4460–66. https://doi.org/10.5281/zenodo.8379528.

Full text
Abstract:
In real life data science problems, it’s almost rare that all the features in the dataset are useful for building a model. In machine learning, feature selection is the process of selecting a subset of relevant features or attributes for constructing a model. Removing irrelevant and redundant features and, selecting relevant features will improve the accuracy of a machine learning model. Furthermore, adding unnecessary variables to a model increases the overall complexity of the model. Our experiment indicates that the accuracy of a classification model is highly affected by the process of feature selection. We train three algorithms (K-Nearest Neighbors, Decision Tree, Multi-layer Perceptron) by selecting all the features and we got accuracies 49%, 84% and 71% accordingly. After doing some feature selection without any logical changes in models code the accuracy scores jumped to 82%, 86% and 78% accordingly which is quite impressive.
APA, Harvard, Vancouver, ISO, and other styles
6

Guru, D. S., N. Vinay Kumar, and Mahamad Suhil. "Feature Selection of Interval Valued Data Through Interval K-Means Clustering." International Journal of Computer Vision and Image Processing 7, no. 2 (2017): 64–80. http://dx.doi.org/10.4018/ijcvip.2017040105.

Full text
Abstract:
This paper introduces a novel feature selection model for supervised interval valued data based on interval K-Means clustering. The proposed model explores two kinds of feature selection through feature clustering viz., class independent feature selection and class dependent feature selection. The former one clusters the features spread across all the samples belonging to all the classes, whereas the latter one clusters the features spread across only the samples belonging to the respective classes. Both feature selection models are demonstrated to explore the generosity of clustering in selecting the interval valued features. For clustering, the kernel of the K-means clustering has been altered to operate on interval valued data. For experimentation purpose four standard benchmarking datasets and three symbolic classifiers have been used. To corroborate the effectiveness of the proposed model, a comparative analysis against the state-of-the-art models is given and results show the superiority of the proposed model.
APA, Harvard, Vancouver, ISO, and other styles
7

K, Bhuvaneswari. "Filter Based Sentiment Feature Selection Using Back Propagation Deep Learning." Journal of Computer Sciences and Informatics 2, no. 1 (2025): 15. https://doi.org/10.5455/jcsi.20241216054507.

Full text
Abstract:
Aim: The proposed Filter Based Sentiment Feature Selection (FBSFS) model focuses on to improve the performance of Sentiment Learning (SL) by selecting the most relevant sentiment features from text reviews using feature selection methods at document level. Method: Sentiment Learning is applied at the document level for classifying text reviews into two categories either positive or negative. The key sentiment features adjectives (ADJ), adverbs (ADV), and verbs (VRB) which are essential for sentiment analysis, are extracted from text document using the WordNet dictionary. Feature selection is performed by applying four different algorithms: Information Gain, Correlation, Gini Index, and Chi-Square. These algorithms help identify the most significant features that contribute to sentiment classification. The selected features are then fed into a Back Propagation Deep Learning (BPDL) classification model for sentiment analysis. Result: The experimental findings show that the proposed model achieved higher accuracy of 91.15% using Correlation feature selection. This accuracy signifies the effectiveness of the proposed model in classifying text reviews, outperforming other methods in terms of sentiment feature selection and classification. Conclusion: The proposed model enhances the performance of sentiment learning by selecting the most relevant sentiment features, particularly those extracted from adjectives, adverbs, and verbs, and combining them with BPDL. The FBSFS model as a robust tool for sentiment classification.
APA, Harvard, Vancouver, ISO, and other styles
8

Sholeh, Muhammad, Uning Lestari, and Dina Andayati. "Comparison of Feature Selection with Information Gain Method in Decision Tree, Regression Logistic and Random Forest Algorithms." Journal of Applied Business and Technology 5, no. 3 (2024): 146–53. https://doi.org/10.35145/jabt.v5i3.155.

Full text
Abstract:
One of the approaches that can be done is to perform feature selection. Feature selection is done by identifying the most informative features and not using features that do not directly contribute to the target feature. The purpose of feature selection is to increase the accuracy of the model. The research was conducted by comparing the performance of the model by comparing the accuracy results of the model without any feature selection with the model that has done feature selection. The process is done by comparing the accuracy results with decision tree, random forest and SVM algorithms. In the research method of feature selection on science data, the steps include understanding the domain and dataset, exploratory analysis, data cleaning, measuring feature relevance with criteria such as Information Gain, and feature ranking. The results are evaluated and validated using model performance metrics before and after feature selection. This process ensures selection of relevant features, improving accuracy. The research process used the Lung Cancer Prediction datasheet which consists of 306 rows and 16 attributes. The results show that feature selection can improve the performance of the classification model by reducing features that do not contribute to the target. Comparison results using decision tree, Regression Logistic and random forest classification model algorithms and feature selection resulted in a high accuracy value of 0.968 in the Regression Logistic algorithm with a feature selection of 5.
APA, Harvard, Vancouver, ISO, and other styles
9

Li, Deyang. "Feature Selection Based on Stock Prediction Model." Journal of Physics: Conference Series 2386, no. 1 (2022): 012021. http://dx.doi.org/10.1088/1742-6596/2386/1/012021.

Full text
Abstract:
Abstract Stocks as an important part of financial investment are becoming more and more popular, and they have higher rates of both returns and risks. Making a prediction for the stock can reduce its risk and help people gain returns. So far, the traditional machine learning model is still unable to achieve ideal accuracy. The paper is devoted to analyzing the input features to improve the performance of stock forecasting models. Aiming at the problem that the traditional stock prediction algorithms produce different accuracy for the models constructed by different input features, the paper, through a method of establishing a long-term memory (LSTM) model, predicts the stock. The Shixia Technology Stock’s history data includes 5 features as the dataset and chooses the different feature as the input. In the experiment in this paper, different results are produced by subtracting one different input feature at a time. Finally, the model’s predictions were compared with each other by the R_square and RMSE, and the analysis revealed which feature could have a greater impact on the stock prediction. The paper finds that the different input features have different influences on the model fitting effect and the prediction accuracy on stock forecasting based on the same neural network model.
APA, Harvard, Vancouver, ISO, and other styles
10

Karthiga, B., Sathya Selvaraj Sinnasamy, V. C. Bharathi, K. Azarudeen, and P. Sherubha. "Design of a Classifier model for Heart Disease Prediction using normalized graph model." Salud, Ciencia y Tecnología - Serie de Conferencias 3 (March 23, 2024): 653. http://dx.doi.org/10.56294/sctconf2024653.

Full text
Abstract:
Heart disease is an illness that influences enormous people worldwide. Particularly in cardiology, heart disease diagnosis and treatment need to happen quickly and precisely. Here, a machine learning-based (ML) approach is anticipated for diagnosing a cardiac disease that is both effective and accurate. The system was developed using standard feature selection algorithms for removing unnecessary and redundant features. Here, a novel normalized graph model (n-GM) is used for prediction. To address the issue of feature selection, this work considers the significant information feature selection approach. To improve classification accuracy and shorten the time it takes to process classifications, feature selection techniques are utilized. Furthermore, the hyper-parameters and learning techniques for model evaluation have been accomplished using cross-validation. The performance is evaluated with various metrics. The performance is evaluated on the features chosen via features representation. The outcomes demonstrate that the suggested n-GM gives 98% accuracy for modeling an intelligent system to detect heart disease using a classifier support vector machine
APA, Harvard, Vancouver, ISO, and other styles
11

Patel, Damodar, and Amit Kumar Saxena. "Feature Selection in High Dimension Datasets using Incremental Feature Clustering." Indian Journal Of Science And Technology 17, no. 32 (2024): 3318–26. http://dx.doi.org/10.17485/ijst/v17i32.2077.

Full text
Abstract:
Objectives: To develop a machine learning-based model to select the most important features from a high-dimensional dataset to classify the patterns at high accuracy and reduce their dimensionality. Methods: The proposed feature selection method (FSIFC) forms and combines feature clusters incrementally and produces feature subsets each time. The method uses K-means clustering and Mutual Information (MI) to refine the feature selection process iteratively. Initially, two clusters of features are formed using K-means clustering (K=2) by taking features as the basis of clustering instead of taking the patterns (a traditional way). From these two clusters, the features with the highest MI value in each cluster are kept in a feature subset. Classification accuracies (CA) of the feature subset are calculated using three classifiers namely Support Vector Machines (SVM), Random Forest (RF), and k-nearest Neighbor (knn). The process is repeated by incrementing the value of K i.e. number of clusters; until a maximum user-defined value of K is reached. The best value of CA obtained from these trials is recorded and the corresponding feature set is finally accepted. Findings: The proposed method is demonstrated using ten datasets and the results are compared with the existing published results using three classifiers to determine the method's performance. The ten datasets are classified with average CAs of 92.72%, 93.13%, and 91.5%, using the SVM, RF, and K-NN classifiers respectively. The proposed method selects a maximum of thirty features from the datasets. In terms of selecting the most effective and the smallest feature sets, the proposed method outperforms eight other feature selection methods considering CAs. Novelty: The proposed model applies feature reduction using combined feature clustering and filter methods in an incremental way. This provides an improved selection of relevant features while removing those which are irrelevant at different trials. Keywords: Feature selection, High-dimensional datasets, K-means algorithm, Mutual information, Machine learning
APA, Harvard, Vancouver, ISO, and other styles
12

Fanqi Meng, Fanqi Meng, Wenying Cheng Fanqi Meng, and Jingdong Wang Wenying Cheng. "An Integrated Semi-supervised Software Defect Prediction Model." 網際網路技術學刊 24, no. 6 (2023): 1307–17. http://dx.doi.org/10.53106/160792642023112406013.

Full text
Abstract:
<p>A novel semi-supervised software defect prediction model FFeSSTri (Filtered Feature Selecting, Sample and Tri-training) is proposed to address the problem that class imbalance and too many irrelevant or redundant features in labelled samples lower the accuracy of semi-supervised software defect prediction. Its innovation lies in that the construction of FFeSSTri integrates an oversampling technique, a new feature selection method, and a Tri-training algorithm, thus it can effectively improve the accuracy. Firstly, the oversampling technique is applied to expand the class of inadequate samples, thus it solves the unbalanced classification of the labelled samples. Secondly, a new filtered feature selection method based on relevance and redundancy is proposed, which can exclude those irrelevant or redundant features from labelled samples. Finally, the Tri-training algorithm is used to learn the labelled training samples to build the defect prediction model FFeSSTri. The experiments conducted on the NASA software defect prediction dataset show that FFeSSTri outperforms the existing four supervised learning methods and one semi-supervised learning method in terms of F-Measure values and AUC values.</p> <p> </p>
APA, Harvard, Vancouver, ISO, and other styles
13

SHRUTI, PANT. "SENTIMENT ANALYSIS USING FEATURE SELECTION AND CLASSIFICATION ALGORITHMS." IJIERT - International Journal of Innovations in Engineering Research and Technology 4, no. 7 (2017): 5–11. https://doi.org/10.5281/zenodo.1459090.

Full text
Abstract:
<strong>Here we present a technique to compute the sentiments of movie review dataset so t hat the overall performance of the model is optimised. This model is certain to train and test the model and find the performance constraints. We first pre - process the dataset followed by feature selection and then we will classify the features to investigate the performance. A textual movie review is important as it reveals strong and weak points of the movie plot and by doing the deeper analysis of a movie review one can tell if movie will meet the expectations of the reviewer.</strong> <strong>https://www.ijiert.org/paper-details?paper_id=141060</strong>
APA, Harvard, Vancouver, ISO, and other styles
14

Ludwig, Simone A. "Guided Particle Swarm Optimization for Feature Selection: Application to Cancer Genome Data." Algorithms 18, no. 4 (2025): 220. https://doi.org/10.3390/a18040220.

Full text
Abstract:
Feature selection is a crucial step in the data preprocessing stage of machine learning. It involves selecting a subset of relevant features for use in model construction. Feature selection helps in improving model performance by reducing overfitting, enhancing generalization, and decreasing computational cost. Techniques for feature selection can be broadly classified into filter methods, wrapper methods, and embedded methods. This paper presents a feature selection method based on Particle Swarm Optimization (PSO). The proposed algorithm makes use of a guided particle scheme whereby three filter-based methods are incorporated. The proposed algorithm addresses the issue of premature convergence to global optima compared to other PSO feature-based methods. In addition, the algorithm is tested on very-high-dimensional genome data that include up to 44,909 features. Results of an experimental comparison with other state-of-the-art feature selection algorithms show that the proposed algorithm produces overall better results.
APA, Harvard, Vancouver, ISO, and other styles
15

Usha, P., and J. G. R. Sathiaseelan. "Enhanced Filtrate Feature Selection Algorithm for Feature Subset Generation." Indian Journal Of Science And Technology 17, no. 29 (2024): 3002–11. http://dx.doi.org/10.17485/ijst/v17i29.2127.

Full text
Abstract:
Objectives: In the bioinformatics field feature selection plays a vital role in selecting relevant features for making better decisions and assessment of disease diagnosis. Brain Tumour (BT) is the second leading disease in the world. Most of the BT detection techniques are based on Magnetic Resonance (MR) images. Methods: In this paper, medical reports are used in the detection of BT to increase the surveillance of patients. To improve the accuracy of predictive models, a new adaptive technique called Enhanced Filtrate Feature Selection (EFFS) algorithm for optimal feature selection is proposed. Initially, the EFFS algorithm finds the dependency of each attribute and feature score by using Mutual Information Gain, Chi-Square, Correlation, and Fishore score filter methods. Afterward, the occurrence rate of each top-ranked attribute is filtered by applying threshold value and obtaining the optimal feature by using the Pareto principle. Findings: The performance of the selected optimal features is evaluated by time complexity, number of features selected, and accuracy. The efficiency of the proposed algorithm is measured and analyzed in a high-quality optimal subset based on a Random Forest classifier integrated with the ranking of attributes. The EFFS algorithm selects 39 out of 46 significant and relevant features with minimum selection time and shows 99.31 % of accuracy for BT, 29 features with 99.47% of accuracy for Breast Cancer, 15 features with 94.61% of accuracy for Lung Cancer, 15 features with 98.84% of accuracy for Diabetes and 43 features with 90% of accuracy for Covid-19 dataset. Novelty: To decrease the processing time and improve the performance of a model feature selection process will be done at the initial stages for the betterment of the classification task. Thus, the proposed EFFS algorithm is applied to different datasets based on medical reports and EFFS outperforms with greater performance measurements and time. The appropriate feature selection techniques help to diagnose diseases in the prior phase and increase the survival of human beings. Keywords: Bioinformatics, Brain Tumour, Chi­Square, Correlation, EFFS, Feature Selection, Fishore Score, Information Gain, Optimal Features, Random Forest
APA, Harvard, Vancouver, ISO, and other styles
16

Assegie, Tsehay Admassu, Ravulapalli Lakshmi Tulasi, Vadivel Elanangai, and Napa Komal Kumar. "Exploring the performance of feature selection method using breast cancer dataset." Indonesian Journal of Electrical Engineering and Computer Science 25, no. 1 (2022): 232–37. https://doi.org/10.11591/ijeecs.v25.i1.pp232-237.

Full text
Abstract:
Breast cancer is the most common type of cancer occurring mostly in females. In recent years, many researchers have devoted to automate diagnosis of breast cancer by developing different machine learning model. However, the quality and quantity of feature in breast cancer diagnostic dataset have significant effect on the accuracy and efficiency of predictive model. Feature selection is effective method for reducing the dimensionality and improving the accuracy of predictive model. The use of feature selection is to determine feature required for training model and to remove irrelevant and duplicate feature. Duplicate feature is a feature that is highly correlated to another feature. The objective of this study is to conduct experimental research on three different feature selection methods for breast cancer prediction. Sequential, embedded and chi-square feature selection are implemented using breast cancer diagnostic dataset. The study compares the performance of sequential embedded and chi-square feature selection on test set. The experimental result evidently shows that sequential feature selection outperforms as compared to chi-square (X2 ) statistics and embedded feature selection. Overall, sequential feature selection achieves better accuracy of 98.3% as compared to chi-square (X2 ) statistics and embedded feature selection.
APA, Harvard, Vancouver, ISO, and other styles
17

Malik, Azman Ab, Tao Lyu, Noormadinah Allias, and Irni Hamiza Hamzah. "Analysing feature selection: impacts towards forecasting electricity power consumption." International Journal of Reconfigurable and Embedded Systems (IJRES) 14, no. 1 (2025): 265–72. https://doi.org/10.11591/ijres.v14.i1.pp265-272.

Full text
Abstract:
This study focuses on the development of electrical power forecasting based on electricity usage in Wuzhou, China. To develop a forecasting model, the important features need to be identified. Therefore, this study investigates the performance of the feature selection method, focusing on the mutual information as a filter and random forest as a wrapper-based feature selection. From the experiment, six features have been chosen, whereby both feature selection methods chose almost identical features. Later, the selected features are trained and tested with common machine learning models, namely random forest regressor, support vector regression (SVR), k-nearest neighbor (KNN) regressor, and extreme gradient boosting (XGBoost) regressor. The performances of the feature selections tested on each of the models are measured in terms of mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R&sup2;). Findings from the experiment revealed that XGBoost outperform the other machine learning models with RMSE 0.9566 and R&sup2; indicated with 0.2561. However, SVR outperformed XGBoost and other model by obtaining MAE 0.6028. It can be concluded that the performance of filter-based outperformed the embedded feature selection.
APA, Harvard, Vancouver, ISO, and other styles
18

Maju, Sonam V., and Gnana Prakasi Oliver Sirya Pushpam. "A novel two-tier feature selection model for Alzheimer's disease prediction." Indonesian Journal of Electrical Engineering and Computer Science 33, no. 1 (2024): 227–35. https://doi.org/10.11591/ijeecs.v33.i1.pp227-235.

Full text
Abstract:
The interdisciplinary research studies of artificial intelligence in health sector is bringing drastic life saving changes in the healthcare domain. One such aspect is the early disease prediction using machine learning and regression algorithms. The purpose of this research is to improve the prediction accuracy of Alzheimer &rsquo;s disease by analysing the correlation of unexplored Alzheimer causing diseases. The work proposes Chi square-lasso ridge linear (Chi-LRL) model, a new two-tier feature ranking model which recognizes the significance of including diabetes, blood pressure and body mass index as potential Alzhiemer predictive parameters. The newly added predictive parameters of Alzheimer&rsquo;s disease were statistically verified along with the conventional prediction parameters using chi-square method (Chi) as Tier 1 and an embedded model of lasso, ridge and linear (LRL) Regression for feature ranking as Tier 2. The performance of the proposed Chi-LRL model with selected features were then analysed using machine learning algorithms for performance analysis. The result shows a noticeable performance by selecting eleven significant features and a 4.5% increase in the prediction accuracy of Alzheirmer disease.
APA, Harvard, Vancouver, ISO, and other styles
19

Dagogo-George, Tamunopriye Ene, Hammed Adeleye Mojeed, Abdulateef Oluwagbemiga Balogun, Modinat Abolore Mabayoje, and Shakirat Aderonke Salihu. "Tree-based homogeneous ensemble model with feature selection for diabetic retinopathy prediction." Jurnal Teknologi dan Sistem Komputer 8, no. 4 (2020): 297–303. http://dx.doi.org/10.14710/jtsiskom.2020.13669.

Full text
Abstract:
Diabetic Retinopathy (DR) is a condition that emerges from prolonged diabetes, causing severe damages to the eyes. Early diagnosis of this disease is highly imperative as late diagnosis may be fatal. Existing studies employed machine learning approaches with Support Vector Machines (SVM) having the highest performance on most analyses and Decision Trees (DT) having the lowest. However, SVM has been known to suffer from parameter and kernel selection problems, which undermine its predictive capability. Hence, this study presents homogenous ensemble classification methods with DT as the base classifier to optimize predictive performance. Boosting and Bagging ensemble methods with feature selection were employed, and experiments were carried out using Python Scikit Learn libraries on DR datasets extracted from UCI Machine Learning repository. Experimental results showed that Bagged and Boosted DT were better than SVM. Specifically, Bagged DT performed best with accuracy 65.38 %, f-score 0.664, and AUC 0.731, followed by Boosted DT with accuracy 65.42 %, f-score 0.655, and AUC 0.724 when compared to SVM (accuracy 65.16 %, f-score 0.652, and AUC 0.721). These results indicate that DT's predictive performance can be optimized by employing the homogeneous ensemble methods to outperform SVM in predicting DR.
APA, Harvard, Vancouver, ISO, and other styles
20

N, Yogesh, Purohit Shrinivasacharya, and Nagaraj Naik. "Novel statistically equivalent signature-based hybrid feature selection and ensemble deep learning LSTM and GRU for chronic kidney disease classification." PeerJ Computer Science 10 (November 13, 2024): e2467. http://dx.doi.org/10.7717/peerj-cs.2467.

Full text
Abstract:
Chronic kidney disease (CKD) involves numerous variables, but only a few significantly impact the classification task. The statistically equivalent signature (SES) method, inspired by constraint-based learning of Bayesian networks, is employed to identify essential features in CKD. Unlike conventional feature selection methods, which typically focus on a single set of features with the highest predictive potential, the SES method can identify multiple predictive feature subsets with similar performance. However, most feature selection (FS) classifiers perform suboptimally with strongly correlated data. The FS approach faces challenges in identifying crucial features and selecting the most effective classifier, particularly in high-dimensional data. This study proposes using the Least Absolute Shrinkage and Selection Operator (LASSO) in conjunction with the SES method for feature selection in CKD identification. Following this, an ensemble deep-learning model combining long short-term memory (LSTM) and gated recurrent unit (GRU) networks is proposed for CKD classification. The features selected by the hybrid feature selection method are fed into the ensemble deep-learning model. The model’s performance is evaluated using accuracy, precision, recall, and F1 score metrics. The experimental results are compared with individual classifiers, including decision tree (DT), Random Forest (RF), logistic regression (LR), and support vector machine (SVM). The findings indicate a 2% improvement in classification accuracy when using the proposed hybrid feature selection method combined with the LSTM and GRU ensemble deep-learning model. Further analysis reveals that certain features, such as HEMO, POT, bacteria, and coronary artery disease, contribute minimally to the classification task. Future research could explore additional feature selection methods, including dynamic feature selection that adapts to evolving datasets and incorporates clinical knowledge to enhance CKD classification accuracy further.
APA, Harvard, Vancouver, ISO, and other styles
21

Ramineni, Vyshnavi, and Goo-Rak Kwon. "Diagnosis of Alzheimer’s Disease using Wrapper Feature Selection Method." Korean Institute of Smart Media 12, no. 3 (2023): 30–37. http://dx.doi.org/10.30693/smj.2023.12.3.30.

Full text
Abstract:
Alzheimer’s disease (AD) symptoms are being treated by early diagnosis, where we can only slow the symptoms and research is still undergoing. In consideration, using T1-weighted images several classification models are proposed in Machine learning to identify AD. In this paper, we consider the improvised feature selection, to reduce the complexity by using wrapping techniques and Restricted Boltzmann Machine (RBM). This present work used the subcortical and cortical features of 278 subjects from the ADNI dataset to identify AD and sMRI. Multi-class classification is used for the experiment i.e., AD, EMCI, LMCI, HC. The proposed feature selection consists of Forward feature selection, Backward feature selection, and Combined PCA &amp; RBM. Forward and backward feature selection methods use an iterative method starting being no features in the forward feature selection and backward feature selection with all features included in the technique. PCA is used to reduce the dimensions and RBM is used to select the best feature without interpreting the features. We have compared the three models with PCA to analysis. The following experiment shows that combined PCA &amp;RBM, and backward feature selection give the best accuracy with respective classification model RF i.e., 88.65, 88.56% respectively.
APA, Harvard, Vancouver, ISO, and other styles
22

Constantinopoulos, C., M. K. Titsias, and A. Likas. "Bayesian feature and model selection for Gaussian mixture models." IEEE Transactions on Pattern Analysis and Machine Intelligence 28, no. 6 (2006): 1013–18. http://dx.doi.org/10.1109/tpami.2006.111.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Zhao, Zheng, Lei Wang, and Huan Liu. "Efficient Spectral Feature Selection with Minimum Redundancy." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (2010): 673–78. http://dx.doi.org/10.1609/aaai.v24i1.7671.

Full text
Abstract:
Spectral feature selection identifies relevant features by measuring their capability of preserving sample similarity. It provides a powerful framework for both supervised and unsupervised feature selection, and has been proven to be effective in many real-world applications. One common drawback associated with most existing spectral feature selection algorithms is that they evaluate features individually and cannot identify redundant features. Since redundant features can have significant adverse effect on learning performance, it is necessary to address this limitation for spectral feature selection. To this end, we propose a novel spectral feature selection algorithm to handle feature redundancy, adopting an embedded model. The algorithm is derived from a formulation based on a sparse multi-output regression with a L2,1-norm constraint. We conduct theoretical analysis on the properties of its optimal solutions, paving the way for designing an efficient path-following solver. Extensive experiments show that the proposed algorithm can do well in both selecting relevant features and removing redundancy.
APA, Harvard, Vancouver, ISO, and other styles
24

Hamad, Zana O. "REVIEW OF FEATURE SELECTION METHODS USING OPTIMIZATION ALGORITHM." Polytechnic Journal 12, no. 2 (2023): 203–14. http://dx.doi.org/10.25156/ptj.v12n2y2022.pp203-214.

Full text
Abstract:
Many works have been done to reduce complexity in terms of time and memory space. The feature selection process is one of the strategies to reduce system complexity and can be defined as a process of selecting the most important feature among feature space. Therefore, the most useful features will be kept, and the less useful features will be eliminated. In the fault classification and diagnosis field, feature selection plays an important role in reducing dimensionality and sometimes might lead to having a high classification rate. In this paper, a comprehensive review is presented about feature selection processing and how it can be done. The primary goal of this research is to examine all of the strategies that have been used to highlight the (selection) selected process, including filter, wrapper, Meta-heuristic algorithm, and embedded. Review of Nature-inspired algorithms that have been used for features selection is more focused such as particle swarm, Grey Wolf, Bat, Genetic, wale, and ant colony algorithm. The overall results confirmed that the feature selection approach is important in reducing the complexity of any model-based machine learning algorithm and may sometimes result in improved performance of the simulated model.
APA, Harvard, Vancouver, ISO, and other styles
25

Jin, Shunhao, Fenlin Liu, Chunfang Yang, Yuanyuan Ma, and Yuan Liu. "Feature Selection of the Rich Model Based on the Correlation of Feature Components." Security and Communication Networks 2021 (April 29, 2021): 1–12. http://dx.doi.org/10.1155/2021/6680528.

Full text
Abstract:
Currently, the popular Rich Model steganalysis features usually contain a large number of redundant feature components which may bring “curse of dimensionality” and large computation cost, but the existing feature selection methods are difficult to effectively reduce the dimensionality when there are many strongly correlated effective feature components. This paper proposes a novel selection method for Rich Model steganalysis features. First, the separability of each feature component in the submodels of Rich Model is measured based on the Fisher criterion, and the feature components are sorted in the descending order based on the separability. Second, the correlation coefficient between any two feature components in each submodel is calculated, and feature selection is performed according to the Fisher value of each component and the correlation coefficients. Finally, the selected submodels are combined as the final steganalysis feature. The results show that the proposed feature selection method can effectively reduce the dimensionalities of JPEG domain and spatial domain Rich Model steganalysis features without affecting the detection accuracies.
APA, Harvard, Vancouver, ISO, and other styles
26

Fahrudy, Dony, and Shofwatul 'Uyun. "Classification of Student Graduation using Naïve Bayes by Comparing between Random Oversampling and Feature Selections of Information Gain and Forward Selection." JOIV : International Journal on Informatics Visualization 6, no. 4 (2022): 798. http://dx.doi.org/10.30630/joiv.6.4.982.

Full text
Abstract:
Class-imbalanced data with high attribute dimensions in datasets frequently contribute to issues in a classification process as this can affect algorithms’ performance in the computing process because there are imbalanced numbers of data in each class and irrelevant attributes that must be processed; therefore, this needs for some techniques to overcome the class-imbalanced data and feature selection to reduce data complexity and irrelevant features. Therefore, this study applied random oversampling (ROs) method to overcome the class-imbalanced data and two feature selections (information gain and forward selection) compared to determine which feature selection is superior, more effective and more appropriate to apply. The results of feature selection then were used to classify the student graduation by creating a classification model of Naïve Bayes algorithm. This study indicated an increase in the average accuracy of the Naïve Bayes method without the ROs preprocessing and the feature selection (81.83%), with the ROs (83.84%), with information gain with 3 selected features (86.03%) and forward selection with 2 selected features (86.42%); consequently, these led to increasing accuracy of 4.2% from no pre-processing to information gain and 4.59% from no pre-processing to forward selection. Therefore, the best feature selection was the forward selection with 2 selected features (GPA of the 8th semester and the overall GPA), and the ROs and both feature selections were proven to improve the performance of the Naïve Bayes method.
APA, Harvard, Vancouver, ISO, and other styles
27

Himawan, Salamet Nur, Rendi Rendi, and Nur Budi Nugraha. "Feature Selection Menggunakan Algoritma Meta-Heuristik." Journal of Practical Computer Science 2, no. 2 (2023): 84–89. http://dx.doi.org/10.37366/jpcs.v2i2.2289.

Full text
Abstract:
Machine learning requires data to make predictions. Data can have a large number of features. The large number of features can cause machine learning models to overfit, increase model complexity, and high computational costs. Feature selection is one method for optimizing machine learning models. Feature selection reduces the number of features used in the learning process. This research proposes a feature selection method using meta-heuristic algorithms. The machine learning model serves as the objective function for the meta-heuristic algorithm. The objective function is evaluated at each iteration to obtain the most influential features in the model. The machine learning models used are Random Forest, k-Nearest Neighbors, and Support Vector Machine. The meta-heuristic algorithms used are Differential Evolution, Flower Pollination, Grey Wolf, and Whale Optimization. The research shows that using meta-heuristic algorithms can improve the accuracy of machine learning models with fewer features. The Support Vector Machine – Differential Evolution scheme has the highest accuracy and uses the fewest features.
APA, Harvard, Vancouver, ISO, and other styles
28

Guo, Jia. "Feature Selection in House Price Prediction." Highlights in Business, Economics and Management 21 (December 12, 2023): 746–52. http://dx.doi.org/10.54097/hbem.v21i.14755.

Full text
Abstract:
This study aims to construct a model to choose effective features and use them to predict the market price of an exact house, which can help people with house pricing and property evaluating. This paper preliminarily constructs several machine learning models like Linear Regression, SVM, KNN and compares their accuracy on this problem to choose the best-fit one for improving. After using parameter tuning to optimize this model, this study tries to use and recursive feature elimination and genetic algorithm to select features to improve simple SVM model. After feature selection, this study re-evaluated the accuracy of the model and compared which features had a greater impact on the predictions. After comparison, this study found that in this particular case, features have closer connections with living condition, traffic condition and sale condition will have a huge impact on the house price.
APA, Harvard, Vancouver, ISO, and other styles
29

Sushma, S. J., Tsehay Admassu Assegie, D. C. Vinutha, and S. Padmashree. "An improved feature selection approach for chronic heart disease detection." Bulletin of Electrical Engineering and Informatics 10, no. 6 (2021): 3501–6. http://dx.doi.org/10.11591/eei.v10i6.3001.

Full text
Abstract:
Irrelevant feature in heart disease dataset affects the performance of binary classification model. Consequently, eliminating irrelevant and redundant feature (s) from training set with feature selection algorithm significantly improves the performance of classification model on heart disease detection. Sequential feature selection (SFS) is successful algorithm to improve the performance of classification model on heart disease detection and reduces the computational time complexity. In this study, sequential feature selection (SFS) algorithm is implemented for improving the classifier performance on heart disease detection by removing irrelevant features and training a model on optimal features. Furthermore, exhaustive and permutation based feature selection algorithm are implemented and compared with SFS algorithm. The implemented and existing feature selection algorithms are evaluated using real world Pima Indian heart disease dataset and result appears to prove that the SFS algorithm outperforms as compared to exhaustive and permutation based feature selection algorithm. Overall, the result looks promising and more effective heart disease detection model is developed with accuracy of 99.3%.
APA, Harvard, Vancouver, ISO, and other styles
30

Isnaeni Nurul Afra, Dian, Radhiyatul Fajri, Harnum Annisa Prafitia, Ikhwan Arief, and Aprinaldi Jasa Mantau. "Feature Selection and Performance Evaluation of Buzzer Classification Model." Jurnal Optimasi Sistem Industri 23, no. 1 (2024): 1–14. http://dx.doi.org/10.25077/josi.v23.n1.p1-14.2024.

Full text
Abstract:
In the rapidly evolving digital age, social media platforms have transformed into battleground for shaping public opinion. Among these platforms, X has been particularly susceptible to the phenomenon of 'buzzers', paid or coordinated actors who manipulate online discussions and influence public sentiment. This manipulation poses significant challenges for users, researchers, and policymakers alike, necessitating robust detection measures and strategic feature selection for accurate classification models. This research explores the utilization of various feature selection techniques to identify the most influential features among the 24 features employed in the classification modeling using Support Vector Machine. This study found that selecting 11 key features yields a remarkably effective classification model, achieving an impressive F1-score of 87.54 in distinguishing between buzzer and non-buzzer accounts. These results suggest that focusing on the relevant features can improve the accuracy and efficiency of buzzer detection models. By providing a more robust and adaptable solution to buzzer detection, our research has the potential to advance social media research and policy. This enabling researchers and policymakers to devise strategies aimed at mitigating misinformation dissemination and cultivating an environment of trust and integrity within social media platforms, thus fostering healthier online interactions and discourse.
APA, Harvard, Vancouver, ISO, and other styles
31

Han, Yuanyuan, Lan Huang, and Fengfeng Zhou. "Zoo: Selecting Transcriptomic and Methylomic Biomarkers by Ensembling Animal-Inspired Swarm Intelligence Feature Selection Algorithms." Genes 12, no. 11 (2021): 1814. http://dx.doi.org/10.3390/genes12111814.

Full text
Abstract:
Biological omics data such as transcriptomes and methylomes have the inherent “large p small n” paradigm, i.e., the number of features is much larger than that of the samples. A feature selection (FS) algorithm selects a subset of the transcriptomic or methylomic biomarkers in order to build a better prediction model. The hidden patterns in the FS solution space make it challenging to achieve a feature subset with satisfying prediction performances. Swarm intelligence (SI) algorithms mimic the target searching behaviors of various animals and have demonstrated promising capabilities in selecting features with good machine learning performances. Our study revealed that different SI-based feature selection algorithms contributed complementary searching capabilities in the FS solution space, and their collaboration generated a better feature subset than the individual SI feature selection algorithms. Nine SI-based feature selection algorithms were integrated to vote for the selected features, which were further refined by the dynamic recursive feature elimination framework. In most cases, the proposed Zoo algorithm outperformed the existing feature selection algorithms on transcriptomics and methylomics datasets.
APA, Harvard, Vancouver, ISO, and other styles
32

Balcarras, Matthew, Salva Ardid, Daniel Kaping, Stefan Everling, and Thilo Womelsdorf. "Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness." Journal of Cognitive Neuroscience 28, no. 2 (2016): 333–49. http://dx.doi.org/10.1162/jocn_a_00894.

Full text
Abstract:
Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.
APA, Harvard, Vancouver, ISO, and other styles
33

Emima, A., and D. I. George Amalarethinam. "A Hybrid Model of Enhanced Teacher Learner Based Optimization (ETLBO) with Particle Swarm Optimization (PSO) Algorithm for Predicting Academic Student Performance." Indian Journal Of Science And Technology 18, no. 10 (2025): 772–83. https://doi.org/10.17485/ijst/v18i10.240.

Full text
Abstract:
Objectives: A hybrid ETLBO-PSO model is developed to improve student performance predictions. It assesses intellectual, social, and economic background of students to increase accuracy of students performance predictions. The model optimizes selecting features, which reduces redundancy and increases efficiency. The efficacy is compared with existing Educational Data Mining techniques. Methods : This study integrates Enhanced Teachers Learners Based Optimization (ETLBO) and Particle Swarm Optimization (PSO) algorithm for optimal feature selection. The suggested technique is utilized as an algorithm for selecting features to identify the most significant elements for predicting student academic performance. The efficacy of the proposed feature selection technique is evaluated using three machine learning classifiers: Extreme Gradient Boosting (XGB), Light Gradient Boosting (LightGB), and Category Gradient Boosting (CatGB) in Student achievement Dataset in secondary education for Mathematics. Findings: The experimental results of ETLBO-PSO provides sustained excellent model performance while reducing accuracy decline. The Meta-Class model of ETLBO-PSO has an F1-score of 82.43%, which makes it an increasingly robust and reliable strategy. Furthermore, an innovative visual and intuitive method is employed to identify the aspects that most significantly impact the score, facilitating the interpretation and comprehension of the complete model. Novelty: ETLBO_PSO is integrated with SHAP (SHapley Additive exPlanations), and Meta-class Model are used to optimize student performance predictions with higher accuracy. Unlike traditional approaches, it continuously refines selecting features throughout training, solving high-dimensional data issues. SHAP's approach assures precise feature attribution, hence improving accessibility and making decisions. Keywords: Feature Selection, Enhance Teacher Learner based Optimization, Particle Swarm Optimization, Academic Student Performance, Classification Algorithm, Optimization Techniques, XGBoost, LGBoost, CATBoost
APA, Harvard, Vancouver, ISO, and other styles
34

Et. al., V. NishaJenipher ,. "Comparing the Performance of Algorithm with Relevant Features for Histological Categorization of Lung Cancer." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 2 (2021): 357–63. http://dx.doi.org/10.17762/turcomat.v12i2.814.

Full text
Abstract:
Due to increasing cancer cases around the world, Lung cancer has become the favorite topic of research for a long period of time. The actual reason is due to the increasing rate of new cases across the globe. Therefore, many researchers used prediction or classification algorithm to identify the factors that contribute to the increase of this deadly disease. Two models were built namely WRF and RF. RF model provides the result of features selected by a predominant feature selection method whereas WRF model provides result of all features without performing any selection process. A comparison is made to inform the importance of selecting the feature for classification or prediction algorithm. The accuracy provided by WRF model is higher than RF model which highlights the importance of selecting the feature for classification algorithm.&#x0D;
APA, Harvard, Vancouver, ISO, and other styles
35

Benson, Philip J. "Feature see, feature do." Behavioral and Brain Sciences 21, no. 1 (1998): 18–19. http://dx.doi.org/10.1017/s0140525x98250102.

Full text
Abstract:
Physiological evidence predicts a model of concept categorisation that evolves through direct interaction with object feature selection. The requirement stated by Schyns et al. for feature plasticity is supported, but important caveats raise a question about the level at which feature identification can occur. Visual attribute selection for feature creation is likely to be directed by top-down and attentional processes.
APA, Harvard, Vancouver, ISO, and other styles
36

Malik, Azman Ab, Lyu Tao, Noormadinah Allias, and Irni Hamiza Hamzah. "Analysing feature selection: impacts towards forecasting electricity power consumption." International Journal of Reconfigurable and Embedded Systems (IJRES) 14, no. 1 (2025): 265. https://doi.org/10.11591/ijres.v14.i1.pp265-272.

Full text
Abstract:
This study focuses on the development of electrical power forecasting based on electricity usage in Wuzhou, China. To develop a forecasting model, the important features need to be identified. Therefore, this study investigates the performance of the feature selection method, focusing on the mutual information as a filter and random forest as a wrapper-based feature selection. From the experiment, six features have been chosen, whereby both feature selection methods chose almost identical features. Later, the selected features are trained and tested with common machine learning models, namely random forest regressor, support vector regression (SVR), k-nearest neighbor (KNN) regressor, and extreme gradient boosting (XGBoost) regressor. The performances of the feature selections tested on each of the models are measured in terms of mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R²). Findings from the experiment revealed that XGBoost outperform the other machine learning models with RMSE 0.9566 and R² indicated with 0.2561. However, SVR outperformed XGBoost and other model by obtaining MAE 0.6028. It can be concluded that the performance of filter-based outperformed the embedded feature selection.
APA, Harvard, Vancouver, ISO, and other styles
37

L. William Mary. "Enhancing COVID-19 Prediction Using Machine Learning: A Comparative Analysis of Feature Selection and Classification Techniques." Journal of Information Systems Engineering and Management 10, no. 24s (2025): 361–69. https://doi.org/10.52783/jisem.v10i24s.3910.

Full text
Abstract:
Introduction: The early and accurate detection of COVID-19 remains a life-threatening challenge in medical analysis. Machine learning is used for predicting disease outcomes based on clinical parameters. This analysis proposes a comparative analysis of feature selection method and classification techniques to enhance COVID-19 detection accuracy using blood biomarkers. We used a pensourse dataset of 1,724 cases, including 35 features. To improve the model performance data preprocessing process included outlier handling, normalization, and transformation techniques to improve model performance. To identify the relevant features, we employed the three-feature selection methods Chi-Square test, Pearson correlation coefficient, and Random Forest. The model prediction accuracy was enhanced using a stacking ensemble classication techniques. The machine learning based classification models effectively predicted COVID-19 infectious disease using blood biomarkers with optimized feature selection techniques. Objectives: To enhance the accuracy of COVID-19 prediction using machine learning techniques by applying feature selection and classification techniques on blood biomarkers. Methods: The comparative analysis utilized a publicly available dataset containing 1,724 cases with 35 attributes. Data preprocessing involved outlier handling, normalization, and transformation techniques. Employed Chi-Square test, Pearson correlation coefficient, and Random Forest feature selection techniques. Stacking ensemble classification algorithm was utilized for the better performance of a model. Results: The classification models demonstrated efficiency in predicting COVID-19 using blood biomarkers. Optimized feature selection significantly improved predictive accuracy, highlighting the importance of selecting relevant features for model performance enhancement. Conclusions: This study showcases the potential of ML-driven approaches for COVID-19 detection, emphasizing the role of feature selection in improving classification accuracy. The findings contribute to the advancement of diagnostic tools, offering a data-driven solution for rapid and reliable COVID-19 screening.
APA, Harvard, Vancouver, ISO, and other styles
38

Bhaskara, I. Made Wasanta, I. Putu Gede Hendra Suputra, I. Made Widiartha, I. Gusti Agung Gede Arya Kadyanan, I. Gusti Ngurah Anom Cahyadi Putra, and Ida Bagus Gede Dwidasmara. "Klasifikasi Serangan Distributed Denial of Service (DDoS) Menggunakan Random Forest Dengan CFS." JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) 11, no. 2 (2022): 215. http://dx.doi.org/10.24843/jlk.2022.v11.i02.p01.

Full text
Abstract:
Distributed Denial of Service (DDoS) attacks can have serious impacts on your organization and can cause enormous losses. This attack works by sending a computer or server an amount of requests that exceeds the capabilities of that computer. When classifying DDoS attacks in this study, feature selection is performed using correlation-based feature selection (CFS). The dataset used by the author in this study is CSE-CIC-IDS 2018. Feature selection on a dataset using CFS gets the results in the form of features related to the dataset. That is, a total of 31 features with a relationship score greater than 0.1. The average precision generated by the system using the random forest method and CFS function selection is 99.784%. Accuracy is the result of using the number of trees parameter with a value of 10. For a random forest model with no feature selection, the highest accuracy is 49.501%. This indicates that changing the random forest model parameters and selecting the CFS feature will affect high accuracy.
APA, Harvard, Vancouver, ISO, and other styles
39

Laborda, Juan, and Seyong Ryoo. "Feature Selection in a Credit Scoring Model." Mathematics 9, no. 7 (2021): 746. http://dx.doi.org/10.3390/math9070746.

Full text
Abstract:
This paper proposes different classification algorithms—logistic regression, support vector machine, K-nearest neighbors, and random forest—in order to identify which candidates are likely to default for a credit scoring model. Three different feature selection methods are used in order to mitigate the overfitting in the curse of dimensionality of these classification algorithms: one filter method (Chi-squared test and correlation coefficients) and two wrapper methods (forward stepwise selection and backward stepwise selection). The performances of these three methods are discussed using two measures, the mean absolute error and the number of selected features. The methodology is applied for a valuable database of Taiwan. The results suggest that forward stepwise selection yields superior performance in each one of the classification algorithms used. The conclusions obtained are related to those in the literature, and their managerial implications are analyzed.
APA, Harvard, Vancouver, ISO, and other styles
40

Zhao, X. M., Q. H. Hu, Y. G. Lei, and M. J. Zuo. "Vibration-based fault diagnosis of slurry pump impellers using neighbourhood rough set models." Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 224, no. 4 (2010): 995–1006. http://dx.doi.org/10.1243/09544062jmes1777.

Full text
Abstract:
Rough set models have been widely used as a method for feature selection in fault diagnosis. A neighbourhood rough set model can deal with both nominal and numerical features, but selecting the neighbourhood size for its application may be a challenge. In this article, the authors illustrate that using a common neighbourhood size for all features may overestimate or underestimate a feature's dependency degree. The neighbourhood rough set model is then modified by setting different neighbourhood sizes for different features. The modified model is applied to the fault diagnosis of slurry pump impellers. Results show that the chosen feature subsets generated by the modified neighbourhood rough set model can be physically explained by the corresponding flow patterns and can achieve higher classification accuracy than the raw feature subsets and the feature subsets generated by the original neighbourhood rough set model.
APA, Harvard, Vancouver, ISO, and other styles
41

Madan, Karun, Dr Kavita Taneja, and Dr Harmunish Taneja. "AI BASED FEATURE SELECTION MODEL FOR SOCCER SPORTS MANAGEMENT." International Journal of Engineering Science and Humanities 14, Special Issue 1 (2024): 38–42. http://dx.doi.org/10.62904/mf4qy057.

Full text
Abstract:
Due to swift development of data mining as well as machine-learning technology and the flare- up of big sports data mining expansion challenges, sports data mining cannot merely use data statistical methods such as how to club machine learning and data mining technology for efficient mining and analysis of the sports data, to supply useful advice for the public physical exercise, and this is an vital need to study. It is a kind of effective sports data mining work through feature selection algorithm. Around the tricky problems existing in the study of the sports effect, given the drawback of existing data sets and conventional research methods, this paper begins from data mining algorithm, construct the sports effect evaluation database, based on the feature selection scheme, using elastic system network algorithm, random forest algorithm, and the impact of sports on the outcome of physical gauges. The evaluation algorithm presents machine learning techniques and the feature selection algorithm to guide sports effect evaluation research. When studying this evaluation problem of the sports effect, according to created sports effect evaluation database, elastic system algorithm is appended to regularize, realize and optimize the feature selection. When selecting features of different sports skills using the information gains marked to rank the significance of characteristics, which can systematically and accurately provide the influence degree of the sports on diverse physical indicators, bring the physical fitness research little more scientific, and can uncover the effect of the sports as much as possible. Experimental results demonstrate that the selected features as well as ground-truth both have good accuracy and good evaluation as match up to the baseline method.
APA, Harvard, Vancouver, ISO, and other styles
42

B. R. Sathish, B. R. Sathish, and Radha Senthilkumar B. R. Sathish. "A Hybrid Algorithm for Feature Selection and Classification." 網際網路技術學刊 24, no. 3 (2023): 593–602. http://dx.doi.org/10.53106/160792642023052403004.

Full text
Abstract:
&lt;p&gt;With a recent spread of intelligent information systems, massive data collections with a lot of repeated and unintentional, unwanted interference oriented data are gathered and a huge feature set are being operated. Higher dimensional inputs, on the other hand, contain more correlated variables, which might have a negative impact on model performance. In our model a Hybrid method of selecting feature was developed by combining Binary Gravitational Search Particle Swarm Optimization (HBGSPSO) method with an Enhanced Convolution Neural Network Bidirectional Long Short Term Memory (ECNN-BiLSTM). In our proposed system, the Bidirectional Long Short Term Memory (BiLSTM) is introduced which extracts the hidden dynamic data and utilizes the memory cells to think of long-term historical data after the convolution process. In this paper, thirteen well-defined datasets are used from the machine learning database of UC Irvine to evaluate the efficiency of the proposed system. The experiments are conducted using K Nearest Neighbor (KNN) and Decision Tree (DT) which are used as classifiers to evaluate the outcome of selected features. The outcomes are contrasted and compared with the bio-enlivened calculations like Genetic Algorithm (GA), Grey Wolf Optimizer (GWO), and Optimization protocol using Particle Swarm Optimization (PSO).&lt;/p&gt; &lt;p&gt;&amp;nbsp;&lt;/p&gt;
APA, Harvard, Vancouver, ISO, and other styles
43

Prasetiyowati, Maria Irmina, Nur Ulfa Maulidevi, and Kridanto Surendro. "The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy." PeerJ Computer Science 8 (July 14, 2022): e1041. http://dx.doi.org/10.7717/peerj-cs.1041.

Full text
Abstract:
One of the significant purposes of building a model is to increase its accuracy within a shorter timeframe through the feature selection process. It is carried out by determining the importance of available features in a dataset using Information Gain (IG). The process is used to calculate the amounts of information contained in features with high values selected to accelerate the performance of an algorithm. In selecting informative features, a threshold value (cut-off) is used by the Information Gain (IG). Therefore, this research aims to determine the time and accuracy-performance needed to improve feature selection by integrating IG, the Fast Fourier Transform (FFT), and Synthetic Minor Oversampling Technique (SMOTE) methods. The feature selection model is then applied to the Random Forest, a tree-based machine learning algorithm with random feature selection. A total of eight datasets consisting of three balanced and five imbalanced datasets were used to conduct this research. Furthermore, the SMOTE found in the imbalance dataset was used to balance the data. The result showed that the feature selection using Information Gain, FFT, and SMOTE improved the performance accuracy of Random Forest.
APA, Harvard, Vancouver, ISO, and other styles
44

Alemneh, Girma Neshir, Hirut Bekele Ashagrie, and Lemlem K. Tegegne. "Feature Selection Methods for ICU Mortality Prediction Model." Journal of Computational Science and Data Analytics 01, no. 1 (2024): 14–38. http://dx.doi.org/10.69660/jcsda.01012402.

Full text
Abstract:
The goal of this research is to offer insightful information that can improve Ethiopia's intensive care unit (ICU) services. There is an increased risk of patients' death in Intensive Care Units (ICUs). This is because of several variables, including preexisting medical issues, lack of resources, and delayed decisions. Healthcare professionals can better prioritize their patients in need of intensive care, distribute resources more efficiently, and enhance patient outcomes by using predictive models to estimate ICU mortality. ICU data is collected from five Ethiopian public hospitals to develop a machine learning method for predicting ICU mortality. The data includes demographic features, vital signs, lab results, and discharge status of 10,798 ICU dataset records. We employed a range of feature selection techniques, such as filters, wrappers, and embedding methods, to identify the most crucial features for mortality prediction. We also compared the performance of two machine learning algorithms, Random Forest and Decision Tree. These models are trained using ICU data with features encompassing age, length of stay, temperature, neutrophil, Diagnosis (DX) condition, PH, and Lymphocite. These features are selected using Recursive Feature Elimination (RFE). Using a number of different evaluation methods, including accuracy (99.7%), precision (99.4%), recall (98.8%), F1 score (99.1%), and area under the curve (AUC) (99.3%), Random Forest performed better than Decision Tree. Based on our findings, we made recommendations for healthcare practitioners and policy makers. We also suggest key future research directions for researchers in the area.
APA, Harvard, Vancouver, ISO, and other styles
45

Morkonda Gunasekaran, Dinesh, and Prabha Dhandayudam. "Design of novel multi filter union feature selection framework for breast cancer dataset." Concurrent Engineering 29, no. 3 (2021): 285–90. http://dx.doi.org/10.1177/1063293x211016046.

Full text
Abstract:
Nowadays women are commonly diagnosed with breast cancer. Feature based Selection method plays an important step while constructing a classification based framework. We have proposed Multi filter union (MFU) feature selection method for breast cancer data set. The feature selection process based on random forest algorithm and Logistic regression (LG) algorithm based union model is used for selecting important features in the dataset. The performance of the data analysis is evaluated using optimal features subset from selected dataset. The experiments are computed with data set of Wisconsin diagnostic breast cancer center and next the real data set from women health care center. The result of the proposed approach shows high performance and efficient when comparing with existing feature selection algorithms.
APA, Harvard, Vancouver, ISO, and other styles
46

Yakovyna, Vitaliy S., and Ivan I. Symets. "Towards a software defect proneness model: feature selection." Applied Aspects of Information Technology 4, no. 4 (2021): 354–65. http://dx.doi.org/10.15276/aait.04.2021.5.

Full text
Abstract:
This article is focused on improving static models of software reliability based on using machine learning methods to select the software code metrics that most strongly affect its reliability. The study used a merged dataset from the PROMISE Software Engineering repository, which contained data on testing software modules of five programs and twenty-one code metrics. For the prepared sampling, the most important features that affect the quality of software code have been selected using the following methods of feature selection: Boruta, Stepwise selection, Exhaustive Feature Selection, Random Forest Importance, LightGBM Importance, Genetic Algorithms, Principal Component Analysis, Xverse python. Basing on the voting on the results of the work of the methods of feature selection, a static (deterministic) model of software reliability has been built, which establishes the relationship between the probability of a defect in the software module and the metrics of its code. It has been shown that this model includes such code metrics as branch count of a program, McCabe’s lines of code and cyclomatic complexity, Halstead’s total number of operators and operands, intelligence, volume, and effort value. A comparison of the effectiveness of different methods of feature selection has been put into practice, in particular, a study of the effect of the method of feature selection on the accuracy of classification using the following classifiers: Random Forest, Support Vector Machine, k-Nearest Neighbors, Decision Tree classifier, AdaBoost classifier, Gradient Boosting for classification. It has been shown that the use of any method of feature selection increases the accuracy of classification by at least ten percent compared to the original dataset, which confirms the importance of this procedure for predicting software defects based on metric datasets that contain a significant number of highly correlated software code metrics. It has been found that the best accuracy of the forecast for most classifiers was reached using a set of features obtained from the proposed static model of software reliability. In addition, it has been shown that it is also possible to use separate methods, such as Autoencoder, Exhaustive Feature Selection and Principal Component Analysis with an insignificant loss of classification and prediction accuracy
APA, Harvard, Vancouver, ISO, and other styles
47

Wang, Chung-Ying, Chien-Yao Huang, and Yen-Han Chiang. "Solutions of Feature and Hyperparameter Model Selection in the Intelligent Manufacturing." Processes 10, no. 5 (2022): 862. http://dx.doi.org/10.3390/pr10050862.

Full text
Abstract:
In the era of Industry 4.0, numerous AI technologies have been widely applied. However, implementation of the AI technology requires observation, analysis, and pre-processing of the obtained data, which takes up 60–90% of total time after data collection. Next, sensors and features are selected. Finally, the AI algorithms are used for clustering or classification. Despite the completion of data pre-processing, the subsequent feature selection and hyperparameter tuning in the AI model affect the sensitivity, accuracy, and robustness of the system. In this study, two novel approaches of sensor and feature selecting system, and hyperparameter tuning mechanisms are proposed. In the sensor and feature selecting system, the Shapley Additive ExPlanations model is used to calculate the contribution of individual features or sensors and to make the black-box AI model transparent, whereas, in the hyperparameter tuning mechanism, Hyperopt is used for tuning to improve model performance. Implementation of these two new systems is expected to reduce the problems in the processes of selection of the most sensitive features in the pre-processing stage, and tuning of hyperparameters, which are the most frequently occurring problems. Meanwhile, these methods are also applicable to the field of tool wear monitoring systems in intelligent manufacturing.
APA, Harvard, Vancouver, ISO, and other styles
48

Wang, Yuehan, Tong Li, Yongquan Cai, Zhenhu Ning, Fei Xue, and Di Jiao. "A Novel Anti-Obfuscation Model for Detecting Malicious Code." International Journal of Open Source Software and Processes 8, no. 2 (2017): 25–43. http://dx.doi.org/10.4018/ijossp.2017040102.

Full text
Abstract:
In this article, the authors present a new malicious code detection model. The detection model improves typical n-gram feature extraction algorithms that are easy to be obfuscated. Specifically, the proposed model can dynamically determine obfuscation features and then adjust the selection of meaningful features to improve corresponding machine learning analysis. The experimental results show that the feature database, which is built based on the proposed feature selection and cleaning method, contains a stable number of features and can automatically get rid of obfuscation features. Overall, the proposed detection model has features of long timeliness, high applicability and high accuracy of identification.
APA, Harvard, Vancouver, ISO, and other styles
49

Damodar, Patel, and Kumar Saxena Amit. "Feature Selection in High Dimension Datasets using Incremental Feature Clustering." Indian Journal of Science and Technology 17, no. 32 (2024): 3318–26. https://doi.org/10.17485/IJST/v17i32.2077.

Full text
Abstract:
Abstract <strong>Objectives:</strong>&nbsp;To develop a machine learning-based model to select the most important features from a high-dimensional dataset to classify the patterns at high accuracy and reduce their dimensionality.&nbsp;<strong>Methods:</strong>&nbsp;The proposed feature selection method (FSIFC) forms and combines feature clusters incrementally and produces feature subsets each time. The method uses K-means clustering and Mutual Information (MI) to refine the feature selection process iteratively. Initially, two clusters of features are formed using K-means clustering (K=2) by taking features as the basis of clustering instead of taking the patterns (a traditional way). From these two clusters, the features with the highest MI value in each cluster are kept in a feature subset. Classification accuracies (CA) of the feature subset are calculated using three classifiers namely Support Vector Machines (SVM), Random Forest (RF), and k-nearest Neighbor (knn). The process is repeated by incrementing the value of K i.e. number of clusters; until a maximum user-defined value of K is reached. The best value of CA obtained from these trials is recorded and the corresponding feature set is finally accepted.&nbsp;<strong>Findings:</strong>&nbsp;The proposed method is demonstrated using ten datasets and the results are compared with the existing published results using three classifiers to determine the method's performance. The ten datasets are classified with average CAs of 92.72%, 93.13%, and 91.5%, using the SVM, RF, and K-NN classifiers respectively. The proposed method selects a maximum of thirty features from the datasets. In terms of selecting the most effective and the smallest feature sets, the proposed method outperforms eight other feature selection methods considering CAs.&nbsp;<strong>Novelty:</strong>&nbsp;The proposed model applies feature reduction using combined feature clustering and filter methods in an incremental way. This provides an improved selection of relevant features while removing those which are irrelevant at different trials. <strong>Keywords:</strong> Feature selection, High-dimensional datasets, K-means algorithm, Mutual information, Machine learning
APA, Harvard, Vancouver, ISO, and other styles
50

Fauzi, M. Ali, Agus Zainal Arifin, and Sonny Christiano Gosaria. "Indonesian News Classification Using Naïve Bayes and Two-Phase Feature Selection Model." Indonesian Journal of Electrical Engineering and Computer Science 8, no. 3 (2017): 610. http://dx.doi.org/10.11591/ijeecs.v8.i3.pp610-615.

Full text
Abstract:
Since the rise of WWW, information available online is growing rapidly. One of the example is Indonesian online news. Therefore, automatic text classification became very important task for information filtering. One of the major issue in text classification is its high dimensionality of feature space. Most of the features are irrelevant, noisy, and redundant, which may decline the accuracy of the system. Hence, feature selection is needed. Maximal Marginal Relevance for Feature Selection (MMR-FS) has been proven to be a good feature selection for text with many redundant features, but it has high computational complexity. In this paper, we propose a two-phased feature selection method. In the first phase, to lower the complexity of MMR-FS we utilize Information Gain first to reduce features. This reduced feature will be selected using MMR-FS in the second phase. The experiment result showed that our new method can reach the best accuracy by 86%. This new method could lower the complexity of MMR-FS but still retain its accuracy.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography