To see the other types of publications on this topic, follow the link: Feature selection.

Journal articles on the topic 'Feature selection'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Feature selection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Huber, Florian, and Volker Steinhage. "Conditional Feature Selection: Evaluating Model Averaging When Selecting Features with Shapley Values." Geomatics 4, no. 3 (August 8, 2024): 286–310. http://dx.doi.org/10.3390/geomatics4030016.

Full text
Abstract:
In the field of geomatics, artificial intelligence (AI) and especially machine learning (ML) are rapidly transforming the field of geomatics with respect to collecting, managing, and analyzing spatial data. Feature selection as a building block in ML is crucial because it directly impacts the performance and predictive power of a model by selecting the most critical variables and eliminating the redundant and irrelevant ones. Random forests have now been used for decades and allow for building models with high accuracy. However, finding the most expressive features from the dataset by selecting the most important features within random forests is still a challenging question. The often-used internal Gini importances of random forests are based on the amount of training examples that are divided by a feature but fail to acknowledge the magnitude of change in the target variable, leading to suboptimal selections. Shapley values are an established and unified framework for feature attribution, i.e., specifying how much each feature in a trained ML model contributes to the predictions for a given instance. Previous studies highlight the effectiveness of Shapley values for feature selection in real-world applications, while other research emphasizes certain theoretical limitations. This study provides an application-driven discussion of Shapley values for feature selection by first proposing four necessary conditions for a successful feature selection with Shapley values that are extracted from a multitude of critical research in the field. Given these valuable conditions, Shapley value feature selection is nevertheless a model averaging procedure by definition, where unimportant features can alter the final selection. Therefore, we additionally present Conditional Feature Selection (CFS) as a novel algorithm for performing feature selection that mitigates this problem and use it to evaluate the impact of model averaging in several real-world examples, covering the use of ML in geomatics. The results of this study show Shapley values as a good measure for feature selection when compared with Gini feature importances on four real-world examples, improving the RMSE by 5% when averaged over selections of all possible subset sizes. An even better selection can be achieved by CFS, improving on the Gini selection by approximately 7.5% in terms of RMSE. For random forests, Shapley value calculation can be performed in polynomial time, offering an advantage over the exponential runtime of CFS, building a trade-off to the lost accuracy in feature selection due to model averaging.
APA, Harvard, Vancouver, ISO, and other styles
2

Usha, P., and J. G. R. Sathiaseelan. "Enhanced Filtrate Feature Selection Algorithm for Feature Subset Generation." Indian Journal Of Science And Technology 17, no. 29 (July 31, 2024): 3002–11. http://dx.doi.org/10.17485/ijst/v17i29.2127.

Full text
Abstract:
Objectives: In the bioinformatics field feature selection plays a vital role in selecting relevant features for making better decisions and assessment of disease diagnosis. Brain Tumour (BT) is the second leading disease in the world. Most of the BT detection techniques are based on Magnetic Resonance (MR) images. Methods: In this paper, medical reports are used in the detection of BT to increase the surveillance of patients. To improve the accuracy of predictive models, a new adaptive technique called Enhanced Filtrate Feature Selection (EFFS) algorithm for optimal feature selection is proposed. Initially, the EFFS algorithm finds the dependency of each attribute and feature score by using Mutual Information Gain, Chi-Square, Correlation, and Fishore score filter methods. Afterward, the occurrence rate of each top-ranked attribute is filtered by applying threshold value and obtaining the optimal feature by using the Pareto principle. Findings: The performance of the selected optimal features is evaluated by time complexity, number of features selected, and accuracy. The efficiency of the proposed algorithm is measured and analyzed in a high-quality optimal subset based on a Random Forest classifier integrated with the ranking of attributes. The EFFS algorithm selects 39 out of 46 significant and relevant features with minimum selection time and shows 99.31 % of accuracy for BT, 29 features with 99.47% of accuracy for Breast Cancer, 15 features with 94.61% of accuracy for Lung Cancer, 15 features with 98.84% of accuracy for Diabetes and 43 features with 90% of accuracy for Covid-19 dataset. Novelty: To decrease the processing time and improve the performance of a model feature selection process will be done at the initial stages for the betterment of the classification task. Thus, the proposed EFFS algorithm is applied to different datasets based on medical reports and EFFS outperforms with greater performance measurements and time. The appropriate feature selection techniques help to diagnose diseases in the prior phase and increase the survival of human beings. Keywords: Bioinformatics, Brain Tumour, Chi­Square, Correlation, EFFS, Feature Selection, Fishore Score, Information Gain, Optimal Features, Random Forest
APA, Harvard, Vancouver, ISO, and other styles
3

Xindong Wu, Kui Yu, Wei Ding, Hao Wang, and Xingquan Zhu. "Online Feature Selection with Streaming Features." IEEE Transactions on Pattern Analysis and Machine Intelligence 35, no. 5 (May 2013): 1178–92. http://dx.doi.org/10.1109/tpami.2012.197.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Jundong, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. "Feature Selection." ACM Computing Surveys 50, no. 6 (January 12, 2018): 1–45. http://dx.doi.org/10.1145/3136625.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Sutherland, Stuart. "Feature selection." Nature 392, no. 6674 (March 1998): 350. http://dx.doi.org/10.1038/32817.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Patel, Damodar, and Amit Kumar Saxena. "Feature Selection in High Dimension Datasets using Incremental Feature Clustering." Indian Journal Of Science And Technology 17, no. 32 (August 24, 2024): 3318–26. http://dx.doi.org/10.17485/ijst/v17i32.2077.

Full text
Abstract:
Objectives: To develop a machine learning-based model to select the most important features from a high-dimensional dataset to classify the patterns at high accuracy and reduce their dimensionality. Methods: The proposed feature selection method (FSIFC) forms and combines feature clusters incrementally and produces feature subsets each time. The method uses K-means clustering and Mutual Information (MI) to refine the feature selection process iteratively. Initially, two clusters of features are formed using K-means clustering (K=2) by taking features as the basis of clustering instead of taking the patterns (a traditional way). From these two clusters, the features with the highest MI value in each cluster are kept in a feature subset. Classification accuracies (CA) of the feature subset are calculated using three classifiers namely Support Vector Machines (SVM), Random Forest (RF), and k-nearest Neighbor (knn). The process is repeated by incrementing the value of K i.e. number of clusters; until a maximum user-defined value of K is reached. The best value of CA obtained from these trials is recorded and the corresponding feature set is finally accepted. Findings: The proposed method is demonstrated using ten datasets and the results are compared with the existing published results using three classifiers to determine the method's performance. The ten datasets are classified with average CAs of 92.72%, 93.13%, and 91.5%, using the SVM, RF, and K-NN classifiers respectively. The proposed method selects a maximum of thirty features from the datasets. In terms of selecting the most effective and the smallest feature sets, the proposed method outperforms eight other feature selection methods considering CAs. Novelty: The proposed model applies feature reduction using combined feature clustering and filter methods in an incremental way. This provides an improved selection of relevant features while removing those which are irrelevant at different trials. Keywords: Feature selection, High-dimensional datasets, K-means algorithm, Mutual information, Machine learning
APA, Harvard, Vancouver, ISO, and other styles
7

Wang, Gang, Yang Zhao, Jiasi Zhang, and Yongjie Ning. "A Novel End-To-End Feature Selection and Diagnosis Method for Rotating Machinery." Sensors 21, no. 6 (March 15, 2021): 2056. http://dx.doi.org/10.3390/s21062056.

Full text
Abstract:
Feature selection is to obtain effective features from data, also known as feature engineering. Traditional feature selection and predictive model learning are separated, and there is a problem of inconsistency of criteria. This paper presents an end-to-end feature selection and diagnosis method that organically unifies feature expression learning and machine prediction learning into one model. The algorithm first combines the prediction model to calculate the mean impact value (MIVs) of the feature and realizes primary feature selection for the prediction model by selecting the feature with a larger MIV. In order to take into account the performance of the feature itself, the within-class and between-class discriminant analysis (WBDA) method is proposed, and combined with the feature diversity strategy, the feature-oriented secondary selection is realized. Eventually, feature vectors obtained by two selections are classified using a multi-class support vector machine (SVM). Compared with the modified network variable selection algorithm (MIVs), the principal component analysis dimensionality reduction algorithm (PCA), variable selection based on compensative distance evaluation technology (CDET), and other algorithms, the proposed method MIVs-WBDA exhibits excellent classification accuracy owing to the fusion of feature selection and predictive model learning. According to the results of classification accuracy testing after dimensionality reduction on rotating machinery status, the MIVs-WBDA method has a 3% classification accuracy improvement under the low-dimensional feature set. The typical running time of this classification learning algorithm is less than 10 s, while using deep learning, its running time will be more than a few hours.
APA, Harvard, Vancouver, ISO, and other styles
8

Fahrudy, Dony, and Shofwatul 'Uyun. "Classification of Student Graduation using Naïve Bayes by Comparing between Random Oversampling and Feature Selections of Information Gain and Forward Selection." JOIV : International Journal on Informatics Visualization 6, no. 4 (December 31, 2022): 798. http://dx.doi.org/10.30630/joiv.6.4.982.

Full text
Abstract:
Class-imbalanced data with high attribute dimensions in datasets frequently contribute to issues in a classification process as this can affect algorithms’ performance in the computing process because there are imbalanced numbers of data in each class and irrelevant attributes that must be processed; therefore, this needs for some techniques to overcome the class-imbalanced data and feature selection to reduce data complexity and irrelevant features. Therefore, this study applied random oversampling (ROs) method to overcome the class-imbalanced data and two feature selections (information gain and forward selection) compared to determine which feature selection is superior, more effective and more appropriate to apply. The results of feature selection then were used to classify the student graduation by creating a classification model of Naïve Bayes algorithm. This study indicated an increase in the average accuracy of the Naïve Bayes method without the ROs preprocessing and the feature selection (81.83%), with the ROs (83.84%), with information gain with 3 selected features (86.03%) and forward selection with 2 selected features (86.42%); consequently, these led to increasing accuracy of 4.2% from no pre-processing to information gain and 4.59% from no pre-processing to forward selection. Therefore, the best feature selection was the forward selection with 2 selected features (GPA of the 8th semester and the overall GPA), and the ROs and both feature selections were proven to improve the performance of the Naïve Bayes method.
APA, Harvard, Vancouver, ISO, and other styles
9

Kar Hoou, Hui, Ooi Ching Sheng, Lim Meng Hee, and Leong Mohd Salman. "Feature selection tree for automated machinery fault diagnosis." MATEC Web of Conferences 255 (2019): 02004. http://dx.doi.org/10.1051/matecconf/201925502004.

Full text
Abstract:
Intelligent machinery fault diagnosis commonly utilises statistical features of sensor signals as the inputs for its machine learning algorithm. Due to the abundance of statistical features that can be extracted from raw signals and the accuracy of inserting all the available features into the machine learning algorithm for machinery fault classification, less accurate fault classification may inadvertently result due to overfitting issues. It is therefore only by selecting the most representative features that overfitting outcomes can be avoided and classification accuracy be improved. Currently, the genetic algorithm (GA) is regarded as the most commonly used and reliable feature selection tool for the improvement of accuracy for any machine learning algorithm. However, the greatest challenge for GA is that it may fall into a local optima and be computationally demanding. To overcome this limitation, a feature selection tree (FST) is here proposed. Numerous experimental dataset feature selections were executed using FST and GA; their performance is compared and discussed. Analysis showed that the proposed FST resulted in identical or superior optimal feature subsets when compared to the renowned GA method, but with a 20-time faster simulation period. The proposed FST is therefore more efficient in performing feature selection task than GA.
APA, Harvard, Vancouver, ISO, and other styles
10

Heriyanto, Heriyanto, and Dyah Ayu Irawati. "Comparison of Mel Frequency Cepstral Coefficient (MFCC) Feature Extraction, With and Without Framing Feature Selection, to Test the Shahada Recitation." RSF Conference Series: Engineering and Technology 1, no. 1 (December 23, 2021): 335–54. http://dx.doi.org/10.31098/cset.v1i1.395.

Full text
Abstract:
Voice research for feature extraction using MFCC. Introduction with feature extraction as the first step to get features. Features need to be done further through feature selection. The feature selection in this research used the Dominant Weight feature for the Shahada voice, which produced frames and cepstral coefficients as the feature extraction. The cepstral coefficient was used from 0 to 23 or 24 cepstral coefficients. At the same time, the taken frame consisted of 0 to 10 frames or eleven frames. Voting as many as 300 samples of recorded voices were tested on 200 voices of both male and female voice recordings. The frequency used was 44.100 kHz 16-bit stereo. This research aimed to gain accuracy by selecting the right features on the frame using MFCC feature extraction and matching accuracy with frame feature selection using the Dominant Weight Normalization (NBD). The accuracy results obtained that the MFCC method with the selection of the 9th frame had a higher accuracy rate of 86% compared to other frames. The MFCC without feature selection had an average of 60%. The conclusion was that selecting the right features in the 9th frame impacted the accuracy of the voice of shahada recitation.
APA, Harvard, Vancouver, ISO, and other styles
11

Subrahmanyam, Somashekar R. "Fixturing features selection in feature-based systems." Computers in Industry 48, no. 2 (June 2002): 99–108. http://dx.doi.org/10.1016/s0166-3615(02)00037-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Zhou, Peng, Shu Zhao, Yuanting Yan, and Xindong Wu. "Online Scalable Streaming Feature Selection via Dynamic Decision." ACM Transactions on Knowledge Discovery from Data 16, no. 5 (October 31, 2022): 1–20. http://dx.doi.org/10.1145/3502737.

Full text
Abstract:
Feature selection is one of the core concepts in machine learning, which hugely impacts the model’s performance. For some real-world applications, features may exist in a stream mode that arrives one by one over time, while we cannot know the exact number of features before learning. Online streaming feature selection aims at selecting optimal stream features at each timestamp on the fly. Without the global information of the entire feature space, most of the existing methods select stream features in terms of individual feature information or the comparison of features in pairs. This article proposes a new online scalable streaming feature selection framework from the dynamic decision perspective that is scalable on running time and selected features by dynamic threshold adjustment. Regarding the philosophy of “Thinking-in-Threes”, we classify each new arrival feature as selecting, discarding, or delaying, aiming at minimizing the overall decision risks. With the dynamic updating of global statistical information, we add the selecting features into the candidate feature subset, ignore the discarding features, cache the delaying features into the undetermined feature subset, and wait for more information. Meanwhile, we perform the redundancy analysis for the candidate features and uncertainty analysis for the undetermined features. Extensive experiments on eleven real-world datasets demonstrate the efficiency and scalability of our new framework compared with state-of-the-art algorithms.
APA, Harvard, Vancouver, ISO, and other styles
13

Ramineni, Vyshnavi, and Goo-Rak Kwon. "Diagnosis of Alzheimer’s Disease using Wrapper Feature Selection Method." Korean Institute of Smart Media 12, no. 3 (April 30, 2023): 30–37. http://dx.doi.org/10.30693/smj.2023.12.3.30.

Full text
Abstract:
Alzheimer’s disease (AD) symptoms are being treated by early diagnosis, where we can only slow the symptoms and research is still undergoing. In consideration, using T1-weighted images several classification models are proposed in Machine learning to identify AD. In this paper, we consider the improvised feature selection, to reduce the complexity by using wrapping techniques and Restricted Boltzmann Machine (RBM). This present work used the subcortical and cortical features of 278 subjects from the ADNI dataset to identify AD and sMRI. Multi-class classification is used for the experiment i.e., AD, EMCI, LMCI, HC. The proposed feature selection consists of Forward feature selection, Backward feature selection, and Combined PCA & RBM. Forward and backward feature selection methods use an iterative method starting being no features in the forward feature selection and backward feature selection with all features included in the technique. PCA is used to reduce the dimensions and RBM is used to select the best feature without interpreting the features. We have compared the three models with PCA to analysis. The following experiment shows that combined PCA &RBM, and backward feature selection give the best accuracy with respective classification model RF i.e., 88.65, 88.56% respectively.
APA, Harvard, Vancouver, ISO, and other styles
14

Wang, Jun, Yuanyuan Xu, Hengpeng Xu, Zhe Sun, Zhenglu Yang, and Jinmao Wei. "An Effective Multi-Label Feature Selection Model Towards Eliminating Noisy Features." Applied Sciences 10, no. 22 (November 15, 2020): 8093. http://dx.doi.org/10.3390/app10228093.

Full text
Abstract:
Feature selection has devoted a consistently great amount of effort to dimension reduction for various machine learning tasks. Existing feature selection models focus on selecting the most discriminative features for learning targets. However, this strategy is weak in handling two kinds of features, that is, the irrelevant and redundant ones, which are collectively referred to as noisy features. These features may hamper the construction of optimal low-dimensional subspaces and compromise the learning performance of downstream tasks. In this study, we propose a novel multi-label feature selection approach by embedding label correlations (dubbed ELC) to address these issues. Particularly, we extract label correlations for reliable label space structures and employ them to steer feature selection. In this way, label and feature spaces can be expected to be consistent and noisy features can be effectively eliminated. An extensive experimental evaluation on public benchmarks validated the superiority of ELC.
APA, Harvard, Vancouver, ISO, and other styles
15

Balcarras, Matthew, Salva Ardid, Daniel Kaping, Stefan Everling, and Thilo Womelsdorf. "Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness." Journal of Cognitive Neuroscience 28, no. 2 (February 2016): 333–49. http://dx.doi.org/10.1162/jocn_a_00894.

Full text
Abstract:
Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.
APA, Harvard, Vancouver, ISO, and other styles
16

Li, Haiguang, Xindong Wu, Zhao Li, and Wei Ding. "Online Group Feature Selection from Feature Streams." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (June 29, 2013): 1627–28. http://dx.doi.org/10.1609/aaai.v27i1.8516.

Full text
Abstract:
Standard feature selection algorithms deal with given candidate feature sets at the individual feature level. When features exhibit certain group structures, it is beneficial to conduct feature selection in a grouped manner. For high-dimensional features, it could be far more preferable to online generate and process features one at a time rather than wait for generating all features before learning begins. In this paper, we discuss a new and interesting problem of online group feature selection from feature streams at both the group and individual feature levels simultaneously from a feature stream. Extensive experiments on both real-world and synthetic datasets demonstrate the superiority of the proposed algorithm.
APA, Harvard, Vancouver, ISO, and other styles
17

Zhao, Zheng, Lei Wang, and Huan Liu. "Efficient Spectral Feature Selection with Minimum Redundancy." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 3, 2010): 673–78. http://dx.doi.org/10.1609/aaai.v24i1.7671.

Full text
Abstract:
Spectral feature selection identifies relevant features by measuring their capability of preserving sample similarity. It provides a powerful framework for both supervised and unsupervised feature selection, and has been proven to be effective in many real-world applications. One common drawback associated with most existing spectral feature selection algorithms is that they evaluate features individually and cannot identify redundant features. Since redundant features can have significant adverse effect on learning performance, it is necessary to address this limitation for spectral feature selection. To this end, we propose a novel spectral feature selection algorithm to handle feature redundancy, adopting an embedded model. The algorithm is derived from a formulation based on a sparse multi-output regression with a L2,1-norm constraint. We conduct theoretical analysis on the properties of its optimal solutions, paving the way for designing an efficient path-following solver. Extensive experiments show that the proposed algorithm can do well in both selecting relevant features and removing redundancy.
APA, Harvard, Vancouver, ISO, and other styles
18

V, Venkatesh, Sharan S B, Mahalaxmy S, Monisha S, Ashick Sanjey D S, and Ashokkumar P. "A Class Specific Feature Selection Method for Improving the Performance of Text Classification." Scalable Computing: Practice and Experience 25, no. 2 (February 24, 2024): 1018–28. http://dx.doi.org/10.12694/scpe.v25i2.2502.

Full text
Abstract:
Recently, a significant amount of research work has been carried out in the field of feature selection. Although these methods help to increase the accuracy of the machine learning classification, the selected subset of features considers all the classes and may not select recommendable features for a particular class. The main goal of our paper is to propose a new class-specific feature selection algorithm that is capable of selecting an appropriate subset of features for each class. In this regard, we first perform class binarization and then select the best features for each class. During the feature selection process, we deal with class imbalance problems and redundancy elimination. The Weighted Average Voting Ensemble method is used for the final classification. Finally, we carry out experiments to compare our proposed feature selection approach with the existing popular feature selection methods. The results prove that our feature selection method outperforms the existing methods with an accuracy of more than 37%.
APA, Harvard, Vancouver, ISO, and other styles
19

Gramegna, Alex, and Paolo Giudici. "Shapley Feature Selection." FinTech 1, no. 1 (February 25, 2022): 72–80. http://dx.doi.org/10.3390/fintech1010006.

Full text
Abstract:
Feature selection is a popular topic. The main approaches to deal with it fall into the three main categories of filters, wrappers and embedded methods. Advancement in algorithms, though proving fruitful, may be not enough. We propose to integrate an explainable AI approach, based on Shapley values, to provide more accurate information for feature selection. We test our proposal in a real setting, which concerns the prediction of the probability of default of Small and Medium Enterprises. Our results show that the integrated approach may indeed prove fruitful to some feature selection methods, in particular more parsimonious ones like LASSO. In general the combination of approaches seems to provide useful information which feature selection algorithm can improve their performance with.
APA, Harvard, Vancouver, ISO, and other styles
20

Somol, P., and P. Pudil. "Feature selection toolbox." Pattern Recognition 35, no. 12 (December 2002): 2749–59. http://dx.doi.org/10.1016/s0031-3203(01)00245-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Ramze Rezaee, M., B. Goedhart, B. P. F. Lelieveldt, and J. H. C. Reiber. "Fuzzy feature selection." Pattern Recognition 32, no. 12 (December 1999): 2011–19. http://dx.doi.org/10.1016/s0031-3203(99)00005-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Liu, H., E. R. Dougherty, J. G. Dy, K. Torkkola, E. Tuv, H. Peng, C. Ding, et al. "Evolving feature selection." IEEE Intelligent Systems 20, no. 6 (November 2005): 64–76. http://dx.doi.org/10.1109/mis.2005.105.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

de Souza, Jerffeson Teixeira, Stan Matwin, and Nathalie Japkowicz. "Parallelizing Feature Selection." Algorithmica 45, no. 3 (May 24, 2006): 433–56. http://dx.doi.org/10.1007/s00453-006-1220-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Moran, Michal, and Goren Gordon. "Curious Feature Selection." Information Sciences 485 (June 2019): 42–54. http://dx.doi.org/10.1016/j.ins.2019.02.009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Yang, Yanyan, Degang Chen, Xiao Zhang, Zhenyan Ji, and Yingjun Zhang. "Incremental feature selection by sample selection and feature-based accelerator." Applied Soft Computing 121 (May 2022): 108800. http://dx.doi.org/10.1016/j.asoc.2022.108800.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Luque-Rodriguez, Maria, Jose Molina-Baena, Alfonso Jimenez-Vilchez, and Antonio Arauzo-Azofra. "Initialization of Feature Selection Search for Classification." Journal of Artificial Intelligence Research 75 (November 27, 2022): 953–83. http://dx.doi.org/10.1613/jair.1.14015.

Full text
Abstract:
Selecting the best features in a dataset improves accuracy and efficiency of classifiers in a learning process. Datasets generally have more features than necessary, some of them being irrelevant or redundant to others. For this reason, numerous feature selection methods have been developed, in which different evaluation functions and measures are applied. This paper proposes the systematic application of individual feature evaluation methods to initialize search-based feature subset selection methods. An exhaustive review of the starting methods used by genetic algorithms from 2014 to 2020 has been carried out. Subsequently, an in-depth empirical study has been carried out evaluating the proposal for different search-based feature selection methods (Sequential forward and backward selection, Las Vegas filter and wrapper, Simulated Annealing and Genetic Algorithms). Since the computation time is reduced and the classification accuracy with the selected features is improved, the initialization of feature selection proposed in this work is proved to be worth considering while designing any feature selection algorithms.
APA, Harvard, Vancouver, ISO, and other styles
27

Zabidi, A., W. Mansor, and Khuan Y. Lee. "Optimal Feature Selection Technique for Mel Frequency Cepstral Coefficient Feature Extraction in Classifying Infant Cry with Asphyxia." Indonesian Journal of Electrical Engineering and Computer Science 6, no. 3 (June 1, 2017): 646. http://dx.doi.org/10.11591/ijeecs.v6.i3.pp646-655.

Full text
Abstract:
<p>Mel Frequency Cepstral Coefficient is an efficient feature representation method for extracting human-audible audio signals. However, its representation of features is large and redundant. Therefore, feature selection is required to select the optimal subset of Mel Frequency Cepstral Coefficient features. The performance of two types of feature selection techniques; Orthogonal Least Squares and F-ratio for selecting Mel Frequency Cepstral Coefficient features of infant cry with asphyxia was examined. OLS selects the feature subset based on their contribution to the reduction of error, while F-Ratio selects them according to their discriminative abilities. The feature selection techniques were combined with Multilayer Perceptron to distinguish between asphyxiated infant cry and normal cry signals. The performance of the feature selection methods was examined by analysing the Multilayer Perceptron classification accuracy resulted from the combination of the feature selection techniques and Multilayer Perceptron. The results indicate that Orthogonal Least Squares is the most suitable feature selection method in classifying infant cry with asphyxia since it produces the highest classification accuracy.<em></em></p>
APA, Harvard, Vancouver, ISO, and other styles
28

Mitra, P., C. A. Murthy, and S. K. Pal. "Unsupervised feature selection using feature similarity." IEEE Transactions on Pattern Analysis and Machine Intelligence 24, no. 3 (March 2002): 301–12. http://dx.doi.org/10.1109/34.990133.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Yen Huang, Jia. "Feature Selection for Cloud Computing Patents Classification." International Journal of Social Science and Humanity 6, no. 7 (July 2016): 541–46. http://dx.doi.org/10.7763/ijssh.2016.v6.707.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Tatwani, Shaveta, and Ela Kumar. "Parametric Comparison of Various Feature Selection Techniques." Journal of Advanced Research in Dynamical and Control Systems 11, no. 10-SPECIAL ISSUE (October 31, 2019): 1180–90. http://dx.doi.org/10.5373/jardcs/v11sp10/20192961.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Xu Yuan, Xu Yuan, Jeng-Shyang Pan Xu Yuan, Ai-Qing Tian Jeng-Shyang Pan, and Shu-Chuan Chu Ai-Qing Tian. "Binary Sparrow Search Algorithm for Feature Selection." 網際網路技術學刊 24, no. 2 (March 2023): 217–32. http://dx.doi.org/10.53106/160792642023032402001.

Full text
Abstract:
<>The sparrow search algorithm (SSA) is a novel intelligent optimization algorithm that simulates the foraging and anti-predation behavior of sparrows. The sparrow search algorithm (SSA) can optimize continuous problems, but in reality many problems are binary problems. In this paper, the binary sparrow search algorithm (BSSA) is proposed to solve binary optimization problems, such as feature selection. The transfer function is crucial to BSSA and it directly affects the performance of BSSA. This paper proposes three new transfer functions to improve the performance of BSSA. Mathematical analysis revealed that the original SSA scroungers position update equation is no longer adapted to BSSA. This paper improves the position update equation. We compared BSSA with BPSO, BGWO, and BBA algorithms, and tested on 23 benchmark functions. In addition, statistical analysis of the experimental results, Friedman test and Wilcoxon rank-sum test were performed to verify the effectiveness of BSSA. Finally, the algorithm was used to successfully implement feature selection and obtain satisfactory results in the UCI data set.<>
APA, Harvard, Vancouver, ISO, and other styles
32

Muthukrishnan, R., and C. K. James. "The Effect of Multicollinearity on Feature Selection." Indian Journal Of Science And Technology 17, no. 35 (September 9, 2024): 3664–68. http://dx.doi.org/10.17485/ijst/v17i35.1876.

Full text
Abstract:
Objectives: To provide a new LASSO-based feature selection technique that aids in selecting important variables for predicting the response variables in case of multicollinearity. Methods: LASSO is a type of regression method employed to select important covariates for predicting a dependent variable. The traditional LASSO method uses the conventional Ordinary Least Square (OLS) method for this purpose. The Use of the OLS based LASSO approach gives unreliable results if the data deviates from normality. Thus, this study recommends using, a Redescending M-estimator-based LASSO approach. The efficacy of this new method is checked against the ordinary LASSO method using a real dataset and also a simulation study with various levels of sample size (N=100,200,1000), different numbers of predictors (p=10,15,20), and varying degrees of correlation (ρ = 0.96, 0.98, 0.999). Findings: The usual OLS-based LASSO finds it difficult to select important variables when the independent variables are correlated. The Redescending M-estimator-based LASSO addresses at tackling the pitfalls faced by Conventional LASSO methodology. Among other things, the proposed method is far better than the old-fashioned LASSO since it helps to pick out significant factors more effectively, particularly in the presence of multicollinearity. Novelty: The conventional OLS-based LASSO approach selects a greater number of non-significant variables in the presence of multicollinearity. The proposed Redescending M-estimator-based LASSO approach selects the important variables in the presence of multicollinearity. Keywords: Feature Selection, LASSO, MDAE, VIF, Variable Selection
APA, Harvard, Vancouver, ISO, and other styles
33

Porebski, Alice, Vinh Truong Hoang, Nicolas Vandenbroucke, and Denis Hamad. "Combination of LBP Bin and Histogram Selections for Color Texture Classification." Journal of Imaging 6, no. 6 (June 23, 2020): 53. http://dx.doi.org/10.3390/jimaging6060053.

Full text
Abstract:
LBP (Local Binary Pattern) is a very popular texture descriptor largely used in computer vision. In most applications, LBP histograms are exploited as texture features leading to a high dimensional feature space, especially for color texture classification problems. In the past few years, different solutions were proposed to reduce the dimension of the feature space based on the LBP histogram. Most of these approaches apply feature selection methods in order to find the most discriminative bins. Recently another strategy proposed selecting the most discriminant LBP histograms in their entirety. This paper tends to improve on these previous approaches, and presents a combination of LBP bin and histogram selections, where a histogram ranking method is applied before processing a bin selection procedure. The proposed approach is evaluated on five benchmark image databases and the obtained results show the effectiveness of the combination of LBP bin and histogram selections which outperforms the simple LBP bin and LBP histogram selection approaches when they are applied independently.
APA, Harvard, Vancouver, ISO, and other styles
34

Baskar, S. S., and Dr L. Arockiam. "A Novel LAS-Relief Feature Selection Algorithm for Enhancing Classification Accuracy in Data mining." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 11, no. 8 (October 23, 2013): 2921–27. http://dx.doi.org/10.24297/ijct.v11i8.7047.

Full text
Abstract:
Feature selection is an important task in data mining and machine learning domain. The main objective of feature selection is to find a relevant feature that predicts the knowledge better than the original set of features. This can be achieved by removing irrelevant or redundant features from original data sets. Feature selection involves a significant task of selecting relevant features from the feature space for data mining and pattern recognition. In this paper, the new approach has been introduced on feature selection on Relief based on Median Variance model. The new approach is named as LAS-Relief algorithm. This algorithm facilitates to stabilise the feature weights estimation compared to mean variance based Relief algorithm and is considered to be a better successful algorithm for feature selection. The random selection of instances in the data sets will lead to the fluctuation of weight estimation. This in turn leads to poor evaluation accuracy. This new approach removes the irrelevant features in the feature space. The novel LAS-Relief algorithm incorporates the median variance in the feature weight estimation. The feature weight is calculated by selection of the instances in random. To overcome this issue, the novel feature selection algorithm called LAS-Relief algorithm is proposed based on median variance. This algorithm takes both the median and the variance of difference between instances. These are considered as the criterion of feature weight estimation in this LAS Relief algorithm.. This algorithm makes the result more stable and more accurate on classification. The relevant features are obtained from the original feature space using LAS-Relief algorithm, which outperforms well than Mean Variance Relief algorithm.
APA, Harvard, Vancouver, ISO, and other styles
35

Han, Yuanyuan, Lan Huang, and Fengfeng Zhou. "Zoo: Selecting Transcriptomic and Methylomic Biomarkers by Ensembling Animal-Inspired Swarm Intelligence Feature Selection Algorithms." Genes 12, no. 11 (November 18, 2021): 1814. http://dx.doi.org/10.3390/genes12111814.

Full text
Abstract:
Biological omics data such as transcriptomes and methylomes have the inherent “large p small n” paradigm, i.e., the number of features is much larger than that of the samples. A feature selection (FS) algorithm selects a subset of the transcriptomic or methylomic biomarkers in order to build a better prediction model. The hidden patterns in the FS solution space make it challenging to achieve a feature subset with satisfying prediction performances. Swarm intelligence (SI) algorithms mimic the target searching behaviors of various animals and have demonstrated promising capabilities in selecting features with good machine learning performances. Our study revealed that different SI-based feature selection algorithms contributed complementary searching capabilities in the FS solution space, and their collaboration generated a better feature subset than the individual SI feature selection algorithms. Nine SI-based feature selection algorithms were integrated to vote for the selected features, which were further refined by the dynamic recursive feature elimination framework. In most cases, the proposed Zoo algorithm outperformed the existing feature selection algorithms on transcriptomics and methylomics datasets.
APA, Harvard, Vancouver, ISO, and other styles
36

Feng, Chao, Chao Qian, and Ke Tang. "Unsupervised Feature Selection by Pareto Optimization." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 3534–41. http://dx.doi.org/10.1609/aaai.v33i01.33013534.

Full text
Abstract:
Dimensionality reduction is often employed to deal with the data with a huge number of features, which can be generally divided into two categories: feature transformation and feature selection. Due to the interpretability, the efficiency during inference and the abundance of unlabeled data, unsupervised feature selection has attracted much attention. In this paper, we consider its natural formulation, column subset selection (CSS), which is to minimize the reconstruction error of a data matrix by selecting a subset of features. We propose an anytime randomized iterative approach POCSS, which minimizes the reconstruction error and the number of selected features simultaneously. Its approximation guarantee is well bounded. Empirical results exhibit the superior performance of POCSS over the state-of-the-art algorithms.
APA, Harvard, Vancouver, ISO, and other styles
37

Venkatesh, B., and J. Anuradha. "Fuzzy Rank Based Parallel Online Feature Selection Method using Multiple Sliding Windows." Open Computer Science 11, no. 1 (January 1, 2021): 275–87. http://dx.doi.org/10.1515/comp-2020-0169.

Full text
Abstract:
Abstract Nowadays, in real-world applications, the dimensions of data are generated dynamically, and the traditional batch feature selection methods are not suitable for streaming data. So, online streaming feature selection methods gained more attention but the existing methods had demerits like low classification accuracy, fails to avoid redundant and irrelevant features, and a higher number of features selected. In this paper, we propose a parallel online feature selection method using multiple sliding-windows and fuzzy fast-mRMR feature selection analysis, which is used for selecting minimum redundant and maximum relevant features, and also overcomes the drawbacks of existing online streaming feature selection methods. To increase the performance speed of the proposed method parallel processing is used. To evaluate the performance of the proposed online feature selection method k-NN, SVM, and Decision Tree Classifiers are used and compared against the state-of-the-art online feature selection methods. Evaluation metrics like Accuracy, Precision, Recall, F1-Score are used on benchmark datasets for performance analysis. From the experimental analysis, it is proved that the proposed method has achieved more than 95% accuracy for most of the datasets and performs well over other existing online streaming feature selection methods and also, overcomes the drawbacks of the existing methods.
APA, Harvard, Vancouver, ISO, and other styles
38

Paul, Dipanjyoti, Rahul Kumar, Sriparna Saha, and Jimson Mathew. "Multi-objective Cuckoo Search-based Streaming Feature Selection for Multi-label Dataset." ACM Transactions on Knowledge Discovery from Data 15, no. 6 (May 19, 2021): 1–24. http://dx.doi.org/10.1145/3447586.

Full text
Abstract:
The feature selection method is the process of selecting only relevant features by removing irrelevant or redundant features amongst the large number of features that are used to represent data. Nowadays, many application domains especially social media networks, generate new features continuously at different time stamps. In such a scenario, when the features are arriving in an online fashion, to cope up with the continuous arrival of features, the selection task must also have to be a continuous process. Therefore, the streaming feature selection based approach has to be incorporated, i.e., every time a new feature or a group of features arrives, the feature selection process has to be invoked. Again, in recent years, there are many application domains that generate data where samples may belong to more than one classes called multi-label dataset. The multiple labels that the instances are being associated with, may have some dependencies amongst themselves. Finding the co-relation amongst the class labels helps to select the discriminative features across multiple labels. In this article, we develop streaming feature selection methods for multi-label data where the multiple labels are reduced to a lower-dimensional space. The similar labels are grouped together before performing the selection method to improve the selection quality and to make the model time efficient. The multi-objective version of the cuckoo search-based approach is used to select the optimal feature set. The proposed method develops two versions of the streaming feature selection method: ) when the features arrive individually and ) when the features arrive in the form of a batch. Various multi-label datasets from various domains such as text, biology, and audio have been used to test the developed streaming feature selection methods. The proposed methods are compared with many previous feature selection methods and from the comparison, the superiority of using multiple objectives and label co-relation in the feature selection process can be established.
APA, Harvard, Vancouver, ISO, and other styles
39

Solovei, Olga. "NEW ORGANIZATION PROCESS OF FEATURE SELECTION BY FILTER WITH CORRELATION-BASED FEATURES SELECTION METHOD." Innovative Technologies and Scientific Solutions for Industries, no. 3 (21) (November 18, 2022): 39–50. http://dx.doi.org/10.30837/itssi.2022.21.039.

Full text
Abstract:
The subject of the article is feature selection techniques that are used on data preprocessing step before building machine learning models. In this paper the focus is put on a Filter technique when it uses Correlation-based Feature Selection (further CFS) with symmetrical uncertainty method (further CFS-SU) or CFS with Pearson Correlation (further CFS-PearCorr). The goal of the work is to increase the efficiency of feature selection by Filter with CFS by proposing a new organization process of feature selection. The tasks which are solved in the article: review and analysis of the existing organization process of feature selections by Filter with CFS; identify the routs cause the performance degradation; propose a new approach; evaluate the proposed approach. To implement the specified tasks, the following methods were used: information theory, process theory, algorithm theory, statistics theory, sampling techniques, data modeling theory, science experiments. Results. Based on the received results are proved: 1) the chosen features subset’s evaluation function couldn’t be based only on CFS merit as it causes a learning algorithm’s results degradation; 2) the accuracies of the classification learning algorithms had improved and the values of determination coefficient of the regression leaning algorithms had increased when features are selected according to the proposed new organization process. Conclusions. A new organization process for feature selection which is proposed in current work combines filter and learning algorithm properties in evaluation strategy which helps to choose the optimal feature subset for predefined learning algorithm. The computation complexity of the proposed approach to feature selection doesn’t depend on dataset’s dimensions which makes it robust to different data varieties; it eliminates the time needed for feature subsets’ search as subsets are selected randomly. The conducted experiments proved that the performance of the classification and regression learning algorithms with features selected according to the new flow had outperformed the performance of the same learning algorithms built with without applied new process on data preprocessing step.
APA, Harvard, Vancouver, ISO, and other styles
40

Nasib, Salmun K., Fadilah Istiqomah Pammus, Nurwan, and La Ode Nashar. "COMPARISON OF FEATURE SELECTION BASED ON COMPUTATION TIME AND CLASSIFICATION ACCURACY USING SUPPORT VECTOR MACHINE." Indonesian Journal of Applied Research (IJAR) 4, no. 1 (April 18, 2023): 63–74. http://dx.doi.org/10.30997/ijar.v4i1.252.

Full text
Abstract:
The goal of this research to compare Chi-Square feature selection with Mutual Information feature selection based on computation time and classification accuracy. In this research, people's comments on Twitter are classified based on positive, negative, and neutral sentiments using the Support Vector Machine method. Sentiment classification has the disadvantage that it has many features that are used, therefore feature selection is needed to optimize a sentiment classification performance. Chi-square feature selection and mutual information feature selection are feature selections that both can improve the accuracy of sentiment classification. How to collect the data on twitter taken using the IDE application from python. The results of this study indicate that sentiment classification using Chi-Square feature selection produces a computation time of 0.4375 seconds with an accuracy of 78% while sentiment classification using Mutual Information feature selection produces an accuracy of 80% with a required computation time of 252.75 seconds. So that the conclusion are obtained based on the computational time aspect, the Chi-Square feature selection is superior to the Mutual Information feature selection, while based on the classification accuracy aspect, the Mutual Information feature selection is more accurate than the Chi-Square feature selection. The recommendations for further research can use mutual information feature selection to get high accuracy results on sentiment classification
APA, Harvard, Vancouver, ISO, and other styles
41

Guru, D. S., N. Vinay Kumar, and Mahamad Suhil. "Feature Selection of Interval Valued Data Through Interval K-Means Clustering." International Journal of Computer Vision and Image Processing 7, no. 2 (April 2017): 64–80. http://dx.doi.org/10.4018/ijcvip.2017040105.

Full text
Abstract:
This paper introduces a novel feature selection model for supervised interval valued data based on interval K-Means clustering. The proposed model explores two kinds of feature selection through feature clustering viz., class independent feature selection and class dependent feature selection. The former one clusters the features spread across all the samples belonging to all the classes, whereas the latter one clusters the features spread across only the samples belonging to the respective classes. Both feature selection models are demonstrated to explore the generosity of clustering in selecting the interval valued features. For clustering, the kernel of the K-means clustering has been altered to operate on interval valued data. For experimentation purpose four standard benchmarking datasets and three symbolic classifiers have been used. To corroborate the effectiveness of the proposed model, a comparative analysis against the state-of-the-art models is given and results show the superiority of the proposed model.
APA, Harvard, Vancouver, ISO, and other styles
42

Ismi, Dewi Pramudi, Shireen Panchoo, and Murinto Murinto. "K-means clustering based filter feature selection on high dimensional data." International Journal of Advances in Intelligent Informatics 2, no. 1 (March 31, 2016): 38. http://dx.doi.org/10.26555/ijain.v2i1.54.

Full text
Abstract:
With hundreds or thousands of features in high dimensional data, computational workload is challenging. In classification process, features which do not contribute significantly to prediction of classes, add to the computational workload. Therefore the aim of this paper is to use feature selection to decrease the computation load by reducing the size of high dimensional data. Selecting subsets of features which represent all features were used. Hence the process is two-fold; discarding irrelevant data and choosing one feature that representing a number of redundant features. There have been many studies regarding feature selection, for example backward feature selection and forward feature selection. In this study, a k-means clustering based feature selection is proposed. It is assumed that redundant features are located in the same cluster, whereas irrelevant features do not belong to any clusters. In this research, two different high dimensional datasets are used: 1) the Human Activity Recognition Using Smartphones (HAR) Dataset, containing 7352 data points each of 561 features and 2) the National Classification of Economic Activities Dataset, which contains 1080 data points each of 857 features. Both datasets provide class label information of each data point. Our experiment shows that k-means clustering based feature selection can be performed to produce subset of features. The latter returns more than 80% accuracy of classification result.
APA, Harvard, Vancouver, ISO, and other styles
43

Matharaarachchi, Surani, Mike Domaratzki, and Saman Muthukumarana. "Minimizing features while maintaining performance in data classification problems." PeerJ Computer Science 8 (September 14, 2022): e1081. http://dx.doi.org/10.7717/peerj-cs.1081.

Full text
Abstract:
High dimensional classification problems have gained increasing attention in machine learning, and feature selection has become essential in executing machine learning algorithms. In general, most feature selection methods compare the scores of several feature subsets and select the one that gives the maximum score. There may be other selections of a lower number of features with a lower score, yet the difference is negligible. This article proposes and applies an extended version of such feature selection methods, which selects a smaller feature subset with similar performance to the original subset under a pre-defined threshold. It further validates the suggested extended version of the Principal Component Loading Feature Selection (PCLFS-ext) results by simulating data for several practical scenarios with different numbers of features and different imbalance rates on several classification methods. Our simulated results show that the proposed method outperforms the original PCLFS and existing Recursive Feature Elimination (RFE) by giving reasonable feature reduction on various data sets, which is important in some applications.
APA, Harvard, Vancouver, ISO, and other styles
44

Santiko, Irfan, and Ikhsan Honggo. "Naive Bayes Algorithm Using Selection of Correlation Based Featured Selections Features for Chronic Diagnosis Disease." IJIIS: International Journal of Informatics and Information Systems 2, no. 2 (September 1, 2019): 56–60. http://dx.doi.org/10.47738/ijiis.v2i2.14.

Full text
Abstract:
Chronic kidney disease is a disease that can cause death, because the pathophysiological etiology resulting in a progressive decline in renal function, and ends in kidney failure. Chronic Kidney Disease (CKD) has now become a serious problem in the world. Kidney and urinary tract diseases have caused the death of 850,000 people each year. This suggests that the disease was ranked the 12th highest mortality rate. Some studies in the field of health including one with chronic kidney disease have been carried out to detect the disease early, In this study, testing the Naive Bayes algorithm to detect the disease on patients who tested positive for negative CKD and CKD. From the results of the test algorithm accuracy value will be compared against the results of the algorithm accuracy before use and after feature selection using feature selection Featured Correlation Based Selection (CFS), it is known that Naive Bayes algorithm after feature selection that is 93.58%, while the naive Bayes without feature selection the result is 93.54% accuracy. Seeing the value of a second accuracy testing Naive Bayes algorithm without using the feature selection and feature selection, testing both these algorithms including the classification is very good, because the accuracy value above 0.90 to 1.00. Included in the excellent classification. higher accuracy results.
APA, Harvard, Vancouver, ISO, and other styles
45

SIEDLECKI, WOJCIECH, and JACK SKLANSKY. "ON AUTOMATIC FEATURE SELECTION." International Journal of Pattern Recognition and Artificial Intelligence 02, no. 02 (June 1988): 197–220. http://dx.doi.org/10.1142/s0218001488000145.

Full text
Abstract:
We review recent research on methods for selecting features for multidimensional pattern classification. These methods include nonmonotonicity-tolerant branch-and-bound search and beam search. We describe the potential benefits of Monte Carlo approaches such as simulated annealing and genetic algorithms. We compare these methods to facilitate the planning of future research on feature selection.
APA, Harvard, Vancouver, ISO, and other styles
46

Fitrianah, Devi, and Hisyam Fahmi. "THE IDENTIFICATION OF DETERMINANT PARAMETER IN FOREST FIRE BASED ON FEATURE SELECTION ALGORITHMS." SINERGI 23, no. 3 (October 11, 2019): 184. http://dx.doi.org/10.22441/sinergi.2019.3.002.

Full text
Abstract:
This research conducts studies of the use of the Sequential Forward Floating Selection (SFFS) Algorithm and Sequential Backward Floating Selection (SBFS) Algorithm as the feature selection algorithms in the Forest Fire case study. With the supporting data that become the features of the forest fire case, we obtained information regarding the kinds of features that are very significant and influential in the event of a forest fire. Data used are weather data and land coverage of each area where the forest fire occurs. Based on the existing data, ten features were included in selecting the features using both feature selection methods. The result of the Sequential Forward Floating Selection method shows that earth surface temperature is the most significant and influential feature in regards to forest fire, while, based on the result of the Sequential Backward Feature Selection method, cloud coverage, is the most significant. Referring to the results from a total of 100 tests, the average accuracy of the Sequential Forward Floating Selection method is 96.23%. It surpassed the 82.41% average accuracy percentage of the Sequential Backward Floating Selection method.
APA, Harvard, Vancouver, ISO, and other styles
47

Gakii, Consolata, Paul O. Mireji, and Richard Rimiru. "Graph Based Feature Selection for Reduction of Dimensionality in Next-Generation RNA Sequencing Datasets." Algorithms 15, no. 1 (January 10, 2022): 21. http://dx.doi.org/10.3390/a15010021.

Full text
Abstract:
Analysis of high-dimensional data, with more features (p) than observations (N) (p>N), places significant demand in cost and memory computational usage attributes. Feature selection can be used to reduce the dimensionality of the data. We used a graph-based approach, principal component analysis (PCA) and recursive feature elimination to select features for classification from RNAseq datasets from two lung cancer datasets. The selected features were discretized for association rule mining where support and lift were used to generate informative rules. Our results show that the graph-based feature selection improved the performance of sequential minimal optimization (SMO) and multilayer perceptron classifiers (MLP) in both datasets. In association rule mining, features selected using the graph-based approach outperformed the other two feature-selection techniques at a support of 0.5 and lift of 2. The non-redundant rules reflect the inherent relationships between features. Biological features are usually related to functions in living systems, a relationship that cannot be deduced by feature selection and classification alone. Therefore, the graph-based feature-selection approach combined with rule mining is a suitable way of selecting and finding associations between features in high-dimensional RNAseq data.
APA, Harvard, Vancouver, ISO, and other styles
48

Mweshi, George. "Feature Selection using Genetic Programming." Zambia ICT Journal 3, no. 2 (November 30, 2019): 11–18. http://dx.doi.org/10.33260/zictjournal.v3i2.62.

Full text
Abstract:
Extracting useful and novel information from the large amount of collected data has become a necessity for corporations wishing to maintain a competitive advantage. One of the biggest issues in handling these significantly large datasets is the curse of dimensionality. As the dimension of the data increases, the performance of the data mining algorithms employed to mine the data deteriorates. This deterioration is mainly caused by the large search space created as a result of having irrelevant, noisy and redundant features in the data. Feature selection is one of the various techniques that can be used to remove these unnecessary features. Feature selection consequently reduces the dimension of the data as well as the search space which in turn increases the efficiency and the accuracy of the mining algorithms. In this paper, we investigate the ability of Genetic Programming (GP), an evolutionary algorithm searching strategy capable of automatically finding solutions in complex and large search spaces, to perform feature selection. We implement a basic GP algorithm and perform feature selection on 5 benchmark classification datasets from UCI repository. To test the competitiveness and feasibility of the GP approach, we examine the classification performance of four classifiers namely J48, Naives Bayes, PART, and Random Forests using the GP selected features, all the original features and the features selected by the other commonly used feature selection techniques i.e. principal component analysis, information gain, relief-f and cfs. The experimental results show that not only does GP select a smaller set of features from the original features, classifiers using GP selected features achieve a better classification performance than using all the original features. Furthermore, compared to the other well-known feature selection techniques, GP achieves very competitive results.
APA, Harvard, Vancouver, ISO, and other styles
49

Chaudhry, Muhammad Umar, Muhammad Yasir, Muhammad Nabeel Asghar, and Jee-Hyong Lee. "Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets." Entropy 22, no. 10 (September 29, 2020): 1093. http://dx.doi.org/10.3390/e22101093.

Full text
Abstract:
The complexity and high dimensionality are the inherent concerns of big data. The role of feature selection has gained prime importance to cope with the issue by reducing dimensionality of datasets. The compromise between the maximum classification accuracy and the minimum dimensions is as yet an unsolved puzzle. Recently, Monte Carlo Tree Search (MCTS)-based techniques have been invented that have attained great success in feature selection by constructing a binary feature selection tree and efficiently focusing on the most valuable features in the features space. However, one challenging problem associated with such approaches is a tradeoff between the tree search and the number of simulations. In a limited number of simulations, the tree might not meet the sufficient depth, thus inducing biasness towards randomness in feature subset selection. In this paper, a new algorithm for feature selection is proposed where multiple feature selection trees are built iteratively in a recursive fashion. The state space of every successor feature selection tree is less than its predecessor, thus increasing the impact of tree search in selecting best features, keeping the MCTS simulations fixed. In this study, experiments are performed on 16 benchmark datasets for validation purposes. We also compare the performance with state-of-the-art methods in literature both in terms of classification accuracy and the feature selection ratio.
APA, Harvard, Vancouver, ISO, and other styles
50

K, Bhuvaneswari. "Filter Based Sentiment Feature Selection Using Back Propagation Deep Learning." Journal of Computer Sciences and Informatics 2, no. 1 (2025): 15. https://doi.org/10.5455/jcsi.20241216054507.

Full text
Abstract:
Aim: The proposed Filter Based Sentiment Feature Selection (FBSFS) model focuses on to improve the performance of Sentiment Learning (SL) by selecting the most relevant sentiment features from text reviews using feature selection methods at document level. Method: Sentiment Learning is applied at the document level for classifying text reviews into two categories either positive or negative. The key sentiment features adjectives (ADJ), adverbs (ADV), and verbs (VRB) which are essential for sentiment analysis, are extracted from text document using the WordNet dictionary. Feature selection is performed by applying four different algorithms: Information Gain, Correlation, Gini Index, and Chi-Square. These algorithms help identify the most significant features that contribute to sentiment classification. The selected features are then fed into a Back Propagation Deep Learning (BPDL) classification model for sentiment analysis. Result: The experimental findings show that the proposed model achieved higher accuracy of 91.15% using Correlation feature selection. This accuracy signifies the effectiveness of the proposed model in classifying text reviews, outperforming other methods in terms of sentiment feature selection and classification. Conclusion: The proposed model enhances the performance of sentiment learning by selecting the most relevant sentiment features, particularly those extracted from adjectives, adverbs, and verbs, and combining them with BPDL. The FBSFS model as a robust tool for sentiment classification.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography