Segui questo link per vedere altri tipi di pubblicazioni sul tema: Feature selection.

Articoli di riviste sul tema "Feature selection"

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Vedi i top-50 articoli di riviste per l'attività di ricerca sul tema "Feature selection".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Vedi gli articoli di riviste di molte aree scientifiche e compila una bibliografia corretta.

1

Huber, Florian, e Volker Steinhage. "Conditional Feature Selection: Evaluating Model Averaging When Selecting Features with Shapley Values". Geomatics 4, n. 3 (8 agosto 2024): 286–310. http://dx.doi.org/10.3390/geomatics4030016.

Testo completo
Abstract (sommario):
In the field of geomatics, artificial intelligence (AI) and especially machine learning (ML) are rapidly transforming the field of geomatics with respect to collecting, managing, and analyzing spatial data. Feature selection as a building block in ML is crucial because it directly impacts the performance and predictive power of a model by selecting the most critical variables and eliminating the redundant and irrelevant ones. Random forests have now been used for decades and allow for building models with high accuracy. However, finding the most expressive features from the dataset by selecting the most important features within random forests is still a challenging question. The often-used internal Gini importances of random forests are based on the amount of training examples that are divided by a feature but fail to acknowledge the magnitude of change in the target variable, leading to suboptimal selections. Shapley values are an established and unified framework for feature attribution, i.e., specifying how much each feature in a trained ML model contributes to the predictions for a given instance. Previous studies highlight the effectiveness of Shapley values for feature selection in real-world applications, while other research emphasizes certain theoretical limitations. This study provides an application-driven discussion of Shapley values for feature selection by first proposing four necessary conditions for a successful feature selection with Shapley values that are extracted from a multitude of critical research in the field. Given these valuable conditions, Shapley value feature selection is nevertheless a model averaging procedure by definition, where unimportant features can alter the final selection. Therefore, we additionally present Conditional Feature Selection (CFS) as a novel algorithm for performing feature selection that mitigates this problem and use it to evaluate the impact of model averaging in several real-world examples, covering the use of ML in geomatics. The results of this study show Shapley values as a good measure for feature selection when compared with Gini feature importances on four real-world examples, improving the RMSE by 5% when averaged over selections of all possible subset sizes. An even better selection can be achieved by CFS, improving on the Gini selection by approximately 7.5% in terms of RMSE. For random forests, Shapley value calculation can be performed in polynomial time, offering an advantage over the exponential runtime of CFS, building a trade-off to the lost accuracy in feature selection due to model averaging.
Gli stili APA, Harvard, Vancouver, ISO e altri
2

Usha, P., e J. G. R. Sathiaseelan. "Enhanced Filtrate Feature Selection Algorithm for Feature Subset Generation". Indian Journal Of Science And Technology 17, n. 29 (31 luglio 2024): 3002–11. http://dx.doi.org/10.17485/ijst/v17i29.2127.

Testo completo
Abstract (sommario):
Objectives: In the bioinformatics field feature selection plays a vital role in selecting relevant features for making better decisions and assessment of disease diagnosis. Brain Tumour (BT) is the second leading disease in the world. Most of the BT detection techniques are based on Magnetic Resonance (MR) images. Methods: In this paper, medical reports are used in the detection of BT to increase the surveillance of patients. To improve the accuracy of predictive models, a new adaptive technique called Enhanced Filtrate Feature Selection (EFFS) algorithm for optimal feature selection is proposed. Initially, the EFFS algorithm finds the dependency of each attribute and feature score by using Mutual Information Gain, Chi-Square, Correlation, and Fishore score filter methods. Afterward, the occurrence rate of each top-ranked attribute is filtered by applying threshold value and obtaining the optimal feature by using the Pareto principle. Findings: The performance of the selected optimal features is evaluated by time complexity, number of features selected, and accuracy. The efficiency of the proposed algorithm is measured and analyzed in a high-quality optimal subset based on a Random Forest classifier integrated with the ranking of attributes. The EFFS algorithm selects 39 out of 46 significant and relevant features with minimum selection time and shows 99.31 % of accuracy for BT, 29 features with 99.47% of accuracy for Breast Cancer, 15 features with 94.61% of accuracy for Lung Cancer, 15 features with 98.84% of accuracy for Diabetes and 43 features with 90% of accuracy for Covid-19 dataset. Novelty: To decrease the processing time and improve the performance of a model feature selection process will be done at the initial stages for the betterment of the classification task. Thus, the proposed EFFS algorithm is applied to different datasets based on medical reports and EFFS outperforms with greater performance measurements and time. The appropriate feature selection techniques help to diagnose diseases in the prior phase and increase the survival of human beings. Keywords: Bioinformatics, Brain Tumour, Chi­Square, Correlation, EFFS, Feature Selection, Fishore Score, Information Gain, Optimal Features, Random Forest
Gli stili APA, Harvard, Vancouver, ISO e altri
3

Xindong Wu, Kui Yu, Wei Ding, Hao Wang e Xingquan Zhu. "Online Feature Selection with Streaming Features". IEEE Transactions on Pattern Analysis and Machine Intelligence 35, n. 5 (maggio 2013): 1178–92. http://dx.doi.org/10.1109/tpami.2012.197.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
4

Li, Jundong, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang e Huan Liu. "Feature Selection". ACM Computing Surveys 50, n. 6 (12 gennaio 2018): 1–45. http://dx.doi.org/10.1145/3136625.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
5

Sutherland, Stuart. "Feature selection". Nature 392, n. 6674 (marzo 1998): 350. http://dx.doi.org/10.1038/32817.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
6

Patel, Damodar, e Amit Kumar Saxena. "Feature Selection in High Dimension Datasets using Incremental Feature Clustering". Indian Journal Of Science And Technology 17, n. 32 (24 agosto 2024): 3318–26. http://dx.doi.org/10.17485/ijst/v17i32.2077.

Testo completo
Abstract (sommario):
Objectives: To develop a machine learning-based model to select the most important features from a high-dimensional dataset to classify the patterns at high accuracy and reduce their dimensionality. Methods: The proposed feature selection method (FSIFC) forms and combines feature clusters incrementally and produces feature subsets each time. The method uses K-means clustering and Mutual Information (MI) to refine the feature selection process iteratively. Initially, two clusters of features are formed using K-means clustering (K=2) by taking features as the basis of clustering instead of taking the patterns (a traditional way). From these two clusters, the features with the highest MI value in each cluster are kept in a feature subset. Classification accuracies (CA) of the feature subset are calculated using three classifiers namely Support Vector Machines (SVM), Random Forest (RF), and k-nearest Neighbor (knn). The process is repeated by incrementing the value of K i.e. number of clusters; until a maximum user-defined value of K is reached. The best value of CA obtained from these trials is recorded and the corresponding feature set is finally accepted. Findings: The proposed method is demonstrated using ten datasets and the results are compared with the existing published results using three classifiers to determine the method's performance. The ten datasets are classified with average CAs of 92.72%, 93.13%, and 91.5%, using the SVM, RF, and K-NN classifiers respectively. The proposed method selects a maximum of thirty features from the datasets. In terms of selecting the most effective and the smallest feature sets, the proposed method outperforms eight other feature selection methods considering CAs. Novelty: The proposed model applies feature reduction using combined feature clustering and filter methods in an incremental way. This provides an improved selection of relevant features while removing those which are irrelevant at different trials. Keywords: Feature selection, High-dimensional datasets, K-means algorithm, Mutual information, Machine learning
Gli stili APA, Harvard, Vancouver, ISO e altri
7

Wang, Gang, Yang Zhao, Jiasi Zhang e Yongjie Ning. "A Novel End-To-End Feature Selection and Diagnosis Method for Rotating Machinery". Sensors 21, n. 6 (15 marzo 2021): 2056. http://dx.doi.org/10.3390/s21062056.

Testo completo
Abstract (sommario):
Feature selection is to obtain effective features from data, also known as feature engineering. Traditional feature selection and predictive model learning are separated, and there is a problem of inconsistency of criteria. This paper presents an end-to-end feature selection and diagnosis method that organically unifies feature expression learning and machine prediction learning into one model. The algorithm first combines the prediction model to calculate the mean impact value (MIVs) of the feature and realizes primary feature selection for the prediction model by selecting the feature with a larger MIV. In order to take into account the performance of the feature itself, the within-class and between-class discriminant analysis (WBDA) method is proposed, and combined with the feature diversity strategy, the feature-oriented secondary selection is realized. Eventually, feature vectors obtained by two selections are classified using a multi-class support vector machine (SVM). Compared with the modified network variable selection algorithm (MIVs), the principal component analysis dimensionality reduction algorithm (PCA), variable selection based on compensative distance evaluation technology (CDET), and other algorithms, the proposed method MIVs-WBDA exhibits excellent classification accuracy owing to the fusion of feature selection and predictive model learning. According to the results of classification accuracy testing after dimensionality reduction on rotating machinery status, the MIVs-WBDA method has a 3% classification accuracy improvement under the low-dimensional feature set. The typical running time of this classification learning algorithm is less than 10 s, while using deep learning, its running time will be more than a few hours.
Gli stili APA, Harvard, Vancouver, ISO e altri
8

Fahrudy, Dony, e Shofwatul 'Uyun. "Classification of Student Graduation using Naïve Bayes by Comparing between Random Oversampling and Feature Selections of Information Gain and Forward Selection". JOIV : International Journal on Informatics Visualization 6, n. 4 (31 dicembre 2022): 798. http://dx.doi.org/10.30630/joiv.6.4.982.

Testo completo
Abstract (sommario):
Class-imbalanced data with high attribute dimensions in datasets frequently contribute to issues in a classification process as this can affect algorithms’ performance in the computing process because there are imbalanced numbers of data in each class and irrelevant attributes that must be processed; therefore, this needs for some techniques to overcome the class-imbalanced data and feature selection to reduce data complexity and irrelevant features. Therefore, this study applied random oversampling (ROs) method to overcome the class-imbalanced data and two feature selections (information gain and forward selection) compared to determine which feature selection is superior, more effective and more appropriate to apply. The results of feature selection then were used to classify the student graduation by creating a classification model of Naïve Bayes algorithm. This study indicated an increase in the average accuracy of the Naïve Bayes method without the ROs preprocessing and the feature selection (81.83%), with the ROs (83.84%), with information gain with 3 selected features (86.03%) and forward selection with 2 selected features (86.42%); consequently, these led to increasing accuracy of 4.2% from no pre-processing to information gain and 4.59% from no pre-processing to forward selection. Therefore, the best feature selection was the forward selection with 2 selected features (GPA of the 8th semester and the overall GPA), and the ROs and both feature selections were proven to improve the performance of the Naïve Bayes method.
Gli stili APA, Harvard, Vancouver, ISO e altri
9

Kar Hoou, Hui, Ooi Ching Sheng, Lim Meng Hee e Leong Mohd Salman. "Feature selection tree for automated machinery fault diagnosis". MATEC Web of Conferences 255 (2019): 02004. http://dx.doi.org/10.1051/matecconf/201925502004.

Testo completo
Abstract (sommario):
Intelligent machinery fault diagnosis commonly utilises statistical features of sensor signals as the inputs for its machine learning algorithm. Due to the abundance of statistical features that can be extracted from raw signals and the accuracy of inserting all the available features into the machine learning algorithm for machinery fault classification, less accurate fault classification may inadvertently result due to overfitting issues. It is therefore only by selecting the most representative features that overfitting outcomes can be avoided and classification accuracy be improved. Currently, the genetic algorithm (GA) is regarded as the most commonly used and reliable feature selection tool for the improvement of accuracy for any machine learning algorithm. However, the greatest challenge for GA is that it may fall into a local optima and be computationally demanding. To overcome this limitation, a feature selection tree (FST) is here proposed. Numerous experimental dataset feature selections were executed using FST and GA; their performance is compared and discussed. Analysis showed that the proposed FST resulted in identical or superior optimal feature subsets when compared to the renowned GA method, but with a 20-time faster simulation period. The proposed FST is therefore more efficient in performing feature selection task than GA.
Gli stili APA, Harvard, Vancouver, ISO e altri
10

Heriyanto, Heriyanto, e Dyah Ayu Irawati. "Comparison of Mel Frequency Cepstral Coefficient (MFCC) Feature Extraction, With and Without Framing Feature Selection, to Test the Shahada Recitation". RSF Conference Series: Engineering and Technology 1, n. 1 (23 dicembre 2021): 335–54. http://dx.doi.org/10.31098/cset.v1i1.395.

Testo completo
Abstract (sommario):
Voice research for feature extraction using MFCC. Introduction with feature extraction as the first step to get features. Features need to be done further through feature selection. The feature selection in this research used the Dominant Weight feature for the Shahada voice, which produced frames and cepstral coefficients as the feature extraction. The cepstral coefficient was used from 0 to 23 or 24 cepstral coefficients. At the same time, the taken frame consisted of 0 to 10 frames or eleven frames. Voting as many as 300 samples of recorded voices were tested on 200 voices of both male and female voice recordings. The frequency used was 44.100 kHz 16-bit stereo. This research aimed to gain accuracy by selecting the right features on the frame using MFCC feature extraction and matching accuracy with frame feature selection using the Dominant Weight Normalization (NBD). The accuracy results obtained that the MFCC method with the selection of the 9th frame had a higher accuracy rate of 86% compared to other frames. The MFCC without feature selection had an average of 60%. The conclusion was that selecting the right features in the 9th frame impacted the accuracy of the voice of shahada recitation.
Gli stili APA, Harvard, Vancouver, ISO e altri
11

Subrahmanyam, Somashekar R. "Fixturing features selection in feature-based systems". Computers in Industry 48, n. 2 (giugno 2002): 99–108. http://dx.doi.org/10.1016/s0166-3615(02)00037-4.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
12

Zhou, Peng, Shu Zhao, Yuanting Yan e Xindong Wu. "Online Scalable Streaming Feature Selection via Dynamic Decision". ACM Transactions on Knowledge Discovery from Data 16, n. 5 (31 ottobre 2022): 1–20. http://dx.doi.org/10.1145/3502737.

Testo completo
Abstract (sommario):
Feature selection is one of the core concepts in machine learning, which hugely impacts the model’s performance. For some real-world applications, features may exist in a stream mode that arrives one by one over time, while we cannot know the exact number of features before learning. Online streaming feature selection aims at selecting optimal stream features at each timestamp on the fly. Without the global information of the entire feature space, most of the existing methods select stream features in terms of individual feature information or the comparison of features in pairs. This article proposes a new online scalable streaming feature selection framework from the dynamic decision perspective that is scalable on running time and selected features by dynamic threshold adjustment. Regarding the philosophy of “Thinking-in-Threes”, we classify each new arrival feature as selecting, discarding, or delaying, aiming at minimizing the overall decision risks. With the dynamic updating of global statistical information, we add the selecting features into the candidate feature subset, ignore the discarding features, cache the delaying features into the undetermined feature subset, and wait for more information. Meanwhile, we perform the redundancy analysis for the candidate features and uncertainty analysis for the undetermined features. Extensive experiments on eleven real-world datasets demonstrate the efficiency and scalability of our new framework compared with state-of-the-art algorithms.
Gli stili APA, Harvard, Vancouver, ISO e altri
13

Ramineni, Vyshnavi, e Goo-Rak Kwon. "Diagnosis of Alzheimer’s Disease using Wrapper Feature Selection Method". Korean Institute of Smart Media 12, n. 3 (30 aprile 2023): 30–37. http://dx.doi.org/10.30693/smj.2023.12.3.30.

Testo completo
Abstract (sommario):
Alzheimer’s disease (AD) symptoms are being treated by early diagnosis, where we can only slow the symptoms and research is still undergoing. In consideration, using T1-weighted images several classification models are proposed in Machine learning to identify AD. In this paper, we consider the improvised feature selection, to reduce the complexity by using wrapping techniques and Restricted Boltzmann Machine (RBM). This present work used the subcortical and cortical features of 278 subjects from the ADNI dataset to identify AD and sMRI. Multi-class classification is used for the experiment i.e., AD, EMCI, LMCI, HC. The proposed feature selection consists of Forward feature selection, Backward feature selection, and Combined PCA & RBM. Forward and backward feature selection methods use an iterative method starting being no features in the forward feature selection and backward feature selection with all features included in the technique. PCA is used to reduce the dimensions and RBM is used to select the best feature without interpreting the features. We have compared the three models with PCA to analysis. The following experiment shows that combined PCA &RBM, and backward feature selection give the best accuracy with respective classification model RF i.e., 88.65, 88.56% respectively.
Gli stili APA, Harvard, Vancouver, ISO e altri
14

Wang, Jun, Yuanyuan Xu, Hengpeng Xu, Zhe Sun, Zhenglu Yang e Jinmao Wei. "An Effective Multi-Label Feature Selection Model Towards Eliminating Noisy Features". Applied Sciences 10, n. 22 (15 novembre 2020): 8093. http://dx.doi.org/10.3390/app10228093.

Testo completo
Abstract (sommario):
Feature selection has devoted a consistently great amount of effort to dimension reduction for various machine learning tasks. Existing feature selection models focus on selecting the most discriminative features for learning targets. However, this strategy is weak in handling two kinds of features, that is, the irrelevant and redundant ones, which are collectively referred to as noisy features. These features may hamper the construction of optimal low-dimensional subspaces and compromise the learning performance of downstream tasks. In this study, we propose a novel multi-label feature selection approach by embedding label correlations (dubbed ELC) to address these issues. Particularly, we extract label correlations for reliable label space structures and employ them to steer feature selection. In this way, label and feature spaces can be expected to be consistent and noisy features can be effectively eliminated. An extensive experimental evaluation on public benchmarks validated the superiority of ELC.
Gli stili APA, Harvard, Vancouver, ISO e altri
15

Balcarras, Matthew, Salva Ardid, Daniel Kaping, Stefan Everling e Thilo Womelsdorf. "Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness". Journal of Cognitive Neuroscience 28, n. 2 (febbraio 2016): 333–49. http://dx.doi.org/10.1162/jocn_a_00894.

Testo completo
Abstract (sommario):
Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.
Gli stili APA, Harvard, Vancouver, ISO e altri
16

Li, Haiguang, Xindong Wu, Zhao Li e Wei Ding. "Online Group Feature Selection from Feature Streams". Proceedings of the AAAI Conference on Artificial Intelligence 27, n. 1 (29 giugno 2013): 1627–28. http://dx.doi.org/10.1609/aaai.v27i1.8516.

Testo completo
Abstract (sommario):
Standard feature selection algorithms deal with given candidate feature sets at the individual feature level. When features exhibit certain group structures, it is beneficial to conduct feature selection in a grouped manner. For high-dimensional features, it could be far more preferable to online generate and process features one at a time rather than wait for generating all features before learning begins. In this paper, we discuss a new and interesting problem of online group feature selection from feature streams at both the group and individual feature levels simultaneously from a feature stream. Extensive experiments on both real-world and synthetic datasets demonstrate the superiority of the proposed algorithm.
Gli stili APA, Harvard, Vancouver, ISO e altri
17

Zhao, Zheng, Lei Wang e Huan Liu. "Efficient Spectral Feature Selection with Minimum Redundancy". Proceedings of the AAAI Conference on Artificial Intelligence 24, n. 1 (3 luglio 2010): 673–78. http://dx.doi.org/10.1609/aaai.v24i1.7671.

Testo completo
Abstract (sommario):
Spectral feature selection identifies relevant features by measuring their capability of preserving sample similarity. It provides a powerful framework for both supervised and unsupervised feature selection, and has been proven to be effective in many real-world applications. One common drawback associated with most existing spectral feature selection algorithms is that they evaluate features individually and cannot identify redundant features. Since redundant features can have significant adverse effect on learning performance, it is necessary to address this limitation for spectral feature selection. To this end, we propose a novel spectral feature selection algorithm to handle feature redundancy, adopting an embedded model. The algorithm is derived from a formulation based on a sparse multi-output regression with a L2,1-norm constraint. We conduct theoretical analysis on the properties of its optimal solutions, paving the way for designing an efficient path-following solver. Extensive experiments show that the proposed algorithm can do well in both selecting relevant features and removing redundancy.
Gli stili APA, Harvard, Vancouver, ISO e altri
18

V, Venkatesh, Sharan S B, Mahalaxmy S, Monisha S, Ashick Sanjey D S e Ashokkumar P. "A Class Specific Feature Selection Method for Improving the Performance of Text Classification". Scalable Computing: Practice and Experience 25, n. 2 (24 febbraio 2024): 1018–28. http://dx.doi.org/10.12694/scpe.v25i2.2502.

Testo completo
Abstract (sommario):
Recently, a significant amount of research work has been carried out in the field of feature selection. Although these methods help to increase the accuracy of the machine learning classification, the selected subset of features considers all the classes and may not select recommendable features for a particular class. The main goal of our paper is to propose a new class-specific feature selection algorithm that is capable of selecting an appropriate subset of features for each class. In this regard, we first perform class binarization and then select the best features for each class. During the feature selection process, we deal with class imbalance problems and redundancy elimination. The Weighted Average Voting Ensemble method is used for the final classification. Finally, we carry out experiments to compare our proposed feature selection approach with the existing popular feature selection methods. The results prove that our feature selection method outperforms the existing methods with an accuracy of more than 37%.
Gli stili APA, Harvard, Vancouver, ISO e altri
19

Gramegna, Alex, e Paolo Giudici. "Shapley Feature Selection". FinTech 1, n. 1 (25 febbraio 2022): 72–80. http://dx.doi.org/10.3390/fintech1010006.

Testo completo
Abstract (sommario):
Feature selection is a popular topic. The main approaches to deal with it fall into the three main categories of filters, wrappers and embedded methods. Advancement in algorithms, though proving fruitful, may be not enough. We propose to integrate an explainable AI approach, based on Shapley values, to provide more accurate information for feature selection. We test our proposal in a real setting, which concerns the prediction of the probability of default of Small and Medium Enterprises. Our results show that the integrated approach may indeed prove fruitful to some feature selection methods, in particular more parsimonious ones like LASSO. In general the combination of approaches seems to provide useful information which feature selection algorithm can improve their performance with.
Gli stili APA, Harvard, Vancouver, ISO e altri
20

Somol, P., e P. Pudil. "Feature selection toolbox". Pattern Recognition 35, n. 12 (dicembre 2002): 2749–59. http://dx.doi.org/10.1016/s0031-3203(01)00245-x.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
21

Ramze Rezaee, M., B. Goedhart, B. P. F. Lelieveldt e J. H. C. Reiber. "Fuzzy feature selection". Pattern Recognition 32, n. 12 (dicembre 1999): 2011–19. http://dx.doi.org/10.1016/s0031-3203(99)00005-9.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
22

Liu, H., E. R. Dougherty, J. G. Dy, K. Torkkola, E. Tuv, H. Peng, C. Ding et al. "Evolving feature selection". IEEE Intelligent Systems 20, n. 6 (novembre 2005): 64–76. http://dx.doi.org/10.1109/mis.2005.105.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
23

de Souza, Jerffeson Teixeira, Stan Matwin e Nathalie Japkowicz. "Parallelizing Feature Selection". Algorithmica 45, n. 3 (24 maggio 2006): 433–56. http://dx.doi.org/10.1007/s00453-006-1220-3.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
24

Moran, Michal, e Goren Gordon. "Curious Feature Selection". Information Sciences 485 (giugno 2019): 42–54. http://dx.doi.org/10.1016/j.ins.2019.02.009.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
25

Yang, Yanyan, Degang Chen, Xiao Zhang, Zhenyan Ji e Yingjun Zhang. "Incremental feature selection by sample selection and feature-based accelerator". Applied Soft Computing 121 (maggio 2022): 108800. http://dx.doi.org/10.1016/j.asoc.2022.108800.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
26

Luque-Rodriguez, Maria, Jose Molina-Baena, Alfonso Jimenez-Vilchez e Antonio Arauzo-Azofra. "Initialization of Feature Selection Search for Classification". Journal of Artificial Intelligence Research 75 (27 novembre 2022): 953–83. http://dx.doi.org/10.1613/jair.1.14015.

Testo completo
Abstract (sommario):
Selecting the best features in a dataset improves accuracy and efficiency of classifiers in a learning process. Datasets generally have more features than necessary, some of them being irrelevant or redundant to others. For this reason, numerous feature selection methods have been developed, in which different evaluation functions and measures are applied. This paper proposes the systematic application of individual feature evaluation methods to initialize search-based feature subset selection methods. An exhaustive review of the starting methods used by genetic algorithms from 2014 to 2020 has been carried out. Subsequently, an in-depth empirical study has been carried out evaluating the proposal for different search-based feature selection methods (Sequential forward and backward selection, Las Vegas filter and wrapper, Simulated Annealing and Genetic Algorithms). Since the computation time is reduced and the classification accuracy with the selected features is improved, the initialization of feature selection proposed in this work is proved to be worth considering while designing any feature selection algorithms.
Gli stili APA, Harvard, Vancouver, ISO e altri
27

Zabidi, A., W. Mansor e Khuan Y. Lee. "Optimal Feature Selection Technique for Mel Frequency Cepstral Coefficient Feature Extraction in Classifying Infant Cry with Asphyxia". Indonesian Journal of Electrical Engineering and Computer Science 6, n. 3 (1 giugno 2017): 646. http://dx.doi.org/10.11591/ijeecs.v6.i3.pp646-655.

Testo completo
Abstract (sommario):
<p>Mel Frequency Cepstral Coefficient is an efficient feature representation method for extracting human-audible audio signals. However, its representation of features is large and redundant. Therefore, feature selection is required to select the optimal subset of Mel Frequency Cepstral Coefficient features. The performance of two types of feature selection techniques; Orthogonal Least Squares and F-ratio for selecting Mel Frequency Cepstral Coefficient features of infant cry with asphyxia was examined. OLS selects the feature subset based on their contribution to the reduction of error, while F-Ratio selects them according to their discriminative abilities. The feature selection techniques were combined with Multilayer Perceptron to distinguish between asphyxiated infant cry and normal cry signals. The performance of the feature selection methods was examined by analysing the Multilayer Perceptron classification accuracy resulted from the combination of the feature selection techniques and Multilayer Perceptron. The results indicate that Orthogonal Least Squares is the most suitable feature selection method in classifying infant cry with asphyxia since it produces the highest classification accuracy.<em></em></p>
Gli stili APA, Harvard, Vancouver, ISO e altri
28

Mitra, P., C. A. Murthy e S. K. Pal. "Unsupervised feature selection using feature similarity". IEEE Transactions on Pattern Analysis and Machine Intelligence 24, n. 3 (marzo 2002): 301–12. http://dx.doi.org/10.1109/34.990133.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
29

Yen Huang, Jia. "Feature Selection for Cloud Computing Patents Classification". International Journal of Social Science and Humanity 6, n. 7 (luglio 2016): 541–46. http://dx.doi.org/10.7763/ijssh.2016.v6.707.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
30

Tatwani, Shaveta, e Ela Kumar. "Parametric Comparison of Various Feature Selection Techniques". Journal of Advanced Research in Dynamical and Control Systems 11, n. 10-SPECIAL ISSUE (31 ottobre 2019): 1180–90. http://dx.doi.org/10.5373/jardcs/v11sp10/20192961.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
31

Xu Yuan, Xu Yuan, Jeng-Shyang Pan Xu Yuan, Ai-Qing Tian Jeng-Shyang Pan e Shu-Chuan Chu Ai-Qing Tian. "Binary Sparrow Search Algorithm for Feature Selection". 網際網路技術學刊 24, n. 2 (marzo 2023): 217–32. http://dx.doi.org/10.53106/160792642023032402001.

Testo completo
Abstract (sommario):
<>The sparrow search algorithm (SSA) is a novel intelligent optimization algorithm that simulates the foraging and anti-predation behavior of sparrows. The sparrow search algorithm (SSA) can optimize continuous problems, but in reality many problems are binary problems. In this paper, the binary sparrow search algorithm (BSSA) is proposed to solve binary optimization problems, such as feature selection. The transfer function is crucial to BSSA and it directly affects the performance of BSSA. This paper proposes three new transfer functions to improve the performance of BSSA. Mathematical analysis revealed that the original SSA scroungers position update equation is no longer adapted to BSSA. This paper improves the position update equation. We compared BSSA with BPSO, BGWO, and BBA algorithms, and tested on 23 benchmark functions. In addition, statistical analysis of the experimental results, Friedman test and Wilcoxon rank-sum test were performed to verify the effectiveness of BSSA. Finally, the algorithm was used to successfully implement feature selection and obtain satisfactory results in the UCI data set.<>
Gli stili APA, Harvard, Vancouver, ISO e altri
32

Muthukrishnan, R., e C. K. James. "The Effect of Multicollinearity on Feature Selection". Indian Journal Of Science And Technology 17, n. 35 (9 settembre 2024): 3664–68. http://dx.doi.org/10.17485/ijst/v17i35.1876.

Testo completo
Abstract (sommario):
Objectives: To provide a new LASSO-based feature selection technique that aids in selecting important variables for predicting the response variables in case of multicollinearity. Methods: LASSO is a type of regression method employed to select important covariates for predicting a dependent variable. The traditional LASSO method uses the conventional Ordinary Least Square (OLS) method for this purpose. The Use of the OLS based LASSO approach gives unreliable results if the data deviates from normality. Thus, this study recommends using, a Redescending M-estimator-based LASSO approach. The efficacy of this new method is checked against the ordinary LASSO method using a real dataset and also a simulation study with various levels of sample size (N=100,200,1000), different numbers of predictors (p=10,15,20), and varying degrees of correlation (ρ = 0.96, 0.98, 0.999). Findings: The usual OLS-based LASSO finds it difficult to select important variables when the independent variables are correlated. The Redescending M-estimator-based LASSO addresses at tackling the pitfalls faced by Conventional LASSO methodology. Among other things, the proposed method is far better than the old-fashioned LASSO since it helps to pick out significant factors more effectively, particularly in the presence of multicollinearity. Novelty: The conventional OLS-based LASSO approach selects a greater number of non-significant variables in the presence of multicollinearity. The proposed Redescending M-estimator-based LASSO approach selects the important variables in the presence of multicollinearity. Keywords: Feature Selection, LASSO, MDAE, VIF, Variable Selection
Gli stili APA, Harvard, Vancouver, ISO e altri
33

Porebski, Alice, Vinh Truong Hoang, Nicolas Vandenbroucke e Denis Hamad. "Combination of LBP Bin and Histogram Selections for Color Texture Classification". Journal of Imaging 6, n. 6 (23 giugno 2020): 53. http://dx.doi.org/10.3390/jimaging6060053.

Testo completo
Abstract (sommario):
LBP (Local Binary Pattern) is a very popular texture descriptor largely used in computer vision. In most applications, LBP histograms are exploited as texture features leading to a high dimensional feature space, especially for color texture classification problems. In the past few years, different solutions were proposed to reduce the dimension of the feature space based on the LBP histogram. Most of these approaches apply feature selection methods in order to find the most discriminative bins. Recently another strategy proposed selecting the most discriminant LBP histograms in their entirety. This paper tends to improve on these previous approaches, and presents a combination of LBP bin and histogram selections, where a histogram ranking method is applied before processing a bin selection procedure. The proposed approach is evaluated on five benchmark image databases and the obtained results show the effectiveness of the combination of LBP bin and histogram selections which outperforms the simple LBP bin and LBP histogram selection approaches when they are applied independently.
Gli stili APA, Harvard, Vancouver, ISO e altri
34

Baskar, S. S., e Dr L. Arockiam. "A Novel LAS-Relief Feature Selection Algorithm for Enhancing Classification Accuracy in Data mining". INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 11, n. 8 (23 ottobre 2013): 2921–27. http://dx.doi.org/10.24297/ijct.v11i8.7047.

Testo completo
Abstract (sommario):
Feature selection is an important task in data mining and machine learning domain. The main objective of feature selection is to find a relevant feature that predicts the knowledge better than the original set of features. This can be achieved by removing irrelevant or redundant features from original data sets. Feature selection involves a significant task of selecting relevant features from the feature space for data mining and pattern recognition. In this paper, the new approach has been introduced on feature selection on Relief based on Median Variance model. The new approach is named as LAS-Relief algorithm. This algorithm facilitates to stabilise the feature weights estimation compared to mean variance based Relief algorithm and is considered to be a better successful algorithm for feature selection. The random selection of instances in the data sets will lead to the fluctuation of weight estimation. This in turn leads to poor evaluation accuracy. This new approach removes the irrelevant features in the feature space. The novel LAS-Relief algorithm incorporates the median variance in the feature weight estimation. The feature weight is calculated by selection of the instances in random. To overcome this issue, the novel feature selection algorithm called LAS-Relief algorithm is proposed based on median variance. This algorithm takes both the median and the variance of difference between instances. These are considered as the criterion of feature weight estimation in this LAS Relief algorithm.. This algorithm makes the result more stable and more accurate on classification. The relevant features are obtained from the original feature space using LAS-Relief algorithm, which outperforms well than Mean Variance Relief algorithm.
Gli stili APA, Harvard, Vancouver, ISO e altri
35

Han, Yuanyuan, Lan Huang e Fengfeng Zhou. "Zoo: Selecting Transcriptomic and Methylomic Biomarkers by Ensembling Animal-Inspired Swarm Intelligence Feature Selection Algorithms". Genes 12, n. 11 (18 novembre 2021): 1814. http://dx.doi.org/10.3390/genes12111814.

Testo completo
Abstract (sommario):
Biological omics data such as transcriptomes and methylomes have the inherent “large p small n” paradigm, i.e., the number of features is much larger than that of the samples. A feature selection (FS) algorithm selects a subset of the transcriptomic or methylomic biomarkers in order to build a better prediction model. The hidden patterns in the FS solution space make it challenging to achieve a feature subset with satisfying prediction performances. Swarm intelligence (SI) algorithms mimic the target searching behaviors of various animals and have demonstrated promising capabilities in selecting features with good machine learning performances. Our study revealed that different SI-based feature selection algorithms contributed complementary searching capabilities in the FS solution space, and their collaboration generated a better feature subset than the individual SI feature selection algorithms. Nine SI-based feature selection algorithms were integrated to vote for the selected features, which were further refined by the dynamic recursive feature elimination framework. In most cases, the proposed Zoo algorithm outperformed the existing feature selection algorithms on transcriptomics and methylomics datasets.
Gli stili APA, Harvard, Vancouver, ISO e altri
36

Feng, Chao, Chao Qian e Ke Tang. "Unsupervised Feature Selection by Pareto Optimization". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 luglio 2019): 3534–41. http://dx.doi.org/10.1609/aaai.v33i01.33013534.

Testo completo
Abstract (sommario):
Dimensionality reduction is often employed to deal with the data with a huge number of features, which can be generally divided into two categories: feature transformation and feature selection. Due to the interpretability, the efficiency during inference and the abundance of unlabeled data, unsupervised feature selection has attracted much attention. In this paper, we consider its natural formulation, column subset selection (CSS), which is to minimize the reconstruction error of a data matrix by selecting a subset of features. We propose an anytime randomized iterative approach POCSS, which minimizes the reconstruction error and the number of selected features simultaneously. Its approximation guarantee is well bounded. Empirical results exhibit the superior performance of POCSS over the state-of-the-art algorithms.
Gli stili APA, Harvard, Vancouver, ISO e altri
37

Venkatesh, B., e J. Anuradha. "Fuzzy Rank Based Parallel Online Feature Selection Method using Multiple Sliding Windows". Open Computer Science 11, n. 1 (1 gennaio 2021): 275–87. http://dx.doi.org/10.1515/comp-2020-0169.

Testo completo
Abstract (sommario):
Abstract Nowadays, in real-world applications, the dimensions of data are generated dynamically, and the traditional batch feature selection methods are not suitable for streaming data. So, online streaming feature selection methods gained more attention but the existing methods had demerits like low classification accuracy, fails to avoid redundant and irrelevant features, and a higher number of features selected. In this paper, we propose a parallel online feature selection method using multiple sliding-windows and fuzzy fast-mRMR feature selection analysis, which is used for selecting minimum redundant and maximum relevant features, and also overcomes the drawbacks of existing online streaming feature selection methods. To increase the performance speed of the proposed method parallel processing is used. To evaluate the performance of the proposed online feature selection method k-NN, SVM, and Decision Tree Classifiers are used and compared against the state-of-the-art online feature selection methods. Evaluation metrics like Accuracy, Precision, Recall, F1-Score are used on benchmark datasets for performance analysis. From the experimental analysis, it is proved that the proposed method has achieved more than 95% accuracy for most of the datasets and performs well over other existing online streaming feature selection methods and also, overcomes the drawbacks of the existing methods.
Gli stili APA, Harvard, Vancouver, ISO e altri
38

Paul, Dipanjyoti, Rahul Kumar, Sriparna Saha e Jimson Mathew. "Multi-objective Cuckoo Search-based Streaming Feature Selection for Multi-label Dataset". ACM Transactions on Knowledge Discovery from Data 15, n. 6 (19 maggio 2021): 1–24. http://dx.doi.org/10.1145/3447586.

Testo completo
Abstract (sommario):
The feature selection method is the process of selecting only relevant features by removing irrelevant or redundant features amongst the large number of features that are used to represent data. Nowadays, many application domains especially social media networks, generate new features continuously at different time stamps. In such a scenario, when the features are arriving in an online fashion, to cope up with the continuous arrival of features, the selection task must also have to be a continuous process. Therefore, the streaming feature selection based approach has to be incorporated, i.e., every time a new feature or a group of features arrives, the feature selection process has to be invoked. Again, in recent years, there are many application domains that generate data where samples may belong to more than one classes called multi-label dataset. The multiple labels that the instances are being associated with, may have some dependencies amongst themselves. Finding the co-relation amongst the class labels helps to select the discriminative features across multiple labels. In this article, we develop streaming feature selection methods for multi-label data where the multiple labels are reduced to a lower-dimensional space. The similar labels are grouped together before performing the selection method to improve the selection quality and to make the model time efficient. The multi-objective version of the cuckoo search-based approach is used to select the optimal feature set. The proposed method develops two versions of the streaming feature selection method: ) when the features arrive individually and ) when the features arrive in the form of a batch. Various multi-label datasets from various domains such as text, biology, and audio have been used to test the developed streaming feature selection methods. The proposed methods are compared with many previous feature selection methods and from the comparison, the superiority of using multiple objectives and label co-relation in the feature selection process can be established.
Gli stili APA, Harvard, Vancouver, ISO e altri
39

Solovei, Olga. "NEW ORGANIZATION PROCESS OF FEATURE SELECTION BY FILTER WITH CORRELATION-BASED FEATURES SELECTION METHOD". Innovative Technologies and Scientific Solutions for Industries, n. 3 (21) (18 novembre 2022): 39–50. http://dx.doi.org/10.30837/itssi.2022.21.039.

Testo completo
Abstract (sommario):
The subject of the article is feature selection techniques that are used on data preprocessing step before building machine learning models. In this paper the focus is put on a Filter technique when it uses Correlation-based Feature Selection (further CFS) with symmetrical uncertainty method (further CFS-SU) or CFS with Pearson Correlation (further CFS-PearCorr). The goal of the work is to increase the efficiency of feature selection by Filter with CFS by proposing a new organization process of feature selection. The tasks which are solved in the article: review and analysis of the existing organization process of feature selections by Filter with CFS; identify the routs cause the performance degradation; propose a new approach; evaluate the proposed approach. To implement the specified tasks, the following methods were used: information theory, process theory, algorithm theory, statistics theory, sampling techniques, data modeling theory, science experiments. Results. Based on the received results are proved: 1) the chosen features subset’s evaluation function couldn’t be based only on CFS merit as it causes a learning algorithm’s results degradation; 2) the accuracies of the classification learning algorithms had improved and the values of determination coefficient of the regression leaning algorithms had increased when features are selected according to the proposed new organization process. Conclusions. A new organization process for feature selection which is proposed in current work combines filter and learning algorithm properties in evaluation strategy which helps to choose the optimal feature subset for predefined learning algorithm. The computation complexity of the proposed approach to feature selection doesn’t depend on dataset’s dimensions which makes it robust to different data varieties; it eliminates the time needed for feature subsets’ search as subsets are selected randomly. The conducted experiments proved that the performance of the classification and regression learning algorithms with features selected according to the new flow had outperformed the performance of the same learning algorithms built with without applied new process on data preprocessing step.
Gli stili APA, Harvard, Vancouver, ISO e altri
40

Nasib, Salmun K., Fadilah Istiqomah Pammus, Nurwan e La Ode Nashar. "COMPARISON OF FEATURE SELECTION BASED ON COMPUTATION TIME AND CLASSIFICATION ACCURACY USING SUPPORT VECTOR MACHINE". Indonesian Journal of Applied Research (IJAR) 4, n. 1 (18 aprile 2023): 63–74. http://dx.doi.org/10.30997/ijar.v4i1.252.

Testo completo
Abstract (sommario):
The goal of this research to compare Chi-Square feature selection with Mutual Information feature selection based on computation time and classification accuracy. In this research, people's comments on Twitter are classified based on positive, negative, and neutral sentiments using the Support Vector Machine method. Sentiment classification has the disadvantage that it has many features that are used, therefore feature selection is needed to optimize a sentiment classification performance. Chi-square feature selection and mutual information feature selection are feature selections that both can improve the accuracy of sentiment classification. How to collect the data on twitter taken using the IDE application from python. The results of this study indicate that sentiment classification using Chi-Square feature selection produces a computation time of 0.4375 seconds with an accuracy of 78% while sentiment classification using Mutual Information feature selection produces an accuracy of 80% with a required computation time of 252.75 seconds. So that the conclusion are obtained based on the computational time aspect, the Chi-Square feature selection is superior to the Mutual Information feature selection, while based on the classification accuracy aspect, the Mutual Information feature selection is more accurate than the Chi-Square feature selection. The recommendations for further research can use mutual information feature selection to get high accuracy results on sentiment classification
Gli stili APA, Harvard, Vancouver, ISO e altri
41

Guru, D. S., N. Vinay Kumar e Mahamad Suhil. "Feature Selection of Interval Valued Data Through Interval K-Means Clustering". International Journal of Computer Vision and Image Processing 7, n. 2 (aprile 2017): 64–80. http://dx.doi.org/10.4018/ijcvip.2017040105.

Testo completo
Abstract (sommario):
This paper introduces a novel feature selection model for supervised interval valued data based on interval K-Means clustering. The proposed model explores two kinds of feature selection through feature clustering viz., class independent feature selection and class dependent feature selection. The former one clusters the features spread across all the samples belonging to all the classes, whereas the latter one clusters the features spread across only the samples belonging to the respective classes. Both feature selection models are demonstrated to explore the generosity of clustering in selecting the interval valued features. For clustering, the kernel of the K-means clustering has been altered to operate on interval valued data. For experimentation purpose four standard benchmarking datasets and three symbolic classifiers have been used. To corroborate the effectiveness of the proposed model, a comparative analysis against the state-of-the-art models is given and results show the superiority of the proposed model.
Gli stili APA, Harvard, Vancouver, ISO e altri
42

Ismi, Dewi Pramudi, Shireen Panchoo e Murinto Murinto. "K-means clustering based filter feature selection on high dimensional data". International Journal of Advances in Intelligent Informatics 2, n. 1 (31 marzo 2016): 38. http://dx.doi.org/10.26555/ijain.v2i1.54.

Testo completo
Abstract (sommario):
With hundreds or thousands of features in high dimensional data, computational workload is challenging. In classification process, features which do not contribute significantly to prediction of classes, add to the computational workload. Therefore the aim of this paper is to use feature selection to decrease the computation load by reducing the size of high dimensional data. Selecting subsets of features which represent all features were used. Hence the process is two-fold; discarding irrelevant data and choosing one feature that representing a number of redundant features. There have been many studies regarding feature selection, for example backward feature selection and forward feature selection. In this study, a k-means clustering based feature selection is proposed. It is assumed that redundant features are located in the same cluster, whereas irrelevant features do not belong to any clusters. In this research, two different high dimensional datasets are used: 1) the Human Activity Recognition Using Smartphones (HAR) Dataset, containing 7352 data points each of 561 features and 2) the National Classification of Economic Activities Dataset, which contains 1080 data points each of 857 features. Both datasets provide class label information of each data point. Our experiment shows that k-means clustering based feature selection can be performed to produce subset of features. The latter returns more than 80% accuracy of classification result.
Gli stili APA, Harvard, Vancouver, ISO e altri
43

Matharaarachchi, Surani, Mike Domaratzki e Saman Muthukumarana. "Minimizing features while maintaining performance in data classification problems". PeerJ Computer Science 8 (14 settembre 2022): e1081. http://dx.doi.org/10.7717/peerj-cs.1081.

Testo completo
Abstract (sommario):
High dimensional classification problems have gained increasing attention in machine learning, and feature selection has become essential in executing machine learning algorithms. In general, most feature selection methods compare the scores of several feature subsets and select the one that gives the maximum score. There may be other selections of a lower number of features with a lower score, yet the difference is negligible. This article proposes and applies an extended version of such feature selection methods, which selects a smaller feature subset with similar performance to the original subset under a pre-defined threshold. It further validates the suggested extended version of the Principal Component Loading Feature Selection (PCLFS-ext) results by simulating data for several practical scenarios with different numbers of features and different imbalance rates on several classification methods. Our simulated results show that the proposed method outperforms the original PCLFS and existing Recursive Feature Elimination (RFE) by giving reasonable feature reduction on various data sets, which is important in some applications.
Gli stili APA, Harvard, Vancouver, ISO e altri
44

Santiko, Irfan, e Ikhsan Honggo. "Naive Bayes Algorithm Using Selection of Correlation Based Featured Selections Features for Chronic Diagnosis Disease". IJIIS: International Journal of Informatics and Information Systems 2, n. 2 (1 settembre 2019): 56–60. http://dx.doi.org/10.47738/ijiis.v2i2.14.

Testo completo
Abstract (sommario):
Chronic kidney disease is a disease that can cause death, because the pathophysiological etiology resulting in a progressive decline in renal function, and ends in kidney failure. Chronic Kidney Disease (CKD) has now become a serious problem in the world. Kidney and urinary tract diseases have caused the death of 850,000 people each year. This suggests that the disease was ranked the 12th highest mortality rate. Some studies in the field of health including one with chronic kidney disease have been carried out to detect the disease early, In this study, testing the Naive Bayes algorithm to detect the disease on patients who tested positive for negative CKD and CKD. From the results of the test algorithm accuracy value will be compared against the results of the algorithm accuracy before use and after feature selection using feature selection Featured Correlation Based Selection (CFS), it is known that Naive Bayes algorithm after feature selection that is 93.58%, while the naive Bayes without feature selection the result is 93.54% accuracy. Seeing the value of a second accuracy testing Naive Bayes algorithm without using the feature selection and feature selection, testing both these algorithms including the classification is very good, because the accuracy value above 0.90 to 1.00. Included in the excellent classification. higher accuracy results.
Gli stili APA, Harvard, Vancouver, ISO e altri
45

SIEDLECKI, WOJCIECH, e JACK SKLANSKY. "ON AUTOMATIC FEATURE SELECTION". International Journal of Pattern Recognition and Artificial Intelligence 02, n. 02 (giugno 1988): 197–220. http://dx.doi.org/10.1142/s0218001488000145.

Testo completo
Abstract (sommario):
We review recent research on methods for selecting features for multidimensional pattern classification. These methods include nonmonotonicity-tolerant branch-and-bound search and beam search. We describe the potential benefits of Monte Carlo approaches such as simulated annealing and genetic algorithms. We compare these methods to facilitate the planning of future research on feature selection.
Gli stili APA, Harvard, Vancouver, ISO e altri
46

Fitrianah, Devi, e Hisyam Fahmi. "THE IDENTIFICATION OF DETERMINANT PARAMETER IN FOREST FIRE BASED ON FEATURE SELECTION ALGORITHMS". SINERGI 23, n. 3 (11 ottobre 2019): 184. http://dx.doi.org/10.22441/sinergi.2019.3.002.

Testo completo
Abstract (sommario):
This research conducts studies of the use of the Sequential Forward Floating Selection (SFFS) Algorithm and Sequential Backward Floating Selection (SBFS) Algorithm as the feature selection algorithms in the Forest Fire case study. With the supporting data that become the features of the forest fire case, we obtained information regarding the kinds of features that are very significant and influential in the event of a forest fire. Data used are weather data and land coverage of each area where the forest fire occurs. Based on the existing data, ten features were included in selecting the features using both feature selection methods. The result of the Sequential Forward Floating Selection method shows that earth surface temperature is the most significant and influential feature in regards to forest fire, while, based on the result of the Sequential Backward Feature Selection method, cloud coverage, is the most significant. Referring to the results from a total of 100 tests, the average accuracy of the Sequential Forward Floating Selection method is 96.23%. It surpassed the 82.41% average accuracy percentage of the Sequential Backward Floating Selection method.
Gli stili APA, Harvard, Vancouver, ISO e altri
47

Gakii, Consolata, Paul O. Mireji e Richard Rimiru. "Graph Based Feature Selection for Reduction of Dimensionality in Next-Generation RNA Sequencing Datasets". Algorithms 15, n. 1 (10 gennaio 2022): 21. http://dx.doi.org/10.3390/a15010021.

Testo completo
Abstract (sommario):
Analysis of high-dimensional data, with more features (p) than observations (N) (p>N), places significant demand in cost and memory computational usage attributes. Feature selection can be used to reduce the dimensionality of the data. We used a graph-based approach, principal component analysis (PCA) and recursive feature elimination to select features for classification from RNAseq datasets from two lung cancer datasets. The selected features were discretized for association rule mining where support and lift were used to generate informative rules. Our results show that the graph-based feature selection improved the performance of sequential minimal optimization (SMO) and multilayer perceptron classifiers (MLP) in both datasets. In association rule mining, features selected using the graph-based approach outperformed the other two feature-selection techniques at a support of 0.5 and lift of 2. The non-redundant rules reflect the inherent relationships between features. Biological features are usually related to functions in living systems, a relationship that cannot be deduced by feature selection and classification alone. Therefore, the graph-based feature-selection approach combined with rule mining is a suitable way of selecting and finding associations between features in high-dimensional RNAseq data.
Gli stili APA, Harvard, Vancouver, ISO e altri
48

Mweshi, George. "Feature Selection using Genetic Programming". Zambia ICT Journal 3, n. 2 (30 novembre 2019): 11–18. http://dx.doi.org/10.33260/zictjournal.v3i2.62.

Testo completo
Abstract (sommario):
Extracting useful and novel information from the large amount of collected data has become a necessity for corporations wishing to maintain a competitive advantage. One of the biggest issues in handling these significantly large datasets is the curse of dimensionality. As the dimension of the data increases, the performance of the data mining algorithms employed to mine the data deteriorates. This deterioration is mainly caused by the large search space created as a result of having irrelevant, noisy and redundant features in the data. Feature selection is one of the various techniques that can be used to remove these unnecessary features. Feature selection consequently reduces the dimension of the data as well as the search space which in turn increases the efficiency and the accuracy of the mining algorithms. In this paper, we investigate the ability of Genetic Programming (GP), an evolutionary algorithm searching strategy capable of automatically finding solutions in complex and large search spaces, to perform feature selection. We implement a basic GP algorithm and perform feature selection on 5 benchmark classification datasets from UCI repository. To test the competitiveness and feasibility of the GP approach, we examine the classification performance of four classifiers namely J48, Naives Bayes, PART, and Random Forests using the GP selected features, all the original features and the features selected by the other commonly used feature selection techniques i.e. principal component analysis, information gain, relief-f and cfs. The experimental results show that not only does GP select a smaller set of features from the original features, classifiers using GP selected features achieve a better classification performance than using all the original features. Furthermore, compared to the other well-known feature selection techniques, GP achieves very competitive results.
Gli stili APA, Harvard, Vancouver, ISO e altri
49

Chaudhry, Muhammad Umar, Muhammad Yasir, Muhammad Nabeel Asghar e Jee-Hyong Lee. "Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets". Entropy 22, n. 10 (29 settembre 2020): 1093. http://dx.doi.org/10.3390/e22101093.

Testo completo
Abstract (sommario):
The complexity and high dimensionality are the inherent concerns of big data. The role of feature selection has gained prime importance to cope with the issue by reducing dimensionality of datasets. The compromise between the maximum classification accuracy and the minimum dimensions is as yet an unsolved puzzle. Recently, Monte Carlo Tree Search (MCTS)-based techniques have been invented that have attained great success in feature selection by constructing a binary feature selection tree and efficiently focusing on the most valuable features in the features space. However, one challenging problem associated with such approaches is a tradeoff between the tree search and the number of simulations. In a limited number of simulations, the tree might not meet the sufficient depth, thus inducing biasness towards randomness in feature subset selection. In this paper, a new algorithm for feature selection is proposed where multiple feature selection trees are built iteratively in a recursive fashion. The state space of every successor feature selection tree is less than its predecessor, thus increasing the impact of tree search in selecting best features, keeping the MCTS simulations fixed. In this study, experiments are performed on 16 benchmark datasets for validation purposes. We also compare the performance with state-of-the-art methods in literature both in terms of classification accuracy and the feature selection ratio.
Gli stili APA, Harvard, Vancouver, ISO e altri
50

K, Bhuvaneswari. "Filter Based Sentiment Feature Selection Using Back Propagation Deep Learning". Journal of Computer Sciences and Informatics 2, n. 1 (2025): 15. https://doi.org/10.5455/jcsi.20241216054507.

Testo completo
Abstract (sommario):
Aim: The proposed Filter Based Sentiment Feature Selection (FBSFS) model focuses on to improve the performance of Sentiment Learning (SL) by selecting the most relevant sentiment features from text reviews using feature selection methods at document level. Method: Sentiment Learning is applied at the document level for classifying text reviews into two categories either positive or negative. The key sentiment features adjectives (ADJ), adverbs (ADV), and verbs (VRB) which are essential for sentiment analysis, are extracted from text document using the WordNet dictionary. Feature selection is performed by applying four different algorithms: Information Gain, Correlation, Gini Index, and Chi-Square. These algorithms help identify the most significant features that contribute to sentiment classification. The selected features are then fed into a Back Propagation Deep Learning (BPDL) classification model for sentiment analysis. Result: The experimental findings show that the proposed model achieved higher accuracy of 91.15% using Correlation feature selection. This accuracy signifies the effectiveness of the proposed model in classifying text reviews, outperforming other methods in terms of sentiment feature selection and classification. Conclusion: The proposed model enhances the performance of sentiment learning by selecting the most relevant sentiment features, particularly those extracted from adjectives, adverbs, and verbs, and combining them with BPDL. The FBSFS model as a robust tool for sentiment classification.
Gli stili APA, Harvard, Vancouver, ISO e altri
Offriamo sconti su tutti i piani premium per gli autori le cui opere sono incluse in raccolte letterarie tematiche. Contattaci per ottenere un codice promozionale unico!

Vai alla bibliografia