Academic literature on the topic 'Feature selection'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Feature selection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Feature selection"

1

Huber, Florian, and Volker Steinhage. "Conditional Feature Selection: Evaluating Model Averaging When Selecting Features with Shapley Values." Geomatics 4, no. 3 (August 8, 2024): 286–310. http://dx.doi.org/10.3390/geomatics4030016.

Full text
Abstract:
In the field of geomatics, artificial intelligence (AI) and especially machine learning (ML) are rapidly transforming the field of geomatics with respect to collecting, managing, and analyzing spatial data. Feature selection as a building block in ML is crucial because it directly impacts the performance and predictive power of a model by selecting the most critical variables and eliminating the redundant and irrelevant ones. Random forests have now been used for decades and allow for building models with high accuracy. However, finding the most expressive features from the dataset by selecting the most important features within random forests is still a challenging question. The often-used internal Gini importances of random forests are based on the amount of training examples that are divided by a feature but fail to acknowledge the magnitude of change in the target variable, leading to suboptimal selections. Shapley values are an established and unified framework for feature attribution, i.e., specifying how much each feature in a trained ML model contributes to the predictions for a given instance. Previous studies highlight the effectiveness of Shapley values for feature selection in real-world applications, while other research emphasizes certain theoretical limitations. This study provides an application-driven discussion of Shapley values for feature selection by first proposing four necessary conditions for a successful feature selection with Shapley values that are extracted from a multitude of critical research in the field. Given these valuable conditions, Shapley value feature selection is nevertheless a model averaging procedure by definition, where unimportant features can alter the final selection. Therefore, we additionally present Conditional Feature Selection (CFS) as a novel algorithm for performing feature selection that mitigates this problem and use it to evaluate the impact of model averaging in several real-world examples, covering the use of ML in geomatics. The results of this study show Shapley values as a good measure for feature selection when compared with Gini feature importances on four real-world examples, improving the RMSE by 5% when averaged over selections of all possible subset sizes. An even better selection can be achieved by CFS, improving on the Gini selection by approximately 7.5% in terms of RMSE. For random forests, Shapley value calculation can be performed in polynomial time, offering an advantage over the exponential runtime of CFS, building a trade-off to the lost accuracy in feature selection due to model averaging.
APA, Harvard, Vancouver, ISO, and other styles
2

Usha, P., and J. G. R. Sathiaseelan. "Enhanced Filtrate Feature Selection Algorithm for Feature Subset Generation." Indian Journal Of Science And Technology 17, no. 29 (July 31, 2024): 3002–11. http://dx.doi.org/10.17485/ijst/v17i29.2127.

Full text
Abstract:
Objectives: In the bioinformatics field feature selection plays a vital role in selecting relevant features for making better decisions and assessment of disease diagnosis. Brain Tumour (BT) is the second leading disease in the world. Most of the BT detection techniques are based on Magnetic Resonance (MR) images. Methods: In this paper, medical reports are used in the detection of BT to increase the surveillance of patients. To improve the accuracy of predictive models, a new adaptive technique called Enhanced Filtrate Feature Selection (EFFS) algorithm for optimal feature selection is proposed. Initially, the EFFS algorithm finds the dependency of each attribute and feature score by using Mutual Information Gain, Chi-Square, Correlation, and Fishore score filter methods. Afterward, the occurrence rate of each top-ranked attribute is filtered by applying threshold value and obtaining the optimal feature by using the Pareto principle. Findings: The performance of the selected optimal features is evaluated by time complexity, number of features selected, and accuracy. The efficiency of the proposed algorithm is measured and analyzed in a high-quality optimal subset based on a Random Forest classifier integrated with the ranking of attributes. The EFFS algorithm selects 39 out of 46 significant and relevant features with minimum selection time and shows 99.31 % of accuracy for BT, 29 features with 99.47% of accuracy for Breast Cancer, 15 features with 94.61% of accuracy for Lung Cancer, 15 features with 98.84% of accuracy for Diabetes and 43 features with 90% of accuracy for Covid-19 dataset. Novelty: To decrease the processing time and improve the performance of a model feature selection process will be done at the initial stages for the betterment of the classification task. Thus, the proposed EFFS algorithm is applied to different datasets based on medical reports and EFFS outperforms with greater performance measurements and time. The appropriate feature selection techniques help to diagnose diseases in the prior phase and increase the survival of human beings. Keywords: Bioinformatics, Brain Tumour, Chi­Square, Correlation, EFFS, Feature Selection, Fishore Score, Information Gain, Optimal Features, Random Forest
APA, Harvard, Vancouver, ISO, and other styles
3

Xindong Wu, Kui Yu, Wei Ding, Hao Wang, and Xingquan Zhu. "Online Feature Selection with Streaming Features." IEEE Transactions on Pattern Analysis and Machine Intelligence 35, no. 5 (May 2013): 1178–92. http://dx.doi.org/10.1109/tpami.2012.197.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Jundong, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. "Feature Selection." ACM Computing Surveys 50, no. 6 (January 12, 2018): 1–45. http://dx.doi.org/10.1145/3136625.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Sutherland, Stuart. "Feature selection." Nature 392, no. 6674 (March 1998): 350. http://dx.doi.org/10.1038/32817.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Patel, Damodar, and Amit Kumar Saxena. "Feature Selection in High Dimension Datasets using Incremental Feature Clustering." Indian Journal Of Science And Technology 17, no. 32 (August 24, 2024): 3318–26. http://dx.doi.org/10.17485/ijst/v17i32.2077.

Full text
Abstract:
Objectives: To develop a machine learning-based model to select the most important features from a high-dimensional dataset to classify the patterns at high accuracy and reduce their dimensionality. Methods: The proposed feature selection method (FSIFC) forms and combines feature clusters incrementally and produces feature subsets each time. The method uses K-means clustering and Mutual Information (MI) to refine the feature selection process iteratively. Initially, two clusters of features are formed using K-means clustering (K=2) by taking features as the basis of clustering instead of taking the patterns (a traditional way). From these two clusters, the features with the highest MI value in each cluster are kept in a feature subset. Classification accuracies (CA) of the feature subset are calculated using three classifiers namely Support Vector Machines (SVM), Random Forest (RF), and k-nearest Neighbor (knn). The process is repeated by incrementing the value of K i.e. number of clusters; until a maximum user-defined value of K is reached. The best value of CA obtained from these trials is recorded and the corresponding feature set is finally accepted. Findings: The proposed method is demonstrated using ten datasets and the results are compared with the existing published results using three classifiers to determine the method's performance. The ten datasets are classified with average CAs of 92.72%, 93.13%, and 91.5%, using the SVM, RF, and K-NN classifiers respectively. The proposed method selects a maximum of thirty features from the datasets. In terms of selecting the most effective and the smallest feature sets, the proposed method outperforms eight other feature selection methods considering CAs. Novelty: The proposed model applies feature reduction using combined feature clustering and filter methods in an incremental way. This provides an improved selection of relevant features while removing those which are irrelevant at different trials. Keywords: Feature selection, High-dimensional datasets, K-means algorithm, Mutual information, Machine learning
APA, Harvard, Vancouver, ISO, and other styles
7

Wang, Gang, Yang Zhao, Jiasi Zhang, and Yongjie Ning. "A Novel End-To-End Feature Selection and Diagnosis Method for Rotating Machinery." Sensors 21, no. 6 (March 15, 2021): 2056. http://dx.doi.org/10.3390/s21062056.

Full text
Abstract:
Feature selection is to obtain effective features from data, also known as feature engineering. Traditional feature selection and predictive model learning are separated, and there is a problem of inconsistency of criteria. This paper presents an end-to-end feature selection and diagnosis method that organically unifies feature expression learning and machine prediction learning into one model. The algorithm first combines the prediction model to calculate the mean impact value (MIVs) of the feature and realizes primary feature selection for the prediction model by selecting the feature with a larger MIV. In order to take into account the performance of the feature itself, the within-class and between-class discriminant analysis (WBDA) method is proposed, and combined with the feature diversity strategy, the feature-oriented secondary selection is realized. Eventually, feature vectors obtained by two selections are classified using a multi-class support vector machine (SVM). Compared with the modified network variable selection algorithm (MIVs), the principal component analysis dimensionality reduction algorithm (PCA), variable selection based on compensative distance evaluation technology (CDET), and other algorithms, the proposed method MIVs-WBDA exhibits excellent classification accuracy owing to the fusion of feature selection and predictive model learning. According to the results of classification accuracy testing after dimensionality reduction on rotating machinery status, the MIVs-WBDA method has a 3% classification accuracy improvement under the low-dimensional feature set. The typical running time of this classification learning algorithm is less than 10 s, while using deep learning, its running time will be more than a few hours.
APA, Harvard, Vancouver, ISO, and other styles
8

Fahrudy, Dony, and Shofwatul 'Uyun. "Classification of Student Graduation using Naïve Bayes by Comparing between Random Oversampling and Feature Selections of Information Gain and Forward Selection." JOIV : International Journal on Informatics Visualization 6, no. 4 (December 31, 2022): 798. http://dx.doi.org/10.30630/joiv.6.4.982.

Full text
Abstract:
Class-imbalanced data with high attribute dimensions in datasets frequently contribute to issues in a classification process as this can affect algorithms’ performance in the computing process because there are imbalanced numbers of data in each class and irrelevant attributes that must be processed; therefore, this needs for some techniques to overcome the class-imbalanced data and feature selection to reduce data complexity and irrelevant features. Therefore, this study applied random oversampling (ROs) method to overcome the class-imbalanced data and two feature selections (information gain and forward selection) compared to determine which feature selection is superior, more effective and more appropriate to apply. The results of feature selection then were used to classify the student graduation by creating a classification model of Naïve Bayes algorithm. This study indicated an increase in the average accuracy of the Naïve Bayes method without the ROs preprocessing and the feature selection (81.83%), with the ROs (83.84%), with information gain with 3 selected features (86.03%) and forward selection with 2 selected features (86.42%); consequently, these led to increasing accuracy of 4.2% from no pre-processing to information gain and 4.59% from no pre-processing to forward selection. Therefore, the best feature selection was the forward selection with 2 selected features (GPA of the 8th semester and the overall GPA), and the ROs and both feature selections were proven to improve the performance of the Naïve Bayes method.
APA, Harvard, Vancouver, ISO, and other styles
9

Kar Hoou, Hui, Ooi Ching Sheng, Lim Meng Hee, and Leong Mohd Salman. "Feature selection tree for automated machinery fault diagnosis." MATEC Web of Conferences 255 (2019): 02004. http://dx.doi.org/10.1051/matecconf/201925502004.

Full text
Abstract:
Intelligent machinery fault diagnosis commonly utilises statistical features of sensor signals as the inputs for its machine learning algorithm. Due to the abundance of statistical features that can be extracted from raw signals and the accuracy of inserting all the available features into the machine learning algorithm for machinery fault classification, less accurate fault classification may inadvertently result due to overfitting issues. It is therefore only by selecting the most representative features that overfitting outcomes can be avoided and classification accuracy be improved. Currently, the genetic algorithm (GA) is regarded as the most commonly used and reliable feature selection tool for the improvement of accuracy for any machine learning algorithm. However, the greatest challenge for GA is that it may fall into a local optima and be computationally demanding. To overcome this limitation, a feature selection tree (FST) is here proposed. Numerous experimental dataset feature selections were executed using FST and GA; their performance is compared and discussed. Analysis showed that the proposed FST resulted in identical or superior optimal feature subsets when compared to the renowned GA method, but with a 20-time faster simulation period. The proposed FST is therefore more efficient in performing feature selection task than GA.
APA, Harvard, Vancouver, ISO, and other styles
10

Heriyanto, Heriyanto, and Dyah Ayu Irawati. "Comparison of Mel Frequency Cepstral Coefficient (MFCC) Feature Extraction, With and Without Framing Feature Selection, to Test the Shahada Recitation." RSF Conference Series: Engineering and Technology 1, no. 1 (December 23, 2021): 335–54. http://dx.doi.org/10.31098/cset.v1i1.395.

Full text
Abstract:
Voice research for feature extraction using MFCC. Introduction with feature extraction as the first step to get features. Features need to be done further through feature selection. The feature selection in this research used the Dominant Weight feature for the Shahada voice, which produced frames and cepstral coefficients as the feature extraction. The cepstral coefficient was used from 0 to 23 or 24 cepstral coefficients. At the same time, the taken frame consisted of 0 to 10 frames or eleven frames. Voting as many as 300 samples of recorded voices were tested on 200 voices of both male and female voice recordings. The frequency used was 44.100 kHz 16-bit stereo. This research aimed to gain accuracy by selecting the right features on the frame using MFCC feature extraction and matching accuracy with frame feature selection using the Dominant Weight Normalization (NBD). The accuracy results obtained that the MFCC method with the selection of the 9th frame had a higher accuracy rate of 86% compared to other frames. The MFCC without feature selection had an average of 60%. The conclusion was that selecting the right features in the 9th frame impacted the accuracy of the voice of shahada recitation.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Feature selection"

1

Zheng, Ling. "Feature grouping-based feature selection." Thesis, Aberystwyth University, 2017. http://hdl.handle.net/2160/41e7b226-d8e1-481f-9c48-4983f64b0a92.

Full text
Abstract:
Feature selection (FS) is a process which aims to select input domain features that are most informative for a given outcome. Unlike other dimensionality reduction techniques, feature selection methods preserve the underlying semantics or meaning of the original data following reduction. Typically, FS can be divided into four categories: filter, wrapper, hybrid-based and embedded approaches. Many strategies have been proposed for this task in an effort to identify more compact and better quality feature subsets. As various advanced techniques have emerged in the development of search mechanisms, it has become increasingly possible for quality feature subsets to be discovered efficiently without resorting to exhaustive search. Harmony search is a music-inspired stochastic search method. This general technique can be used to support FS in conjunction with many available feature subset quality evaluation methods. The structural simplicity of this technique means that it is capable of reducing the overall complexity of the subset search. The naturally stochastic properties of this technique also help to reduce local optima for any resultant feature subset, whilst locating multiple, potential candidates for the final subset. However, it is not sufficiently flexible in adjusting the size of the parametric musician population, which directly affects the performance on feature subset size reduction. This weakness can be alleviated to a certain extent by an iterative refinement extension, but the fundamental issue remains. Stochastic mechanisms have not been explored to their maximum potential by the original work, as it does not employ a parameter of pitch adjustment rate due to its ineffective mapping of concepts. To address the above problems, this thesis proposes a series of extensions. Firstly, a self-adjusting approach is proposed for the task of FS which involves a mechanism to further improve the performance of the existing harmony search-based method. This approach introduces three novel techniques: a restricted feature domain created for each individual musician contributing to the harmony improvisation in order to improve harmony diversity; a harmony memory consolidation which explores the possibility of exchanging/communicating information amongst musicians such that it can dynamically adjust the population of musicians in improvising new harmonies; and a pitch adjustment which exploits feature similarity measures to identify neighbouring features in order to fine-tune the newly discovered harmonies. These novel developments are also supplemented by a further new proposal involving the application to a feature grouping-based approach proposed herein for FS, which works by searching for feature subsets across homogeneous feature groups rather than examining a massive number of possible combinations of features. This approach radically departs from the traditional FS techniques that work by incrementally adding/removing features from a candidate feature subset one feature at a time or randomly selecting feature combinations without considering the relationship(s) between features. As such, information such as inter-feature correlation may be retained and the residual redundancy in the returned feature subset minimised. Two different instantiations of an FS mechanism are derived from such a feature grouping-based framework: one based upon the straightforward ranking of features within the resultant feature grouping; and the other on the simplification for harmony search-based FS. Feature grouping-based FS offers a self-adjusting approach to effectively and efficiently addressing many real-world problems which may have data dimensionality concerns and which requires semantic-preserving in data reduction. This thesis investigate the application of this approach in the area of intrusion detection, which must deal in a timely fashion with huge quantities of data extracted from network traffic or audit trails. This approach empirically demonstrates the efficacy of feature grouping-based FS in action.
APA, Harvard, Vancouver, ISO, and other styles
2

Dreyer, Sigve. "Evolutionary Feature Selection." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-24225.

Full text
Abstract:
This thesis contains research on feature selection, in particular feature selection using evolutionary algorithms. Feature selection is motivated by increasing data-dimensionality and the need to construct simple induction models.A literature review of evolutionary feature selection is conducted. After that a abstract feature selection algorithm, capable of using many different wrappers, is constructed. The algorithm is configured using a low-dimensional dataset. Finally it is tested on a wide range of datasets, revealing both it's abilities and problems.The main contribution is the revelation that classifier accuracy is not a sufficient metric for feature selection on high-dimensional data.
APA, Harvard, Vancouver, ISO, and other styles
3

Doquet, Guillaume. "Agnostic Feature Selection." Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS486.

Full text
Abstract:
Les bases de données dont la taille dépasse largement l'échelle humaine sont de plus en plus courantes. La surabondance de variables considérées qui en résulte (amis sur un réseau social, films regardés, nucléotides codant l'ADN, transactions monétaires...) a motivé le développement des techniques de réduction de dimensionalité (DR).Une sous-catégorie particulière de DR est formée par les méthodes de sélection d'attributs (SA), qui conservent directement les variables initiales les plus importantes. La manière de sélectionner les meilleurs candidats est un sujet d'actualité à la croisée des chemins entre statistiques et apprentissage automatique. L'importance des attributs est généralement déduite dans un contexte supervisé, où les variables sont classées en fonction de leur utilité pour prédire une variable cible spécifique.Cette thèse porte sur le contexte non supervisé de la SA, c'est-à-dire la situation épineuse où aucun objectif de prédiction n'est disponible pour évaluer la pertinence des attributs. Au lieu de cela, les algorithmes de SA non supervisés construisent généralement un objectif de classification artificiel et notent les attributs en fonction de leur utilité pour prédire cette nouvelle cible, se rabattant ainsi sur le contexte supervisé.Dans ce travail, nous proposons un autre modèle combinant SA non supervisée et compression de données. Notre algorithme AgnoS (Agnostic Feature Selection) ne repose pas sur la création d'une cible artificielle, et vise à conserver un sous-ensemble d'attributs suffisant pour reconstruire l'intégralité des données d'origine, plutôt qu'une variable cible en particulier. Par conséquent, AgnoS ne souffre pas du biais de sélection inhérent aux techniques basées sur le clustering.La seconde contribution de ce travail (Agnostic Feature Selection, G. Doquet & M. Sebag, ECML PKDD 2019) est d'établir à la fois la fragilité du processus supervisé standard d'évaluation de la SA non supervisée ainsi que la stabilité du nouvel algorithme proposé AgnoS
With the advent of Big Data, databases whose size far exceed the human scale are becoming increasingly common. The resulting overabundance of monitored variables (friends on a social network, movies watched, nucleotides coding the DNA, monetary transactions...) has motivated the development of Dimensionality Reduction (DR) techniques. A DR algorithm such as Principal Component Analysis (PCA) or an AutoEncoder typically combines the original variables into new features fewer in number, such that most of the information in the dataset is conveyed by the extracted feature set.A particular subcategory of DR is formed by Feature Selection (FS) methods, which directly retain the most important initial variables. How to select the best candidates is a hot topic at the crossroad of statistics and Machine Learning. Feature importance is usually inferred in a supervised context, where variables are ranked according to their usefulness for predicting a specific target feature.The present thesis focuses on the unsupervised context in FS, i.e. the challenging situation where no prediction goal is available to help assess feature relevance. Instead, unsupervised FS algorithms usually build an artificial classification goal and rank features based on their helpfulness for predicting this new target, thus falling back on the supervised context. Additionally, the efficiency of unsupervised FS approaches is typically also assessed in a supervised setting.In this work, we propose an alternate model combining unsupervised FS with data compression. Our Agnostic Feature Selection (AgnoS) algorithm does not rely on creating an artificial target and aims to retain a feature subset sufficient to recover the whole original dataset, rather than a specific variable. As a result, AgnoS does not suffer from the selection bias inherent to clustering-based techniques.The second contribution of this work( Agnostic Feature Selection, G. Doquet & M. Sebag, ECML PKDD 2019) is to establish both the brittleness of the standard supervised evaluation of unsupervised FS, and the stability of the new proposed AgnoS
APA, Harvard, Vancouver, ISO, and other styles
4

Sima, Chao. "Small sample feature selection." Texas A&M University, 2003. http://hdl.handle.net/1969.1/5796.

Full text
Abstract:
High-throughput technologies for rapid measurement of vast numbers of biolog- ical variables offer the potential for highly discriminatory diagnosis and prognosis; however, high dimensionality together with small samples creates the need for fea- ture selection, while at the same time making feature-selection algorithms less reliable. Feature selection is required to avoid overfitting, and the combinatorial nature of the problem demands a suboptimal feature-selection algorithm. In this dissertation, we have found that feature selection is problematic in small- sample settings via three different approaches. First we examined the feature-ranking performance of several kinds of error estimators for different classification rules, by considering all feature subsets and using 2 measures of performance. The results show that their ranking is strongly affected by inaccurate error estimation. Secondly, since enumerating all feature subsets is computationally impossible in practice, a suboptimal feature-selection algorithm is often employed to find from a large set of potential features a small subset with which to classify the samples. If error estimation is required for a feature-selection algorithm, then the impact of error estimation can be greater than the choice of algorithm. Lastly, we took a regression approach by comparing the classification errors for the optimal feature sets and the errors for the feature sets found by feature-selection algorithms. Our study shows that it is unlikely that feature selection will yield a feature set whose error is close to that of the optimal feature set, and the inability to find a good feature set should not lead to the conclusion that good feature sets do not exist.
APA, Harvard, Vancouver, ISO, and other styles
5

Coelho, Frederico Gualberto Ferreira. "Semi-supervised feature selection." Universidade Federal de Minas Gerais, 2013. http://hdl.handle.net/1843/BUOS-97NJ9S.

Full text
Abstract:
As data acquisition has become relatively easy and inexpensive, data sets are becoming extremely large, both in relation to the number of variables, and on the number of instances. However, the same is not true for labeled instances . Usually, the cost to obtain these labels is very high, and for this reason, unlabeled data represent the majority of instances, especially when compared with the amount of labeled data. Using such data requires special care, since several problems arise with the dimensionality increase and the lack of labels. Reducing the size of the data is thus a primordial need. In the midst of its outstanding features, usually we found irrelevant and redundant variables, which can and should be eliminated. In attempt to identify these variables, to despise the unlabeled data, implementing only supervised strategies, is a loss of any structural information that can be useful. Likewise, ignoring the labeled data by implementing only unsupervised methods is also a loss of information. In this context, the application of a semi-supervised approach is very suitable, where one can try to take advantage of the best benefits that each type of data has to offer. We are working on the problem of semi-supervised feature selection by two different approaches, but it may eventually complement each other later. The problem can be addressed in the context of feature clustering, grouping similar variables and discarding the irrelevant ones. On the other hand, we address the problem through a multi-objective approach, since we have arguments that clearly establish its multi-objective nature. In the first approach, a similarity measure capable to take into account both the labeled and unlabeled data, based on mutual information, is developed as well, a criterion based on this measure for clustering and discarding variables. Also the principle of homogeneity between labels and data clusters is exploited and two semi-supervised feature selection methods are developed. Finally a mutual information estimator for a mixed set of discrete and continuous variables is developed as a secondary contribution. In the multi-objective approach, the proposal is try to solve both the problem of feature selection and function approximation, at the same time. The proposed method includes considering different weight vector norms for each layer of a Multi Layer Perceptron (MLP) neural networks, the independent training of each layer and the definition of objective functions, that are able to eliminate irrelevant features.
Como a aquisição de dados tem se tornado relativamente mais fácil e barata, o conjunto de dados tem adquirido dimensões extremamente grandes, tanto em relação ao número de variáveis, bem como em relação ao número de instâncias. Contudo, o mesmo não ocorre com os rótulos de cada instância. O custo para se obter estes rótulos é, via de regra, muito alto, e por causa disto, dados não rotulados são a grande maioria, principalmente quando comparados com a quanti-dade de dados rotulados. A utilização destes dados requer cuidados especiais uma vez que vários problemas surgem com o aumento da dimensionalidade e com a escassez de rótulos. Reduzir a dimensão dos dados é então uma necessidade primordial. Em meio às suas características mais relevantes, usualmente encontramos variáveis redundantes e mesmo irrelevantes, que podem e devem ser eliminadas. Na procura destas variáveis, ao desprezar os dados não rotulados, implementando-se apenas estratégias supervisionadas, abrimos mão de informações estruturais que podem ser úteis. Da mesma forma, desprezar os dados rotulados implementando-se apenas métodos não supervisionados é igualmente disperdício de informação. Neste contexto, a aplicação de uma abordagem semi-supervisionada é bastante apropriada, onde pode-se tentar aproveitar o que cada tipo de dado tem de melhor a oferecer. Estamos trabalhando no problema de seleção de características semi-supervisionada através de duas abordagens distintas, mas que podem, eventualmente se complementarem mais à frente. O problema pode ser abordado num contexto de agrupamento de características, agrupando variáveis semelhantes e desprezando as irrelevantes. Por outro lado, podemos abordar o problema através de uma metodologia multiobjetiva, uma vez que temos argumentos estabelecendo claramente esta sua natureza multiobjetiva. Na primeira abordagem, uma medida de semelhança capaz de levar em consideração tanto os dados rotulados como os não rotulados, baseado na informação mútua, está sendo desenvolvida, bem como, um critério, baseado nesta medida, para agrupamento e eliminação de variáveis. Também o princípio da homogeneidade entre os rótulos e os clusters de dados é explorado e dois métodos semissupervisionados de seleção de características são desenvolvidos. Finalmente um estimador de informaçã mútua para um conjunto misto de variáveis discretas e contínuas é desenvolvido e constitue uma contribuição secundária do trabalho. Na segunda abordagem, a proposta é tentar resolver o problema de seleção de características e de aproximação de funções ao mesmo tempo. O método proposto inclue a consideração de normas diferentes para cada camada de uma rede MLP, pelo treinamento independente de cada camada e pela definição de funções objetivo que sejam capazes de maximizar algum índice de relevância das variáveis.
APA, Harvard, Vancouver, ISO, and other styles
6

Garnes, Øystein Løhre. "Feature Selection for Text Categorisation." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9017.

Full text
Abstract:

Text categorization is the task of discovering the category or class text documents belongs to, or in other words spotting the correct topic for text documents. While there today exists many machine learning schemes for building automatic classifiers, these are typically resource demanding and do not always achieve the best results when given the whole contents of the documents. A popular solution to these problems is called feature selection. The features (e.g. terms) in a document collection are given weights based on a simple scheme, and then ranked by these weights. Next, each document is represented using only the top ranked features, typically only a few percent of the features. The classifier is then built in considerably less time, and might even improve accuracy. In situations where the documents can belong to one of a series of categories, one can either build a multi-class classifier and use one feature set for all categories, or one can split the problem into a series of binary categorization tasks (deciding if documents belong to a category or not) and create one ranked feature subset for each category/classifier. Many feature selection metrics have been suggested over the last decades, including supervised methods that make use of a manually pre-categorized set of training documents, and unsupervised methods that need only training documents of the same type or collection that is to be categorized. While many of these look promising, there has been a lack of large-scale comparison experiments. Also, several methods have been proposed the last two years. Moreover, most evaluations are conducted on a set of binary tasks instead of a multi-class task as this often gives better results, although multi-class categorization with a joint feature set often is used in operational environments. In this report, we present results from the comparison of 16 feature selection methods (in addition to random selection) using various feature set sizes. Of these, 5 were unsupervised , and 11 were supervised. All methods are tested on both a Naive Bayes (NB) classifier and a Support Vector Machine (SVM) classifier. We conducted multi-class experiments using a collection with 20 non-overlapping categories, and each feature selection method produced feature sets common for all the categories. We also combined feature selection methods and evaluated their joint efforts. We found that the classical supervised methods had the best performance, including Chi Square, Information Gain and Mutual Information. The Chi Square variant GSS coefficient was also among the top performers. Odds Ratio showed excellent performance for NB, but not for SVM. The three unsupervised methods Collection Frequency, Collection Frequency Inverse Document Frequency and Term Frequency Document Frequency all showed performances close to the best group. The Bi-Normal Separation metric produced excellent results for the smallest feature subsets. The weirdness factor performed several times better than random selection, but was not among the top performing group. Some combination experiments achieved better results than each method alone, but the majority did not. The top performers Chi square and GSS coefficient classified more documents when used together than alone.Four of the five combinations that showed increase in performance included the BNS metric.

APA, Harvard, Vancouver, ISO, and other styles
7

Pradhananga, Nripendra. "Effective Linear-Time Feature Selection." The University of Waikato, 2007. http://hdl.handle.net/10289/2315.

Full text
Abstract:
The classification learning task requires selection of a subset of features to represent patterns to be classified. This is because the performance of the classifier and the cost of classification are sensitive to the choice of the features used to construct the classifier. Exhaustive search is impractical since it searches every possible combination of features. The runtime of heuristic and random searches are better but the problem still persists when dealing with high-dimensional datasets. We investigate a heuristic, forward, wrapper-based approach, called Linear Sequential Selection, which limits the search space at each iteration of the feature selection process. We introduce randomization in the search space. The algorithm is called Randomized Linear Sequential Selection. Our experiments demonstrate that both methods are faster, find smaller subsets and can even increase the classification accuracy. We also explore the idea of ensemble learning. We have proposed two ensemble creation methods, Feature Selection Ensemble and Random Feature Ensemble. Both methods apply a feature selection algorithm to create individual classifiers of the ensemble. Our experiments have shown that both methods work well with high-dimensional data.
APA, Harvard, Vancouver, ISO, and other styles
8

Cheng, Iunniang. "Hybrid Methods for Feature Selection." TopSCHOLAR®, 2013. http://digitalcommons.wku.edu/theses/1244.

Full text
Abstract:
Feature selection is one of the important data preprocessing steps in data mining. The feature selection problem involves finding a feature subset such that a classification model built only with this subset would have better predictive accuracy than model built with a complete set of features. In this study, we propose two hybrid methods for feature selection. The best features are selected through either the hybrid methods or existing feature selection methods. Next, the reduced dataset is used to build classification models using five classifiers. The classification accuracy was evaluated in terms of the area under the Receiver Operating Characteristic (ROC) curve (AUC) performance metric. The proposed methods have been shown empirically to improve the performance of existing feature selection methods.
APA, Harvard, Vancouver, ISO, and other styles
9

Athanasakis, D. "Feature selection in computational biology." Thesis, University College London (University of London), 2014. http://discovery.ucl.ac.uk/1432346/.

Full text
Abstract:
This thesis concerns feature selection, with a particular emphasis on the computational biology domain and the possibility of non-linear interaction between features. Towards this it establishes a two-step approach, where the first step is feature selection, followed by the learning of a kernel machine in this reduced representation. Optimization of kernel target alignment is proposed as a model selection criterion and its properties are established for a number of feature selection algorithms, including some novel variants of stability selection. The thesis further studies greedy and stochastic approaches for optimizing alignment, propos- ing a fast stochastic method with substantial probabilistic guarantees. The proposed stochastic method compares favorably to its deterministic counterparts in terms of computational complexity and resulting accuracy. The characteristics of this stochastic proposal in terms of computational complexity and applicabil- ity to multi-class problems make it invaluable to a deep learning architecture which we propose. Very encouraging results of this architecture in a recent challenge dataset further justify this approach, with good further results on a signal peptide cleavage prediction task. These proposals are evaluated in terms of generalization accuracy, interpretability and numerical stability of the models, and speed on a number of real datasets arising from infectious disease bioinfor- matics, with encouraging results.
APA, Harvard, Vancouver, ISO, and other styles
10

Sarkar, Saurabh. "Feature Selection with Missing Data." University of Cincinnati / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1378194989.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Feature selection"

1

Liu, Huan, and Hiroshi Motoda, eds. Feature Extraction, Construction and Selection. Boston, MA: Springer US, 1998. http://dx.doi.org/10.1007/978-1-4615-5725-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Cakmakov, Dusan. Feature selection for pattern recognition. Skopje: Informa, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Saunders, Craig, Marko Grobelnik, Steve Gunn, and John Shawe-Taylor, eds. Subspace, Latent Structure and Feature Selection. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11752790.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Bolón-Canedo, Verónica, Noelia Sánchez-Maroño, and Amparo Alonso-Betanzos. Feature Selection for High-Dimensional Data. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-21858-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wan, Cen. Hierarchical Feature Selection for Knowledge Discovery. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-319-97919-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

1958-, Liu Huan, ed. Spectral feature selection for data mining. Boca Raton, FL: CRC Press, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Stańczyk, Urszula, and Lakhmi C. Jain, eds. Feature Selection for Data and Pattern Recognition. Berlin, Heidelberg: Springer Berlin Heidelberg, 2015. http://dx.doi.org/10.1007/978-3-662-45620-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bolón-Canedo, Verónica, and Amparo Alonso-Betanzos. Recent Advances in Ensembles for Feature Selection. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-90080-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Lu, Rui. Feature Selection for High Dimensional Causal Inference. [New York, N.Y.?]: [publisher not identified], 2020.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Liu, Huan, and Hiroshi Motoda. Feature Selection for Knowledge Discovery and Data Mining. Boston, MA: Springer US, 1998. http://dx.doi.org/10.1007/978-1-4615-5689-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Feature selection"

1

Verma, Nishchal K., and Al Salour. "Feature Selection." In Studies in Systems, Decision and Control, 175–200. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-0512-6_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

De Silva, Anthony Mihirana, and Philip H. W. Leong. "Feature Selection." In SpringerBriefs in Applied Sciences and Technology, 13–24. Singapore: Springer Singapore, 2015. http://dx.doi.org/10.1007/978-981-287-411-5_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Bolón-Canedo, Verónica, and Amparo Alonso-Betanzos. "Feature Selection." In Intelligent Systems Reference Library, 13–37. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-90080-3_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

García, Salvador, Julián Luengo, and Francisco Herrera. "Feature Selection." In Intelligent Systems Reference Library, 163–93. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-10247-4_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Brank, Janez, Dunja Mladenić, Marko Grobelnik, Huan Liu, Dunja Mladenić, Peter A. Flach, Gemma C. Garriga, Hannu Toivonen, and Hannu Toivonen. "Feature Selection." In Encyclopedia of Machine Learning, 402–6. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_306.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Sun, Chenglei. "Feature Selection." In Encyclopedia of Systems Biology, 737. New York, NY: Springer New York, 2013. http://dx.doi.org/10.1007/978-1-4419-9863-7_431.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Cornejo, Roger. "Feature Selection." In Dynamic Oracle Performance Analytics, 79–89. Berkeley, CA: Apress, 2018. http://dx.doi.org/10.1007/978-1-4842-4137-0_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Elgendi, Mohamed. "Feature Selection." In PPG Signal Analysis, 165–93. Boca Raton : Taylor & Francis, [2018]: CRC Press, 2020. http://dx.doi.org/10.1201/9780429449581-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wright, Marvin N. "Feature Selection." In Applied Machine Learning Using mlr3 in R, 146–60. Boca Raton: Chapman and Hall/CRC, 2023. http://dx.doi.org/10.1201/9781003402848-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Ros, Frederic, and Rabia Riad. "Feature selection." In Unsupervised and Semi-Supervised Learning, 27–44. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-48743-9_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Feature selection"

1

Duan, Xuan, Songbai Liu, Junkai Ji, Lingjie Li, Qiuzhen Lin, and Kay Chen Tan. "Evolutionary Multiobjective Feature Selection Assisted by Unselected Features." In 2024 IEEE Congress on Evolutionary Computation (CEC), 1–8. IEEE, 2024. http://dx.doi.org/10.1109/cec60901.2024.10611992.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Juanyan, and Mustafa Bilgic. "Context-Aware Feature Selection and Classification." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/480.

Full text
Abstract:
We propose a joint model that performs instance-level feature selection and classification. For a given case, the joint model first skims the full feature vector, decides which features are relevant for that case, and makes a classification decision using only the selected features, resulting in compact, interpretable, and case-specific classification decisions. Because the selected features depend on the case at hand, we refer to this approach as context-aware feature selection and classification. The model can be trained on instances that are annotated by experts with both class labels and instance-level feature selections, so it can select instance-level features that humans would use. Experiments on several datasets demonstrate that the proposed model outperforms eight baselines on a combined classification and feature selection measure, and is able to better emulate the ground-truth instance-level feature selections. The supplementary materials are available at https://github.com/IIT-ML/IJCAI23-CFSC.
APA, Harvard, Vancouver, ISO, and other styles
3

Chaudhary, Seema, Sangeeta Kakarwal, and Chitra Gaikwad. "Feature Selection." In DSMLAI '21': International Conference on Data Science, Machine Learning and Artificial Intelligence. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3484824.3484881.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Haiguang, Xindong Wu, Zhao Li, and Wei Ding. "Group Feature Selection with Streaming Features." In 2013 IEEE International Conference on Data Mining (ICDM). IEEE, 2013. http://dx.doi.org/10.1109/icdm.2013.137.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wu, Junfang, and Chao Li. "Feature Selection Based on Features Unit." In 2017 4th International Conference on Information Science and Control Engineering (ICISCE). IEEE, 2017. http://dx.doi.org/10.1109/icisce.2017.76.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Nisar, Shibli, and Muhammad Tariq. "Intelligent feature selection using hybrid based feature selection method." In 2016 Sixth International Conference on Innovative Computing Technology (INTECH). IEEE, 2016. http://dx.doi.org/10.1109/intech.2016.7845025.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Li, Jirong. "Feature Selection Based on Correlation between Fuzzy Features and Optimal Fuzzy-Valued Feature Subset Selection." In 2008 Fourth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP). IEEE, 2008. http://dx.doi.org/10.1109/iih-msp.2008.292.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Joshi, Alok A., Peter H. Meckl, Galen B. King, and Kristofer Jennings. "Information-Theoretic Sensor Subset Selection: Application to Signal-Based Fault Isolation in Diesel Engines." In ASME 2006 International Mechanical Engineering Congress and Exposition. ASMEDC, 2006. http://dx.doi.org/10.1115/imece2006-15903.

Full text
Abstract:
In this paper a stepwise information-theoretic feature selector is designed and implemented to reduce the dimension of a data set without losing pertinent information. The effectiveness of the proposed feature selector is demonstrated by selecting features from forty three variables monitored on a set of heavy duty diesel engines and then using this feature space for classification of faults in these engines. Using a cross-validation technique, the effects of various classification methods (linear regression, quadratic discriminants, probabilistic neural networks, and support vector machines) and feature selection methods (regression subset selection, RV-based selection by simulated annealing, and information-theoretic selection) are compared based on the percentage misclassification. The information-theoretic feature selector combined with the probabilistic neural network achieved an average classification accuracy of 90%, which was the best performance of any combination of classifiers and feature selectors under consideration.
APA, Harvard, Vancouver, ISO, and other styles
9

Fei, Hongliang, Brian Quanz, and Jun Huan. "Regularization and feature selection for networked features." In the 19th ACM international conference. New York, New York, USA: ACM Press, 2010. http://dx.doi.org/10.1145/1871437.1871756.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Xiao, Di, and Junfeng Zhang. "Importance Degree of Features and Feature Selection." In 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, 2009. http://dx.doi.org/10.1109/fskd.2009.625.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Feature selection"

1

Sisto, A., and C. Kamath. Ensemble Feature Selection in Scientific Data Analysis. Office of Scientific and Technical Information (OSTI), September 2013. http://dx.doi.org/10.2172/1097710.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Seo, Young-Woo, Anupriya Ankolekar, and Katia Sycara. Feature Selection for Extracting Semantically Rich Words. Fort Belvoir, VA: Defense Technical Information Center, March 2004. http://dx.doi.org/10.21236/ada597268.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Plinski, M. J. License Application Design Selection Feature Report:Ceramic Coatings. Office of Scientific and Technical Information (OSTI), March 1999. http://dx.doi.org/10.2172/762894.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Massari, J. R. License Application Design Selection Feature Report: Additives and Fillers Design Feature 19. Office of Scientific and Technical Information (OSTI), March 1999. http://dx.doi.org/10.2172/762917.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Tang, J. S. License Application Design Selection Feature Report: Waste Package Self Shielding Design Feature 13. Office of Scientific and Technical Information (OSTI), March 2000. http://dx.doi.org/10.2172/752783.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Silapachote, Piyanuch, Deepak R. Karuppiah, and Allen R. Hanson. Feature Selection Using Adaboost for Face Expression Recognition. Fort Belvoir, VA: Defense Technical Information Center, January 2005. http://dx.doi.org/10.21236/ada438800.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chen, Maximillian Gene, Aleksander Bapst, Kirk Busche, Minh Do, Laura E. Matzen, Laura A. McNamara, and Raymond Yeh. Feature Selection and Inferential Procedures for Video Data. Office of Scientific and Technical Information (OSTI), September 2017. http://dx.doi.org/10.2172/1494165.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Pirozzo, David M., Philip A. Frederick, Shawn Hunt, Bernard Theisen, and Mike Del Rose. Spectrally Queued Feature Selection for Robotic Visual Odometery. Fort Belvoir, VA: Defense Technical Information Center, November 2010. http://dx.doi.org/10.21236/ada535663.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Scott M. Bennett. License Application Design Selection Feature Report Canistered Assemblies. Office of Scientific and Technical Information (OSTI), March 1999. http://dx.doi.org/10.2172/759933.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Nitti, D. A. License Application Design Selection Feature Report: Rod Consolidation. Office of Scientific and Technical Information (OSTI), June 1999. http://dx.doi.org/10.2172/762903.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography