To see the other types of publications on this topic, follow the link: UCI dataset.

Journal articles on the topic 'UCI dataset'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'UCI dataset.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mitra, Malay, and R. K. Samanta. "A Study on UCI Hepatitis Disease Dataset Using Soft Computing." Modelling, Measurement and Control C 78, no. 4 (December 30, 2017): 467–77. http://dx.doi.org/10.18280/mmc_c.780405.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kumar, Ajay, and Indranath Chatterjee. "Data Mining: An experimental approach with WEKA on UCI Dataset." International Journal of Computer Applications 138, no. 13 (March 17, 2016): 23–28. http://dx.doi.org/10.5120/ijca2016909050.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Naz, Mehreen, Kashif Zafar, and Ayesha Khan. "Ensemble Based Classification of Sentiments Using Forest Optimization Algorithm." Data 4, no. 2 (May 23, 2019): 76. http://dx.doi.org/10.3390/data4020076.

Full text
Abstract:
Feature subset selection is a process to choose a set of relevant features from a high dimensionality dataset to improve the performance of classifiers. The meaningful words extracted from data forms a set of features for sentiment analysis. Many evolutionary algorithms, like the Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), have been applied to feature subset selection problem and computational performance can still be improved. This research presents a solution to feature subset selection problem for classification of sentiments using ensemble-based classifiers. It consists of a hybrid technique of minimum redundancy and maximum relevance (mRMR) and Forest Optimization Algorithm (FOA)-based feature selection. Ensemble-based classification is implemented to optimize the results of individual classifiers. The Forest Optimization Algorithm as a feature selection technique has been applied to various classification datasets from the UCI machine learning repository. The classifiers used for ensemble methods for UCI repository datasets are the k-Nearest Neighbor (k-NN) and Naïve Bayes (NB). For the classification of sentiments, 15–20% improvement has been recorded. The dataset used for classification of sentiments is Blitzer’s dataset consisting of reviews of electronic products. The results are further improved by ensemble of k-NN, NB, and Support Vector Machine (SVM) with an accuracy of 95% for the classification of sentiment tasks.
APA, Harvard, Vancouver, ISO, and other styles
4

Naz, Aqdas, Muhammad Javed, Nadeem Javaid, Tanzila Saba, Musaed Alhussein, and Khursheed Aurangzeb. "Short-Term Electric Load and Price Forecasting Using Enhanced Extreme Learning Machine Optimization in Smart Grids." Energies 12, no. 5 (March 5, 2019): 866. http://dx.doi.org/10.3390/en12050866.

Full text
Abstract:
A Smart Grid (SG) is a modernized grid to provide efficient, reliable and economic energy to the consumers. Energy is the most important resource in the world. An efficient energy distribution is required as smart devices are increasing dramatically. The forecasting of electricity consumption is supposed to be a major constituent to enhance the performance of SG. Various learning algorithms have been proposed to solve the forecasting problem. The sole purpose of this work is to predict the price and load efficiently. The first technique is Enhanced Logistic Regression (ELR) and the second technique is Enhanced Recurrent Extreme Learning Machine (ERELM). ELR is an enhanced form of Logistic Regression (LR), whereas, ERELM optimizes weights and biases using a Grey Wolf Optimizer (GWO). Classification and Regression Tree (CART), Relief-F and Recursive Feature Elimination (RFE) are used for feature selection and extraction. On the basis of selected features, classification is performed using ELR. Cross validation is done for ERELM using Monte Carlo and K-Fold methods. The simulations are performed on two different datasets. The first dataset, i.e., UMass Electric Dataset is multi-variate while the second dataset, i.e., UCI Dataset is uni-variate. The first proposed model performed better with UMass Electric Dataset than UCI Dataset and the accuracy of second model is better with UCI than UMass. The prediction accuracy is analyzed on the basis of four different performance metrics: Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), Mean Square Error (MSE) and Root Mean Square Error (RMSE). The proposed techniques are then compared with four benchmark schemes. The comparison is done to verify the adaptivity of the proposed techniques. The simulation results show that the proposed techniques outperformed benchmark schemes. The proposed techniques efficiently increased the prediction accuracy of load and price. However, the computational time is increased in both scenarios. ELR achieved almost 5% better results than Convolutional Neural Network (CNN) and almost 3% than LR. While, ERELM achieved almost 6% better results than ELM and almost 5% than RELM. However, the computational time is almost 20% increased with ELR and 50% with ERELM. Scalability is also addressed for the proposed techniques using half-yearly and yearly datasets. Simulation results show that ELR gives 5% better results while, ERELM gives 6% better results when used for yearly dataset.
APA, Harvard, Vancouver, ISO, and other styles
5

Dash, Ch Sanjeev Kumar, Ajit Kumar Behera, Sarat Chandra Nayak, Satchidananda Dehuri, and Sung-Bae Cho. "An Integrated CRO and FLANN Based Classifier for a Non-Imputed and Inconsistent Dataset." International Journal on Artificial Intelligence Tools 28, no. 03 (May 2019): 1950013. http://dx.doi.org/10.1142/s0218213019500131.

Full text
Abstract:
This paper presents an integrated approach by considering chemical reaction optimization (CRO) and functional link artificial neural networks (FLANNs) for building a classifier from the dataset with missing value, inconsistent records, and noisy instances. Here, imputation is carried out based on the known value of two nearest neighbors to address dataset plagued with missing values. The probabilistic approach is used to remove the inconsistency from either of the datasets like original or imputed. The resulting dataset is then given as an input to boosted instance selection approach for selection of relevant instances to reduce the size of the dataset without loss of generality and compromising classification accuracy. Finally, the transformed dataset (i.e., from non-imputed and inconsistent dataset to imputed and consistent dataset) is used for developing a classifier based on CRO trained FLANN. The method is evaluated extensively through a few bench-mark datasets obtained from University of California, Irvine (UCI) repository. The experimental results confirm that our preprocessing tasks along with integrated approach can be a promising alternative tool for mitigating missing value, inconsistent records, and noisy instances.
APA, Harvard, Vancouver, ISO, and other styles
6

Jomaa, Hadi S., Lars Schmidt-Thieme, and Josif Grabocka. "Dataset2Vec: learning dataset meta-features." Data Mining and Knowledge Discovery 35, no. 3 (February 25, 2021): 964–85. http://dx.doi.org/10.1007/s10618-021-00737-9.

Full text
Abstract:
AbstractMeta-learning, or learning to learn, is a machine learning approach that utilizes prior learning experiences to expedite the learning process on unseen tasks. As a data-driven approach, meta-learning requires meta-features that represent the primary learning tasks or datasets, and are estimated traditonally as engineered dataset statistics that require expert domain knowledge tailored for every meta-task. In this paper, first, we propose a meta-feature extractor called Dataset2Vec that combines the versatility of engineered dataset meta-features with the expressivity of meta-features learned by deep neural networks. Primary learning tasks or datasets are represented as hierarchical sets, i.e., as a set of sets, esp. as a set of predictor/target pairs, and then a DeepSet architecture is employed to regress meta-features on them. Second, we propose a novel auxiliary meta-learning task with abundant data called dataset similarity learning that aims to predict if two batches stem from the same dataset or different ones. In an experiment on a large-scale hyperparameter optimization task for 120 UCI datasets with varying schemas as a meta-learning task, we show that the meta-features of Dataset2Vec outperform the expert engineered meta-features and thus demonstrate the usefulness of learned meta-features for datasets with varying schemas for the first time.
APA, Harvard, Vancouver, ISO, and other styles
7

Al-Sarem, Mohammed, Faisal Saeed, Zeyad Ghaleb Al-Mekhlafi, Badiea Abdulkarem Mohammed, Tawfik Al-Hadhrami, Mohammad T. Alshammari, Abdulrahman Alreshidi, and Talal Sarheed Alshammari. "An Optimized Stacking Ensemble Model for Phishing Websites Detection." Electronics 10, no. 11 (May 28, 2021): 1285. http://dx.doi.org/10.3390/electronics10111285.

Full text
Abstract:
Security attacks on legitimate websites to steal users’ information, known as phishing attacks, have been increasing. This kind of attack does not just affect individuals’ or organisations’ websites. Although several detection methods for phishing websites have been proposed using machine learning, deep learning, and other approaches, their detection accuracy still needs to be enhanced. This paper proposes an optimized stacking ensemble method for phishing website detection. The optimisation was carried out using a genetic algorithm (GA) to tune the parameters of several ensemble machine learning methods, including random forests, AdaBoost, XGBoost, Bagging, GradientBoost, and LightGBM. The optimized classifiers were then ranked, and the best three models were chosen as base classifiers of a stacking ensemble method. The experiments were conducted on three phishing website datasets that consisted of both phishing websites and legitimate websites—the Phishing Websites Data Set from UCI (Dataset 1); Phishing Dataset for Machine Learning from Mendeley (Dataset 2, and Datasets for Phishing Websites Detection from Mendeley (Dataset 3). The experimental results showed an improvement using the optimized stacking ensemble method, where the detection accuracy reached 97.16%, 98.58%, and 97.39% for Dataset 1, Dataset 2, and Dataset 3, respectively.
APA, Harvard, Vancouver, ISO, and other styles
8

Setiawati, Intan, Adityo Permana, and Arief Hermawan. "IMPLEMENTASI DECISION TREE UNTUK MENDIAGNOSIS PENYAKIT LIVER." Journal of Information System Management (JOISM) 1, no. 1 (July 31, 2019): 13–17. http://dx.doi.org/10.24076/joism.2019v1i1.17.

Full text
Abstract:
Hati merupakan salah satu organ manusia yang paling penting. UCI Machine Learning Repository mempunyai banyak dataset, salah satunya adalah dataset ILPD (Indian Liver Patient Dataset). Penelitian ini membahas tentang klasifikasi penyakit liver pada dataset ILPD menggunakan Algoritma Decision Tree C4.5. Berdasarkan hasil pengolahan yang dilakukan, didapatkan bahwa Algoritma Decision Tree C4.5 menghaasilkan nilai akurasi sebesar 72.67% dan juga membuktikan bahwa dari 11 variabel penyakit liver yang ada pada dataset ILPD, hanya 2 variabel (Almine Alminotransferase) yang menjadi pokok dalam penentuan penyakit liver.
APA, Harvard, Vancouver, ISO, and other styles
9

Mabuni, D., and S. Aquter Babu. "High Accurate and a Variant of k-fold Cross Validation Technique for Predicting the Decision Tree Classifier Accuracy." International Journal of Innovative Technology and Exploring Engineering 10, no. 2 (January 10, 2021): 105–10. http://dx.doi.org/10.35940/ijitee.c8403.0110321.

Full text
Abstract:
In machine learning data usage is the most important criterion than the logic of the program. With very big and moderate sized datasets it is possible to obtain robust and high classification accuracies but not with small and very small sized datasets. In particular only large training datasets are potential datasets for producing robust decision tree classification results. The classification results obtained by using only one training and one testing dataset pair are not reliable. Cross validation technique uses many random folds of the same dataset for training and validation. In order to obtain reliable and statistically correct classification results there is a need to apply the same algorithm on different pairs of training and validation datasets. To overcome the problem of the usage of only a single training dataset and a single testing dataset the existing k-fold cross validation technique uses cross validation plan for obtaining increased decision tree classification accuracy results. In this paper a new cross validation technique called prime fold is proposed and it is experimentally tested thoroughly and then verified correctly using many bench mark UCI machine learning datasets. It is observed that the prime fold based decision tree classification accuracy results obtained after experimentation are far better than the existing techniques of finding decision tree classification accuracies.
APA, Harvard, Vancouver, ISO, and other styles
10

Homjandee, Suvaporn, and Krung Sinapiromsaran. "A Random Forest with Minority Condensation and Decision Trees for Class Imbalanced Problems." WSEAS TRANSACTIONS ON SYSTEMS AND CONTROL 16 (September 16, 2021): 502–7. http://dx.doi.org/10.37394/23203.2021.16.46.

Full text
Abstract:
Building an effective classifier that could classify a target or class of instances in a dataset from historical data has played an important role in machine learning for a decade. The standard classification algorithm has difficulty generating an appropriate classifier when faced with an imbalanced dataset. In 2019, the efficient splitting measure, minority condensation entropy (MCE) [1] is proposed that could build a decision tree to classify minority instances. The aim of this research is to extend the concept of a random forest to use both decision trees and minority condensation trees. The algorithm will build a minority condensation tree from a bootstrapped dataset maintaining all minorities while it will build a decision tree from a bootstrapped dataset of a balanced dataset. The experimental results on synthetic datasets apparent the results that confirm this proposed algorithm compared with the standard random forest are suitable for dealing with the binary-class imbalanced problem. Furthermore, the experiment on real-world datasets from the UCI repository shows that this proposed algorithm constructs a random forest that outperforms other existing random forest algorithms based on the recall, the precision, the F-measure, and the Geometric mean
APA, Harvard, Vancouver, ISO, and other styles
11

Jain, Siddhartha, Ge Liu, Jonas Mueller, and David Gifford. "Maximizing Overall Diversity for Improved Uncertainty Estimates in Deep Ensembles." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 4264–71. http://dx.doi.org/10.1609/aaai.v34i04.5849.

Full text
Abstract:
The inaccuracy of neural network models on inputs that do not stem from the distribution underlying the training data is problematic and at times unrecognized. Uncertainty estimates of model predictions are often based on the variation in predictions produced by a diverse ensemble of models applied to the same input. Here we describe Maximize Overall Diversity (MOD), an approach to improve ensemble-based uncertainty estimates by encouraging larger overall diversity in ensemble predictions across all possible inputs. We apply MOD to regression tasks including 38 Protein-DNA binding datasets, 9 UCI datasets, and the IMDB-Wiki image dataset. We also explore variants that utilize adversarial training techniques and data density estimation. For out-of-distribution test examples, MOD significantly improves predictive performance and uncertainty calibration without sacrificing performance on test data drawn from same distribution as the training data. We also find that in Bayesian optimization tasks, the performance of UCB acquisition is improved via MOD uncertainty estimates.
APA, Harvard, Vancouver, ISO, and other styles
12

Althnian, Alhanoof, Duaa AlSaeed, Heyam Al-Baity, Amani Samha, Alanoud Bin Dris, Najla Alzakari, Afnan Abou Elwafa, and Heba Kurdi. "Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain." Applied Sciences 11, no. 2 (January 15, 2021): 796. http://dx.doi.org/10.3390/app11020796.

Full text
Abstract:
Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naïve Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.
APA, Harvard, Vancouver, ISO, and other styles
13

Shukla, Alok Kumar, Pradeep Singh, and Manu Vardhan. "A New Hybrid Feature Subset Selection Framework Based on Binary Genetic Algorithm and Information Theory." International Journal of Computational Intelligence and Applications 18, no. 03 (September 2019): 1950020. http://dx.doi.org/10.1142/s1469026819500202.

Full text
Abstract:
The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using k-nearest neighbor (k-NN) and highest classification accuracy is 99.05% on the ionosphere using support vector machine (SVM).
APA, Harvard, Vancouver, ISO, and other styles
14

P., Ashok, and G. M. Kadhar Nawaz. "Outlier Detection Method on UCI Repository Dataset by Entropy Based Rough K-means." Defence Science Journal 66, no. 2 (March 23, 2016): 113. http://dx.doi.org/10.14429/dsj.66.9463.

Full text
Abstract:
<p>Rough set theory is used to handle uncertainty and incomplete information by applying two sets, lower and upper approximation. In this paper, the clustering process is improved by adapting the preliminary centroid selection method on rough K-means (RKM) algorithm. The entropy based rough K-means (ERKM) method is developed by adapting entropy based preliminary centroids selection on RKM and executed and also validated by cluster validity indexes. An example shows that the ERKM performs effectively by selection of entropy based preliminary centroid. In addition, Outlier detection is an important task in data mining and very much different from the rest of the objects in the cluster. Entropy based rough outlier factor (EROF) method is used to detect outlier effectively for yeast dataset. An example shows that EROF detects outlier effectively on protein localisation sites and ERKM clustering algorithm performed effectively. Further, experimental readings show that the ERKM and EROF method outperformed the other methods.</p><p> </p>
APA, Harvard, Vancouver, ISO, and other styles
15

Li, Fengqi, Chuang Yu, Nanhai Yang, Feng Xia, Guangming Li, and Fatemeh Kaveh-Yazdy. "Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data." Scientific World Journal 2013 (2013): 1–9. http://dx.doi.org/10.1155/2013/875450.

Full text
Abstract:
Transductive graph-based semisupervised learning methods usually build an undirected graph utilizing both labeled and unlabeled samples as vertices. Those methods propagate label information of labeled samples to neighbors through their edges in order to get the predicted labels of unlabeled samples. Most popular semi-supervised learning approaches are sensitive to initial label distribution which happened in imbalanced labeled datasets. The class boundary will be severely skewed by the majority classes in an imbalanced classification. In this paper, we proposed a simple and effective approach to alleviate the unfavorable influence of imbalance problem by iteratively selecting a few unlabeled samples and adding them into the minority classes to form a balanced labeled dataset for the learning methods afterwards. The experiments on UCI datasets and MNIST handwritten digits dataset showed that the proposed approach outperforms other existing state-of-art methods.
APA, Harvard, Vancouver, ISO, and other styles
16

Nafea, Ohoud, Wadood Abdul, Ghulam Muhammad, and Mansour Alsulaiman. "Sensor-Based Human Activity Recognition with Spatio-Temporal Deep Learning." Sensors 21, no. 6 (March 18, 2021): 2141. http://dx.doi.org/10.3390/s21062141.

Full text
Abstract:
Human activity recognition (HAR) remains a challenging yet crucial problem to address in computer vision. HAR is primarily intended to be used with other technologies, such as the Internet of Things, to assist in healthcare and eldercare. With the development of deep learning, automatic high-level feature extraction has become a possibility and has been used to optimize HAR performance. Furthermore, deep-learning techniques have been applied in various fields for sensor-based HAR. This study introduces a new methodology using convolution neural networks (CNN) with varying kernel dimensions along with bi-directional long short-term memory (BiLSTM) to capture features at various resolutions. The novelty of this research lies in the effective selection of the optimal video representation and in the effective extraction of spatial and temporal features from sensor data using traditional CNN and BiLSTM. Wireless sensor data mining (WISDM) and UCI datasets are used for this proposed methodology in which data are collected through diverse methods, including accelerometers, sensors, and gyroscopes. The results indicate that the proposed scheme is efficient in improving HAR. It was thus found that unlike other available methods, the proposed method improved accuracy, attaining a higher score in the WISDM dataset compared to the UCI dataset (98.53% vs. 97.05%).
APA, Harvard, Vancouver, ISO, and other styles
17

Yao, Dengju, Jing Yang, and Xiaojuan Zhan. "An Improved Random Forest Algorithm for Class-Imbalanced Data Classification and its Application in PAD Risk Factors Analysis." Open Electrical & Electronic Engineering Journal 7, no. 1 (June 14, 2013): 62–70. http://dx.doi.org/10.2174/1874129001307010062.

Full text
Abstract:
The classification problem is one of the important research subjects in the field of machine learning. However, most machine learning algorithms train a classifier based on the assumption that the number of training examples of classes is almost equal. When a classifier was trained on imbalanced data, the performance of the classifier declined clearly. For resolving the class-imbalanced problem, an improved random forest algorithm was proposed based on sampling with replacement. We extracted multiple example subsets randomly with replacement from majority class, and the example number of extracted example subsets is as the same with minority class example dataset. Then, multiple new training datasets were constructed by combining the each exacted majority example subset and minority class dataset respectively, and multiple random forest classifiers were training on these training dataset. For a prediction example, the class was determined by majority voting of multiple random forest classifiers. The experimental results on five groups UCI datasets and a real clinical dataset show that the proposed method could deal with the class-imbalanced data problem and the improved random forest algorithm outperformed original random forest and other methods in literatures.
APA, Harvard, Vancouver, ISO, and other styles
18

Jamjoom, Mona. "The pertinent single-attribute-based classifier for small datasets classification." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 3 (June 1, 2020): 3227. http://dx.doi.org/10.11591/ijece.v10i3.pp3227-3234.

Full text
Abstract:
Classifying a dataset using machine learning algorithms can be a big challenge when the target is a small dataset. The OneR classifier can be used for such cases due to its simplicity and efficiency. In this paper, we revealed the power of a single attribute by introducing the pertinent single-attribute-based-heterogeneity-ratio classifier (SAB-HR) that used a pertinent attribute to classify small datasets. The SAB-HR’s used feature selection method, which used the Heterogeneity-Ratio (H-Ratio) measure to identify the most homogeneous attribute among the other attributes in the set. Our empirical results on 12 benchmark datasets from a UCI machine learning repository showed that the SAB-HR classifier significantly outperformed the classical OneR classifier for small datasets. In addition, using the H-Ratio as a feature selection criterion for selecting the single attribute was more effectual than other traditional criteria, such as Information Gain (IG) and Gain Ratio (GR).
APA, Harvard, Vancouver, ISO, and other styles
19

Spits Warnars, Harco Leslie Hendric. "Using Attribute Oriented Induction High Level Emerging Pattern (AOI-HEP) to Mine Frequent Patterns." International Journal of Electrical and Computer Engineering (IJECE) 6, no. 6 (December 1, 2016): 3037. http://dx.doi.org/10.11591/ijece.v6i6.10579.

Full text
Abstract:
<p><span lang="EN-US">Frequent patterns in Attribute Oriented Induction High level Emerging Pattern (AOI-HEP), are recognized when have maximum subsumption target (superset) into contrasting (subset) datasets (contrasting </span><span lang="EN-US">⊂</span><span lang="EN-US"> target) and having large High Emerging Pattern (HEP) growth rate and support in target dataset. HEP Frequent patterns had been successful mined with AOI-HEP upon 4 UCI machine learning datasets such as adult, breast cancer, census and IPUMS with the number of instances of 48842, 569, 2458285 and 256932 respectively and each dataset has concept hierarchies built from its five chosen attributes. There are 2 and 1 finding frequent patterns from adult and breast cancer datasets, while there is no frequent pattern from census and IPUMS datasets. The finding HEP frequent patterns from adult dataset are adult which have government workclass with an intermediate education (80.53%) and America as native country(33%). Meanwhile, the only 1 HEP frequent pattern from breast cancer dataset is breast cancer which have clump thickness type of AboutAverClump with cell size of VeryLargeSize(3.56%). Finding HEP frequent patterns with AOI-HEP are influenced by learning on high level concept in one of chosen attribute and extended experiment upon adult dataset where learn on marital-status attribute showed that there is no finding frequent pattern.</span></p>
APA, Harvard, Vancouver, ISO, and other styles
20

Spits Warnars, Harco Leslie Hendric. "Using Attribute Oriented Induction High Level Emerging Pattern (AOI-HEP) to Mine Frequent Patterns." International Journal of Electrical and Computer Engineering (IJECE) 6, no. 6 (December 1, 2016): 3037. http://dx.doi.org/10.11591/ijece.v6i6.pp3037-3046.

Full text
Abstract:
<p><span lang="EN-US">Frequent patterns in Attribute Oriented Induction High level Emerging Pattern (AOI-HEP), are recognized when have maximum subsumption target (superset) into contrasting (subset) datasets (contrasting </span><span lang="EN-US">⊂</span><span lang="EN-US"> target) and having large High Emerging Pattern (HEP) growth rate and support in target dataset. HEP Frequent patterns had been successful mined with AOI-HEP upon 4 UCI machine learning datasets such as adult, breast cancer, census and IPUMS with the number of instances of 48842, 569, 2458285 and 256932 respectively and each dataset has concept hierarchies built from its five chosen attributes. There are 2 and 1 finding frequent patterns from adult and breast cancer datasets, while there is no frequent pattern from census and IPUMS datasets. The finding HEP frequent patterns from adult dataset are adult which have government workclass with an intermediate education (80.53%) and America as native country(33%). Meanwhile, the only 1 HEP frequent pattern from breast cancer dataset is breast cancer which have clump thickness type of AboutAverClump with cell size of VeryLargeSize(3.56%). Finding HEP frequent patterns with AOI-HEP are influenced by learning on high level concept in one of chosen attribute and extended experiment upon adult dataset where learn on marital-status attribute showed that there is no finding frequent pattern.</span></p>
APA, Harvard, Vancouver, ISO, and other styles
21

Jais, Imran Khan Mohd, Amelia Ritahani Ismail, and Syed Qamrun Nisa. "Adam Optimization Algorithm for Wide and Deep Neural Network." Knowledge Engineering and Data Science 2, no. 1 (June 23, 2019): 41. http://dx.doi.org/10.17977/um018v2i12019p41-46.

Full text
Abstract:
The objective of this research is to evaluate the effects of Adam when used together with a wide and deep neural network. The dataset used was a diagnostic breast cancer dataset taken from UCI Machine Learning. Then, the dataset was fed into a conventional neural network for a benchmark test. Afterwards, the dataset was fed into the wide and deep neural network with and without Adam. It was found that there were improvements in the result of the wide and deep network with Adam. In conclusion, Adam is able to improve the performance of a wide and deep neural network.
APA, Harvard, Vancouver, ISO, and other styles
22

Jabor, Ali Hakem, and Ali Hussein Ali. "Dual Heuristic Feature Selection Based on Genetic Algorithm and Binary Particle Swarm Optimization." JOURNAL OF UNIVERSITY OF BABYLON for Pure and Applied Sciences 27, no. 1 (April 1, 2019): 171–83. http://dx.doi.org/10.29196/jubpas.v27i1.2106.

Full text
Abstract:
The features selection is one of the data mining tools that used to select the most important features of a given dataset. It contributes to save time and memory during the handling a given dataset. According to these principles, we have proposed features selection method based on mixing two metaheuristic algorithms Binary Particle Swarm Optimization and Genetic Algorithm work individually. The K-Nearest Neighbour (K-NN) is used as an objective function to evaluate the proposed features selection algorithm. The Dual Heuristic Feature Selection based on Genetic Algorithm and Binary Particle Swarm Optimization (DHFS) test, and compared with 26 well-known datasets of UCI machine learning. The numeric experiments result imply that the DHFS better performance compared with full features and that selected by the mentioned algorithms (Genetic Algorithm and Binary Particle Swarm Optimization).
APA, Harvard, Vancouver, ISO, and other styles
23

Nahato, Kindie Biredagn, Khanna Nehemiah Harichandran, and Kannan Arputharaj. "Knowledge Mining from Clinical Datasets Using Rough Sets and Backpropagation Neural Network." Computational and Mathematical Methods in Medicine 2015 (2015): 1–13. http://dx.doi.org/10.1155/2015/460189.

Full text
Abstract:
The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN) is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI) machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets.
APA, Harvard, Vancouver, ISO, and other styles
24

Nguyen, Sinh-Huy, and Van-Hung Le. "Standardized UCI-EGO Dataset for Evaluating 3D Hand Pose Estimation on the Point Cloud." Advances in Science, Technology and Engineering Systems Journal 6, no. 1 (January 2021): 1–9. http://dx.doi.org/10.25046/aj060101.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Chen, Guangchun, Juan Hu, Hong Peng, Jun Wang, and Xiangnian Huang. "A Spectral Clustering Algorithm Improved by P Systems." International Journal of Computers Communications & Control 13, no. 5 (September 29, 2018): 759–71. http://dx.doi.org/10.15837/ijccc.2018.5.3238.

Full text
Abstract:
Using spectral clustering algorithm is diffcult to find the clusters in the cases that dataset has a large difference in density and its clustering effect depends on the selection of initial centers. To overcome the shortcomings, we propose a novel spectral clustering algorithm based on membrane computing framework, called MSC algorithm, whose idea is to use membrane clustering algorithm to realize the clustering component in spectral clustering. A tissue-like P system is used as its computing framework, where each object in cells denotes a set of cluster centers and velocity-location model is used as the evolution rules. Under the control of evolutioncommunication mechanism, the tissue-like P system can obtain a good clustering partition for each dataset. The proposed spectral clustering algorithm is evaluated on three artiffcial datasets and ten UCI datasets, and it is further compared with classical spectral clustering algorithms. The comparison results demonstrate the advantage of the proposed spectral clustering algorithm.
APA, Harvard, Vancouver, ISO, and other styles
26

Jin-Mao, Wei, Wang Shu-Qin, and Wang Ming-Yang. "Novel Approach to Decision-Tree Construction." Journal of Advanced Computational Intelligence and Intelligent Informatics 8, no. 3 (May 20, 2004): 332–35. http://dx.doi.org/10.20965/jaciii.2004.p0332.

Full text
Abstract:
A new approach is presented, in which rough set theory is applied to select attributes as nodes of a decision tree. Initially, dataset is partitioned into subsets based on different condition attributes, then an attribute is chosen as a node for branching when the size of its corresponding implicit region is smaller than that of all other attributes. This approach is compared to the entropy-based method on some datasets from the UCI Machine Learning Database Repository, which instantiates the performance of the rough set approach. Statistical experiments showed that the proposed approach is feasible for decision-tree construction.
APA, Harvard, Vancouver, ISO, and other styles
27

Nayak, Suvra, Chhabi Panigrahi, Bibudhendu Pati, Sarmistha Nanda, and Meng-Yen Hsieh. "Comparative analysis of HAR datasets using classification algorithms." Computer Science and Information Systems, no. 00 (2021): 43. http://dx.doi.org/10.2298/csis201221043n.

Full text
Abstract:
In the current research and development era, Human Activity Recognition (HAR) plays a vital role in analyzing the movements and activities of a human being. The main objective of HAR is to infer the current behaviour by extracting previous information. Now-a-days, the continuous improvement of living condition of human beings changes human society dramatically. To detect the activities of human beings, various devices, such as smartphones and smart watches, use different types of sensors, such as multi modal sensors, non-video based and video-based sensors, and so on. Among the entire machine learning approaches, tasks in different applications adopt extensively classification techniques, in terms of smart homes by active and assisted living, healthcare, security and surveillance, making decisions, tele-immersion, forecasting weather, official tasks, and prediction of risk analysis in society. In this paper, we perform three classification algorithms, Sequential Minimal Optimization (SMO), Random Forest (RF), and Simple Logistic (SL) with the two HAR datasets, UCI HAR and WISDM, downloaded from the UCI repository. The experiment described in this paper uses the WEKA tool to evaluate performance with the matrices, Kappa statistics, relative absolute error, mean absolute error, ROC Area, and PRC Area by 10-fold cross validation technique. We also provide a comparative analysis of the classification algorithms with the two determined datasets by calculating the accuracy with precision, recall, and F-measure metrics. In the experimental results, all the three algorithms with the UCI HAR datasets achieve nearly the same accuracy of 98%.The RF algorithm with the WISDM dataset has the accuracy of 90.69%,better than the others.
APA, Harvard, Vancouver, ISO, and other styles
28

Li, Yibo, Chao Liu, Senyue Zhang, Wenan Tan, and Yanyan Ding. "Reproducing Polynomial Kernel Extreme Learning Machine." Journal of Advanced Computational Intelligence and Intelligent Informatics 21, no. 5 (September 20, 2017): 795–802. http://dx.doi.org/10.20965/jaciii.2017.p0795.

Full text
Abstract:
Conventional kernel support vector machine (KSVM) has the problem of slow training speed, and single kernel extreme learning machine (KELM) also has some performance limitations, for which this paper proposes a new combined KELM model that build by the polynomial kernel and reproducing kernel on Sobolev Hilbert space. This model combines the advantages of global and local kernel function and has fast training speed. At the same time, an efficient optimization algorithm called cuckoo search algorithm is adopted to avoid blindness and inaccuracy in parameter selection. Experiments were performed on bi-spiral benchmark dataset, Banana dataset, as well as a number of classification and regression datasets from the UCI benchmark repository illustrate the feasibility of the proposed model. It achieves the better robustness and generalization performance when compared to other conventional KELM and KSVM, which demonstrates its effectiveness and usefulness.
APA, Harvard, Vancouver, ISO, and other styles
29

Ma, Wenlu, and Han Liu. "Least Squares Support Vector Machine Regression Based on Sparse Samples and Mixture Kernel Learning." Information Technology and Control 50, no. 2 (June 17, 2021): 319–31. http://dx.doi.org/10.5755/j01.itc.50.2.27752.

Full text
Abstract:
Least squares support vector machine (LSSVM) is a machine learning algorithm based on statistical theory. Itsadvantages include robustness and calculation simplicity, and it has good performance in the data processingof small samples. The LSSVM model lacks sparsity and is unable to handle large-scale data problem, this articleproposes an LSSVM method based on mixture kernel learning and sparse samples. This algorithm reduces theinitial training set to a sub-dataset using a sparse selection strategy. It converts the single kernel function in theLSSVM model into a mixed kernel function and optimizes its parameters. The reduced sub-dataset is used fortraining LSSVM. Finally, a group of datasets in the UCI Machine Learning Repository were used to verify theeffectiveness of the proposed algorithm, which is applied to real-world power load data to achieve better fittingand improve the prediction accuracy.
APA, Harvard, Vancouver, ISO, and other styles
30

Orenes, Yolanda, Alejandro Rabasa, Jesus Javier Rodriguez-Sala, and Joaquin Sanchez-Soriano. "Benchmarking Analysis of the Accuracy of Classification Methods Related to Entropy." Entropy 23, no. 7 (July 1, 2021): 850. http://dx.doi.org/10.3390/e23070850.

Full text
Abstract:
In the machine learning literature we can find numerous methods to solve classification problems. We propose two new performance measures to analyze such methods. These measures are defined by using the concept of proportional reduction of classification error with respect to three benchmark classifiers, the random and two intuitive classifiers which are based on how a non-expert person could realize classification simply by applying a frequentist approach. We show that these three simple methods are closely related to different aspects of the entropy of the dataset. Therefore, these measures account somewhat for entropy in the dataset when evaluating the performance of classifiers. This allows us to measure the improvement in the classification results compared to simple methods, and at the same time how entropy affects classification capacity. To illustrate how these new performance measures can be used to analyze classifiers taking into account the entropy of the dataset, we carry out an intensive experiment in which we use the well-known J48 algorithm, and a UCI repository dataset on which we have previously selected a subset of the most relevant attributes. Then we carry out an extensive experiment in which we consider four heuristic classifiers, and 11 datasets.
APA, Harvard, Vancouver, ISO, and other styles
31

V., Yasaswini, and Santhi Baskaran. "An Optimization of Feature Selection for Classification using Bat Algorithm." International Journal of Recent Technology and Engineering 9, no. 6 (March 30, 2021): 39–43. http://dx.doi.org/10.35940/ijrte.f5331.039621.

Full text
Abstract:
Data mining is the action of searching the large existing database in order to get new and best information. It plays a major and vital role now-a-days in all sorts of fields like Medical, Engineering, Banking, Education and Fraud detection. In this paper Feature selection which is a part of Data mining is performed to do classification. The role of feature selection is in the context of deep learning and how it is related to feature engineering. Feature selection is a preprocessing technique which selects the appropriate features from the data set to get the accurate result and outcome for the classification. Natureinspired Optimization algorithms like Ant colony, Firefly, Cuckoo Search and Harmony Search showed better performance by giving the best accuracy rate with less number of features selected and also fine f-Measure value is noted. These algorithms are used to perform classification that accurately predicts the target class for each case in the data set. We propose a technique to get the optimized feature selection to perform classification using Meta Heuristic algorithms. We applied new and recent advanced optimized algorithm named Bat algorithm on UCI datasets that showed comparatively equal results with best performed existing firefly but with less number of features selected. The work is implemented using JAVA and the Medical dataset (UCI) has been used. These datasets were chosen due to nominal class features. The number of attributes, instances and classes varies from chosen dataset to represent different combinations. Classification is done using J48 classifier in WEKA tool. We demonstrate the comparative results of the presently used algorithms with the existing algorithms thoroughly.
APA, Harvard, Vancouver, ISO, and other styles
32

Wang, Zi-yang, Xiao-yi Luo, and Jun Liang. "A Label Noise Robust Stacked Auto-Encoder Algorithm for Inaccurate Supervised Classification Problems." Mathematical Problems in Engineering 2019 (May 14, 2019): 1–19. http://dx.doi.org/10.1155/2019/2182616.

Full text
Abstract:
In real applications, label noise and feature noise are two main noise sources. Similar to feature noise, label noise imposes great detriment on training classification models. Motivated by successful application of deep learning method in normal classification problems, this paper proposes a new framework called LNC-SDAE to handle those datasets corrupted with label noise, or so-called inaccurate supervision problems. The LNC-SDAE framework contains a preliminary label noise cleansing part and a stacked denoising auto-encoder. In preliminary label noise cleansing part, the K-fold cross-validation thought is applied for detecting and relabeling those mislabeled samples. After being preprocessed by label noise cleansing part, the cleansed training dataset is then input into the stacked denoising auto-encoder to learn robust representation for classification. A corrupted UCI standard dataset and a corrupted real industrial dataset are used for test, both of which contain a certain proportion of label noise (the ratio changes from 0% to 30%). The experiment results prove the effectiveness of LNC-SDAE, the representation learnt by which is shown robust.
APA, Harvard, Vancouver, ISO, and other styles
33

Hairani, Hairani, and Muhammad Innuddin. "Kombinasi Metode Correlated Naive Bayes dan Metode Seleksi Fitur Wrapper untuk Klasifikasi Data Kesehatan." Jurnal Teknik Elektro 11, no. 2 (April 27, 2020): 50–55. http://dx.doi.org/10.15294/jte.v11i2.23693.

Full text
Abstract:
Most features of health data that have many irrelevant features can reduce the performance of classification method. One health data that has many attributes is the Pima Indian Diabetes dataset and Thyroid. Diabetes is a deadly disease caused by the increasing of blood sugar because of the body's inability to produce enough insulin and its complications can lead to heart attacks and strokes. The purpose of this research is to do a combination of Correlated Naïve Bayes method and Wrapper-based feature selection to classification of health data. The stages of this research consist of several stages, namely; (1) the collection of Pima Indian Diabetes and Thyroid dataset from UCI Machine Learning Repository, (2) pre-processing data such as transformation, Scaling, and Wrapper-based feature selection, (3) classification using the Correlated Naive Bayes and Naive Bayes methods, and (4) performance test based on its accuracy using the 10-fold cross validation method. Based on the results, the combination of Correlated Naive Bayes method and Wrapper-based feature selection get the best accuracy for both datasets used. For Pima Indian Diabetes dataset, the accuracy is 71,4% and the Thyroid dataset accuracy is 79,38%. Thus, the combination of Correlated Naïve Bayes method and Wrapper-based feature selection result in better accuracy without feature selection with an increase of 4,1% for Pima Indian Diabetes dataset and 0,48% for the Thyroid dataset.
APA, Harvard, Vancouver, ISO, and other styles
34

Yang, Lingkai, Yinan Guo, and Jian Cheng. "Manifold Distance-Based Over-Sampling Technique for Class Imbalance Learning." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 10071–72. http://dx.doi.org/10.1609/aaai.v33i01.330110071.

Full text
Abstract:
Over-sampling technology for handling the class imbalanced problem generates more minority samples to balance the dataset size of different classes. However, sampling in original data space is ineffective as the data in different classes is overlapped or disjunct. Based on this, a new minority sample is presented in terms of the manifold distance rather than Euclidean distance. The overlapped majority and minority samples apt to distribute in fully disjunct subspaces from the view of manifold learning. Moreover, it can avoid generating samples between the minority data locating far away in manifold space. Experiments on 23 UCI datasets show that the proposed method has the better classification accuracy.
APA, Harvard, Vancouver, ISO, and other styles
35

Handayani, Putri Kurnia. "Penerapan Principal Component Analysis untuk Peningkatan Kinerja Algoritma Decision Tree pada Iris Dataset." Indonesian Journal of Technology, Informatics and Science (IJTIS) 1, no. 2 (June 30, 2020): 55–58. http://dx.doi.org/10.24176/ijtis.v1i2.4939.

Full text
Abstract:
Data mining merupakan salah bidang ilmu yang bermanfaat untuk pengenalan pola/knowledge yang tersimpan dalam database. Klasifikasi merupakan salah satu peran dalam bidang data mining. Termasuk ke dalam supervised learning, klasifikasi digunakan untuk memprediksi objek yang belum memiliki kelas/label. Penggunaan algoritma decision tree untuk proses mining dataset bunga iris dikarenakan kemudahan dalam representasi knowledge yang dihasilkan. Selain itu, decision tree juga termasuk ke dalam eager learner sehingga akurasi dari knowledge yang dihasilkan lebih baik. Penggunaan principal component analysis (PCA) dalam optimasi algoritma decision tree, dilakukan saat preprocessing dataset. PCA berfungsi untuk mereduksi dimensi, fitur yang saling berkorelasi akan dipertahankan. Penggunaan dataset publik bunga iris diambil dari UCI Repository. Berdasarkan hasil perhitungan, akurasi algoritma decision tree setelah dilakukan optimasi dengan PCA terhadap dataset bunga iris sebesar 95.33%.
APA, Harvard, Vancouver, ISO, and other styles
36

Untoro, Meida Cahyo, Mugi Praseptiawan, Mastuti Widianingsih, Ilham Firman Ashari, Aidil Afriansyah, and Oktafianto. "Evaluation of Decision Tree, K-NN, Naive Bayes and SVM with MWMOTE on UCI Dataset." Journal of Physics: Conference Series 1477 (March 2020): 032005. http://dx.doi.org/10.1088/1742-6596/1477/3/032005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Kartal, Serkan, Mustafa Oral, and Buse Melis Ozyildirim. "Pattern Layer Reduction for a Generalized Regression Neural Network by Using a Self–Organizing Map." International Journal of Applied Mathematics and Computer Science 28, no. 2 (June 1, 2018): 411–24. http://dx.doi.org/10.2478/amcs-2018-0031.

Full text
Abstract:
Abstract In a general regression neural network (GRNN), the number of neurons in the pattern layer is proportional to the number of training samples in the dataset. The use of a GRNN in applications that have relatively large datasets becomes troublesome due to the architecture and speed required. The great number of neurons in the pattern layer requires a substantial increase in memory usage and causes a substantial decrease in calculation speed. Therefore, there is a strong need for pattern layer size reduction. In this study, a self-organizing map (SOM) structure is introduced as a pre-processor for the GRNN. First, an SOM is generated for the training dataset. Second, each training record is labelled with the most similar map unit. Lastly, when a new test record is applied to the network, the most similar map units are detected, and the training data that have the same labels as the detected units are fed into the network instead of the entire training dataset. This scheme enables a considerable reduction in the pattern layer size. The proposed hybrid model was evaluated by using fifteen benchmark test functions and eight different UCI datasets. According to the simulation results, the proposed model significantly simplifies the GRNN’s structure without any performance loss.
APA, Harvard, Vancouver, ISO, and other styles
38

Nanni, Loris, Sheryl Brahnam, Stefano Ghidoni, and Alessandra Lumini. "Toward a General-Purpose Heterogeneous Ensemble for Pattern Classification." Computational Intelligence and Neuroscience 2015 (2015): 1–10. http://dx.doi.org/10.1155/2015/909123.

Full text
Abstract:
We perform an extensive study of the performance of different classification approaches on twenty-five datasets (fourteen image datasets and eleven UCI data mining datasets). The aim is to find General-Purpose (GP) heterogeneous ensembles (requiring little to no parameter tuning) that perform competitively across multiple datasets. The state-of-the-art classifiers examined in this study include the support vector machine, Gaussian process classifiers, random subspace of adaboost, random subspace of rotation boosting, and deep learning classifiers. We demonstrate that a heterogeneous ensemble based on the simple fusion by sum rule of different classifiers performs consistently well across all twenty-five datasets. The most important result of our investigation is demonstrating that some very recent approaches, including the heterogeneous ensemble we propose in this paper, are capable of outperforming an SVM classifier (implemented with LibSVM), even when both kernel selection and SVM parameters are carefully tuned for each dataset.
APA, Harvard, Vancouver, ISO, and other styles
39

Naseem, Rashid, Bilal Khan, Muhammad Arif Shah, Karzan Wakil, Atif Khan, Wael Alosaimi, M. Irfan Uddin, and Badar Alouffi. "Performance Assessment of Classification Algorithms on Early Detection of Liver Syndrome." Journal of Healthcare Engineering 2020 (December 12, 2020): 1–13. http://dx.doi.org/10.1155/2020/6680002.

Full text
Abstract:
In the recent era, a liver syndrome that causes any damage in life capacity is exceptionally normal everywhere throughout the world. It has been found that liver disease is exposed more in young people as a comparison with other aged people. At the point when liver capacity ends up, life endures just up to 1 or 2 days scarcely, and it is very hard to predict such illness in the early stage. Researchers are trying to project a model for early prediction of liver disease utilizing various machine learning approaches. However, this study compares ten classifiers including A1DE, NB, MLP, SVM, KNN, CHIRP, CDT, Forest-PA, J48, and RF to find the optimal solution for early and accurate prediction of liver disease. The datasets utilized in this study are taken from the UCI ML repository and the GitHub repository. The outcomes are assessed via RMSE, RRSE, recall, specificity, precision, G-measure, F-measure, MCC, and accuracy. The exploratory outcomes show a better consequence of RF utilizing the UCI dataset. Assessing RF using RMSE and RRSE, the outcomes are 0.4328 and 87.6766, while the accuracy of RF is 72.1739% that is also better than other employed classifiers. However, utilizing the GitHub dataset, SVM beats other employed techniques in terms of increasing accuracy up to 71.3551%. Moreover, the comprehensive outcomes of this exploration can be utilized as a reference point for further research studies that slight assertion concerning the enhancement in extrapolation through any new technique, model, or framework can be benchmarked and confirmed.
APA, Harvard, Vancouver, ISO, and other styles
40

Li, Weinan, Weiguo Zhang, Jingping Shi, and Yunyan Wu. "A Battlefield Target Grouping Method Based on M-CFSFDP Algorithm." Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 36, no. 6 (December 2018): 1121–28. http://dx.doi.org/10.1051/jnwpu/20183661121.

Full text
Abstract:
Target grouping can divide battlefield targets into battle space groups. In this way, the target grouping reduces the difficulty of situation assessment and increases the efficiency of decision. In order to solve the target grouping, a target grouping method based on Manifold-CFSFDP algorithm is proposed. This method turns target grouping into dataset clustering. After calculating the manifold which measures the similarity of targets, it searches the clustering centers and classifies the other data points by CFSFDP based on manifold. The simulation experiment for artificial and UCI datasets proves that M-CFSFDP is more effective than CFSFDP. The correctness and feasibility of M-CFSFDP are also shown by static and dynamic grouping of battlefield targets.
APA, Harvard, Vancouver, ISO, and other styles
41

Liao, Jian, Shao Lei Zhou, and Xian Jun Shi. "Parameter Optimization of SVM Based on Maximum Variance – Entropy Criterion." Applied Mechanics and Materials 373-375 (August 2013): 1053–59. http://dx.doi.org/10.4028/www.scientific.net/amm.373-375.1053.

Full text
Abstract:
Kernel parameter selection of support vector machine (SVM) is difficult in practical application. A parameter selection algorithm of SVM was proposed based on data maximum variance - entropy criterion by analyzing the principle of SVM classifier. The algorithm uses data maximum variance - entropy criterion to measure the linear separability of dataset in the feature space, and combines with particle swarm optimization (PSO) algorithm for parameter optimization. The experiment results on datasets from UCI show that the algorithm is excellence in accuracy and improves the training performance of SVM. To further verify the effectiveness of the algorithm, applying the method in fault diagnosis of biquadratic filter circuit, results prove it improves the diagnostic accuracy.
APA, Harvard, Vancouver, ISO, and other styles
42

Packianather, Michael S., Ammar K. Al-Musawi, and Fatih Anayi. "Bee for mining (B4M) – A novel rule discovery method using the Bees algorithm with quality-weight and coverage-weight." Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 233, no. 14 (March 7, 2019): 5101–12. http://dx.doi.org/10.1177/0954406219833719.

Full text
Abstract:
This paper proposes a novel tool known as Bee for Mining (B4M) for classification tasks, which enables the Bees Algorithm (BA) to discover rules automatically. In the proposed B4M, two parameters namely quality-weight and coverage-weight have been added to the BA to avoid any ambiguous situations during the prediction phase. The contributions of the proposed B4M algorithm are two-fold: the first novel contribution is in the field of swarm intelligence, using a new version of BA for automatic rule discovery, and the second novel contribution is the formulation of a weight metric based on quailty and coverage of the rules discovered from the dataset to carry out Meta-Pruning and making it suitable for any classification problem in the real world. The proposed algorithm was implemented and tested using five different datasets from University of California, at Irvine (UCI Machine Learning Repository) and was compared with other well-known classification algorithms. The results obtained using the proposed B4M show that it was capable of achieving better classification accuracy and at the same time reduce the number of rules in four out of five UCI datasets. Furthermore, the results show that it was not only effective and more robust, but also more efficient, making it at least as good as other methods such as C5.0, C4.5, Jrip and other evolutionary algorithms, and in some cases even better.
APA, Harvard, Vancouver, ISO, and other styles
43

Bharti, Rohit, Aditya Khamparia, Mohammad Shabaz, Gaurav Dhiman, Sagar Pande, and Parneet Singh. "Prediction of Heart Disease Using a Combination of Machine Learning and Deep Learning." Computational Intelligence and Neuroscience 2021 (July 1, 2021): 1–11. http://dx.doi.org/10.1155/2021/8387680.

Full text
Abstract:
The correct prediction of heart disease can prevent life threats, and incorrect prediction can prove to be fatal at the same time. In this paper different machine learning algorithms and deep learning are applied to compare the results and analysis of the UCI Machine Learning Heart Disease dataset. The dataset consists of 14 main attributes used for performing the analysis. Various promising results are achieved and are validated using accuracy and confusion matrix. The dataset consists of some irrelevant features which are handled using Isolation Forest, and data are also normalized for getting better results. And how this study can be combined with some multimedia technology like mobile devices is also discussed. Using deep learning approach, 94.2% accuracy was obtained.
APA, Harvard, Vancouver, ISO, and other styles
44

Murugesan, S., R. S. Bhuvaneswaran, H. Khanna Nehemiah, S. Keerthana Sankari, and Y. Nancy Jane. "Feature Selection and Classification of Clinical Datasets Using Bioinspired Algorithms and Super Learner." Computational and Mathematical Methods in Medicine 2021 (May 17, 2021): 1–18. http://dx.doi.org/10.1155/2021/6662420.

Full text
Abstract:
A computer-aided diagnosis (CAD) system that employs a super learner to diagnose the presence or absence of a disease has been developed. Each clinical dataset is preprocessed and split into training set (60%) and testing set (40%). A wrapper approach that uses three bioinspired algorithms, namely, cat swarm optimization (CSO), krill herd (KH) ,and bacterial foraging optimization (BFO) with the classification accuracy of support vector machine (SVM) as the fitness function has been used for feature selection. The selected features of each bioinspired algorithm are stored in three separate databases. The features selected by each bioinspired algorithm are used to train three back propagation neural networks (BPNN) independently using the conjugate gradient algorithm (CGA). Classifier testing is performed by using the testing set on each trained classifier, and the diagnostic results obtained are used to evaluate the performance of each classifier. The classification results obtained for each instance of the testing set of the three classifiers and the class label associated with each instance of the testing set will be the candidate instances for training and testing the super learner. The training set comprises of 80% of the instances, and the testing set comprises of 20% of the instances. Experimentation has been carried out using seven clinical datasets from the University of California Irvine (UCI) machine learning repository. The super learner has achieved a classification accuracy of 96.83% for Wisconsin diagnostic breast cancer dataset (WDBC), 86.36% for Statlog heart disease dataset (SHD), 94.74% for hepatocellular carcinoma dataset (HCC), 90.48% for hepatitis dataset (HD), 81.82% for vertebral column dataset (VCD), 84% for Cleveland heart disease dataset (CHD), and 70% for Indian liver patient dataset (ILP).
APA, Harvard, Vancouver, ISO, and other styles
45

Mqadi, Nhlakanipho Michael, Nalindren Naicker, and Timothy Adeliyi. "Solving Misclassification of the Credit Card Imbalance Problem Using Near Miss." Mathematical Problems in Engineering 2021 (July 19, 2021): 1–16. http://dx.doi.org/10.1155/2021/7194728.

Full text
Abstract:
In ordinary credit card datasets, there are far fewer fraudulent transactions than ordinary transactions. In dealing with the credit card imbalance problem, the ideal solution must have low bias and low variance. The paper aims to provide an in-depth experimental investigation of the effect of using a hybrid data-point approach to resolve the class misclassification problem in imbalanced credit card datasets. The goal of the research was to use a novel technique to manage unbalanced datasets to improve the effectiveness of machine learning algorithms in detecting fraud or anomalous patterns in huge volumes of financial transaction records where the class distribution was imbalanced. The paper proposed using random forest and a hybrid data-point approach combining feature selection with Near Miss-based undersampling technique. We assessed the proposed method on two imbalanced credit card datasets, namely, the European Credit Card dataset and the UCI Credit Card dataset. The experimental results were reported using performance matrices. We compared the classification results of logistic regression, support vector machine, decision tree, and random forest before and after using our approach. The findings showed that the proposed approach improved the predictive accuracy of the logistic regression, support vector machine, decision tree, and random forest algorithms in credit card datasets. Furthermore, we found that, out of the four algorithms, the random forest produced the best results.
APA, Harvard, Vancouver, ISO, and other styles
46

Zhang, Xiangrong, Licheng Jiao, Anand Paul, Yongfu Yuan, Zhengli Wei, and Qiang Song. "Semisupervised Particle Swarm Optimization for Classification." Mathematical Problems in Engineering 2014 (2014): 1–11. http://dx.doi.org/10.1155/2014/832135.

Full text
Abstract:
A semisupervised classification method based on particle swarm optimization (PSO) is proposed. The semisupervised PSO simultaneously uses limited labeled samples and large amounts of unlabeled samples to find a collection of prototypes (or centroids) that are considered to precisely represent the patterns of the whole data, and then, in principle of the “nearest neighborhood,” the unlabeled data can be classified with the obtained prototypes. In order to validate the performance of the proposed method, we compare the classification accuracy of PSO classifier, k-nearest neighbor algorithm, and support vector machine on six UCI datasets, four typical artificial datasets, and the USPS handwritten dataset. Experimental results demonstrate that the proposed method has good performance even with very limited labeled samples due to the usage of both discriminant information provided by labeled samples and the structure information provided by unlabeled samples.
APA, Harvard, Vancouver, ISO, and other styles
47

Febriantono, M. Aldiki, Ridho Herasmara, and Gusti Pangestu. "Cost Sensitive Tree dan Naïve Bayes pada Klasifikasi Multiclass." Jurnal Informatika Polinema 7, no. 2 (February 23, 2021): 57–64. http://dx.doi.org/10.33795/jip.v7i2.533.

Full text
Abstract:
Data mining merupakan proses pengolahan data untuk mengambil keputusan secara cepat, tepat dan akurat. Data mining pada bidang kesehatan dan manufacturing menjadi hal yang sangat penting dikarenakan suatu kesalahan klasifikasi (misclassification) akan memiliki dampak serius. Masalah utama pada data mining ketika data yang digunakan bersifat imbalanced multiclass karena classifier kesulitan untuk mengklasifikasikan data sehingga menyebabkan terjadinya misclassification. Solusi untuk meminimalkan missclasification dengan menggunakan metode cost sensitive pada classifier decision tree C5.0 dan naïve bayes. Penelitian ini menggunakan dataset glass, lympografi, vehicle, thyroid dan wine yang diperoleh dari UCI Respository. Kelima dataset dilakukan proses seleksi atribut menggunakan particle swarm optimazation. Kemudian dataset diuji menggunakan metode cost sensitive decision tree C5.0 dan cost sensitive naïve bayes. Hasil pengujian menggunakan metode cost sensitive decision tree C5.0 diperoleh nilai accuracy pada dataset glass, lympografi, vehicle, thyroid dan wine berturut-turut sebesar 76.17%, 83.33%, 75.27%, 95.81% dan 95.83%. Sedangkan metode cost sensitive naïve bayes memiliki performa accuracy pada dataset berturut-turut sebesar 32.24%, 82.61%, 25.53%, 97.67% dan 94.94%.
APA, Harvard, Vancouver, ISO, and other styles
48

Geldiev, Ertan Mustafa, Nayden Valkov Nenkov, and Mariana Mateeva Petrova. "EXERCISE OF MACHINE LEARNING USING SOME PYTHON TOOLS AND TECHNIQUES." CBU International Conference Proceedings 6 (September 25, 2018): 1062–70. http://dx.doi.org/10.12955/cbup.v6.1295.

Full text
Abstract:
One of the goals of predictive analytics training using Python tools is to create a "Model" from classified examples that classifies new examples from a Dataset. The purpose of different strategies and experiments is to create a more accurate prediction model. The goals we set out in the study are to achieve successive steps to find an accurate model for a dataset and preserving it for its subsequent use using the python instruments. Once we have found the right model, we save it and load it later, to classify if we have "phishing" in our case. In the case that the path we reach to the discovery of the search model, we can ask ourselves how much we can automate everything and whether a computer program can be written to automatically go through the unified steps and to find the right model? Due to the fact that the steps for finding the exact model are often unified and repetitive for different types of data, we have offered a hypothetical algorithm that could write a complex computer program searching for a model, for example when we have a classification task. This algorithm is rather directional and does not claim to be all-encompassing. The research explores some features of Python Scientific Python Packages like Numpy, Pandas, Matplotlib, Scipy and scycit-learn to create a more accurate model. The Dataset used for the research was downloaded free from the UCI Machine Learning Repository (UCI Machine Learning Repository, 2017).
APA, Harvard, Vancouver, ISO, and other styles
49

Dinata, Rozzi Kesuma, Haried Novriando, Novia Hasdyna, and Sujacka Retno. "Reduksi Atribut Menggunakan Information Gain untuk Optimasi Cluster Algoritma K-Means." Jurnal Edukasi dan Penelitian Informatika (JEPIN) 6, no. 1 (April 27, 2020): 48. http://dx.doi.org/10.26418/jp.v6i1.37606.

Full text
Abstract:
Proses clustering dengan algoritma K-Means pada dataset yang memiliki banyak atribut akan mempengaruhi besarnya jumlah iterasi. Pada penelitian ini, metode Information Gain digunakan untuk mereduksi atribut dataset. Dataset yang telah direduksi atribut akan dilanjutkan proses clustering dengan K-Means. Dataset yang dianalisis pada penelitian ini adalah data Hepatitis C Virus yang diperoleh dari UCI Machine Learning Repository, dengan 29 atribut dan 1385 jumlah data. Hasil penelitian ini menunjukkan bahwa rata-rata jumlah iterasi yang diperoleh dari 10 kali pengujian dengan menggunakan K-Means konvensional diperoleh rata-rata sebesar 32 iterasi, sedangkan K-Means dengan reduksi atribut diperoleh rata-rata sebesar 27.7 iterasi. Nilai validitas cluster dihitung menggunakan Davies-Bouldin Index (DBI). Nilai DBI pada K-Means konvensional sebesar 2.1972, sedangkan DBI pada K-Means yang telah direduksi 1 atribut sampai 5 atribut diperoleh nilai rata-rata DBI masing-masing sebesar 2.0290, 1.8771, 1.8641, 1.8389, dan 1.8117.
APA, Harvard, Vancouver, ISO, and other styles
50

Faraj, Azhi Abdalmohammed, Didam Ahmed Mahmud, and Bilal Najmaddin Rashid. "Comparison of Different Ensemble Methods in Credit Card Default Prediction." UHD Journal of Science and Technology 5, no. 2 (July 19, 2021): 20–25. http://dx.doi.org/10.21928/uhdjst.v5n2y2021.pp20-25.

Full text
Abstract:
Credit card defaults pause a business-critical threat in banking systems thus prompt detection of defaulters is a crucial and challenging research problem. Machine learning algorithms must deal with a heavily skewed dataset since the ratio of defaulters to non-defaulters is very small. The purpose of this research is to apply different ensemble methods and compare their performance in detecting the probability of defaults customer’s credit card default payments in Taiwan from the UCI Machine learning repository. This is done on both the original skewed dataset and then on balanced dataset several studies have showed the superiority of neural networks as compared to traditional machine learning algorithms, the results of our study show that ensemble methods consistently outperform Neural Networks and other machine learning algorithms in terms of F1 score and area under receiver operating characteristic curve regardless of balancing the dataset or ignoring the imbalance
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography