Academic literature on the topic 'UCI dataset'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'UCI dataset.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "UCI dataset"

1

Mitra, Malay, and R. K. Samanta. "A Study on UCI Hepatitis Disease Dataset Using Soft Computing." Modelling, Measurement and Control C 78, no. 4 (December 30, 2017): 467–77. http://dx.doi.org/10.18280/mmc_c.780405.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kumar, Ajay, and Indranath Chatterjee. "Data Mining: An experimental approach with WEKA on UCI Dataset." International Journal of Computer Applications 138, no. 13 (March 17, 2016): 23–28. http://dx.doi.org/10.5120/ijca2016909050.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Naz, Mehreen, Kashif Zafar, and Ayesha Khan. "Ensemble Based Classification of Sentiments Using Forest Optimization Algorithm." Data 4, no. 2 (May 23, 2019): 76. http://dx.doi.org/10.3390/data4020076.

Full text
Abstract:
Feature subset selection is a process to choose a set of relevant features from a high dimensionality dataset to improve the performance of classifiers. The meaningful words extracted from data forms a set of features for sentiment analysis. Many evolutionary algorithms, like the Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), have been applied to feature subset selection problem and computational performance can still be improved. This research presents a solution to feature subset selection problem for classification of sentiments using ensemble-based classifiers. It consists of a hybrid technique of minimum redundancy and maximum relevance (mRMR) and Forest Optimization Algorithm (FOA)-based feature selection. Ensemble-based classification is implemented to optimize the results of individual classifiers. The Forest Optimization Algorithm as a feature selection technique has been applied to various classification datasets from the UCI machine learning repository. The classifiers used for ensemble methods for UCI repository datasets are the k-Nearest Neighbor (k-NN) and Naïve Bayes (NB). For the classification of sentiments, 15–20% improvement has been recorded. The dataset used for classification of sentiments is Blitzer’s dataset consisting of reviews of electronic products. The results are further improved by ensemble of k-NN, NB, and Support Vector Machine (SVM) with an accuracy of 95% for the classification of sentiment tasks.
APA, Harvard, Vancouver, ISO, and other styles
4

Naz, Aqdas, Muhammad Javed, Nadeem Javaid, Tanzila Saba, Musaed Alhussein, and Khursheed Aurangzeb. "Short-Term Electric Load and Price Forecasting Using Enhanced Extreme Learning Machine Optimization in Smart Grids." Energies 12, no. 5 (March 5, 2019): 866. http://dx.doi.org/10.3390/en12050866.

Full text
Abstract:
A Smart Grid (SG) is a modernized grid to provide efficient, reliable and economic energy to the consumers. Energy is the most important resource in the world. An efficient energy distribution is required as smart devices are increasing dramatically. The forecasting of electricity consumption is supposed to be a major constituent to enhance the performance of SG. Various learning algorithms have been proposed to solve the forecasting problem. The sole purpose of this work is to predict the price and load efficiently. The first technique is Enhanced Logistic Regression (ELR) and the second technique is Enhanced Recurrent Extreme Learning Machine (ERELM). ELR is an enhanced form of Logistic Regression (LR), whereas, ERELM optimizes weights and biases using a Grey Wolf Optimizer (GWO). Classification and Regression Tree (CART), Relief-F and Recursive Feature Elimination (RFE) are used for feature selection and extraction. On the basis of selected features, classification is performed using ELR. Cross validation is done for ERELM using Monte Carlo and K-Fold methods. The simulations are performed on two different datasets. The first dataset, i.e., UMass Electric Dataset is multi-variate while the second dataset, i.e., UCI Dataset is uni-variate. The first proposed model performed better with UMass Electric Dataset than UCI Dataset and the accuracy of second model is better with UCI than UMass. The prediction accuracy is analyzed on the basis of four different performance metrics: Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), Mean Square Error (MSE) and Root Mean Square Error (RMSE). The proposed techniques are then compared with four benchmark schemes. The comparison is done to verify the adaptivity of the proposed techniques. The simulation results show that the proposed techniques outperformed benchmark schemes. The proposed techniques efficiently increased the prediction accuracy of load and price. However, the computational time is increased in both scenarios. ELR achieved almost 5% better results than Convolutional Neural Network (CNN) and almost 3% than LR. While, ERELM achieved almost 6% better results than ELM and almost 5% than RELM. However, the computational time is almost 20% increased with ELR and 50% with ERELM. Scalability is also addressed for the proposed techniques using half-yearly and yearly datasets. Simulation results show that ELR gives 5% better results while, ERELM gives 6% better results when used for yearly dataset.
APA, Harvard, Vancouver, ISO, and other styles
5

Dash, Ch Sanjeev Kumar, Ajit Kumar Behera, Sarat Chandra Nayak, Satchidananda Dehuri, and Sung-Bae Cho. "An Integrated CRO and FLANN Based Classifier for a Non-Imputed and Inconsistent Dataset." International Journal on Artificial Intelligence Tools 28, no. 03 (May 2019): 1950013. http://dx.doi.org/10.1142/s0218213019500131.

Full text
Abstract:
This paper presents an integrated approach by considering chemical reaction optimization (CRO) and functional link artificial neural networks (FLANNs) for building a classifier from the dataset with missing value, inconsistent records, and noisy instances. Here, imputation is carried out based on the known value of two nearest neighbors to address dataset plagued with missing values. The probabilistic approach is used to remove the inconsistency from either of the datasets like original or imputed. The resulting dataset is then given as an input to boosted instance selection approach for selection of relevant instances to reduce the size of the dataset without loss of generality and compromising classification accuracy. Finally, the transformed dataset (i.e., from non-imputed and inconsistent dataset to imputed and consistent dataset) is used for developing a classifier based on CRO trained FLANN. The method is evaluated extensively through a few bench-mark datasets obtained from University of California, Irvine (UCI) repository. The experimental results confirm that our preprocessing tasks along with integrated approach can be a promising alternative tool for mitigating missing value, inconsistent records, and noisy instances.
APA, Harvard, Vancouver, ISO, and other styles
6

Jomaa, Hadi S., Lars Schmidt-Thieme, and Josif Grabocka. "Dataset2Vec: learning dataset meta-features." Data Mining and Knowledge Discovery 35, no. 3 (February 25, 2021): 964–85. http://dx.doi.org/10.1007/s10618-021-00737-9.

Full text
Abstract:
AbstractMeta-learning, or learning to learn, is a machine learning approach that utilizes prior learning experiences to expedite the learning process on unseen tasks. As a data-driven approach, meta-learning requires meta-features that represent the primary learning tasks or datasets, and are estimated traditonally as engineered dataset statistics that require expert domain knowledge tailored for every meta-task. In this paper, first, we propose a meta-feature extractor called Dataset2Vec that combines the versatility of engineered dataset meta-features with the expressivity of meta-features learned by deep neural networks. Primary learning tasks or datasets are represented as hierarchical sets, i.e., as a set of sets, esp. as a set of predictor/target pairs, and then a DeepSet architecture is employed to regress meta-features on them. Second, we propose a novel auxiliary meta-learning task with abundant data called dataset similarity learning that aims to predict if two batches stem from the same dataset or different ones. In an experiment on a large-scale hyperparameter optimization task for 120 UCI datasets with varying schemas as a meta-learning task, we show that the meta-features of Dataset2Vec outperform the expert engineered meta-features and thus demonstrate the usefulness of learned meta-features for datasets with varying schemas for the first time.
APA, Harvard, Vancouver, ISO, and other styles
7

Al-Sarem, Mohammed, Faisal Saeed, Zeyad Ghaleb Al-Mekhlafi, Badiea Abdulkarem Mohammed, Tawfik Al-Hadhrami, Mohammad T. Alshammari, Abdulrahman Alreshidi, and Talal Sarheed Alshammari. "An Optimized Stacking Ensemble Model for Phishing Websites Detection." Electronics 10, no. 11 (May 28, 2021): 1285. http://dx.doi.org/10.3390/electronics10111285.

Full text
Abstract:
Security attacks on legitimate websites to steal users’ information, known as phishing attacks, have been increasing. This kind of attack does not just affect individuals’ or organisations’ websites. Although several detection methods for phishing websites have been proposed using machine learning, deep learning, and other approaches, their detection accuracy still needs to be enhanced. This paper proposes an optimized stacking ensemble method for phishing website detection. The optimisation was carried out using a genetic algorithm (GA) to tune the parameters of several ensemble machine learning methods, including random forests, AdaBoost, XGBoost, Bagging, GradientBoost, and LightGBM. The optimized classifiers were then ranked, and the best three models were chosen as base classifiers of a stacking ensemble method. The experiments were conducted on three phishing website datasets that consisted of both phishing websites and legitimate websites—the Phishing Websites Data Set from UCI (Dataset 1); Phishing Dataset for Machine Learning from Mendeley (Dataset 2, and Datasets for Phishing Websites Detection from Mendeley (Dataset 3). The experimental results showed an improvement using the optimized stacking ensemble method, where the detection accuracy reached 97.16%, 98.58%, and 97.39% for Dataset 1, Dataset 2, and Dataset 3, respectively.
APA, Harvard, Vancouver, ISO, and other styles
8

Setiawati, Intan, Adityo Permana, and Arief Hermawan. "IMPLEMENTASI DECISION TREE UNTUK MENDIAGNOSIS PENYAKIT LIVER." Journal of Information System Management (JOISM) 1, no. 1 (July 31, 2019): 13–17. http://dx.doi.org/10.24076/joism.2019v1i1.17.

Full text
Abstract:
Hati merupakan salah satu organ manusia yang paling penting. UCI Machine Learning Repository mempunyai banyak dataset, salah satunya adalah dataset ILPD (Indian Liver Patient Dataset). Penelitian ini membahas tentang klasifikasi penyakit liver pada dataset ILPD menggunakan Algoritma Decision Tree C4.5. Berdasarkan hasil pengolahan yang dilakukan, didapatkan bahwa Algoritma Decision Tree C4.5 menghaasilkan nilai akurasi sebesar 72.67% dan juga membuktikan bahwa dari 11 variabel penyakit liver yang ada pada dataset ILPD, hanya 2 variabel (Almine Alminotransferase) yang menjadi pokok dalam penentuan penyakit liver.
APA, Harvard, Vancouver, ISO, and other styles
9

Mabuni, D., and S. Aquter Babu. "High Accurate and a Variant of k-fold Cross Validation Technique for Predicting the Decision Tree Classifier Accuracy." International Journal of Innovative Technology and Exploring Engineering 10, no. 2 (January 10, 2021): 105–10. http://dx.doi.org/10.35940/ijitee.c8403.0110321.

Full text
Abstract:
In machine learning data usage is the most important criterion than the logic of the program. With very big and moderate sized datasets it is possible to obtain robust and high classification accuracies but not with small and very small sized datasets. In particular only large training datasets are potential datasets for producing robust decision tree classification results. The classification results obtained by using only one training and one testing dataset pair are not reliable. Cross validation technique uses many random folds of the same dataset for training and validation. In order to obtain reliable and statistically correct classification results there is a need to apply the same algorithm on different pairs of training and validation datasets. To overcome the problem of the usage of only a single training dataset and a single testing dataset the existing k-fold cross validation technique uses cross validation plan for obtaining increased decision tree classification accuracy results. In this paper a new cross validation technique called prime fold is proposed and it is experimentally tested thoroughly and then verified correctly using many bench mark UCI machine learning datasets. It is observed that the prime fold based decision tree classification accuracy results obtained after experimentation are far better than the existing techniques of finding decision tree classification accuracies.
APA, Harvard, Vancouver, ISO, and other styles
10

Homjandee, Suvaporn, and Krung Sinapiromsaran. "A Random Forest with Minority Condensation and Decision Trees for Class Imbalanced Problems." WSEAS TRANSACTIONS ON SYSTEMS AND CONTROL 16 (September 16, 2021): 502–7. http://dx.doi.org/10.37394/23203.2021.16.46.

Full text
Abstract:
Building an effective classifier that could classify a target or class of instances in a dataset from historical data has played an important role in machine learning for a decade. The standard classification algorithm has difficulty generating an appropriate classifier when faced with an imbalanced dataset. In 2019, the efficient splitting measure, minority condensation entropy (MCE) [1] is proposed that could build a decision tree to classify minority instances. The aim of this research is to extend the concept of a random forest to use both decision trees and minority condensation trees. The algorithm will build a minority condensation tree from a bootstrapped dataset maintaining all minorities while it will build a decision tree from a bootstrapped dataset of a balanced dataset. The experimental results on synthetic datasets apparent the results that confirm this proposed algorithm compared with the standard random forest are suitable for dealing with the binary-class imbalanced problem. Furthermore, the experiment on real-world datasets from the UCI repository shows that this proposed algorithm constructs a random forest that outperforms other existing random forest algorithms based on the recall, the precision, the F-measure, and the Geometric mean
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "UCI dataset"

1

Duncan, Andrew Paul. "The analysis and application of artificial neural networks for early warning systems in hydrology and the environment." Thesis, University of Exeter, 2014. http://hdl.handle.net/10871/17569.

Full text
Abstract:
Artificial Neural Networks (ANNs) have been comprehensively researched, both from a computer scientific perspective and with regard to their use for predictive modelling in a wide variety of applications including hydrology and the environment. Yet their adoption for live, real-time systems remains on the whole sporadic and experimental. A plausible hypothesis is that this may be at least in part due to their treatment heretofore as “black boxes” that implicitly contain something that is unknown, or even unknowable. It is understandable that many of those responsible for delivering Early Warning Systems (EWS) might not wish to take the risk of implementing solutions perceived as containing unknown elements, despite the computational advantages that ANNs offer. This thesis therefore builds on existing efforts to open the box and develop tools and techniques that visualise, analyse and use ANN weights and biases especially from the viewpoint of neural pathways from inputs to outputs of feedforward networks. In so doing, it aims to demonstrate novel approaches to self-improving predictive model construction for both regression and classification problems. This includes Neural Pathway Strength Feature Selection (NPSFS), which uses ensembles of ANNs trained on differing subsets of data and analysis of the learnt weights to infer degrees of relevance of the input features and so build simplified models with reduced input feature sets. Case studies are carried out for prediction of flooding at multiple nodes in urban drainage networks located in three urban catchments in the UK, which demonstrate rapid, accurate prediction of flooding both for regression and classification. Predictive skill is shown to reduce beyond the time of concentration of each sewer node, when actual rainfall is used as input to the models. Further case studies model and predict statutory bacteria count exceedances for bathing water quality compliance at 5 beaches in Southwest England. An illustrative case study using a forest fires dataset from the UCI machine learning repository is also included. Results from these model ensembles generally exhibit improved performance, when compared with single ANN models. Also ensembles with reduced input feature sets, using NPSFS, demonstrate as good or improved performance when compared with the full feature set models. Conclusions are drawn about a new set of tools and techniques, including NPSFS and visualisation techniques for inspection of ANN weights, the adoption of which it is hoped may lead to improved confidence in the use of ANN for live real-time EWS applications.
APA, Harvard, Vancouver, ISO, and other styles
2

Bankler, Hampus. "Effektivisering av UI-utveckling i datorspel." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-66305.

Full text
Abstract:
This thesis presents my work at the company Fatshark AB in Stockholm. My supervisor Rikard Blomberg, who works as a chief technology officer at Fatshark and is one of the owners of the company, explained that they had experienced weaknesses in the methodology used when developing heads up displays (HUD) for their games. In the latest production, Lead  and Gold, the HUD had been redesigned a number of times, becoming an unnecessarily big expense due to the number of work hours invested. In modern software development, an iterative workflow is commonly encouraged. Despite this fact, the work efficiency could likely be increased by setting up guidelines as a help in the process of developing HUDs and reviewing the solutions. There was also a need for a way to estimate a particular redesign’s impact on the game before the actual  implementation had been made, and ways to define the pros and cons of each redesign. My task was to come up with standards for how the different elements in a UI could be structured and reviewed, in order to improve the developing process of the UI and facilitate the communication between employees. These standards were to be designed by a team consisting of Fatshark employees from different work disciplines and myself.
Den här rapporten beskriver mitt exjobb på företaget Fatshark AB i Stockholm. Min handledare Rikard Blomberg, som är chefsprogrammerare och delägare i företaget, förklarade att det fanns brister i deras metodik för utveckling av HUD i deras produktioner. I den senaste titeln, Lead and Gold, hade HUD:en designats om väldigt många gånger, och därför, räknat i arbetstimmar, kostat orimliga summor pengar. Samtidigt som man i modern mjukvaruutveckling ofta främjar en iterativ arbetsmetodik, skulle processen kunna effektiviseras om det sattes upp ramar och planer för hur man ska gå tillväga för att utveckla och mäta kvaliteten på resultatet. Vidare var det önskvärt att i någon utsträckning kunna förutspå vad som kan bli bra innan det läggs in, och att kunna definiera på vilket eller vilka sätt någonting är ”bra”. Mitt uppdrag handlade om att tillsammans med några på företaget ta fram standarder för hur olika delar av spelets HUD (senare omformulerat till att inbegripa UI i allmänhet) ska struktureras och hur man ska kunna använda detta för att effektivisera utvecklingen för att nå snabbare och bättre resultat.
APA, Harvard, Vancouver, ISO, and other styles
3

Rioux, Jonathan. "Un modèle rétroactif de réconciliation utilité-confidentialité sur les données d’assurance." Thèse, 2016. http://hdl.handle.net/1866/16180.

Full text
Abstract:
Le partage des données de façon confidentielle préoccupe un bon nombre d’acteurs, peu importe le domaine. La recherche évolue rapidement, mais le manque de solutions adaptées à la réalité d’une entreprise freine l’adoption de bonnes pratiques d’affaires quant à la protection des renseignements sensibles. Nous proposons dans ce mémoire une solution modulaire, évolutive et complète nommée PEPS, paramétrée pour une utilisation dans le domaine de l’assurance. Nous évaluons le cycle entier d’un partage confidentiel, de la gestion des données à la divulgation, en passant par la gestion des forces externes et l’anonymisation. PEPS se démarque du fait qu’il utilise la contextualisation du problème rencontré et l’information propre au domaine afin de s’ajuster et de maximiser l’utilisation de l’ensemble anonymisé. À cette fin, nous présentons un algorithme d’anonymat fortement contextualisé ainsi que des mesures de performances ajustées aux analyses d’expérience.
Privacy-preserving data sharing is a challenge for almost any enterprise nowadays, no matter their field of expertise. Research is evolving at a rapid pace, but there is still a lack of adapted and adaptable solutions for best business practices regarding the management and sharing of privacy-aware datasets. To this problem, we offer PEPS, a modular, upgradeable and end-to-end system tailored for the need of insurance companies and researchers. We take into account the entire cycle of sharing data: from data management to publication, while negotiating with external forces and policies. Our system distinguishes itself by taking advantage of the domain-specific and problem-specific knowledge to tailor itself to the situation and increase the utility of the resulting dataset. To this end, we also present a strongly contextualised privacy algorithm and adapted utility measures to evaluate the performance of a successful disclosure of experience analysis.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "UCI dataset"

1

Dasgupta, Soumik Ranjan, and Srinibas Rana. "Face Recognition Using Transfer Learning on UFI Dataset." In Learning and Analytics in Intelligent Systems, 989–96. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-42363-6_114.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Krüger, Eduardo L., Ivan Julio Apolonio Callejas, Luísa Alcantara Rosa, Eduardo Grala da Cunha, Linccon Carvalho, Solange Leder, Thiago Vieira, Simone Hirashima, and Patricia Drach. "Regional Adaptation of the UTCI: Comparisons Between Different Datasets in Brazil." In Applications of the Universal Thermal Climate Index UTCI in Biometeorology, 113–35. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-76716-7_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Choubey, Dilip Kumar, Sanchita Paul, Kanchan Bala, Manish Kumar, and Uday Pratap Singh. "Implementation of a Hybrid Classification Method for Diabetes." In Intelligent Innovations in Multimedia Data Engineering and Management, 201–40. IGI Global, 2019. http://dx.doi.org/10.4018/978-1-5225-7107-0.ch009.

Full text
Abstract:
This chapter presents a best classification of diabetes. The proposed approach work consists in two stages. In the first stage the Pima Indian diabetes dataset is obtained from the UCI repository of machine learning databases. In the second stage, the authors have performed the classification technique by using fuzzy decision tree on Pima Indian diabetes dataset. Then they applied PSO_SVM as a feature selection technique followed by the classification technique by using fuzzy decision tree on Pima Indian diabetes dataset. In this chapter, the optimization of SVM using PSO reduces the number of attributes, and hence, applying fuzzy decision tree improves the accuracy of detecting diabetes. The hybrid combinatorial method of feature selection and classification needs to be done so that the system applied is used for the classification of diabetes.
APA, Harvard, Vancouver, ISO, and other styles
4

Jain, Gauri, Manisha Sharma, and Basant Agarwal. "Spam Detection on Social Media Using Semantic Convolutional Neural Network." In Deep Learning and Neural Networks, 704–19. IGI Global, 2020. http://dx.doi.org/10.4018/978-1-7998-0414-7.ch039.

Full text
Abstract:
This article describes how spam detection in the social media text is becoming increasing important because of the exponential increase in the spam volume over the network. It is challenging, especially in case of text within the limited number of characters. Effective spam detection requires more number of efficient features to be learned. In the current article, the use of a deep learning technology known as a convolutional neural network (CNN) is proposed for spam detection with an added semantic layer on the top of it. The resultant model is known as a semantic convolutional neural network (SCNN). A semantic layer is composed of training the random word vectors with the help of Word2vec to get the semantically enriched word embedding. WordNet and ConceptNet are used to find the word similar to a given word, in case it is missing in the word2vec. The architecture is evaluated on two corpora: SMS Spam dataset (UCI repository) and Twitter dataset (Tweets scrapped from public live tweets). The authors' approach outperforms the-state-of-the-art results with 98.65% accuracy on SMS spam dataset and 94.40% accuracy on Twitter dataset.
APA, Harvard, Vancouver, ISO, and other styles
5

Buabin, Emmanuel. "Hybrid Neural Genetic Architecture." In Intelligent Techniques in Recommendation Systems, 245–70. IGI Global, 2013. http://dx.doi.org/10.4018/978-1-4666-2542-6.ch013.

Full text
Abstract:
The objective is a neural-based feature selection in intelligent recommender systems. In particular, a hybrid neural genetic architecture is modeled based on human nature, interactions, and behaviour. The main contribution of this chapter is the development of a novel genetic algorithm based on human nature, interactions, and behaviour. The novel genetic algorithm termed “Buabin Algorithm” is fully integrated with a hybrid neural classifier to form a Hybrid Neural Genetic Architecture. The research presents GA in a more attractive manner and opens up the various departments of a GA for active research. Although no scientific experiment is conducted to compare network performance with standard approaches, engaged techniques reveal drastic reductions in genetic operator operations. For illustration purposes, the UCI Molecular Biology (Splice Junction) dataset is used. Overall, “Buabin Algorithm” seeks to integrate human related interactions into genetic algorithms as imitate human genetics in recommender systems design and understand underlying datasets explicitly.
APA, Harvard, Vancouver, ISO, and other styles
6

Leema N., Khanna H. Nehemiah, Elgin Christo V. R., and Kannan A. "Evaluation of Parameter Settings for Training Neural Networks Using Backpropagation Algorithms." In Research Anthology on Artificial Neural Network Applications, 202–26. IGI Global, 2022. http://dx.doi.org/10.4018/978-1-6684-2408-7.ch009.

Full text
Abstract:
Artificial neural networks (ANN) are widely used for classification, and the training algorithm commonly used is the backpropagation (BP) algorithm. The major bottleneck faced in the backpropagation neural network training is in fixing the appropriate values for network parameters. The network parameters are initial weights, biases, activation function, number of hidden layers and the number of neurons per hidden layer, number of training epochs, learning rate, minimum error, and momentum term for the classification task. The objective of this work is to investigate the performance of 12 different BP algorithms with the impact of variations in network parameter values for the neural network training. The algorithms were evaluated with different training and testing samples taken from the three benchmark clinical datasets, namely, Pima Indian Diabetes (PID), Hepatitis, and Wisconsin Breast Cancer (WBC) dataset obtained from the University of California Irvine (UCI) machine learning repository.
APA, Harvard, Vancouver, ISO, and other styles
7

Mukherjee, Soumen, Arpan Deyasi, Arup Kumar Bhattacharjee, Arindam Mondal, and Anirban Mukherjee. "Role of Metaheuristic Optimization in Portfolio Management for the Banking Sector." In Metaheuristic Approaches to Portfolio Optimization, 198–220. IGI Global, 2019. http://dx.doi.org/10.4018/978-1-5225-8103-1.ch009.

Full text
Abstract:
In this chapter, the importance of optimization technique, more specifically metaheuristic optimization in banking portfolio management, is reviewed. Present work deals with interactive bank marketing campaign of a specific Portugal bank, taken from UCI dataset archive. This dataset consists of 45,211 samples with 17 features including one response/output variable. The classification work is carried out with all data using decision tree (DT), support vector machine (SVM), and k-nearest neighbour (k-NN), without any feature optimization. Metaheuristic genetic algorithm (GA) is used as a feature optimizer to find only 5 features out of the 16 features. Finally, the classification work with the optimized feature shows relatively good accuracy in comparison to classification with all feature set. This result shows that with a smaller number of optimized features better classification can be achieved with less computational overhead.
APA, Harvard, Vancouver, ISO, and other styles
8

Çetin, Aydın, and Tuba Gökhan. "Differential Diagnosis of Erythematous Squamous Diseases With Feature Selection and Classification Algorithms." In Nature-Inspired Intelligent Techniques for Solving Biomedical Engineering Problems, 103–29. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-4769-3.ch005.

Full text
Abstract:
In this chapter, the differential diagnosis of erythematous diseases was determined using data mining and machine learning algorithms. In this chapter, data mining and its application to differential diagnosis of erythematous squamous diseases were discussed. A dermatology dataset from UCI Machine Learning Repository was used for the study. The dataset consists of 366 data items with 34 attributes. Initially, feature selection was made, and then classification was performed by using various algorithms. The number of attributes has been reduced from 34 to 19 as a result of the integration of the correlation-based filter methods and various heuristic search methods. The evaluation results show that Naive Bayes has 100% success rate in classification of psoriasis, seborrheic dermatitis, lichen planus, rose disease, chronic dermatitis, and pityriasis rubra pilaris diseases with 19 attributes selected with feature extraction algorithms.
APA, Harvard, Vancouver, ISO, and other styles
9

Dash, Ch Sanjeev Kumar, Ajit Kumar Behera, and Sarat Chandra Nayak. "DE-Based RBFNs for Classification With Special Attention to Noise Removal and Irrelevant Features." In Advances in Computational Intelligence and Robotics, 218–43. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-2857-9.ch011.

Full text
Abstract:
This chapter presents a novel approach for classification of dataset by suitably tuning the parameters of radial basis function networks with an additional cost of feature selection. Inputting optimal and relevant set of features to a radial basis function may greatly enhance the network efficiency (in terms of accuracy) at the same time compact its size. In this chapter, the authors use information gain theory (a kind of filter approach) for reducing the features and differential evolution for tuning center and spread of radial basis functions. Different feature selection methods, handling missing values and removal of inconsistency to improve the classification accuracy of the proposed model are emphasized. The proposed approach is validated with a few benchmarking highly skewed and balanced dataset retrieved from University of California, Irvine (UCI) repository. The experimental study is encouraging to pursue further extensive research in highly skewed data.
APA, Harvard, Vancouver, ISO, and other styles
10

Rathipriya, R., and K. Thangavel. "Hybrid Swarm Intelligence-Based Biclustering Approach for Recommendation of Web Pages." In Emerging Research on Swarm Intelligence and Algorithm Optimization, 161–80. IGI Global, 2015. http://dx.doi.org/10.4018/978-1-4666-6328-2.ch007.

Full text
Abstract:
This chapter focuses on recommender systems based on the coherent user's browsing patterns. Biclustering approach is used to discover the aggregate usage profiles from the preprocessed Web data. A combination of Discrete Artificial Bees Colony Optimization and Simulated Annealing technique is used for optimizing the aggregate usage profiles from the preprocessed clickstream data. Web page recommendation process is structured in to two components performed online and offline with respect to Web server activity. Offline component builds the usage profiles or usage models by analyzing historical data, such as server access log file or Web logs from the server using hybrid biclustering approach. Recommendation process is the online component. Current user's session is used in the online component for capturing the user's interest so as to recommend pages to the user for next navigation. The experiment was conducted on the benchmark clickstream data (i.e. MSNBC dataset and MSWEB dataset from UCI repository). The results signify the improved prediction accuracy of recommendations using biclustering approach.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "UCI dataset"

1

Chang, Sun, Yue Shihong, and Li Qi. "Clustering Characteristics of UCI Dataset." In 2020 39th Chinese Control Conference (CCC). IEEE, 2020. http://dx.doi.org/10.23919/ccc50068.2020.9189507.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Jik Lee, Byoung. "Extracting the Significant Degrees of Attributes in Unlabeled Data using Unsupervised Machine Learning." In 4th International Conference on Computer Science and Information Technology (COMIT 2020). AIRCC Publishing Corporation, 2020. http://dx.doi.org/10.5121/csit.2020.101608.

Full text
Abstract:
We propose a valid approach to find the degree of important attributes in unlabeled dataset to improve the clustering performance. The significant degrees of attributes are extracted through the training of unsupervised simple competitive learning with the raw unlabeled data. These significant degrees are applied to the original dataset and generate the weighted dataset reflected by the degrees of influentialvalues for the set ofattributes. This work is simulated on the UCI Machine Learning repository dataset. The Scikit-learn K-Means clustering with raw data, scaled data, and the weighted data are tested. The result shows that the proposed approach improves the performance.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Nan, Xibin Zhao, Yu Jiang, and Yue Gao. "Iterative Metric Learning for Imbalance Data Classification." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/389.

Full text
Abstract:
In many classification applications, the amount of data from different categories usually vary significantly, such as software defect predication and medical diagnosis. Under such circumstances, it is essential to propose a proper method to solve the imbalance issue among the data. However, most of the existing methods mainly focus on improving the performance of classifiers rather than searching for an appropriate way to find an effective data space for classification. In this paper, we propose a method named Iterative Metric Learning (IML) to explore the correlations among imbalance data and construct an effective data space for classification. Given the imbalance training data, it is important to select a subset of training samples for each testing data. Thus, we aim to find a more stable neighborhood for testing data using the iterative metric learning strategy. To evaluate the effectiveness of the proposed method, we have conducted experiments on two groups of dataset, i.e., the NASA Metrics Data Program (NASA) dataset and UCI Machine Learning Repository (UCI) dataset. Experimental results and comparisons with state-of-the-art methods have exhibited better performance of our proposed method.
APA, Harvard, Vancouver, ISO, and other styles
4

Sen, Anupam. "Data Mining and Principal Component Analysis on Coimbra Breast Cancer Dataset." In Intelligent Computing and Technologies Conference. AIJR Publisher, 2021. http://dx.doi.org/10.21467/proceedings.115.5.

Full text
Abstract:
Machine Learning (ML) techniques play an important role in the medical field. Early diagnosis is required to improve the treatment of carcinoma. During this analysis Breast Cancer Coimbra dataset (BCCD) with ten predictors are analyzed to classify carcinoma. In this paper method for feature selection and Machine learning algorithms are applied to the dataset from the UCI repository. WEKA (“Waikato Environment for Knowledge Analysis”) tool is used for machine learning techniques. In this paper Principal Component Analysis (PCA) is used for feature extraction. Different Machine Learning classification algorithms are applied through WEKA such as Glmnet, Gbm, ada Boosting, Adabag Boosting, C50, Cforest, DcSVM, fnn, Ksvm, Node Harvest compares the accuracy and also compare values such as Kappa statistic, Mean Absolute Error (MAE), Root Mean Square Error (RMSE). Here the 10-fold cross validation method is used for training, testing and validation purposes.
APA, Harvard, Vancouver, ISO, and other styles
5

Shi, Hong, Shaojun Pan, Jian Yang, and Chen Gong. "Positive and Unlabeled Learning via Loss Decomposition and Centroid Estimation." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/373.

Full text
Abstract:
Positive and Unlabeled learning (PU learning) aims to train a binary classifier based on only positive and unlabeled examples, where the unlabeled examples could be either positive or negative. The state-of-the-art algorithms usually cast PU learning as a cost-sensitive learning problem and impose distinct weights to different training examples via a manual or automatic way. However, such weight adjustment or estimation can be inaccurate and thus often lead to unsatisfactory performance. Therefore, this paper regards all unlabeled examples as negative, which means that some of the original positive data are mistakenly labeled as negative. By doing so, we convert PU learning into the risk minimization problem in the presence of false negative label noise, and propose a novel PU learning algorithm termed ?Loss Decomposition and Centroid Estimation? (LDCE). By decomposing the hinge loss function into two parts, we show that only the second part is influenced by label noise, of which the adverse effect can be reduced by estimating the centroid of negative examples. We intensively validate our approach on synthetic dataset, UCI benchmark datasets and real-world datasets, and the experimental results firmly demonstrate the effectiveness of our approach when compared with other state-of-the-art PU learning methodologies.
APA, Harvard, Vancouver, ISO, and other styles
6

Kaur, Simarjeet, Meenakshi Bansal, and Ashok Kumar Bathla. "A Comparitive Study of E-Mail Spam Detection using Various Machine Learning Techniques." In International Conference on Women Researchers in Electronics and Computing. AIJR Publisher, 2021. http://dx.doi.org/10.21467/proceedings.114.56.

Full text
Abstract:
Due to the rise in the use of messaging and mailing services, spam detection tasks are of much greater importance than before. In such a set of communications, efficient classification is a comparatively onerous job. For an addressee or any email that the user does not want to have in his inbox, spam can be defined as redundant or trash email. After pre-processing and feature extraction, various machine learning algorithms were applied to a Spam base dataset from the UCI Machine Learning repository in order to classify incoming emails into two categories: spam and non-spam. The outcomes of various algorithms have been compared. This paper used random forest, naive bayes, support vector machine (SVM), logistic regression, and the k nearest (KNN) machine learning algorithm to successfully classify email spam messages. The main goal of this study is to improve the prediction accuracy of spam email filters.
APA, Harvard, Vancouver, ISO, and other styles
7

Ramroach, Sterling, Jonathan Herbert, and Ajay Joshi. "CUDA-ACCELERATED FEATURE SELECTION." In International Conference on Emerging Trends in Engineering & Technology (IConETech-2020). Faculty of Engineering, The University of the West Indies, St. Augustine, 2020. http://dx.doi.org/10.47412/juqg5057.

Full text
Abstract:
Identifying important features from high dimensional data is usually done using one-dimensional filtering techniques. These techniques discard noisy attributes and those that are constant throughout the data. This is a time-consuming task that has scope for acceleration via high performance computing techniques involving the graphics processing unit (GPU). The proposed algorithm involves acceleration via the Compute Unified Device Architecture (CUDA) framework developed by Nvidia. This framework facilitates the seamless scaling of computation on any CUDA-enabled GPUs. Thus, the Pearson Correlation Coefficient can be applied in parallel on each feature with respect to the response variable. The ranks obtained for each feature can be used to determine the most relevant features to select. Using data from the UCI Machine Learning Repository, our results show an increase in efficiency for multi-dimensional analysis with a more reliable feature importance ranking. When tested on a high-dimensional dataset of 1000 samples and 10,000 features, we achieved a 1,230-time speedup using CUDA. This acceleration grows exponentially, as with any embarrassingly parallel task.
APA, Harvard, Vancouver, ISO, and other styles
8

Gorpincenko, Artjoms, and Michal Mackiewicz. "SVW-UCF Dataset for Video Domain Adaptation." In International Conference on Image Processing and Vision Engineering. SCITEPRESS - Science and Technology Publications, 2021. http://dx.doi.org/10.5220/0010460901070111.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ashok, P., and G. M. Kadhar Nawaz. "Detecting outliers on UCI repository datasets by Adaptive Rough Fuzzy clustering method." In 2016 Online International Conference on Green Engineering and Technologies (IC-GET). IEEE, 2016. http://dx.doi.org/10.1109/get.2016.7916697.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Feng, Lei, and Bo An. "Leveraging Latent Label Distributions for Partial Label Learning." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/291.

Full text
Abstract:
In partial label learning, each training example is assigned a set of candidate labels, only one of which is the ground-truth label. Existing partial label learning frameworks either assume each candidate label of equal confidence or consider the ground-truth label as a latent variable hidden in the indiscriminate candidate label set, while the different labeling confidence levels of the candidate labels are regrettably ignored. In this paper, we formalize the different labeling confidence levels as the latent label distributions, and propose a novel unified framework to estimate the latent label distributions while training the model simultaneously. Specifically, we present a biconvex formulation with constrained local consistency and adopt an alternating method to solve this optimization problem. The process of alternating optimization exactly facilitates the mutual adaption of the model training and the constrained label propagation. Extensive experimental results on controlled UCI datasets as well as real-world datasets clearly show the effectiveness of the proposed approach.
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "UCI dataset"

1

Billing, Suzannah-Lynn, Shannon Anderson, Andrew Parker, Martin Eichhorn, Lindsay Louise Vare, and Emily Thomson. Scottish Inshore Fisheries Integrated Data System (SIFIDS): work package 4 final report assessment of socio-economic and cultural characteristics of Scottish inshore fisheries. Edited by Mark James and Hannah Ladd-Jones. Marine Alliance for Science and Technology for Scotland (MASTS), 2018. http://dx.doi.org/10.15664/10023.23450.

Full text
Abstract:
[Extract from Executive Summary] The European Maritime and Fisheries Fund (EMFF) has funded the ‘Scottish Inshore Fisheries Integrated Data System’ (SIFIDS) project, which aims to integrate data collection and analysis for the Scottish inshore fishing industry. SIFIDS Work Package 4 was tasked with assessing the socio-economic and cultural characteristics of Scottish Inshore Fisheries. The aim was to develop replicable frameworks for collecting and analysing cultural data in combination with defining and analysing already available socio-economic datasets. An overview of the current available socio-economic data is presented and used to identify the data gaps. Primary socio-economic and cultural research was conducted to fill these gaps in order to capture complex cultural, social and economic relationships in a usable and useful manner. Some of the results from this Work Package will be incorporated into the platform that SIFIDS Work Package 6 is building. All primary research conducted within this work package followed the University of the Highlands and Islands (UHI) Research Ethics Framework and was granted Ethical Approval by the UHI Research Ethics Committee under code ETH895.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography