To see the other types of publications on this topic, follow the link: Machine learning. Data mining. Software measurement.

Journal articles on the topic 'Machine learning. Data mining. Software measurement'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Machine learning. Data mining. Software measurement.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Bagriyanik, Selami, and Adem Karahoca. "Using Data Mining to Identify COSMIC Function Point Measurement Competence." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 6 (December 1, 2018): 5253. http://dx.doi.org/10.11591/ijece.v8i6.pp5253-5259.

Full text
Abstract:
Cosmic Function Point (CFP) measurement errors leads budget, schedule and quality problems in software projects. Therefore, it’s important to identify and plan requirements engineers’ CFP training need quickly and correctly. The purpose of this paper is to identify software requirements engineers’ COSMIC Function Point measurement competence development need by using machine learning algorithms and requirements artifacts created by engineers. Used artifacts have been provided by a large service and technology company ecosystem in Telco. First, feature set has been extracted from the requirements model at hand. To do the data preparation for educational data mining, requirements and COSMIC Function Point (CFP) audit documents have been converted into CFP data set based on the designed feature set. This data set has been used to train and test the machine learning models by designing two different experiment settings to reach statistically significant results. Ten different machine learning algorithms have been used. Finally, algorithm performances have been compared with a baseline and each other to find the best performing models on this data set. In conclusion, REPTree, OneR, and Support Vector Machines (SVM) with Sequential Minimal Optimization (SMO) algorithms achieved top performance in forecasting requirements engineers’ CFP training need.
APA, Harvard, Vancouver, ISO, and other styles
2

Parasich, Andrey, Victor Parasich, and Irina Parasich. "Training set formation in machine learning problems (review)." Information and Control Systems, no. 4 (September 13, 2021): 61–70. http://dx.doi.org/10.31799/1684-8853-2021-4-61-70.

Full text
Abstract:
Introduction: Proper training set formation is a key factor in machine learning. In real training sets, problems and errors commonly occur, having a critical impact on the training result. Training set need to be formed in all machine learning problems; therefore, knowledge of possible difficulties will be helpful. Purpose: Overview of possible problems in the formation of a training set, in order to facilitate their detection and elimination when working with real training sets. Analyzing the impact of these problems on the results of the training. Results: The article makes on overview of possible errors in training set formation, such as lack of data, imbalance, false patterns, sampling from a limited set of sources, change in the general population over time, and others. We discuss the influence of these errors on the result of the training, test set formation, and training algorithm quality measurement. The pseudo-labeling, data augmentation, and hard samples mining are considered the most effective ways to expand a training set. We offer practical recommendations for the formation of a training or test set. Examples from the practice of Kaggle competitions are given. For the problem of cross-dataset generalization in neural network training, we propose an algorithm called Cross-Dataset Machine, which is simple to implement and allows you to get a gain in cross-dataset generalization. Practical relevance: The materials of the article can be used as a practical guide in solving machine learning problems.
APA, Harvard, Vancouver, ISO, and other styles
3

Gunarathna, M. H. J. P., Kazuhito Sakai, Tamotsu Nakandakari, Kazuro Momii, and M. K. N. Kumari. "Machine Learning Approaches to Develop Pedotransfer Functions for Tropical Sri Lankan Soils." Water 11, no. 9 (September 18, 2019): 1940. http://dx.doi.org/10.3390/w11091940.

Full text
Abstract:
Poor data availability on soil hydraulic properties in tropical regions hampers many studies, including crop and environmental modeling. The high cost and effort of measurement and the increasing demand for such data have driven researchers to search for alternative approaches. Pedotransfer functions (PTFs) are predictive functions used to estimate soil properties by easily measurable soil parameters. PTFs are popular in temperate regions, but few attempts have been made to develop PTFs in tropical regions. Regression approaches are widely used to develop PTFs worldwide, and recently a few attempts were made using machine learning methods. PTFs for tropical Sri Lankan soils have already been developed using classical multiple linear regression approaches. However, no attempts were made to use machine learning approaches. This study aimed to determine the applicability of machine learning algorithms in developing PTFs for tropical Sri Lankan soils. We tested three machine learning algorithms (artificial neural networks (ANN), k-nearest neighbor (KNN), and random forest (RF)) with three different input combination (sand, silt, and clay (SSC) percentages; SSC and bulk density (BD); SSC, BD, and organic carbon (OC)) to estimate volumetric water content (VWC) at −10 kPa, −33 kPa (representing field capacity (FC); however, most studies in Sri Lanka use −33 kPa as the FC) and −1500 kPa (representing the permanent wilting point (PWP)) of Sri Lankan soils. This analysis used the open-source data mining software in the Waikato Environment for Knowledge Analysis. Using a wrapper approach and best-first search method, we selected the most appropriate inputs to develop PTFs using different machine learning algorithms and input levels. We developed PTFs to estimate FC and PWP and compared them with the previously reported PTFs for tropical Sri Lankan soils. We found that RF was the best algorithm to develop PTFs for tropical Sri Lankan soils. We tried to further the development of PTFs by adding volumetric water content at −10 kPa as an input variable because it is quite an easily measurable parameter compared to the other targeted VWCs. With the addition of VWC at −10 kPa, all machine learning algorithms boosted the performance. However, RF was the best. We studied the functionality of finetuned PTFs and found that they can estimate the available water content of Sri Lankan soils as well as measurements-based calculations. We identified RF as a robust alternative to linear regression methods in developing PTFs to estimate field capacity and the permanent wilting point of tropical Sri Lankan soils. With those findings, we recommended that PTFs be developed using the RF algorithm in the related software to make up for the data gaps present in tropical regions.
APA, Harvard, Vancouver, ISO, and other styles
4

Wilkening, Jan. "Towards Spatial Data Science: Bridging the Gap between GIS, Cartography and Data Science." Abstracts of the ICA 1 (July 15, 2019): 1–2. http://dx.doi.org/10.5194/ica-abs-1-403-2019.

Full text
Abstract:
<p><strong>Abstract.</strong> Data is regarded as the oil of the 21st century, and the concept of data science has received increasing attention in the last years. These trends are mainly caused by the rise of big data &amp;ndash; data that is big in terms of volume, variety and velocity. Consequently, data scientists are required to make sense of these large datasets. Companies have problems acquiring talented people to solve data science problems. This is not surprising, as employers often expect skillsets that can hardly be found in one person: Not only does a data scientist need to have a solid background in machine learning, statistics and various programming languages, but often also in IT systems architecture, databases, complex mathematics. Above all, she should have a strong non-technical domain expertise in her field (see Figure 1).</p><p>As it is widely accepted that 80% of data has a spatial component, developments in data science could provide exciting new opportunities for GIS and cartography: Cartographers are experts in spatial data visualization, and often also very skilled in statistics, data pre-processing and analysis in general. The cartographers’ skill levels often depend on the degree to which cartography programs at universities focus on the “front end” (visualisation) of a spatial data and leave the “back end” (modelling, gathering, processing, analysis) to GIScientists. In many university curricula, these front-end and back-end distinctions between cartographers and GIScientists are not clearly defined, and the boundaries are somewhat blurred.</p><p>In order to become good data scientists, cartographers and GIScientists need to acquire certain additional skills that are often beyond their university curricula. These skills include programming, machine learning and data mining. These are important technologies for extracting knowledge big spatial data sets, and thereby the logical advancement to “traditional” geoprocessing, which focuses on “traditional” (small, structured, static) datasets such shapefiles or feature classes.</p><p>To bridge the gap between spatial sciences (such as GIS and cartography) and data science, we need an integrated framework of “spatial data science” (Figure 2).</p><p>Spatial sciences focus on causality, theory-based approaches to explain why things are happening in space. In contrast, the scope of data science is to find similar patterns in big datasets with techniques of machine learning and data mining &amp;ndash; often without considering spatial concepts (such as topology, spatial indexing, spatial autocorrelation, modifiable area unit problems, map projections and coordinate systems, uncertainty in measurement etc.).</p><p>Spatial data science could become the core competency of GIScientists and cartographers who are willing to integrate methods from the data science knowledge stack. Moreover, data scientists could enhance their work by integrating important spatial concepts and tools from GIS and cartography into data science workflows. A non-exhaustive knowledge stack for spatial data scientists, including typical tasks and tools, is given in Table 1.</p><p>There are many interesting ongoing projects at the interface of spatial and data science. Examples from the ArcGIS platform include:</p><ul><li>Integration of Python GIS APIs with Machine Learning libraries, such as scikit-learn or TensorFlow, in Jupyter Notebooks</li><li>Combination of R (advanced statistics and visualization) and GIS (basic geoprocessing, mapping) in ModelBuilder and other automatization frameworks</li><li>Enterprise GIS solutions for distributed geoprocessing operations on big, real-time vector and raster datasets</li><li>Dashboards for visualizing real-time sensor data and integrating it with other data sources</li><li>Applications for interactive data exploration</li><li>GIS tools for Machine Learning tasks for prediction, clustering and classification of spatial data</li><li>GIS Integration for Hadoop</li></ul><p>While the discussion about proprietary (ArcGIS) vs. open-source (QGIS) software is beyond the scope of this article, it has to be stated that a.) many ArcGIS projects are actually open-source and b.) using a complete GIS platform instead of several open-source pieces has several advantages, particularly in efficiency, maintenance and support (see Wilkening et al. (2019) for a more detailed consideration). At any rate, cartography and GIS tools are the essential technology blocks for solving the (80% spatial) data science problems of the future.</p>
APA, Harvard, Vancouver, ISO, and other styles
5

Makhlouf Shabou, Basma, Julien Tièche, Julien Knafou, and Arnaud Gaudinat. "Algorithmic methods to explore the automation of the appraisal of structured and unstructured digital data." Records Management Journal 30, no. 2 (July 3, 2020): 175–200. http://dx.doi.org/10.1108/rmj-09-2019-0049.

Full text
Abstract:
Purpose This paper aims to describe an interdisciplinary and innovative research conducted in Switzerland, at the Geneva School of Business Administration HES-SO and supported by the State Archives of Neuchâtel (Office des archives de l'État de Neuchâtel, OAEN). The problem to be addressed is one of the most classical ones: how to extract and discriminate relevant data in a huge amount of diversified and complex data record formats and contents. The goal of this study is to provide a framework and a proof of concept for a software that helps taking defensible decisions on the retention and disposal of records and data proposed to the OAEN. For this purpose, the authors designed two axes: the archival axis, to propose archival metrics for the appraisal of structured and unstructured data, and the data mining axis to propose algorithmic methods as complementary or/and additional metrics for the appraisal process. Design/methodology/approach Based on two axes, this exploratory study designs and tests the feasibility of archival metrics that are paired to data mining metrics, to advance, as much as possible, the digital appraisal process in a systematic or even automatic way. Under Axis 1, the authors have initiated three steps: first, the design of a conceptual framework to records data appraisal with a detailed three-dimensional approach (trustworthiness, exploitability, representativeness). In addition, the authors defined the main principles and postulates to guide the operationalization of the conceptual dimensions. Second, the operationalization proposed metrics expressed in terms of variables supported by a quantitative method for their measurement and scoring. Third, the authors shared this conceptual framework proposing the dimensions and operationalized variables (metrics) with experienced professionals to validate them. The expert’s feedback finally gave the authors an idea on: the relevance and the feasibility of these metrics. Those two aspects may demonstrate the acceptability of such method in a real-life archival practice. In parallel, Axis 2 proposes functionalities to cover not only macro analysis for data but also the algorithmic methods to enable the computation of digital archival and data mining metrics. Based on that, three use cases were proposed to imagine plausible and illustrative scenarios for the application of such a solution. Findings The main results demonstrate the feasibility of measuring the value of data and records with a reproducible method. More specifically, for Axis 1, the authors applied the metrics in a flexible and modular way. The authors defined also the main principles needed to enable computational scoring method. The results obtained through the expert’s consultation on the relevance of 42 metrics indicate an acceptance rate above 80%. In addition, the results show that 60% of all metrics can be automated. Regarding Axis 2, 33 functionalities were developed and proposed under six main types: macro analysis, microanalysis, statistics, retrieval, administration and, finally, the decision modeling and machine learning. The relevance of metrics and functionalities is based on the theoretical validity and computational character of their method. These results are largely satisfactory and promising. Originality/value This study offers a valuable aid to improve the validity and performance of archival appraisal processes and decision-making. Transferability and applicability of these archival and data mining metrics could be considered for other types of data. An adaptation of this method and its metrics could be tested on research data, medical data or banking data.
APA, Harvard, Vancouver, ISO, and other styles
6

Aló, Richard, and Vladik Kreinovich. "Selected Papers from InTech'04." Journal of Advanced Computational Intelligence and Intelligent Informatics 10, no. 3 (May 20, 2006): 243–44. http://dx.doi.org/10.20965/jaciii.2006.p0243.

Full text
Abstract:
The main objective of the annual International Conference on Intelligent Technologies (InTech) is to bring together researchers and practitioners who implement intelligent and fuzzy technologies in real-world environment. The Fifth International Conference on Intelligent Technologies InTech'04 was held in Houston, Texas, on December 2-4, 2004. Topics of InTech'04 included mathematical foundations of intelligent technologies, traditional Artificial Intelligent techniques, uncertainty processing and methods of soft computing, learning/adaptive systems/data mining, and applications of intelligent technologies. This special issue contains versions of 15 selected papers originally presented at InTech'04. These papers cover most of the topics of the conference. Several papers describe new applications of the existing intelligent techniques. R. Aló{o} et al. show how traditional <I>statistical</I> hypotheses testing techniques – originally designed for processing measurement results – need to be modified when applied to simulated data – e.g., when we compare the quality of two algorithms. Y. Frayman et al. use <I>mathematical morphology</I> and <I>genetic algorithms</I> in the design of a machine vision system for detecting surface defects in aluminum die casting. Y. Murai et al. propose a new faster <I>entropy</I>-based placement algorithm for VLSI circuit design and similar applications. A. P. Salvatore et al. show how <I>expert system</I>-type techniques can help in scheduling botox treatment for voice disorders. H. Tsuji et al. propose a new method, based on <I>partial differential equations</I>, for automatically identifying and extracting objects from a video. N. Ward uses <I>Ordered Weighted Average</I> (OWA) techniques to design a model that predicts admission of computer science students into different graduate schools. An important aspect of intelligence is ability to <I>learn</I>. In A. Mahaweerawat et al., neural-based machine learning is <I>used</I> to identify and predict software faults. J. Han et al. show that we can drastically <I>improve</I> the quality of machine learning if, in addition to discovering traditional (positive) rules, we also search for negative rules. A serious problem with many neural-based machine learning algorithms is that often, the results of their learning are un-intelligible rules and numbers. M. I. Khan et al. show, on the example of robotic arm applications, that if we allow neurons with different input-output dependencies – including linear neurons – then we can <I>extract</I> meaningful <I>knowledge</I> from the resulting network. Several papers analyze the Equivalent Transformation (ET) model, that allows the user to <I>automatically generate code from specifications</I>. A general description of this model is given by K. Akama et al. P. Chippimolchai et al. describe how, within this model, we can transform a user's query into an equivalent more efficient one. H. Koike et al. apply this approach to <I>natural language processing</I>. Y. Shigeta et al. show how the existing <I>constraint</I> techniques can be translated into equivalent transformation rules and thus, combined with other specifications. I. Takarajima et al. extend the ET approach to situations like <I>parallel computations</I>, where the order in which different computations are performed on different processors depends on other processes and is, thus, non-deterministic. Finally, a paper by J. Chandra – based on his invited talk at InTech'04 – describes a <I>general framework</I> for robust and resilient critical infrastructure systems, with potential applications to transportation systems, power grids, communication networks, water resources, health delivery systems, and financial networks. We want to thank all the authors for their outstanding work, the participants of InTech'04 for their helpful suggestions, the anonymous reviewers for their thorough analysis and constructive help, and – last but not the least – to Professor Kaoru Hirota for his kind suggestion to host this issue and to the entire staff of the journal for their tireless work.
APA, Harvard, Vancouver, ISO, and other styles
7

Jiao, Changyi. "Big Data Mining Optimization Algorithm Based on Machine Learning Model." Revue d'Intelligence Artificielle 34, no. 1 (February 29, 2020): 51–57. http://dx.doi.org/10.18280/ria.340107.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Mastoi, Qurat-ul-ain, Muhammad Suleman Memon, Abdullah Lakhan, Mazin Abed Mohammed, Mumtaz Qabulio, Fadi Al-Turjman, and Karrar Hameed Abdulkareem. "Machine learning-data mining integrated approach for premature ventricular contraction prediction." Neural Computing and Applications 33, no. 18 (March 14, 2021): 11703–19. http://dx.doi.org/10.1007/s00521-021-05820-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ghaffarian, Seyed Mohammad, and Hamid Reza Shahriari. "Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques." ACM Computing Surveys 50, no. 4 (November 8, 2017): 1–36. http://dx.doi.org/10.1145/3092566.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Mallikharjuna, L. K., and V. S. K. Reddy. "An adaptive correlation based video data mining using machine learning." International Journal of Knowledge-based and Intelligent Engineering Systems 24, no. 1 (April 9, 2020): 1–9. http://dx.doi.org/10.3233/kes-200023.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Yang, Jie, Chenzhou Ye, and Nianyi Chen. "DMiner-I: A software tool of data mining and its applications." Robotica 20, no. 5 (September 2002): 499–508. http://dx.doi.org/10.1017/s0263574702004307.

Full text
Abstract:
SummaryA software tool for data mining (DMiner-I) is introduced, which integrates pattern recognition (PCA, Fisher, clustering, HyperEnvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), and computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, HyperEnvelop, support vector machine and visualization. The principle, algorithms and knowledge representation of some function models of data mining are described. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining is realized byVisual C++under Windows 2000. The software tool of data mining has been satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.
APA, Harvard, Vancouver, ISO, and other styles
12

Kanevski, M., R. Parkin, A. Pozdnukhov, V. Timonin, M. Maignan, V. Demyanov, and S. Canu. "Environmental data mining and modeling based on machine learning algorithms and geostatistics." Environmental Modelling & Software 19, no. 9 (September 2004): 845–55. http://dx.doi.org/10.1016/j.envsoft.2003.03.004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Canaparo, Marco, and Elisabetta Ronchieri. "Data Mining Techniques for Software Quality Prediction in Open Source Software." EPJ Web of Conferences 214 (2019): 05007. http://dx.doi.org/10.1051/epjconf/201921405007.

Full text
Abstract:
Software quality monitoring and analysis are among the most productive topics in software engineering research. Their results may be effectively employed by engineers during software development life cycle. Open source software constitutes a valid test case for the assessment of software characteristics. The data mining approach has been proposed in literature to extract software characteristics from software engineering data. This paper aims at comparing diverse data mining techniques (e.g., derived from machine learning) for developing effective software quality prediction models. To achieve this goal, we tackled various issues, such as the collection of software metrics from open source repositories, the assessment of prediction models to detect software issues and the adoption of statistical methods to evaluate data mining techniques. The results of this study aspire to identify the data mining techniques that perform better amongst all the ones used in this paper for software quality prediction models.
APA, Harvard, Vancouver, ISO, and other styles
14

Shao, Yanli, Yusheng Liu, Xiaoping Ye, and Shuting Zhang. "A machine learning based global simulation data mining approach for efficient design changes." Advances in Engineering Software 124 (October 2018): 22–41. http://dx.doi.org/10.1016/j.advengsoft.2018.07.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Prasad, Manjula C. M., Lilly Florence Florence, and Arti Arya3. "A Study on Software Metrics based Software Defect Prediction using Data Mining and Machine Learning Techniques." International Journal of Database Theory and Application 8, no. 3 (June 30, 2015): 179–90. http://dx.doi.org/10.14257/ijdta.2015.8.3.15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Kukar, Matjaž. "Quality assessment of individual classifications in machine learning and data mining." Knowledge and Information Systems 9, no. 3 (September 9, 2005): 364–84. http://dx.doi.org/10.1007/s10115-005-0203-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Götz, Marco, Ferenc Leichsenring, Thomas Kropp, Peter Müller, Tobias Falk, Wolfgang Graf, Michael Kaliske, and Welf-Guntram Drossel. "Data Mining and Machine Learning Methods Applied to A Numerical Clinching Model." Computer Modeling in Engineering & Sciences 117, no. 3 (December 29, 2018): 387–423. http://dx.doi.org/10.31614/cmes.2018.04112.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Latif, Abdul, Lady Agustin Fitriana, and Muhammad Rifqi Firdaus. "COMPARATIVE ANALYSIS OF SOFTWARE EFFORT ESTIMATION USING DATA MINING TECHNIQUE AND FEATURE SELECTION." JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer) 6, no. 2 (February 2, 2021): 167–74. http://dx.doi.org/10.33480/jitk.v6i2.1968.

Full text
Abstract:
Software development involves several interrelated factors that influence development efforts and productivity. Improving the estimation techniques available to project managers will facilitate more effective time and budget control in software development. Software Effort Estimation or software cost/effort estimation can help a software development company to overcome difficulties experienced in estimating software development efforts. This study aims to compare the Machine Learning method of Linear Regression (LR), Multilayer Perceptron (MLP), Radial Basis Function (RBF), and Decision Tree Random Forest (DTRF) to calculate estimated cost/effort software. Then these five approaches will be tested on a dataset of software development projects as many as 10 dataset projects. So that it can produce new knowledge about what machine learning and non-machine learning methods are the most accurate for estimating software business. As well as knowing between the selection between using Particle Swarm Optimization (PSO) for attributes selection and without PSO, which one can increase the accuracy for software business estimation. The data mining algorithm used to calculate the most optimal software effort estimate is the Linear Regression algorithm with an average RMSE value of 1603,024 for the 10 datasets tested. Then using the PSO feature selection can increase the accuracy or reduce the RMSE average value to 1552,999. The result indicates that, compared with the original regression linear model, the accuracy or error rate of software effort estimation has increased by 3.12% by applying PSO feature selection
APA, Harvard, Vancouver, ISO, and other styles
19

Jacobucci, Ross, and Kevin J. Grimm. "Machine Learning and Psychological Research: The Unexplored Effect of Measurement." Perspectives on Psychological Science 15, no. 3 (April 29, 2020): 809–16. http://dx.doi.org/10.1177/1745691620902467.

Full text
Abstract:
Machine learning (i.e., data mining, artificial intelligence, big data) has been increasingly applied in psychological science. Although some areas of research have benefited tremendously from a new set of statistical tools, most often in the use of biological or genetic variables, the hype has not been substantiated in more traditional areas of research. We argue that this phenomenon results from measurement errors that prevent machine-learning algorithms from accurately modeling nonlinear relationships, if indeed they exist. This shortcoming is showcased across a set of simulated examples, demonstrating that model selection between a machine-learning algorithm and regression depends on the measurement quality, regardless of sample size. We conclude with a set of recommendations and a discussion of ways to better integrate machine learning with statistics as traditionally practiced in psychological science.
APA, Harvard, Vancouver, ISO, and other styles
20

Ren, Jiadong, Zhangqi Zheng, Qian Liu, Zhiyao Wei, and Huaizhi Yan. "A Buffer Overflow Prediction Approach Based on Software Metrics and Machine Learning." Security and Communication Networks 2019 (March 3, 2019): 1–13. http://dx.doi.org/10.1155/2019/8391425.

Full text
Abstract:
Buffer overflow vulnerability is the most common and serious type of vulnerability in software today, as network security issues have become increasingly critical. To alleviate the security threat, many vulnerability mining methods based on static and dynamic analysis have been developed. However, the current analysis methods have problems regarding high computational time, low test efficiency, low accuracy, and low versatility. This paper proposed a software buffer overflow vulnerability prediction method by using software metrics and a decision tree algorithm. First, the software metrics were extracted from the software source code, and data from the dynamic data stream at the functional level was extracted by a data mining method. Second, a model based on a decision tree algorithm was constructed to measure multiple types of buffer overflow vulnerabilities at the functional level. Finally, the experimental results showed that our method ran in less time than SVM, Bayes, adaboost, and random forest algorithms and achieved 82.53% and 87.51% accuracy in two different data sets. The method presented in this paper achieved the effect of accurately predicting software buffer overflow vulnerabilities in C/C++ and Java programs.
APA, Harvard, Vancouver, ISO, and other styles
21

Pratama, Jaka Aulia, Yadi Suprijadi, and Zulhanif Zulhanif. "The Analisis Sentimen Sosial Media Twitter Dengan Algoritma Machine Learning Menggunakan Software R." Jurnal Fourier 6, no. 2 (October 25, 2017): 85. http://dx.doi.org/10.14421/fourier.2017.62.85-89.

Full text
Abstract:
Media sosial adalah wadah untuk mengungkapkan opini terhadap suatu topik tertentu. Ketersediaan informasi dan opini dari para pengguna media sosial merupakan kumpulan dokumen data berupa teks yang amat sangat besar dan berguna untuk kepentingan penelitian maupun membuat suatu keputusan bagi pihak – pihak tertentu. Text Mining bisa didefinisikan sebagai proses penggalian informasi di mana pengguna berinteraksi dengan kumpulan dokumen dari waktu ke waktu dengan menggunakan suatu alat analisis. Analisis sentimen atau Opinion Mining adalah salah satu studi di bidang komputasi yang berhubungan dengan kasus publik mengenai opini, penilaian, sikap, dan emosi. Penelitian ini akan menggunakan metode Machine Learning pada analisis sentimen pengguna layanan jejaring sosial Twitter terhadap Donald Trump dan Barack Obama dalam 20000 tweets. Nilai akurasi metode Machine Learning yang diperoleh cukup tinggi yaitu 87.52% untuk Data Training dan 87.4% untuk Data Testing.
APA, Harvard, Vancouver, ISO, and other styles
22

Amigo, José Manuel. "Data Mining, Machine Learning, Deep Learning, Chemometrics. Definitions, common points and Trends (Spoiler Alert: VALIDATE your models!)." Brazilian Journal of Analytical Chemistry 8, no. 32 (May 14, 2021): 45–61. http://dx.doi.org/10.30744/brjac.2179-3425.ar-38-2021.

Full text
Abstract:
Concepts like Machine Learning, Data Mining or Artificial Intelligence have become part of our daily life. This is mostly due to the incredible advances made in computation (hardware and software), the increasing capabilities of generating and storing all types of data and, especially, the benefits (societal and economical) that generate the analysis of such data. Simultaneously, Chemometrics has played an important role since the late 1970s, analyzing data within natural science (and especially in Analytical Chemistry). Even with the strong parallelisms between all of the abovementioned terms and being popular with most of us, it is still difficult to clearly define or differentiate the meaning of Machine Learning, Data Mining, Artificial Intelligence, Deep Learning and Chemometrics. This manuscript brings some light to the definitions of Machine Learning, Data Mining, Artificial Intelligence and Big Data Analysis, defines their application ranges and seeks an application space within the field of analytical chemistry (a.k.a. Chemometrics). The manuscript is full of personal, sometimes probably subjective, opinions and statements. Therefore, all opinions here are open for constructive discussion with the only purpose of Learning (like the Machines do nowadays).
APA, Harvard, Vancouver, ISO, and other styles
23

Pietraszek, Tadeusz, and Axel Tanner. "Data mining and machine learning—Towards reducing false positives in intrusion detection." Information Security Technical Report 10, no. 3 (January 2005): 169–83. http://dx.doi.org/10.1016/j.istr.2005.07.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Morejón, Reinier, Marx Viana, and Carlos Lucena. "An Approach to Generate Software Agents for Health Data Mining." International Journal of Software Engineering and Knowledge Engineering 27, no. 09n10 (November 2017): 1579–89. http://dx.doi.org/10.1142/s0218194017400125.

Full text
Abstract:
Data mining is a hot topic that attracts researchers of different areas, such as database, machine learning, and agent-oriented software engineering. As a consequence of the growth of data volume, there is an increasing need to obtain knowledge from these large datasets that are very difficult to handle and process with traditional methods. Software agents can play a significant role performing data mining processes in ways that are more efficient. For instance, they can work to perform selection, extraction, preprocessing, and integration of data as well as parallel, distributed, or multisource mining. This paper proposes a framework based on multiagent systems to apply data mining techniques to health datasets. Last but not least, the usage scenarios that we use are datasets for hypothyroidism and diabetes and we run two different mining processes in parallel in each database.
APA, Harvard, Vancouver, ISO, and other styles
25

Fusco, Terence, Yaxin Bi, Haiying Wang, and Fiona Browne. "Data mining and machine learning approaches for prediction modelling of schistosomiasis disease vectors." International Journal of Machine Learning and Cybernetics 11, no. 6 (November 18, 2019): 1159–78. http://dx.doi.org/10.1007/s13042-019-01029-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Zelinska, Snizhana. "Machine learning: technologies and potential application at mining companies." E3S Web of Conferences 166 (2020): 03007. http://dx.doi.org/10.1051/e3sconf/202016603007.

Full text
Abstract:
Implementation of machine learning systems is currently one of the most sought-after spheres of human activities at the interface of information technologies, mathematical analysis and statistics. Machine learning technologies are penetrating into our life through applied software created with the help of artificial intelligence algorithms. It is obvious that machine learning technologies will be developing fast and becoming part of the human information space both in our everyday life and in professional activities. However, building of machine learning systems requires great labour contribution of specialists in the sphere of artificial intelligence and the subject area where this technology is to be applied. The article considers technologies and potential application of machine learning at mining companies. The article describes basic methods of machine learning: unsupervised learning, action learning, semi-supervised machine learning. The criteria are singled out to assess machine learning: operation speed; assessment time; implemented model accuracy; ease of integration; flexible deployment within the subject area; ease of practical application; result visualization. The article describes practical application of machine learning technologies and considers the dispatch system at a mining enterprise (as exemplified by the dispatch system of the mining and transportation complex “Quarry” used to increase efficiency of operating management of enterprise performance; to increase reliability and agility of mining and transportation complex performance records and monitoring. There is also a list of equipment performance data that can be stored in the database and used as a basis for processing by machine learning algorithms and obtaining new knowledge. Application of machine learning technologies in the mining industry is a promising and necessary condition for increasing mining efficiency and ensuring environmental security. Selection of the optimal process flow sheet of mining operations, selection of the optimal complex of stripping and mining equipment, optimal planning of mining operations and mining equipment performance control are some of the tasks where machine learning technologies can be used. However, despite prospectivity of machine learning technologies, this trend still remains understudied and requires further research.
APA, Harvard, Vancouver, ISO, and other styles
27

Wu, Zhiying, and Yuan Chen. "Digital Art Feature Association Mining Based on the Machine Learning Algorithm." Complexity 2021 (April 5, 2021): 1–11. http://dx.doi.org/10.1155/2021/5562298.

Full text
Abstract:
With the development of computer hardware and software, digital art is a new discipline. It uses computers and digital technology as tools to perform artistic expression. It can be expanded to various binary numerical codes with computers as the center and can also be refined to various categories of creation with computers. The research scope is set in the field of digital art, and all kinds of accidental factors of digital art creation based on the machine learning algorithm are mined and analyzed for feature correlation. Based on the hidden association relationship of massive data, the study focuses on the implicit association mining of digital art features of data for the recommendation algorithm. The classification and continuous data feature attributes are introduced and discretized, and the binary representation of data features is extended to ensure the diversity of data feature attributes. In order to mine some correlation features in data, a heuristic feature mining method based on minimum support was studied to discover the frequency of correlation features and construct the optimal feature subset. Based on the frequent items of data features, this study observes the heuristic algorithm of digital art feature association mining based on minimum confidence and carries out feature matching based on digital art feature association mining under different situation modes. The validity of the proposed algorithm is verified by using the experimental data of health and medical situations in the machine learning library.
APA, Harvard, Vancouver, ISO, and other styles
28

Maddouri, M., and M. Elloumi. "A data mining approach based on machine learning techniques to classify biological sequences." Knowledge-Based Systems 15, no. 4 (May 2002): 217–23. http://dx.doi.org/10.1016/s0950-7051(01)00143-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

HOLZMAN, LARS E., TODD A. FISHER, LEON M. GALITSKY, APRIL KONTOSTATHIS, and WILLIAM M. POTTENGER. "A SOFTWARE INFRASTRUCTURE FOR RESEARCH IN TEXTUAL DATA MINING." International Journal on Artificial Intelligence Tools 13, no. 04 (December 2004): 829–49. http://dx.doi.org/10.1142/s0218213004001843.

Full text
Abstract:
Few tools exist that address the challenges facing researchers in the Textual Data Mining (TDM) field. Some are too specific to their application, or are prototypes not suitable for general use. More general tools often are not capable of processing large volumes of data. We have created a Textual Data Mining Infrastructure (TMI) that incorporates both existing and new capabilities in a reusable framework conducive to developing new tools and components. TMI adheres to strict guidelines that allow it to run in a wide range of processing environments – as a result, it accommodates the volume of computing and diversity of research occurring in TDM. A unique capability of TMI is support for optimization. This facilitates text mining research by automating the search for optimal parameters in text mining algorithms. In this article we describe a number of applications that use the TMI. A brief tutorial is provided on the use of TMI. We present several novel results that have not been published elsewhere. We also discuss how the TMI utilizes existing machine-learning libraries, thereby enabling researchers to continue and extend their endeavors with minimal effort. Towards that end, TMI is available on the web at .
APA, Harvard, Vancouver, ISO, and other styles
30

Deng, Jeremiah D., Martin Purvis, and Maryam Purvis. "Software Effort Estimation." International Journal of Intelligent Information Technologies 7, no. 3 (July 2011): 41–53. http://dx.doi.org/10.4018/jiit.2011070104.

Full text
Abstract:
Software development effort estimation is important for quality management in the software development industry, yet its automation still remains a challenging issue. Applying machine learning algorithms alone often cannot achieve satisfactory results. This paper presents an integrated data mining framework that incorporates domain knowledge into a series of data analysis and modeling processes, including visualization, feature selection, and model validation. An empirical study on the software effort estimation problem using a benchmark dataset shows the necessity and effectiveness of the proposed approach.
APA, Harvard, Vancouver, ISO, and other styles
31

Daimi, Kevin, and Shadi Banitaan. "Using Data Mining to Predict Possible Future Depression Cases." International Journal of Public Health Science (IJPHS) 3, no. 4 (December 1, 2014): 231. http://dx.doi.org/10.11591/ijphs.v3i4.4697.

Full text
Abstract:
Depression is a disorder characterized by misery and gloominess felt over a period of time. Some symptoms of depression overlap with somatic illnesses implying considerable difficulty in diagnosing it. This paper contributes to its diagnosis through the application of data mining, namely classification, to predict patients who will most likely develop depression or are currently suffering from depression. Synthetic data is used for this study. To acquire the results, the popular suite of machine learning software, WEKA, is used.
APA, Harvard, Vancouver, ISO, and other styles
32

Daimi, Kevin, and Shadi Banitaan. "Using Data Mining to Predict Possible Future Depression Cases." International Journal of Public Health Science (IJPHS) 3, no. 4 (December 1, 2014): 231. http://dx.doi.org/10.11591/.v3i4.4697.

Full text
Abstract:
Depression is a disorder characterized by misery and gloominess felt over a period of time. Some symptoms of depression overlap with somatic illnesses implying considerable difficulty in diagnosing it. This paper contributes to its diagnosis through the application of data mining, namely classification, to predict patients who will most likely develop depression or are currently suffering from depression. Synthetic data is used for this study. To acquire the results, the popular suite of machine learning software, WEKA, is used.
APA, Harvard, Vancouver, ISO, and other styles
33

Naaz, Sameena. "Detection of Phishing in Internet of Things Using Machine Learning Approach." International Journal of Digital Crime and Forensics 13, no. 2 (March 2021): 1–15. http://dx.doi.org/10.4018/ijdcf.2021030101.

Full text
Abstract:
Phishing attacks are growing in the similar manner as e-commerce industries are growing. Prediction and prevention of phishing attacks is a very critical step towards safeguarding online transactions. Data mining tools can be applied in this regard as the technique is very easy and can mine millions of information within seconds and deliver accurate results. With the help of machine learning algorithms like random forest, decision tree, neural network, and linear model, we can classify data into phishing, suspicious, and legitimate. The devices that are connected over the internet, known as internet of things (IoT), are also at very high risk of phishing attack. In this work, machine learning algorithms random forest classifier, support vector machine, and logistic regression have been applied on IoT dataset for detection of phishing attacks, and then the results have been compared with previous work carried out on the same dataset as well as on a different dataset. The results of these algorithms have then been compared in terms of accuracy, error rate, precision, and recall.
APA, Harvard, Vancouver, ISO, and other styles
34

Behrens, Grit, Klaus Schlender, and Florian Fehring. "Data mining methods of healthy indoor climate coefficients for comfortable well-being." Environmental Protection and Natural Resources 29, no. 3 (September 1, 2018): 7–12. http://dx.doi.org/10.2478/oszn-2018-0013.

Full text
Abstract:
Abstract This article provides information about a currently developed measurement and analysis system ‘Smart Monitoring’, which is used on scientific project in terms of healthy indoor air coefficients, as well as the processing of the collected data for machine learning algorithms. The target is to reduce CO2 emissions caused by wrong ventilation habits in building sector after renovation process in older buildings.
APA, Harvard, Vancouver, ISO, and other styles
35

Vankayalapati, Revathi, Kalyani Balaso Ghutugade, Rekha Vannapuram, and Bejjanki Pooja Sree Prasanna. "K-Means Algorithm for Clustering of Learners Performance Levels Using Machine Learning Techniques." Revue d'Intelligence Artificielle 35, no. 1 (February 28, 2021): 99–104. http://dx.doi.org/10.18280/ria.350112.

Full text
Abstract:
Data Clustering is the process of grouping the objects in a way which is identical to the objects in the same group than in other classes. In this paper, the clustering of data is used as k-means to assess the output of students. Machine Learning is an area used in all systems. Machine learning is used in education, pattern recognition, sports, industrial applications. Its significance increases with the future of the students in the educational system. Data collection in education is very useful, as data volumes in the education system are growing each day. Higher education is relatively new, but due to the growing database its significance grows. There are several ways to assess the success of students. K-means is one of the best and most successful methods. The secret information in the database is extracted using data mining to increase the output of students. The decision tree is also a way to predict the success of the students. In recent years, educational institutions have the greatest challenges in increasing data growth and using it to increase efficiency, such that better decision-making can be made. Clustering is one of the most important methods used for the analysis of data sets. This trial uses cluster analyses according to their features for section students in various classes. Uncontrolled K-means algorithm is discussed. The mining of education data is used for the study of the knowledge available in the field of education in order to provide secret, significant and useful information. The proposed model considers K-means clustering model for analyzing learners performance. The outcomes and future of students can be strengthened with this support. The results show that the K-means cluster algorithm is useful for grouping students based on similar performance features.
APA, Harvard, Vancouver, ISO, and other styles
36

Fernández, Susana, Tomás de la Rosa, Fernando Fernández, Rubén Suárez, Javier Ortiz, Daniel Borrajo, and David Manzano. "Using automated planning for improving data mining processes." Knowledge Engineering Review 28, no. 2 (February 7, 2013): 157–73. http://dx.doi.org/10.1017/s0269888912000409.

Full text
Abstract:
AbstractThis paper presents a distributed architecture for automating data mining (DM) processes using standard languages. DM is a difficult task that relies on an exploratory and analytic process of processing large quantities of data in order to discover meaningful patterns. The increasing heterogeneity and complexity of available data requires some expert knowledge on how to combine the multiple and alternative DM tasks to process the data. Here, we describe DM tasks in terms of Automated Planning, which allows us to automate the DM knowledge flow construction. The work is based on the use of standards that have been defined in both DM and automated-planning communities. Thus, we use PMML (Predictive Model Markup Language) to describe DM tasks. From the PMML, a problem description in PDDL (Planning Domain Definition Language) can be generated, so any current planning system can be used to generate a plan. This plan is, again, translated to a DM workflow description, Knowledge Flow for Machine Learning format (Knowledge Flow file for the WEKA (Waikato Environment for Knowledge Analysis) tool), so the plan or DM workflow can be executed in WEKA.
APA, Harvard, Vancouver, ISO, and other styles
37

Rogers, Frank. "EDUCATIONAL FUZZY DATA-SETS AND DATA MINING IN A LINEAR FUZZY REAL ENVIRONMENT." Journal of Honai Math 2, no. 2 (August 8, 2019): 77–84. http://dx.doi.org/10.30862/jhm.v2i2.81.

Full text
Abstract:
Educational data mining is the process of converting raw data from educational systems to useful information that can be used by educational software developers, students, teachers, parents, and other educational researchers. Fuzzy educational datasets are datasets consisting of uncertain values. The purpose of this study is to develop and test a classification model under uncertainty unique to the modern student. This is done by developing a model of the uncertain data that come from an educational setting with Linear Fuzzy Real data. Machine learning was then used to understand students and their optimal learning environment. The ability to predict student performance is important in a web or online environment. This is true in the brick and mortar classroom as well and is especially important in rural areas where academic achievement is lower than ideal.
APA, Harvard, Vancouver, ISO, and other styles
38

Kalathas, Ilias, and Michail Papoutsidakis. "Predictive Maintenance Using Machine Learning and Data Mining: A Pioneer Method Implemented to Greek Railways." Designs 5, no. 1 (January 15, 2021): 5. http://dx.doi.org/10.3390/designs5010005.

Full text
Abstract:
In every business, the production of knowledge, coming from the process of effective information, is recognized as a strategic asset and source of competitive advantage. In the field of railways, a vast amount of data are produced, which is necessary to be assessed, deployed in an optimum way, and used as a mechanism, which will lead to making the right decisions, aiming at saving resources and maintain the fundamental principle of the railways which is the passengers’ safety. This paper uses stored-inactive data from a Greek railway company, and uses the method of data mining and applies machine learning techniques to create strategic decision support and draw up a risk and control plan for trains. We make an effort to apply Machine Learning open source software (Weka) to the obsolete procedures of maintenance of the rolling stock of the company (hand-written work orders from the supervisors to the technicians, dealing with the dysfunctions of a train unit by experience, the lack of planning and coding of the malfunctions and the maintenance schedule). Using the J48 and M5P algorithms from the Weka software, data are recorded, processed, and analyzed that can help monitor or discover, with great accuracy, the prevention of possible damage or stresses, without the addition of new recording devices—monitoring on trains, with the aim of predicting the diagnosis of the train fleet. The innovative method is capable of being used as a tool for the optimization of the management’s performance of the trains to provide the appropriate information for the implementation of planning and the technical ability of the trains in order to achieve the greatest target of importance for the railways, which is the passengers’ safety.
APA, Harvard, Vancouver, ISO, and other styles
39

Fouad, Maha, Dr Mahmoud M. Abd ellatif, Prof Mohamed Hagag, and Dr Ahmed Akl. "Prediction Of Long Term Living Donor Kidney Graft Outcome: Comparison Between Different Machine Learning Methods." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 14, no. 2 (December 9, 2014): 5419–31. http://dx.doi.org/10.24297/ijct.v14i2.2066.

Full text
Abstract:
Predicting the outcome of a graft transplant with high level of accuracy is a challenging task In medical fields and Data Mining has a great role to answer the challenge. The goal of this study is to compare the performances and features of data mining technique namely Decision Tree , Rule Based Classifiers with Compare to Logistic Regression as a standard statistical data mining method to predict the outcome of kidney transplants over a 5-year horizon. The dataset was compiled from the Urology and Nephrology Center (UNC), Mansoura, Egypt. classifiers were developed using the Weka machine learning software workbench by applying Rule Based Classifiers (RIPPER, DTNB),Decision Tree Classifiers (BF,J48 ) and Logistic Regression. Further from Experimental Results, it has been found that Decision Tree and Rule Based classifiers are providing improved Accuracy and interpretable models compared to other Classifier.
APA, Harvard, Vancouver, ISO, and other styles
40

Verma, Pratibha, Vineet Kumar Awasthi, and Sanat Kumar Sahu. "A Novel Design of Classification of Coronary Artery Disease Using Deep Learning and Data Mining Algorithms." Revue d'Intelligence Artificielle 35, no. 3 (June 30, 2021): 209–15. http://dx.doi.org/10.18280/ria.350304.

Full text
Abstract:
Data mining techniques are included with Ensemble learning and deep learning for the classification. The methods used for classification are, Single C5.0 Tree (C5.0), Classification and Regression Tree (CART), kernel-based Support Vector Machine (SVM) with linear kernel, ensemble (CART, SVM, C5.0), Neural Network-based Fit single-hidden-layer neural network (NN), Neural Networks with Principal Component Analysis (PCA-NN), deep learning-based H2OBinomialModel-Deeplearning (HBM-DNN) and Enhanced H2OBinomialModel-Deeplearning (EHBM-DNN). In this study, experiments were conducted on pre-processed datasets using R programming and 10-fold cross-validation technique. The findings show that the ensemble model (CART, SVM and C5.0) and EHBM-DNN are more accurate for classification, compared with other methods.
APA, Harvard, Vancouver, ISO, and other styles
41

Bohanec, Marko, Marko Robnik-Šikonja, and Mirjana Kljajić Borštnar. "Decision-making framework with double-loop learning through interpretable black-box machine learning models." Industrial Management & Data Systems 117, no. 7 (August 14, 2017): 1389–406. http://dx.doi.org/10.1108/imds-09-2016-0409.

Full text
Abstract:
Purpose The purpose of this paper is to address the problem of weak acceptance of machine learning (ML) models in business. The proposed framework of top-performing ML models coupled with general explanation methods provides additional information to the decision-making process. This builds a foundation for sustainable organizational learning. Design/methodology/approach To address user acceptance, participatory approach of action design research (ADR) was chosen. The proposed framework is demonstrated on a B2B sales forecasting process in an organizational setting, following cross-industry standard process for data mining (CRISP-DM) methodology. Findings The provided ML model explanations efficiently support business decision makers, reduce forecasting error for new sales opportunities, and facilitate discussion about the context of opportunities in the sales team. Research limitations/implications The quality and quantity of available data affect the performance of models and explanations. Practical implications The application in the real-world company demonstrates the utility of the approach and provides evidence that transparent explanations of ML models contribute to individual and organizational learning. Social implications All used methods are available as an open-source software and can improve the acceptance of ML in data-driven decision making. Originality/value The proposed framework incorporates existing ML models and general explanation methodology into a decision-making process. To the authors’ knowledge, this is the first attempt to support organizational learning with a framework combining ML explanations, ADR, and data mining methodology based on the CRISP-DM industry standard.
APA, Harvard, Vancouver, ISO, and other styles
42

Chaturvedi, K. K., and V. B. Singh. "An Empirical Comparison of Machine Learning Techniques in Predicting the Bug Severity of Open and Closed Source Projects." International Journal of Open Source Software and Processes 4, no. 2 (April 2012): 32–59. http://dx.doi.org/10.4018/jossp.2012040103.

Full text
Abstract:
Bug severity is the degree of impact that a defect has on the development or operation of a component or system, and can be classified into different levels based on their impact on the system. Identification of severity level can be useful for bug triager in allocating the bug to the concerned bug fixer. Various researchers have attempted text mining techniques in predicting the severity of bugs, detection of duplicate bug reports and assignment of bugs to suitable fixer for its fix. In this paper, an attempt has been made to compare the performance of different machine learning techniques namely Support vector machine (SVM), probability based Naïve Bayes (NB), Decision Tree based J48 (A Java implementation of C4.5), rule based Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and Random Forests (RF) learners in predicting the severity level (1 to 5) of a reported bug by analyzing the summary or short description of the bug reports. The bug report data has been taken from NASA’s PITS (Projects and Issue Tracking System) datasets as closed source and components of Eclipse, Mozilla & GNOME datasets as open source projects. The analysis has been carried out in RapidMiner and STATISTICA data mining tools. The authors measured the performance of different machine learning techniques by considering (i) the value of accuracy and F-Measure for all severity level and (ii) number of best cases at different threshold level of accuracy and F-Measure.
APA, Harvard, Vancouver, ISO, and other styles
43

GUO, GONGDE, and DANIEL NEAGU. "FUZZY kNNMODEL APPLIED TO PREDICTIVE TOXICOLOGY DATA MINING." International Journal of Computational Intelligence and Applications 05, no. 03 (September 2005): 321–33. http://dx.doi.org/10.1142/s1469026805001635.

Full text
Abstract:
A robust method, fuzzy kNNModel, for toxicity prediction of chemical compounds is proposed. The method is based on a supervised clustering method, called kNNModel, which employs fuzzy partitioning instead of crisp partitioning to group clusters. The merits of fuzzy kNNModel are two-fold: (1) it overcomes the problems of choosing the parameter ε — allowed error rate in a cluster and the parameter N — minimal number of instances covered by a cluster, for each data set; (2) it better captures the characteristics of boundary data by assigning them with different degrees of membership between 0 and 1 to different clusters. The experimental results of fuzzy kNNModel conducted on thirteen public data sets from UCI machine learning repository and seven toxicity data sets from real-world applications, are compared with the results of fuzzy c-means clustering, k-means clustering, kNN, fuzzy kNN, and kNNModel in terms of classification performance. This application shows that fuzzy kNNModel is a promising method for the toxicity prediction of chemical compounds.
APA, Harvard, Vancouver, ISO, and other styles
44

Turesson, Hjalmar K., Henry Kim, Marek Laskowski, and Alexandra Roatis. "Privacy Preserving Data Mining as Proof of Useful Work." Journal of Database Management 32, no. 1 (January 2021): 69–85. http://dx.doi.org/10.4018/jdm.2021010104.

Full text
Abstract:
Blockchains rely on a consensus among participants to achieve decentralization and security. However, reaching consensus in an online, digital world where identities are not tied to physical users is a challenging problem. Proof-of-work provides a solution by linking representation to a valuable, physical resource. While this has worked well, it uses a tremendous amount of specialized hardware and energy, with no utility beyond blockchain security. Here, the authors propose an alternative consensus scheme that directs the computational resources to the optimization of machine learning (ML) models – a task with more general utility. This is achieved by a hybrid consensus scheme relying on three parties: data providers, miners, and a committee. The data provider makes data available and provides payment in return for the best model, miners compete about the payment and access to the committee by producing ML optimized models, and the committee controls the ML competition.
APA, Harvard, Vancouver, ISO, and other styles
45

VAN SOMEREN, MAARTEN, and TANJA URBANČIČ. "Applications of machine learning: matching problems to tasks and methods." Knowledge Engineering Review 20, no. 4 (December 2005): 363–402. http://dx.doi.org/10.1017/s0269888906000762.

Full text
Abstract:
The terminology of Machine Learning and Data Mining methods does not always allow a simple match between practical problems and methods. While some problems look similar from the user's point of view, but require different methods to be solved, some others look very different, yet they can be solved by applying the same methods and tools. Choosing appropriate Machine Learning methods for problem solving in practice is therefore largely a matter of experience and it is not realistic to expect a simple look-up table with matches between problems and methods. However, some guidelines can be given and a collection that summarizes other people's experience can also be helpful. A small number of definitions characterize the tasks that are performed by a large proportion of methods. Most of the variation in methods is concerned with differences in data types and algorithmic aspects of methods. In this paper, we summarize the main task types and illustrate how a wide variety of practical problems are formulated in terms of these tasks. The match between problems and tasks is illustrated with a collection of example applications with the aim of helping to express new practical problems as Machine Learning tasks. Some tasks can be decomposed into subtasks, allowing a wider variety of matches between practical problems and (combinations of) methods. We review the main principles for choosing between alternatives and illustrate this with a large collection of applications. We believe that this provides some guidelines.
APA, Harvard, Vancouver, ISO, and other styles
46

Anggraini, Recha Abriana, Galih Widagdo, Arief Setya Budi, and M. Qomaruddin. "Penerapan Data Mining Classification untuk Data Blogger Menggunakan Metode Naïve Bayes." Jurnal Sistem dan Teknologi Informasi (JUSTIN) 7, no. 1 (January 31, 2019): 47. http://dx.doi.org/10.26418/justin.v7i1.30211.

Full text
Abstract:
Jumlah pengguna situs blogger yang semakin meningkat menyebabkan perlu dilakukan pengklasifikasian data untuk mengetahui pengguna tersebut masuk dalam kategori pengguna blogger professional atau bukan. Sebagai referensi terkait penelitian ini adalah penelitian yang sudah dilakukan oleh peneliti sebelumnya. Teknik pengklasifikasian pemodelan deskriptif dan prediktif dengan algoritma data mining yaitu menggunakan metode naïve bayes. Untuk mengelola data digunakan software rapid miner studio 6.0, dataset blogger diperoleh dari website UCI Machine learning Repository, Perhitungan performance vector menunjukkan akurasi klasifikasi metode Naive bayes diperoleh sebesar 86.67%. Sedangkan class precision dan class recall untuk prediksi yes menunjukkan tingkat precision sebesar 91.30% dan untuk prediksino sebesar 71.43%. Hasil klasifikasi dari data blogger dengan metode naïve bayes membagi 2 kelas klasifikasi PB yaitu class yes dan class no. Untuk nilai class yes yaitu 0.680 dan nilai class no yaitu 0.320. Dari hasil pengolahan data dapat diketahui bahwa tingkat akurasi pengklasifikasian data blogger mencapai 86.67%.
APA, Harvard, Vancouver, ISO, and other styles
47

Khan, Bilal, Rashid Naseem, Muhammad Arif Shah, Karzan Wakil, Atif Khan, M. Irfan Uddin, and Marwan Mahmoud. "Software Defect Prediction for Healthcare Big Data: An Empirical Evaluation of Machine Learning Techniques." Journal of Healthcare Engineering 2021 (March 15, 2021): 1–16. http://dx.doi.org/10.1155/2021/8899263.

Full text
Abstract:
Software defect prediction (SDP) in the initial period of the software development life cycle (SDLC) remains a critical and important assignment. SDP is essentially studied during few last decades as it leads to assure the quality of software systems. The quick forecast of defective or imperfect artifacts in software development may serve the development team to use the existing assets competently and more effectively to provide extraordinary software products in the given or narrow time. Previously, several canvassers have industrialized models for defect prediction utilizing machine learning (ML) and statistical techniques. ML methods are considered as an operative and operational approach to pinpoint the defective modules, in which moving parts through mining concealed patterns amid software metrics (attributes). ML techniques are also utilized by several researchers on healthcare datasets. This study utilizes different ML techniques software defect prediction using seven broadly used datasets. The ML techniques include the multilayer perceptron (MLP), support vector machine (SVM), decision tree (J48), radial basis function (RBF), random forest (RF), hidden Markov model (HMM), credal decision tree (CDT), K-nearest neighbor (KNN), average one dependency estimator (A1DE), and Naïve Bayes (NB). The performance of each technique is evaluated using different measures, for instance, relative absolute error (RAE), mean absolute error (MAE), root mean squared error (RMSE), root relative squared error (RRSE), recall, and accuracy. The inclusive outcome shows the best performance of RF with 88.32% average accuracy and 2.96 rank value, second-best performance is achieved by SVM with 87.99% average accuracy and 3.83 rank values. Moreover, CDT also shows 87.88% average accuracy and 3.62 rank values, placed on the third position. The comprehensive outcomes of research can be utilized as a reference point for new research in the SDP domain, and therefore, any assertion concerning the enhancement in prediction over any new technique or model can be benchmarked and proved.
APA, Harvard, Vancouver, ISO, and other styles
48

Ehrenman, Gayle. "Mining What Others Miss." Mechanical Engineering 127, no. 02 (February 1, 2005): 26–31. http://dx.doi.org/10.1115/1.2005-feb-1.

Full text
Abstract:
This article discusses data mining that draws upon extensive work in areas such as statistics, machine learning, pattern recognition, databases, and high-performance computing to discover interesting and previously unknown information in data. More specifically, data mining is the analysis of 10 large data sets to find relationships and patterns that aren’t readily apparent, and to summarize the data in new and useful ways. Data mining technology has enabled earth scientists from NASA to discover changes in the global carbon cycle and climate system, and biologists to map and explore the human genome. Data mining is not restricted solely to vast banks of data with unlimited ways of analyzing it. Manufacturers, such as W.L. Gore (the maker of GoreTex) use commercially available data mining tools to warehouse and analyze their data, and improve their manufacturing process. Gore uses data mining tools from analytic software vendor SAS for statistical modeling in its manufacturing process.
APA, Harvard, Vancouver, ISO, and other styles
49

., Baydaa Mohammed Merzah. "Actual Needs Criteria for Assessing Data Classification Platforms." Samarra Journal of Pure and Applied Science 3, no. 1 (September 24, 2021): 125–38. http://dx.doi.org/10.54153/sjpas.2021.v3i1.227.

Full text
Abstract:
A Software tools have an important role in different research areas. Generally they provide time and efforts saving. In computer science filed these tools can help in communications, web site development, software metrics finding, data mining, machine learning and many other fields. There are many specialized tools built to support specific purpose. Users and researchers spend a lot of time and efforts to select between the large amounts of the available platforms. Each has its own characteristics, some are open source and the other licensed with trial version to test them. In this work we will focus on some platforms related to data mining research area. The selected tools represent widely used and trusted ones with most updated version. We will study platforms from different perspectives. They have different data processing features, but they support common algorithms helps us to evaluate between them. Four data mining tools and four data set were selected. The assessment procedure done from multi-points of view as we will see in the methodology section of this article. The criteria collected from a survey done among a population of researchers interested in the field of data mining and machine learning. The Contribution of this work is to assess the selected platforms depending on new actual needs criteria. These criteria give a clear idea for the researchers to determine the best platform according to their resources. The results highlighted the power for each platform. Orange and Weka show best performance over the rest. These results will be the guide for beginners or researchers out the computer science field to select the appropriate platform for their needs and available resources.
APA, Harvard, Vancouver, ISO, and other styles
50

Krishna Mohan, G., N. Yoshitha, M. L.N.Lavanya, and A. Krishna Priya. "Assessment and Analysis of Software Reliability Using Machine Learning Techniques." International Journal of Engineering & Technology 7, no. 2.32 (May 31, 2018): 201. http://dx.doi.org/10.14419/ijet.v7i2.32.15567.

Full text
Abstract:
Software reliability models access the reliability by fault prediction. Reliability is a real world phenomenon with many associated real time problems and to obtain solutions to problems quickly, accurately and acceptably a large no. of soft computing techniques has been developed. We attempt to address the software failure problems by modeling software failure data using the machine learning techniques such as support vector machine (SVM) regression and generalized additive models. The study of software reliability can be categorized into three parts: modeling, measurement, improvement. Programming unwavering quality demonstrating has developed to a point that important outcomes can be acquired by applying appropriate models to the issue; there is no single model all inclusive to every one of the circumstances. We propose different machine learning methods for the evaluation of programming unwavering quality, for example, artificial neural networks, support vector machine calculation approached. We at that point break down the outcomes from machine getting the hang of demonstrating, and contrast them with that of some summed up direct displaying procedures that are proportional to programming dependability models.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography