Relevant bibliographies by topics / Machine learning. Data mining. Software measurement

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers

Academic literature on the topic 'Machine learning. Data mining. Software measurement'

Author: Grafiati

Published: 4 June 2021

Last updated: 2 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Machine learning. Data mining. Software measurement.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Machine learning. Data mining. Software measurement"

Bagriyanik, Selami, and Adem Karahoca. "Using Data Mining to Identify COSMIC Function Point Measurement Competence." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 6 (December 1, 2018): 5253. http://dx.doi.org/10.11591/ijece.v8i6.pp5253-5259.

Full text

Abstract:

Cosmic Function Point (CFP) measurement errors leads budget, schedule and quality problems in software projects. Therefore, it’s important to identify and plan requirements engineers’ CFP training need quickly and correctly. The purpose of this paper is to identify software requirements engineers’ COSMIC Function Point measurement competence development need by using machine learning algorithms and requirements artifacts created by engineers. Used artifacts have been provided by a large service and technology company ecosystem in Telco. First, feature set has been extracted from the requirements model at hand. To do the data preparation for educational data mining, requirements and COSMIC Function Point (CFP) audit documents have been converted into CFP data set based on the designed feature set. This data set has been used to train and test the machine learning models by designing two different experiment settings to reach statistically significant results. Ten different machine learning algorithms have been used. Finally, algorithm performances have been compared with a baseline and each other to find the best performing models on this data set. In conclusion, REPTree, OneR, and Support Vector Machines (SVM) with Sequential Minimal Optimization (SMO) algorithms achieved top performance in forecasting requirements engineers’ CFP training need.

APA, Harvard, Vancouver, ISO, and other styles

Parasich, Andrey, Victor Parasich, and Irina Parasich. "Training set formation in machine learning problems (review)." Information and Control Systems, no. 4 (September 13, 2021): 61–70. http://dx.doi.org/10.31799/1684-8853-2021-4-61-70.

Full text

Abstract:

Introduction: Proper training set formation is a key factor in machine learning. In real training sets, problems and errors commonly occur, having a critical impact on the training result. Training set need to be formed in all machine learning problems; therefore, knowledge of possible difficulties will be helpful. Purpose: Overview of possible problems in the formation of a training set, in order to facilitate their detection and elimination when working with real training sets. Analyzing the impact of these problems on the results of the training. Results: The article makes on overview of possible errors in training set formation, such as lack of data, imbalance, false patterns, sampling from a limited set of sources, change in the general population over time, and others. We discuss the influence of these errors on the result of the training, test set formation, and training algorithm quality measurement. The pseudo-labeling, data augmentation, and hard samples mining are considered the most effective ways to expand a training set. We offer practical recommendations for the formation of a training or test set. Examples from the practice of Kaggle competitions are given. For the problem of cross-dataset generalization in neural network training, we propose an algorithm called Cross-Dataset Machine, which is simple to implement and allows you to get a gain in cross-dataset generalization. Practical relevance: The materials of the article can be used as a practical guide in solving machine learning problems.

APA, Harvard, Vancouver, ISO, and other styles

Gunarathna, M. H. J. P., Kazuhito Sakai, Tamotsu Nakandakari, Kazuro Momii, and M. K. N. Kumari. "Machine Learning Approaches to Develop Pedotransfer Functions for Tropical Sri Lankan Soils." Water 11, no. 9 (September 18, 2019): 1940. http://dx.doi.org/10.3390/w11091940.

Full text

Abstract:

Poor data availability on soil hydraulic properties in tropical regions hampers many studies, including crop and environmental modeling. The high cost and effort of measurement and the increasing demand for such data have driven researchers to search for alternative approaches. Pedotransfer functions (PTFs) are predictive functions used to estimate soil properties by easily measurable soil parameters. PTFs are popular in temperate regions, but few attempts have been made to develop PTFs in tropical regions. Regression approaches are widely used to develop PTFs worldwide, and recently a few attempts were made using machine learning methods. PTFs for tropical Sri Lankan soils have already been developed using classical multiple linear regression approaches. However, no attempts were made to use machine learning approaches. This study aimed to determine the applicability of machine learning algorithms in developing PTFs for tropical Sri Lankan soils. We tested three machine learning algorithms (artificial neural networks (ANN), k-nearest neighbor (KNN), and random forest (RF)) with three different input combination (sand, silt, and clay (SSC) percentages; SSC and bulk density (BD); SSC, BD, and organic carbon (OC)) to estimate volumetric water content (VWC) at −10 kPa, −33 kPa (representing field capacity (FC); however, most studies in Sri Lanka use −33 kPa as the FC) and −1500 kPa (representing the permanent wilting point (PWP)) of Sri Lankan soils. This analysis used the open-source data mining software in the Waikato Environment for Knowledge Analysis. Using a wrapper approach and best-first search method, we selected the most appropriate inputs to develop PTFs using different machine learning algorithms and input levels. We developed PTFs to estimate FC and PWP and compared them with the previously reported PTFs for tropical Sri Lankan soils. We found that RF was the best algorithm to develop PTFs for tropical Sri Lankan soils. We tried to further the development of PTFs by adding volumetric water content at −10 kPa as an input variable because it is quite an easily measurable parameter compared to the other targeted VWCs. With the addition of VWC at −10 kPa, all machine learning algorithms boosted the performance. However, RF was the best. We studied the functionality of finetuned PTFs and found that they can estimate the available water content of Sri Lankan soils as well as measurements-based calculations. We identified RF as a robust alternative to linear regression methods in developing PTFs to estimate field capacity and the permanent wilting point of tropical Sri Lankan soils. With those findings, we recommended that PTFs be developed using the RF algorithm in the related software to make up for the data gaps present in tropical regions.

APA, Harvard, Vancouver, ISO, and other styles

Wilkening, Jan. "Towards Spatial Data Science: Bridging the Gap between GIS, Cartography and Data Science." Abstracts of the ICA 1 (July 15, 2019): 1–2. http://dx.doi.org/10.5194/ica-abs-1-403-2019.

Full text

Abstract:

Abstract. Data is regarded as the oil of the 21st century, and the concept of data science has received increasing attention in the last years. These trends are mainly caused by the rise of big data &ndash; data that is big in terms of volume, variety and velocity. Consequently, data scientists are required to make sense of these large datasets. Companies have problems acquiring talented people to solve data science problems. This is not surprising, as employers often expect skillsets that can hardly be found in one person: Not only does a data scientist need to have a solid background in machine learning, statistics and various programming languages, but often also in IT systems architecture, databases, complex mathematics. Above all, she should have a strong non-technical domain expertise in her field (see Figure 1).As it is widely accepted that 80% of data has a spatial component, developments in data science could provide exciting new opportunities for GIS and cartography: Cartographers are experts in spatial data visualization, and often also very skilled in statistics, data pre-processing and analysis in general. The cartographers’ skill levels often depend on the degree to which cartography programs at universities focus on the “front end” (visualisation) of a spatial data and leave the “back end” (modelling, gathering, processing, analysis) to GIScientists. In many university curricula, these front-end and back-end distinctions between cartographers and GIScientists are not clearly defined, and the boundaries are somewhat blurred.In order to become good data scientists, cartographers and GIScientists need to acquire certain additional skills that are often beyond their university curricula. These skills include programming, machine learning and data mining. These are important technologies for extracting knowledge big spatial data sets, and thereby the logical advancement to “traditional” geoprocessing, which focuses on “traditional” (small, structured, static) datasets such shapefiles or feature classes.To bridge the gap between spatial sciences (such as GIS and cartography) and data science, we need an integrated framework of “spatial data science” (Figure 2).Spatial sciences focus on causality, theory-based approaches to explain why things are happening in space. In contrast, the scope of data science is to find similar patterns in big datasets with techniques of machine learning and data mining &ndash; often without considering spatial concepts (such as topology, spatial indexing, spatial autocorrelation, modifiable area unit problems, map projections and coordinate systems, uncertainty in measurement etc.).Spatial data science could become the core competency of GIScientists and cartographers who are willing to integrate methods from the data science knowledge stack. Moreover, data scientists could enhance their work by integrating important spatial concepts and tools from GIS and cartography into data science workflows. A non-exhaustive knowledge stack for spatial data scientists, including typical tasks and tools, is given in Table 1.There are many interesting ongoing projects at the interface of spatial and data science. Examples from the ArcGIS platform include:<ul><li>Integration of Python GIS APIs with Machine Learning libraries, such as scikit-learn or TensorFlow, in Jupyter Notebooks</li><li>Combination of R (advanced statistics and visualization) and GIS (basic geoprocessing, mapping) in ModelBuilder and other automatization frameworks</li><li>Enterprise GIS solutions for distributed geoprocessing operations on big, real-time vector and raster datasets</li><li>Dashboards for visualizing real-time sensor data and integrating it with other data sources</li><li>Applications for interactive data exploration</li><li>GIS tools for Machine Learning tasks for prediction, clustering and classification of spatial data</li><li>GIS Integration for Hadoop</li></ul>While the discussion about proprietary (ArcGIS) vs. open-source (QGIS) software is beyond the scope of this article, it has to be stated that a.) many ArcGIS projects are actually open-source and b.) using a complete GIS platform instead of several open-source pieces has several advantages, particularly in efficiency, maintenance and support (see Wilkening et al. (2019) for a more detailed consideration). At any rate, cartography and GIS tools are the essential technology blocks for solving the (80% spatial) data science problems of the future.

APA, Harvard, Vancouver, ISO, and other styles

Makhlouf Shabou, Basma, Julien Tièche, Julien Knafou, and Arnaud Gaudinat. "Algorithmic methods to explore the automation of the appraisal of structured and unstructured digital data." Records Management Journal 30, no. 2 (July 3, 2020): 175–200. http://dx.doi.org/10.1108/rmj-09-2019-0049.

Full text

Abstract:

Purpose This paper aims to describe an interdisciplinary and innovative research conducted in Switzerland, at the Geneva School of Business Administration HES-SO and supported by the State Archives of Neuchâtel (Office des archives de l'État de Neuchâtel, OAEN). The problem to be addressed is one of the most classical ones: how to extract and discriminate relevant data in a huge amount of diversified and complex data record formats and contents. The goal of this study is to provide a framework and a proof of concept for a software that helps taking defensible decisions on the retention and disposal of records and data proposed to the OAEN. For this purpose, the authors designed two axes: the archival axis, to propose archival metrics for the appraisal of structured and unstructured data, and the data mining axis to propose algorithmic methods as complementary or/and additional metrics for the appraisal process. Design/methodology/approach Based on two axes, this exploratory study designs and tests the feasibility of archival metrics that are paired to data mining metrics, to advance, as much as possible, the digital appraisal process in a systematic or even automatic way. Under Axis 1, the authors have initiated three steps: first, the design of a conceptual framework to records data appraisal with a detailed three-dimensional approach (trustworthiness, exploitability, representativeness). In addition, the authors defined the main principles and postulates to guide the operationalization of the conceptual dimensions. Second, the operationalization proposed metrics expressed in terms of variables supported by a quantitative method for their measurement and scoring. Third, the authors shared this conceptual framework proposing the dimensions and operationalized variables (metrics) with experienced professionals to validate them. The expert’s feedback finally gave the authors an idea on: the relevance and the feasibility of these metrics. Those two aspects may demonstrate the acceptability of such method in a real-life archival practice. In parallel, Axis 2 proposes functionalities to cover not only macro analysis for data but also the algorithmic methods to enable the computation of digital archival and data mining metrics. Based on that, three use cases were proposed to imagine plausible and illustrative scenarios for the application of such a solution. Findings The main results demonstrate the feasibility of measuring the value of data and records with a reproducible method. More specifically, for Axis 1, the authors applied the metrics in a flexible and modular way. The authors defined also the main principles needed to enable computational scoring method. The results obtained through the expert’s consultation on the relevance of 42 metrics indicate an acceptance rate above 80%. In addition, the results show that 60% of all metrics can be automated. Regarding Axis 2, 33 functionalities were developed and proposed under six main types: macro analysis, microanalysis, statistics, retrieval, administration and, finally, the decision modeling and machine learning. The relevance of metrics and functionalities is based on the theoretical validity and computational character of their method. These results are largely satisfactory and promising. Originality/value This study offers a valuable aid to improve the validity and performance of archival appraisal processes and decision-making. Transferability and applicability of these archival and data mining metrics could be considered for other types of data. An adaptation of this method and its metrics could be tested on research data, medical data or banking data.

APA, Harvard, Vancouver, ISO, and other styles

Aló, Richard, and Vladik Kreinovich. "Selected Papers from InTech'04." Journal of Advanced Computational Intelligence and Intelligent Informatics 10, no. 3 (May 20, 2006): 243–44. http://dx.doi.org/10.20965/jaciii.2006.p0243.

Full text

Abstract:

The main objective of the annual International Conference on Intelligent Technologies (InTech) is to bring together researchers and practitioners who implement intelligent and fuzzy technologies in real-world environment. The Fifth International Conference on Intelligent Technologies InTech'04 was held in Houston, Texas, on December 2-4, 2004. Topics of InTech'04 included mathematical foundations of intelligent technologies, traditional Artificial Intelligent techniques, uncertainty processing and methods of soft computing, learning/adaptive systems/data mining, and applications of intelligent technologies. This special issue contains versions of 15 selected papers originally presented at InTech'04. These papers cover most of the topics of the conference. Several papers describe new applications of the existing intelligent techniques. R. Aló{o} et al. show how traditional statistical hypotheses testing techniques – originally designed for processing measurement results – need to be modified when applied to simulated data – e.g., when we compare the quality of two algorithms. Y. Frayman et al. use mathematical morphology and genetic algorithms in the design of a machine vision system for detecting surface defects in aluminum die casting. Y. Murai et al. propose a new faster entropy-based placement algorithm for VLSI circuit design and similar applications. A. P. Salvatore et al. show how expert system-type techniques can help in scheduling botox treatment for voice disorders. H. Tsuji et al. propose a new method, based on partial differential equations, for automatically identifying and extracting objects from a video. N. Ward uses Ordered Weighted Average (OWA) techniques to design a model that predicts admission of computer science students into different graduate schools. An important aspect of intelligence is ability to learn. In A. Mahaweerawat et al., neural-based machine learning is used to identify and predict software faults. J. Han et al. show that we can drastically improve the quality of machine learning if, in addition to discovering traditional (positive) rules, we also search for negative rules. A serious problem with many neural-based machine learning algorithms is that often, the results of their learning are un-intelligible rules and numbers. M. I. Khan et al. show, on the example of robotic arm applications, that if we allow neurons with different input-output dependencies – including linear neurons – then we can extract meaningful knowledge from the resulting network. Several papers analyze the Equivalent Transformation (ET) model, that allows the user to automatically generate code from specifications. A general description of this model is given by K. Akama et al. P. Chippimolchai et al. describe how, within this model, we can transform a user's query into an equivalent more efficient one. H. Koike et al. apply this approach to natural language processing. Y. Shigeta et al. show how the existing constraint techniques can be translated into equivalent transformation rules and thus, combined with other specifications. I. Takarajima et al. extend the ET approach to situations like parallel computations, where the order in which different computations are performed on different processors depends on other processes and is, thus, non-deterministic. Finally, a paper by J. Chandra – based on his invited talk at InTech'04 – describes a general framework for robust and resilient critical infrastructure systems, with potential applications to transportation systems, power grids, communication networks, water resources, health delivery systems, and financial networks. We want to thank all the authors for their outstanding work, the participants of InTech'04 for their helpful suggestions, the anonymous reviewers for their thorough analysis and constructive help, and – last but not the least – to Professor Kaoru Hirota for his kind suggestion to host this issue and to the entire staff of the journal for their tireless work.

APA, Harvard, Vancouver, ISO, and other styles

Jiao, Changyi. "Big Data Mining Optimization Algorithm Based on Machine Learning Model." Revue d'Intelligence Artificielle 34, no. 1 (February 29, 2020): 51–57. http://dx.doi.org/10.18280/ria.340107.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mastoi, Qurat-ul-ain, Muhammad Suleman Memon, Abdullah Lakhan, Mazin Abed Mohammed, Mumtaz Qabulio, Fadi Al-Turjman, and Karrar Hameed Abdulkareem. "Machine learning-data mining integrated approach for premature ventricular contraction prediction." Neural Computing and Applications 33, no. 18 (March 14, 2021): 11703–19. http://dx.doi.org/10.1007/s00521-021-05820-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ghaffarian, Seyed Mohammad, and Hamid Reza Shahriari. "Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques." ACM Computing Surveys 50, no. 4 (November 8, 2017): 1–36. http://dx.doi.org/10.1145/3092566.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mallikharjuna, L. K., and V. S. K. Reddy. "An adaptive correlation based video data mining using machine learning." International Journal of Knowledge-based and Intelligent Engineering Systems 24, no. 1 (April 9, 2020): 1–9. http://dx.doi.org/10.3233/kes-200023.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Machine learning. Data mining. Software measurement"

Ammar, Kareem. "Multi-heuristic theory assessment with iterative selection." Morgantown, W. Va. : [West Virginia University Libraries], 2004. https://etd.wvu.edu/etd/controller.jsp?moduleName=documentdata&jsp%5FetdId=3701.

Full text

Abstract:

Thesis (M.S.)--West Virginia University, 2004.
Title from document title page. Document formatted into pages; contains viii, 106 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 105-106).

APA, Harvard, Vancouver, ISO, and other styles

Badayos, Noah Garcia. "Machine Learning-Based Parameter Validation." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/47675.

Full text

Abstract:

As power system grids continue to grow in order to support an increasing energy demand, the system's behavior accordingly evolves, continuing to challenge designs for maintaining security. It has become apparent in the past few years that, as much as discovering vulnerabilities in the power network, accurate simulations are very critical. This study explores a classification method for validating simulation models, using disturbance measurements from phasor measurement units (PMU). The technique used employs the Random Forest learning algorithm to find a correlation between specific model parameter changes, and the variations in the dynamic response. Also, the measurements used for building and evaluating the classifiers were characterized using Prony decomposition. The generator model, consisting of an exciter, governor, and its standard parameters have been validated using short circuit faults. Single-error classifiers were first tested, where the accuracies of the classifiers built using positive, negative, and zero sequence measurements were compared. The negative sequence measurements have consistently produced the best classifiers, with majority of the parameter classes attaining F-measure accuracies greater than 90%. A multiple-parameter error technique for validation has also been developed and tested on standard generator parameters. Only a few target parameter classes had good accuracies in the presence of multiple parameter errors, but the results were enough to permit a sequential process of validation, where elimination of a highly detectable error can improve the accuracy of suspect errors dependent on the former's removal, and continuing the procedure until all corrections are covered.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

Thun, Julia, and Rebin Kadouri. "Automating debugging through data mining." Thesis, KTH, Data- och elektroteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-203244.

Full text

Abstract:

Contemporary technological systems generate massive quantities of log messages. These messages can be stored, searched and visualized efficiently using log management and analysis tools. The analysis of log messages offer insights into system behavior such as performance, server status and execution faults in web applications. iStone AB wants to explore the possibility to automate their debugging process. Since iStone does most parts of their debugging manually, it takes time to find errors within the system. The aim was therefore to find different solutions to reduce the time it takes to debug. An analysis of log messages within access – and console logs were made, so that the most appropriate data mining techniques for iStone’s system would be chosen. Data mining algorithms and log management and analysis tools were compared. The result of the comparisons showed that the ELK Stack as well as a mixture between Eclat and a hybrid algorithm (Eclat and Apriori) were the most appropriate choices. To demonstrate their feasibility, the ELK Stack and Eclat were implemented. The produced results show that data mining and the use of a platform for log analysis can facilitate and reduce the time it takes to debug.
Dagens system genererar stora mängder av loggmeddelanden. Dessa meddelanden kan effektivt lagras, sökas och visualiseras genom att använda sig av logghanteringsverktyg. Analys av loggmeddelanden ger insikt i systemets beteende såsom prestanda, serverstatus och exekveringsfel som kan uppkomma i webbapplikationer. iStone AB vill undersöka möjligheten att automatisera felsökning. Eftersom iStone till mestadels utför deras felsökning manuellt så tar det tid att hitta fel inom systemet. Syftet var att därför att finna olika lösningar som reducerar tiden det tar att felsöka. En analys av loggmeddelanden inom access – och konsolloggar utfördes för att välja de mest lämpade data mining tekniker för iStone’s system. Data mining algoritmer och logghanteringsverktyg jämfördes. Resultatet av jämförelserna visade att ELK Stacken samt en blandning av Eclat och en hybrid algoritm (Eclat och Apriori) var de lämpligaste valen. För att visa att så är fallet så implementerades ELK Stacken och Eclat. De framställda resultaten visar att data mining och användning av en plattform för logganalys kan underlätta och minska den tid det tar för att felsöka.

APA, Harvard, Vancouver, ISO, and other styles

Tierno, Ivan Alexandre Paiz. "Assessment of data-driven bayesian networks in software effort prediction." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/71952.

Full text

Abstract:

Software prediction unveils itself as a difficult but important task which can aid the manager on decision making, possibly allowing for time and resources sparing, achieving higher software quality among other benefits. One of the approaches set forth to perform this task has been the application of machine learning techniques. One of these techniques are Bayesian Networks, which have been promoted for software projects management due to their special features. However, the pre-processing procedures related to their application remain mostly neglected in this field. In this context, this study presents an assessment of automatic Bayesian Networks (i.e., Bayesian Networks solely based on data) on three public data sets and brings forward a discussion on data pre-processing procedures and the validation approach. We carried out a comparison of automatic Bayesian Networks against mean and median baseline models and also against ordinary least squares regression with a logarithmic transformation, which has been recently deemed in a comprehensive study as a top performer with regard to accuracy. The results obtained through careful validation procedures support that automatic Bayesian Networks can be competitive against other techniques, but still need improvements in order to catch up with linear regression models accuracy-wise. Some current limitations of Bayesian Networks are highlighted and possible improvements are discussed. Furthermore, this study provides some guidelines on the exploration of data. These guidelines can be useful to any Bayesian Networks that use data for model learning. Finally, this study also confirms the potential benefits of feature selection in software effort prediction.

APA, Harvard, Vancouver, ISO, and other styles

Sun, Boya. "PRECISION IMPROVEMENT AND COST REDUCTION FOR DEFECT MINING AND TESTING." Case Western Reserve University School of Graduate Studies / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=case1321827962.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Parisi, Luca. "A Knowledge Flow as a Software Product Line." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/12217/.

Full text

Abstract:

Costruire un "data mining workflow" dipende almendo dal dataset e dagli obiettivi degli utenti. Questo processo è complesso a causa dell'elevato numero di algoritmi disponibili e della difficoltà nel scegliere il migliore algoritmo, opportunamente parametrizzato. Di solito, i data scientists usano tools di analisi per decidere quale algoritmo ha le migliori performance nel loro specifico dataset, confrontando le performance fra i diversi algoritmi. Lo scopo di questo progetto è mettere le basi a un sistema software che porta verso la giusta direzione la costruzione di tali workflow, per trovare il migliore a seconda del dataset degli utenti e dei loro obiettivi.

APA, Harvard, Vancouver, ISO, and other styles

Sivrioglu, Damla. "A Method For Product Defectiveness Prediction With Process Enactment Data In A Small Software Organization." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614516/index.pdf.

Full text

Abstract:

As a part of the quality management, product defectiveness prediction is vital for small software organizations as for instutional ones. Although for defect prediction there have been conducted a lot of studies, process enactment data cannot be used because of the difficulty of collection. Additionally, there is no proposed approach known in general for the analysis of process enactment data in software engineering. In this study, we developed a method to show the applicability of process enactment data for defect prediction and answered &ldquo
Is process enactment data beneficial for defect prediction?&rdquo
, &ldquo
How can we use process enactment data?&rdquo
and &ldquo
Which approaches and analysis methods can our method support?&rdquo
questions. We used multiple case study design and conducted case studies including with and without process enactment data in a small software development company. We preferred machine learning approaches rather than statistical ones, in order to cluster the data which includes process enactment informationsince we believed that they are convenient with the pattern oriented nature of the data. By the case studies performed, we obtained promising results. We evaluated performance values of prediction models to demonstrate the advantage of using process enactment data for the prediction of defect open duration value. When we have enough data points to apply machine learning methods and the data can be clusteredhomogeneously, we observed approximately 3% (ranging from -10% to %17) more accurate results from analyses including with process enactment data than the without ones. Keywords:

APA, Harvard, Vancouver, ISO, and other styles

Artchounin, Daniel. "Tuning of machine learning algorithms for automatic bug assignment." Thesis, Linköpings universitet, Programvara och system, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139230.

Full text

Abstract:

In software development projects, bug triage consists mainly of assigning bug reports to software developers or teams (depending on the project). The partial or total automation of this task would have a positive economic impact on many software projects. This thesis introduces a systematic four-step method to find some of the best configurations of several machine learning algorithms intending to solve the automatic bug assignment problem. These four steps are respectively used to select a combination of pre-processing techniques, a bug report representation, a potential feature selection technique and to tune several classifiers. The aforementioned method has been applied on three software projects: 66 066 bug reports of a proprietary project, 24 450 bug reports of Eclipse JDT and 30 358 bug reports of Mozilla Firefox. 619 configurations have been applied and compared on each of these three projects. In production, using the approach introduced in this work on the bug reports of the proprietary project would have increased the accuracy by up to 16.64 percentage points.

APA, Harvard, Vancouver, ISO, and other styles

Krüger, Franz David, and Mohamad Nabeel. "Hyperparameter Tuning Using Genetic Algorithms : A study of genetic algorithms impact and performance for optimization of ML algorithms." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-42404.

Full text

Abstract:

Maskininlärning har blivit allt vanligare inom näringslivet. Informationsinsamling med Data mining (DM) har expanderats och DM-utövare använder en mängd tumregler för att effektivisera tillvägagångssättet genom att undvika en anständig tid att ställa in hyperparametrarna för en given ML-algoritm för nå bästa träffsäkerhet. Förslaget i denna rapport är att införa ett tillvägagångssätt som systematiskt optimerar ML-algoritmerna med hjälp av genetiska algoritmer (GA), utvärderar om och hur modellen ska konstrueras för att hitta globala lösningar för en specifik datamängd. Genom att implementera genetiska algoritmer på två utvalda ML-algoritmer, K-nearest neighbors och Random forest, med två numeriska datamängder, Iris-datauppsättning och Wisconsin-bröstcancerdatamängd. Modellen utvärderas med träffsäkerhet och beräkningstid som sedan jämförs med sökmetoden exhaustive search. Resultatet har visat att GA fungerar bra för att hitta bra träffsäkerhetspoäng på en rimlig tid. Det finns vissa begränsningar eftersom parameterns betydelse varierar för olika ML-algoritmer.
As machine learning (ML) is being more and more frequent in the business world, information gathering through Data mining (DM) is on the rise, and DM-practitioners are generally using several thumb rules to avoid having to spend a decent amount of time to tune the hyperparameters (parameters that control the learning process) of an ML algorithm to gain a high accuracy score. The proposal in this report is to conduct an approach that systematically optimizes the ML algorithms using genetic algorithms (GA) and to evaluate if and how the model should be constructed to find global solutions for a specific data set. By implementing a GA approach on two ML-algorithms, K-nearest neighbors, and Random Forest, on two numerical data sets, Iris data set and Wisconsin breast cancer data set, the model is evaluated by its accuracy scores as well as the computational time which then is compared towards a search method, specifically exhaustive search. The results have shown that it is assumed that GA works well in finding great accuracy scores in a reasonable amount of time. There are some limitations as the parameter’s significance towards an ML algorithm may vary.

APA, Harvard, Vancouver, ISO, and other styles

Chu, Justin. "CONTEXT-AWARE DEBUGGING FOR CONCURRENT PROGRAMS." UKnowledge, 2017. https://uknowledge.uky.edu/cs_etds/61.

Full text

Abstract:

Concurrency faults are difficult to reproduce and localize because they usually occur under specific inputs and thread interleavings. Most existing fault localization techniques focus on sequential programs but fail to identify faulty memory access patterns across threads, which are usually the root causes of concurrency faults. Moreover, existing techniques for sequential programs cannot be adapted to identify faulty paths in concurrent programs. While concurrency fault localization techniques have been proposed to analyze passing and failing executions obtained from running a set of test cases to identify faulty access patterns, they primarily focus on using statistical analysis. We present a novel approach to fault localization using feature selection techniques from machine learning. Our insight is that the concurrency access patterns obtained from a large volume of coverage data generally constitute high dimensional data sets, yet existing statistical analysis techniques for fault localization are usually applied to low dimensional data sets. Each additional failing or passing run can provide more diverse information, which can help localize faulty concurrency access patterns in code. The patterns with maximum feature diversity information can point to the most suspicious pattern. We then apply data mining technique and identify the interleaving patterns that are occurred most frequently and provide the possible faulty paths. We also evaluate the effectiveness of fault localization using test suites generated from different test adequacy criteria. We have evaluated Cadeco on 10 real-world multi-threaded Java applications. Results indicate that Cadeco outperforms state-of-the-art approaches for localizing concurrency faults.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Machine learning. Data mining. Software measurement"

Mining software specifications: Methodologies and applications. Boca Raton, FL: CRC Press, 2011.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

S, Chen Peter P., Wong Leah Y, and International Conference on Conceptual Modeling (25th : 2006 : Tucson, Ariz.), eds. Active conceptual modeling of learning: Next generation learning-base system development. Berlin: Springer, 2007.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

DeRiggi, Ritchie Marylyn, Giacobini Mario, and SpringerLink (Online service), eds. Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics: 9th European Conference, EvoBIO 2011, Torino, Italy, April 27-29, 2011. Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Vanneschi, Leonardo. Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics: 11th European Conference, EvoBIO 2013, Vienna, Austria, April 3-5, 2013. Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Stützle, Thomas. Learning and Intelligent Optimization: Third International Conference, LION 3, Trento, Italy, January 14-18, 2009. Selected Papers. Berlin, Heidelberg: Springer-Verlag Berlin Heidelberg, 2009.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

F, Costa José A., Barreto Guilherme, and SpringerLink (Online service), eds. Intelligent Data Engineering and Automated Learning - IDEAL 2012: 13th International Conference, Natal, Brazil, August 29-31, 2012. Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Sansone, Carlo. Multiple Classifier Systems: 10th International Workshop, MCS 2011, Naples, Italy, June 15-17, 2011. Proceedings. Berlin, Heidelberg: Springer-Verlag GmbH Berlin Heidelberg, 2011.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Gayar, Neamat El. Multiple Classifier Systems: 9th International Workshop, MCS 2010, Cairo, Egypt, April 7-9, 2010. Proceedings. Berlin, Heidelberg: Springer-Verlag Berlin Heidelberg, 2010.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Philippe, Lenca, Petit Jean-Marc, and SpringerLink (Online service), eds. Discovery Science: 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Data Mining And Machine Learning In Cybersecurity. Auerbach Publications, 2011.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Machine learning. Data mining. Software measurement"

Yang, Ying. "Measurement Scales." In Encyclopedia of Machine Learning and Data Mining, 808–9. Boston, MA: Springer US, 2017. http://dx.doi.org/10.1007/978-1-4899-7687-1_529.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Shirabad, Jelber Sayyad. "Predictive Techniques in Software Engineering." In Encyclopedia of Machine Learning and Data Mining, 992–1000. Boston, MA: Springer US, 2017. http://dx.doi.org/10.1007/978-1-4899-7687-1_661.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yuan, Xiaobu, Manpreet Kaler, and Vijaya Mulpuri. "Personalized Visualization Based upon Wavelet Transform for Interactive Software Customization." In Machine Learning and Data Mining in Pattern Recognition, 361–75. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-62416-7_26.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Grechanik, Mark, Nitin Prabhu, Daniel Graham, Denys Poshyvanyk, and Mohak Shah. "Can Software Project Maturity Be Accurately Predicted Using Internal Source Code Metrics?" In Machine Learning and Data Mining in Pattern Recognition, 774–89. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-41920-6_59.

Full text

APA, Harvard, Vancouver, ISO, and other styles

"Predictive Software Models." In Encyclopedia of Machine Learning and Data Mining, 992. Boston, MA: Springer US, 2017. http://dx.doi.org/10.1007/978-1-4899-7687-1_100372.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ozturk Kiyak, Elife. "Data Mining and Machine Learning for Software Engineering." In Data Mining - Methods, Applications and Systems [Working Title]. IntechOpen, 2020. http://dx.doi.org/10.5772/intechopen.91448.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Meziane, Farid, and Sunil Vadera. "Artificial Intelligence in Software Engineering." In Machine Learning, 1215–36. IGI Global, 2012. http://dx.doi.org/10.4018/978-1-60960-818-7.ch504.

Full text

Abstract:

Artificial intelligences techniques such as knowledge based systems, neural networks, fuzzy logic and data mining have been advocated by many researchers and developers as the way to improve many of the software development activities. As with many other disciplines, software development quality improves with the experience, knowledge of the developers, past projects and expertise. Software also evolves as it operates in changing and volatile environments. Hence, there is significant potential for using AI for improving all phases of the software development life cycle. This chapter provides a survey on the use of AI for software engineering that covers the main software development phases and AI methods such as natural language processing techniques, neural networks, genetic algorithms, fuzzy logic, ant colony optimization, and planning methods.

APA, Harvard, Vancouver, ISO, and other styles

Rodrigues, Anisha P., Niranjan N. Chiplunkar, and Roshan Fernandes. "Social Big Data Mining." In Handbook of Research on Emerging Trends and Applications of Machine Learning, 528–49. IGI Global, 2020. http://dx.doi.org/10.4018/978-1-5225-9643-1.ch025.

Full text

Abstract:

Social media is used to share the data or information among the large group of people. Numerous forums, blogs, social networks, news reports, e-commerce websites, and many more online media play a role in sharing individual opinions. The data generated from these sources is huge and in unstructured format. Big data is a term used for data sets that are large or complex and that cannot be processed by traditional processing system. Sentimental analysis is one of the major data analytics applied on big data. It is a task of natural language processing to determine whether a text contains subjective information and what information it expresses. It helps in achieving various goals like the measurement of customer satisfaction, observing public mood on political movement, movie sales prediction, market intelligence, and many more. In this chapter, the authors present various techniques used for sentimental analysis and related work using these techniques. The chapter also presents open issues and challenges in sentimental analysis landscape.

APA, Harvard, Vancouver, ISO, and other styles

Catal, Cagatay, and Soumya Banerjee. "Application of Artificial Immune Systems Paradigm for Developing Software Fault Prediction Models." In Machine Learning, 371–87. IGI Global, 2012. http://dx.doi.org/10.4018/978-1-60960-818-7.ch302.

Full text

Abstract:

Artificial Immune Systems, a biologically inspired computing paradigm such as Artificial Neural Networks, Genetic Algorithms, and Swarm Intelligence, embody the principles and advantages of vertebrate immune systems. It has been applied to solve several complex problems in different areas such as data mining, computer security, robotics, aircraft control, scheduling, optimization, and pattern recognition. There is an increasing interest in the use of this paradigm and they are widely used in conjunction with other methods such as Artificial Neural Networks, Swarm Intelligence and Fuzzy Logic. In this chapter, we demonstrate the procedure for applying this paradigm and bio-inspired algorithm for developing software fault prediction models. The fault prediction unit is to identify the modules, which are likely to contain the faults at the next release in a large software system. Software metrics and fault data belonging to a previous software version are used to build the model. Fault-prone modules of the next release are predicted by using this model and current software metrics. From machine learning perspective, this type of modeling approach is called supervised learning. A sample fault dataset is used to show the elaborated approach of working of Artificial Immune Recognition Systems (AIRS).

APA, Harvard, Vancouver, ISO, and other styles

Narayanapppa, Manjunath Thimmasandra, T. P. Puneeth Kumar, and Ravindra S. Hegadi. "Essentiality of Machine Learning Algorithms for Big Data Computation." In Advances in Data Mining and Database Management, 156–67. IGI Global, 2016. http://dx.doi.org/10.4018/978-1-4666-9767-6.ch011.

Full text

Abstract:

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Machine learning. Data mining. Software measurement"

Gao, Zheng-Ming, Juan Zhao, and Yu-Rong Hu. "Data Mining of Agricultural Software and Suggestions." In MLMI '20: 2020 The 3rd International Conference on Machine Learning and Machine Intelligence. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3426826.3426841.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Pafka, Szilárd. "Machine Learning Software in Practice." In KDD '17: The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3097983.3106683.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Silva, Raniel. "Development of an Automated Machine Learning Solution for Educational Data Mining (S)." In The 33rd International Conference on Software Engineering and Knowledge Engineering. KSI Research Inc., 2021. http://dx.doi.org/10.18293/seke2021-068.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bin Liu, Shu-Gui Cao, Dong-Fang Cao, Quing-Chun Li, Hai-Tao Liu, and Shao-Nan Shi. "An ontology based semantic heterogeneity measurement framework for optimization in distributed data mining." In 2012 International Conference on Machine Learning and Cybernetics (ICMLC). IEEE, 2012. http://dx.doi.org/10.1109/icmlc.2012.6358897.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ye, Hanmin, Ziyi Zhong, and Shiming Huang. "Research on Insulator Creepage Distance Measurement Based on Different Photographic Equipment." In ICDMML 2019: 2019 International Conference on Data Mining and Machine Learning. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3335656.3335702.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lin, Chih-Jen. "Experiences and lessons in developing industry-strength machine learning and data mining software." In the 18th ACM SIGKDD international conference. New York, New York, USA: ACM Press, 2012. http://dx.doi.org/10.1145/2339530.2339714.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Fachrina, Zulva, and Dwi H. Widyantoro. "Aspect-sentiment classification in opinion mining using the combination of rule-based and machine learning." In 2017 International Conference on Data and Software Engineering (ICoDSE). IEEE, 2017. http://dx.doi.org/10.1109/icodse.2017.8285850.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Gabrielyan, Diana, Jaan Masso, and Lenno Uusküla. "Mining News Data for the Measurement and Prediction of Inflation Expectations." In CARMA 2020 - 3rd International Conference on Advanced Research Methods and Analytics. Valencia: Universitat Politècnica de València, 2020. http://dx.doi.org/10.4995/carma2020.2020.11322.

Full text

Abstract:

In this paper we use high frequency multidimensional textual news data andpropose an index of inflation news. We utilize the power of text mining and itsability to convert large collections of text from unstructured to structured formfor in-depth quantitative analysis of online news data. The significantrelationship between the household’s inflation expectations and news topics isdocumented and the forecasting performance of news-based indices isevaluated for different horizons and model variations. Results suggest that withoptimal number of topics a machine learning model is able to forecast theinflation expectations with greater accuracy than the simple autoregressivemodels. Additional results from forecasting headline inflation indicate that theoverall forecasting accuracy is at a good level. Findings in this paper supportthe view in the literature that the news are good indicators of inflation and areable to capture inflation expecta-tions well.

APA, Harvard, Vancouver, ISO, and other styles

Karlstetter, Roman, Robert Widhopf-Fenk, Jakob Hermann, Driek Rouwenhorst, Amir Raoofy, Carsten Trinitis, and Martin Schulz. "Turning Dynamic Sensor Measurements From Gas Turbines Into Insights: A Big Data Approach." In ASME Turbo Expo 2019: Turbomachinery Technical Conference and Exposition. American Society of Mechanical Engineers, 2019. http://dx.doi.org/10.1115/gt2019-91259.

Full text

Abstract:

Abstract Gas turbine power plants generate an ever growing amount of high frequency dynamic sensor data. One of the applications of this data is the protection against problems induced by combustion dynamics, as, e.g., with the ArgusOMDS system developed by IfTA. In the light of digitalization, this data has the potential to also be used in other areas and ultimately transform maintenance, repair and overhaul approaches. However, current solutions are not designed to cope with the large time windows needed for a general analysis and this can hinder development of advanced machine analysis algorithms. In this work, we present an end-to-end approach for large scale sensor measurement analysis, employing data mining techniques and enabling machine learning algorithms. Our approach covers the complete data pipeline from sensor measurement acquisition to analysis and visualization. We demonstrate the feasibility of our approach by presenting several case studies that prove the benefits over existing solutions.

APA, Harvard, Vancouver, ISO, and other styles

Borozdin, Sergey Olegovich, Anatoly Nikolaevich Dmitrievsky, Nikolai Alexandrovich Eremin, Alexey Igorevich Arkhipov, Alexander Georgievich Sboev, Olga Kimovna Chashchina-Semenova, and Leonid Konstantinovich Fitzner. "Drilling Problems Forecast Based on Neural Network." In Offshore Technology Conference. OTC, 2021. http://dx.doi.org/10.4043/30984-ms.

Full text

Abstract:

Abstract This paper poses and solves the problem of using artificial intelligence methods for processing big volumes of geodata from geological and technological measurement stations in order to identify and predict complications during well drilling. Big volumes of geodata from the stations of geological and technological measurements during drilling varied from units to tens of terabytes. Digital modernization of the life cycle of well construction using machine learning methods contributes to improving the efficiency of drilling oil and gas wells. The clustering of big volumes of geodata from various sources and types of sensors used to measure parameters during drilling has been carried out. In the process of creating, training and applying software components with artificial neural networks, the specified accuracy of calculations was achieved, hidden and non-obvious patterns were revealed in big volumes of geological, geophysical, technical and technological parameters. To predict the operational results of drilling wells, classification models were developed using artificial intelligence methods. The use of a high-performance computing cluster significantly reduced the time spent on assessing the probability of complications and predicting these probabilities for 7-10 minutes ahead. A hierarchical distributed data warehouse has been formed, containing real-time drilling data in WITSML format using the SQL server (Microsoft). The module for preprocessing and uploading geodata to the WITSML repository uses the Energistics Standards DevKit API and Energistic data objects to work with geodata in the WITSML format. Drilling problems forecast accuracy which has been reached with developed system may significantly reduce non-productive time spent on eliminating of stuck pipe, mud loss and oil and gas influx events.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Machine learning. Data mining. Software measurement'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Machine learning. Data mining. Software measurement"

Dissertations / Theses on the topic "Machine learning. Data mining. Software measurement"

Books on the topic "Machine learning. Data mining. Software measurement"

Book chapters on the topic "Machine learning. Data mining. Software measurement"

Conference papers on the topic "Machine learning. Data mining. Software measurement"