Log in

Relevant bibliographies by topics / Machine learning. Data mining. Software measurement / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Machine learning. Data mining. Software measurement.

Dissertations / Theses on the topic 'Machine learning. Data mining. Software measurement'

Author: Grafiati

Published: 4 June 2021

Last updated: 2 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 18 dissertations / theses for your research on the topic 'Machine learning. Data mining. Software measurement.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Ammar, Kareem. "Multi-heuristic theory assessment with iterative selection." Morgantown, W. Va. : [West Virginia University Libraries], 2004. https://etd.wvu.edu/etd/controller.jsp?moduleName=documentdata&jsp%5FetdId=3701.

Full text

Abstract:

Thesis (M.S.)--West Virginia University, 2004.
Title from document title page. Document formatted into pages; contains viii, 106 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 105-106).

APA, Harvard, Vancouver, ISO, and other styles

2

Badayos, Noah Garcia. "Machine Learning-Based Parameter Validation." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/47675.

Full text

Abstract:

As power system grids continue to grow in order to support an increasing energy demand, the system's behavior accordingly evolves, continuing to challenge designs for maintaining security. It has become apparent in the past few years that, as much as discovering vulnerabilities in the power network, accurate simulations are very critical. This study explores a classification method for validating simulation models, using disturbance measurements from phasor measurement units (PMU). The technique used employs the Random Forest learning algorithm to find a correlation between specific model parameter changes, and the variations in the dynamic response. Also, the measurements used for building and evaluating the classifiers were characterized using Prony decomposition. The generator model, consisting of an exciter, governor, and its standard parameters have been validated using short circuit faults. Single-error classifiers were first tested, where the accuracies of the classifiers built using positive, negative, and zero sequence measurements were compared. The negative sequence measurements have consistently produced the best classifiers, with majority of the parameter classes attaining F-measure accuracies greater than 90%. A multiple-parameter error technique for validation has also been developed and tested on standard generator parameters. Only a few target parameter classes had good accuracies in the presence of multiple parameter errors, but the results were enough to permit a sequential process of validation, where elimination of a highly detectable error can improve the accuracy of suspect errors dependent on the former's removal, and continuing the procedure until all corrections are covered.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

3

Thun, Julia, and Rebin Kadouri. "Automating debugging through data mining." Thesis, KTH, Data- och elektroteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-203244.

Full text

Abstract:

Contemporary technological systems generate massive quantities of log messages. These messages can be stored, searched and visualized efficiently using log management and analysis tools. The analysis of log messages offer insights into system behavior such as performance, server status and execution faults in web applications. iStone AB wants to explore the possibility to automate their debugging process. Since iStone does most parts of their debugging manually, it takes time to find errors within the system. The aim was therefore to find different solutions to reduce the time it takes to debug. An analysis of log messages within access – and console logs were made, so that the most appropriate data mining techniques for iStone’s system would be chosen. Data mining algorithms and log management and analysis tools were compared. The result of the comparisons showed that the ELK Stack as well as a mixture between Eclat and a hybrid algorithm (Eclat and Apriori) were the most appropriate choices. To demonstrate their feasibility, the ELK Stack and Eclat were implemented. The produced results show that data mining and the use of a platform for log analysis can facilitate and reduce the time it takes to debug.
Dagens system genererar stora mängder av loggmeddelanden. Dessa meddelanden kan effektivt lagras, sökas och visualiseras genom att använda sig av logghanteringsverktyg. Analys av loggmeddelanden ger insikt i systemets beteende såsom prestanda, serverstatus och exekveringsfel som kan uppkomma i webbapplikationer. iStone AB vill undersöka möjligheten att automatisera felsökning. Eftersom iStone till mestadels utför deras felsökning manuellt så tar det tid att hitta fel inom systemet. Syftet var att därför att finna olika lösningar som reducerar tiden det tar att felsöka. En analys av loggmeddelanden inom access – och konsolloggar utfördes för att välja de mest lämpade data mining tekniker för iStone’s system. Data mining algoritmer och logghanteringsverktyg jämfördes. Resultatet av jämförelserna visade att ELK Stacken samt en blandning av Eclat och en hybrid algoritm (Eclat och Apriori) var de lämpligaste valen. För att visa att så är fallet så implementerades ELK Stacken och Eclat. De framställda resultaten visar att data mining och användning av en plattform för logganalys kan underlätta och minska den tid det tar för att felsöka.

APA, Harvard, Vancouver, ISO, and other styles

4

Tierno, Ivan Alexandre Paiz. "Assessment of data-driven bayesian networks in software effort prediction." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/71952.

Full text

Abstract:

Software prediction unveils itself as a difficult but important task which can aid the manager on decision making, possibly allowing for time and resources sparing, achieving higher software quality among other benefits. One of the approaches set forth to perform this task has been the application of machine learning techniques. One of these techniques are Bayesian Networks, which have been promoted for software projects management due to their special features. However, the pre-processing procedures related to their application remain mostly neglected in this field. In this context, this study presents an assessment of automatic Bayesian Networks (i.e., Bayesian Networks solely based on data) on three public data sets and brings forward a discussion on data pre-processing procedures and the validation approach. We carried out a comparison of automatic Bayesian Networks against mean and median baseline models and also against ordinary least squares regression with a logarithmic transformation, which has been recently deemed in a comprehensive study as a top performer with regard to accuracy. The results obtained through careful validation procedures support that automatic Bayesian Networks can be competitive against other techniques, but still need improvements in order to catch up with linear regression models accuracy-wise. Some current limitations of Bayesian Networks are highlighted and possible improvements are discussed. Furthermore, this study provides some guidelines on the exploration of data. These guidelines can be useful to any Bayesian Networks that use data for model learning. Finally, this study also confirms the potential benefits of feature selection in software effort prediction.

APA, Harvard, Vancouver, ISO, and other styles

5

Sun, Boya. "PRECISION IMPROVEMENT AND COST REDUCTION FOR DEFECT MINING AND TESTING." Case Western Reserve University School of Graduate Studies / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=case1321827962.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Parisi, Luca. "A Knowledge Flow as a Software Product Line." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/12217/.

Full text

Abstract:

Costruire un "data mining workflow" dipende almendo dal dataset e dagli obiettivi degli utenti. Questo processo è complesso a causa dell'elevato numero di algoritmi disponibili e della difficoltà nel scegliere il migliore algoritmo, opportunamente parametrizzato. Di solito, i data scientists usano tools di analisi per decidere quale algoritmo ha le migliori performance nel loro specifico dataset, confrontando le performance fra i diversi algoritmi. Lo scopo di questo progetto è mettere le basi a un sistema software che porta verso la giusta direzione la costruzione di tali workflow, per trovare il migliore a seconda del dataset degli utenti e dei loro obiettivi.

APA, Harvard, Vancouver, ISO, and other styles

7

Sivrioglu, Damla. "A Method For Product Defectiveness Prediction With Process Enactment Data In A Small Software Organization." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614516/index.pdf.

Full text

Abstract:

As a part of the quality management, product defectiveness prediction is vital for small software organizations as for instutional ones. Although for defect prediction there have been conducted a lot of studies, process enactment data cannot be used because of the difficulty of collection. Additionally, there is no proposed approach known in general for the analysis of process enactment data in software engineering. In this study, we developed a method to show the applicability of process enactment data for defect prediction and answered &ldquo
Is process enactment data beneficial for defect prediction?&rdquo
, &ldquo
How can we use process enactment data?&rdquo
and &ldquo
Which approaches and analysis methods can our method support?&rdquo
questions. We used multiple case study design and conducted case studies including with and without process enactment data in a small software development company. We preferred machine learning approaches rather than statistical ones, in order to cluster the data which includes process enactment informationsince we believed that they are convenient with the pattern oriented nature of the data. By the case studies performed, we obtained promising results. We evaluated performance values of prediction models to demonstrate the advantage of using process enactment data for the prediction of defect open duration value. When we have enough data points to apply machine learning methods and the data can be clusteredhomogeneously, we observed approximately 3% (ranging from -10% to %17) more accurate results from analyses including with process enactment data than the without ones. Keywords:

APA, Harvard, Vancouver, ISO, and other styles

8

Artchounin, Daniel. "Tuning of machine learning algorithms for automatic bug assignment." Thesis, Linköpings universitet, Programvara och system, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139230.

Full text

Abstract:

In software development projects, bug triage consists mainly of assigning bug reports to software developers or teams (depending on the project). The partial or total automation of this task would have a positive economic impact on many software projects. This thesis introduces a systematic four-step method to find some of the best configurations of several machine learning algorithms intending to solve the automatic bug assignment problem. These four steps are respectively used to select a combination of pre-processing techniques, a bug report representation, a potential feature selection technique and to tune several classifiers. The aforementioned method has been applied on three software projects: 66 066 bug reports of a proprietary project, 24 450 bug reports of Eclipse JDT and 30 358 bug reports of Mozilla Firefox. 619 configurations have been applied and compared on each of these three projects. In production, using the approach introduced in this work on the bug reports of the proprietary project would have increased the accuracy by up to 16.64 percentage points.

APA, Harvard, Vancouver, ISO, and other styles

9

Krüger, Franz David, and Mohamad Nabeel. "Hyperparameter Tuning Using Genetic Algorithms : A study of genetic algorithms impact and performance for optimization of ML algorithms." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-42404.

Full text

Abstract:

Maskininlärning har blivit allt vanligare inom näringslivet. Informationsinsamling med Data mining (DM) har expanderats och DM-utövare använder en mängd tumregler för att effektivisera tillvägagångssättet genom att undvika en anständig tid att ställa in hyperparametrarna för en given ML-algoritm för nå bästa träffsäkerhet. Förslaget i denna rapport är att införa ett tillvägagångssätt som systematiskt optimerar ML-algoritmerna med hjälp av genetiska algoritmer (GA), utvärderar om och hur modellen ska konstrueras för att hitta globala lösningar för en specifik datamängd. Genom att implementera genetiska algoritmer på två utvalda ML-algoritmer, K-nearest neighbors och Random forest, med två numeriska datamängder, Iris-datauppsättning och Wisconsin-bröstcancerdatamängd. Modellen utvärderas med träffsäkerhet och beräkningstid som sedan jämförs med sökmetoden exhaustive search. Resultatet har visat att GA fungerar bra för att hitta bra träffsäkerhetspoäng på en rimlig tid. Det finns vissa begränsningar eftersom parameterns betydelse varierar för olika ML-algoritmer.
As machine learning (ML) is being more and more frequent in the business world, information gathering through Data mining (DM) is on the rise, and DM-practitioners are generally using several thumb rules to avoid having to spend a decent amount of time to tune the hyperparameters (parameters that control the learning process) of an ML algorithm to gain a high accuracy score. The proposal in this report is to conduct an approach that systematically optimizes the ML algorithms using genetic algorithms (GA) and to evaluate if and how the model should be constructed to find global solutions for a specific data set. By implementing a GA approach on two ML-algorithms, K-nearest neighbors, and Random Forest, on two numerical data sets, Iris data set and Wisconsin breast cancer data set, the model is evaluated by its accuracy scores as well as the computational time which then is compared towards a search method, specifically exhaustive search. The results have shown that it is assumed that GA works well in finding great accuracy scores in a reasonable amount of time. There are some limitations as the parameter’s significance towards an ML algorithm may vary.

APA, Harvard, Vancouver, ISO, and other styles

10

Chu, Justin. "CONTEXT-AWARE DEBUGGING FOR CONCURRENT PROGRAMS." UKnowledge, 2017. https://uknowledge.uky.edu/cs_etds/61.

Full text

Abstract:

Concurrency faults are difficult to reproduce and localize because they usually occur under specific inputs and thread interleavings. Most existing fault localization techniques focus on sequential programs but fail to identify faulty memory access patterns across threads, which are usually the root causes of concurrency faults. Moreover, existing techniques for sequential programs cannot be adapted to identify faulty paths in concurrent programs. While concurrency fault localization techniques have been proposed to analyze passing and failing executions obtained from running a set of test cases to identify faulty access patterns, they primarily focus on using statistical analysis. We present a novel approach to fault localization using feature selection techniques from machine learning. Our insight is that the concurrency access patterns obtained from a large volume of coverage data generally constitute high dimensional data sets, yet existing statistical analysis techniques for fault localization are usually applied to low dimensional data sets. Each additional failing or passing run can provide more diverse information, which can help localize faulty concurrency access patterns in code. The patterns with maximum feature diversity information can point to the most suspicious pattern. We then apply data mining technique and identify the interleaving patterns that are occurred most frequently and provide the possible faulty paths. We also evaluate the effectiveness of fault localization using test suites generated from different test adequacy criteria. We have evaluated Cadeco on 10 real-world multi-threaded Java applications. Results indicate that Cadeco outperforms state-of-the-art approaches for localizing concurrency faults.

APA, Harvard, Vancouver, ISO, and other styles

11

Pardos, Zachary Alexander. "Predictive Models of Student Learning." Digital WPI, 2012. https://digitalcommons.wpi.edu/etd-dissertations/185.

Full text

Abstract:

In this dissertation, several approaches I have taken to build upon the student learning model are described. There are two focuses of this dissertation. The first focus is on improving the accuracy with which future student knowledge and performance can be predicted by individualizing the model to each student. The second focus is to predict how different educational content and tutorial strategies will influence student learning. The two focuses are complimentary but are approached from slightly different directions. I have found that Bayesian Networks, based on belief propagation, are strong at achieving the goals of both focuses. In prediction, they excel at capturing the temporal nature of data produced where student knowledge is changing over time. This concept of state change over time is very difficult to capture with classical machine learning approaches. Interpretability is also hard to come by with classical machine learning approaches; however, it is one of the strengths of Bayesian models and aids in studying the direct influence of various factors on learning. The domain in which these models are being studied is the domain of computer tutoring systems, software which uses artificial intelligence to enhance computer based tutorial instruction. These systems are growing in relevance. At their best they have been shown to achieve the same educational gain as one on one human interaction. Computer tutors have also received the attention of White House, which mentioned an tutoring platform called ASSISTments in its National Educational Technology Plan. With the fast paced adoption of these data driven systems it is important to learn how to improve the educational effectiveness of these systems by making sense of the data that is being generated from them. The studies in this proposal use data from these educational systems which primarily teach topics of Geometry and Algebra but can be applied to any domain with clearly defined sub-skills and dichotomous student response data. One of the intended impacts of this work is for these knowledge modeling contributions to facilitate the move towards computer adaptive learning in much the same way that Item Response Theory models facilitated the move towards computer adaptive testing.

APA, Harvard, Vancouver, ISO, and other styles

12

van, Schaik Sebastiaan Johannes. "A framework for processing correlated probabilistic data." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:91aa418d-536e-472d-9089-39bef5f62e62.

Full text

Abstract:

The amount of digitally-born data has surged in recent years. In many scenarios, this data is inherently uncertain (or: probabilistic), such as data originating from sensor networks, image and voice recognition, location detection, and automated web data extraction. Probabilistic data requires novel and different approaches to data mining and analysis, which explicitly account for the uncertainty and the correlations therein. This thesis introduces ENFrame: a framework for processing and mining correlated probabilistic data. Using this framework, it is possible to express both traditional and novel algorithms for data analysis in a special user language, without having to explicitly address the uncertainty of the data on which the algorithms operate. The framework will subsequently execute the algorithm on the probabilistic input, and perform exact or approximate parallel probability computation. During the probability computation, correlations and provenance are succinctly encoded using probabilistic events. This thesis contains novel contributions in several directions. An expressive user language – a subset of Python – is introduced, which allows a programmer to implement algorithms for probabilistic data without requiring knowledge of the underlying probabilistic model. Furthermore, an event language is presented, which is used for the probabilistic interpretation of the user program. The event language can succinctly encode arbitrary correlations using events, which are the probabilistic counterparts of deterministic user program variables. These highly interconnected events are stored in an event network, a probabilistic interpretation of the original user program. Multiple techniques for exact and approximate probability computation (with error guarantees) of such event networks are presented, as well as techniques for parallel computation. Adaptations of multiple existing data mining algorithms are shown to work in the framework, and are subsequently subjected to an extensive experimental evaluation. Additionally, a use-case is presented in which a probabilistic adaptation of a clustering algorithm is used to predict faults in energy distribution networks. Lastly, this thesis presents techniques for integrating a number of different probabilistic data formalisms for use in this framework and in other applications.

APA, Harvard, Vancouver, ISO, and other styles

13

Macedo, Charles Mendes de. "Aplicação de algoritmos de agrupamento para descoberta de padrões de defeito em software JavaScript." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-29012019-152129/.

Full text

Abstract:

As aplicações desenvolvidas com a linguagem JavaScript, vêm aumentando a cada dia, não somente aquelas na web (client-side), como também as aplicações executadas no servidor (server-side) e em dispositivos móveis (mobile). Neste contexto, a existência de ferramentas para identicação de defeitos e code smells é fundamental, para auxiliar desenvolvedores durante a evoluçãp destas aplicações. A maioria dessas ferramentas utiliza uma lista de defeitos predenidos que são descobertos a partir da observação das melhores práticas de programação e a intuição do desenvolvedor. Para melhorar essas ferramentas, a descoberta automática de defeitos e code smells é importante, pois permite identicar quais ocorrem realmente na prática e de forma frequente. Uma ferramenta que implementa uma estratégia semiautomática para descobrir padrões de defeitos através de agrupamentos das mudanças realizadas no decorrer do desenvolvimento do projeto é a ferramenta BugAID. O objetivo deste trabalho é contribuir nessa ferramenta estendendo-a com melhorias na abordagem da extração de características, as quais são usadas pelos algoritmos de clusterização. O módulo estendido encarregado da extração de características é chamado de BugAIDExtract+ +. Além disso, neste trabalho é realizada uma avaliação de vários algoritmos de clusterização na descoberta dos padrõs de defeitos em software JavaScript
Applications developed with JavaScript language are increasing every day, not only for client-side, but also for server-side and for mobile devices. In this context, the existence of tools to identify faults is fundamental in order to assist developers during the evolution of their applications. Most of these tools use a list of predened faults that are discovered from the observation of the programming best practices and developer intuition. To improve these tools, the automatic discovery of faults and code smells is important because it allows to identify which ones actually occur in practice and frequently. A tool that implements a semiautomatic strategy for discovering bug patterns by grouping the changes made during the project development is the BugAID. The objective of this work is to contribute to the BugAID tool, extending this tool with improvements in the extraction of characteristics to be used by the clustering algorithm. The extended module that extracts the characteristics is called BE+. Additionally, an evaluation of the clustering algorithms used for discovering fault patterns in JavaScript software is performed

APA, Harvard, Vancouver, ISO, and other styles

14

Davis, Jason Victor. "Mining statistical correlations with applications to software analysis." 2008. http://hdl.handle.net/2152/18340.

Full text

Abstract:

Machine learning, data mining, and statistical methods work by representing real-world objects in terms of feature sets that best describe them. This thesis addresses problems related to inferring and analyzing correlations among such features. The contributions of this thesis are two-fold: we develop formulations and algorithms for addressing correlation mining problems, and we also provide novel applications of our methods to statistical software analysis domains. We consider problems related to analyzing correlations via unsupervised approaches, as well as algorithms that infer correlations using fully-supervised or semi-supervised information. In the context of correlation analysis, we propose the problem of correlation matrix clustering which employs a k-means style algorithm to group sets of correlations in an unsupervised manner. Fundamental to this algorithm is a measure for comparing correlations called the log-determinant (LogDet) divergence, and a primary contribution of this thesis is that of interpreting and analyzing this measure in the context of information theory and statistics. Additionally based on the LogDet divergence, we present a metric learning problem called Information-Theoretic Metric Learning which uses semi-supervised or fully-supervised data to infer correlations for parametrization of a Mahalanobis distance metric. We also consider the problem of learning Mahalanobis correlation matrices in the presence of high dimensions when the number of pairwise correlations can grow very large. In validating our correlation mining methods, we consider two in-depth and real-world statistical software analysis problems: software error reporting and unit test prioritization. In the context of Clarify, we investigate two types of correlation mining applications: metric learning for nearest neighbor software support, and decision trees for error classification. We show that our metric learning algorithms can learn program-specific similarity models for more accurate nearest neighbor comparisons. In the context of decision tree learning, we address the problem of learning correlations with associated feature costs, in particular, the overhead costs of software instrumentation. As our second application, we present a unit test ordering algorithm which uses clustering and nearest neighbor algorithms, along with a metric learning component, to efficiently search and execute large unit test suites.
text

APA, Harvard, Vancouver, ISO, and other styles

15

Thomas, STEPHEN. "MINING UNSTRUCTURED SOFTWARE REPOSITORIES USING IR MODELS." Thesis, 2012. http://hdl.handle.net/1974/7688.

Full text

Abstract:

Mining Software Repositories, which is the process of analyzing the data related to software development practices, is an emerging field which aims to aid development teams in their day to day tasks. However, data in many software repositories is currently unused because the data is unstructured, and therefore difficult to mine and analyze. Information Retrieval (IR) techniques, which were developed specifically to handle unstructured data, have recently been used by researchers to mine and analyze the unstructured data in software repositories, with some success. The main contribution of this thesis is the idea that the research and practice of using IR models to mine unstructured software repositories can be improved by going beyond the current state of affairs. First, we propose new applications of IR models to existing software engineering tasks. Specifically, we present a technique to prioritize test cases based on their IR similarity, giving highest priority to those test cases that are most dissimilar. In another new application of IR models, we empirically recover how developers use their mailing list while developing software. Next, we show how the use of advanced IR techniques can improve results. Using a framework for combining disparate IR models, we find that bug localization performance can be improved by 14–56% on average, compared to the best individual IR model. In addition, by using topic evolution models on the history of source code, we can uncover the evolution of source code concepts with an accuracy of 87–89%. Finally, we show the risks of current research, which uses IR models as black boxes without fully understanding their assumptions and parameters. We show that data duplication in source code has undesirable effects for IR models, and that by eliminating the duplication, the accuracy of IR models improves. Additionally, we find that in the bug localization task, an unwise choice of parameter values results in an accuracy of only 1%, where optimal parameters can achieve an accuracy of 55%. Through empirical case studies on real-world systems, we show that all of our proposed techniques and methodologies significantly improve the state-of-the-art.
Thesis (Ph.D, Computing) -- Queen's University, 2012-12-12 12:34:59.854

APA, Harvard, Vancouver, ISO, and other styles

16

Saradha, R. "Malware Analysis using Profile Hidden Markov Models and Intrusion Detection in a Stream Learning Setting." Thesis, 2014. http://hdl.handle.net/2005/3129.

Full text

Abstract:

In the last decade, a lot of machine learning and data mining based approaches have been used in the areas of intrusion detection, malware detection and classification and also traffic analysis. In the area of malware analysis, static binary analysis techniques have become increasingly difficult with the code obfuscation methods and code packing employed when writing the malware. The behavior-based analysis techniques are being used in large malware analysis systems because of this reason. In prior art, a number of clustering and classification techniques have been used to classify the malwares into families and to also identify new malware families, from the behavior reports. In this thesis, we have analysed in detail about the use of Profile Hidden Markov models for the problem of malware classification and clustering. The advantage of building accurate models with limited examples is very helpful in early detection and modeling of malware families. The thesis also revisits the learning setting of an Intrusion Detection System that employs machine learning for identifying attacks and normal traffic. It substantiates the suitability of incremental learning setting(or stream based learning setting) for the problem of learning attack patterns in IDS, when large volume of data arrive in a stream. Related to the above problem, an elaborate survey of the IDS that use data mining and machine learning was done. Experimental evaluation and comparison show that in terms of speed and accuracy, the stream based algorithms perform very well as large volumes of data are presented for classification as attack or non-attack patterns. The possibilities for using stream algorithms in different problems in security is elucidated in conclusion.

APA, Harvard, Vancouver, ISO, and other styles

17

Bharadwaj, Venkatesh. "Aural Mapping of STEM Concepts Using Literature Mining." 2013. http://hdl.handle.net/1805/3242.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
Recent technological applications have made the life of people too much dependent on Science, Technology, Engineering, and Mathematics (STEM) and its applications. Understanding basic level science is a must in order to use and contribute to this technological revolution. Science education in middle and high school levels however depends heavily on visual representations such as models, diagrams, figures, animations and presentations etc. This leaves visually impaired students with very few options to learn science and secure a career in STEM related areas. Recent experiments have shown that small aural clues called Audemes are helpful in understanding and memorization of science concepts among visually impaired students. Audemes are non-verbal sound translations of a science concept. In order to facilitate science concepts as Audemes, for visually impaired students, this thesis presents an automatic system for audeme generation from STEM textbooks. This thesis describes the systematic application of multiple Natural Language Processing tools and techniques, such as dependency parser, POS tagger, Information Retrieval algorithm, Semantic mapping of aural words, machine learning etc., to transform the science concept into a combination of atomic-sounds, thus forming an audeme. We present a rule based classification method for all STEM related concepts. This work also presents a novel way of mapping and extracting most related sounds for the words being used in textbook. Additionally, machine learning methods are used in the system to guarantee the customization of output according to a user's perception. The system being presented is robust, scalable, fully automatic and dynamically adaptable for audeme generation.

APA, Harvard, Vancouver, ISO, and other styles

18

(8771429), Ashley S. Dale. "3D OBJECT DETECTION USING VIRTUAL ENVIRONMENT ASSISTED DEEP NETWORK TRAINING." Thesis, 2021.

Find full text

Abstract:

An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ _F1= 0.015, compared to σF 1 = 0.020 for the networks trained exclusively with real data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!