Dissertations / Theses: 'Naive bayes classifier'

1

Wester, Philip. "Anomaly-based intrusion detection using Tree Augmented Naive Bayes Classifier." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-295754.

Full text

Abstract:

With the rise of information technology and the dependence on these systems, it becomes increasingly more important to keep the systems secure. The possibility to detect an intrusion with intrusion detection systems (IDS) is one of multiple fundamental technologies that may increase the security of a system. One of the bigger challenges of an IDS, is to detect types of intrusions that have previously not been encountered, so called unknown intrusions. These types of intrusions are generally detected by using methods collectively called anomaly detection methods. In this thesis I evaluate the performance of the algorithm Tree Augmented Naive Bayes Classifier (TAN) as an intrusion detection classifier. More specifically, I created a TAN program from scratch in Python and tested the program on two data sets containing data traffic. The thesis aims to create a better understanding of how TAN works and evaluate if it is a reasonable algorithm for intrusion detection. The results show that TAN is able to perform at an acceptable level with a reasonably high accuracy. The results also highlights the importance of using the smoothing operator included in the standard version of TAN.
Med informationsteknikens utveckling och det ökade beroendet av dessa system, blir det alltmer viktigt att hålla systemen säkra. Intrångsdetektionssystem (IDS) är en av många fundamentala teknologier som kan öka säkerheten i ett system. En av de större utmaningarna inom IDS, är att upptäcka typer av intrång som tidigare inte stötts på, så kallade okända intrång. Dessa intrång upptäcks oftast med hjälp av metoder som kollektivt kallas för avvikelsedetektionsmetoder. I denna uppsats utvärderar jag algoritmen Tree Augmented Naive Bayes Classifiers (TAN) prestation som en intrångsdetektionsklassificerare. Jag programmerade ett TAN-program, i Python, och testade detta program på två dataset som innehöll datatrafik. Denna uppsats ämnar att skapa en bättre förståelse för hur TAN fungerar, samt utvärdera om det är en lämplig algoritm för detektion av intrång. Resultaten visar att TAN kan prestera på en acceptabel nivå, med rimligt hög noggrannhet. Resultaten markerar även betydelsen av "smoothing operator", som inkluderas i standardversionen av TAN.

APA, Harvard, Vancouver, ISO, and other styles

2

Eldud, Omer Ahmed Abdelkarim. "Prediction of protein secondary structure using binary classificationtrees, naive Bayes classifiers and the Logistic Regression Classifier." Thesis, Rhodes University, 2016. http://hdl.handle.net/10962/d1019985.

Full text

Abstract:

The secondary structure of proteins is predicted using various binary classifiers. The data are adopted from the RS126 database. The original data consists of protein primary and secondary structure sequences. The original data is encoded using alphabetic letters. These data are encoded into unary vectors comprising ones and zeros only. Different binary classifiers, namely the naive Bayes, logistic regression and classification trees using hold-out and 5-fold cross validation are trained using the encoded data. For each of the classifiers three classification tasks are considered, namely helix against not helix (H/∼H), sheet against not sheet (S/∼S) and coil against not coil (C/∼C). The performance of these binary classifiers are compared using the overall accuracy in predicting the protein secondary structure for various window sizes. Our result indicate that hold-out cross validation achieved higher accuracy than 5-fold cross validation. The Naive Bayes classifier, using 5-fold cross validation achieved, the lowest accuracy for predicting helix against not helix. The classification tree classifiers, using 5-fold cross validation, achieved the lowest accuracies for both coil against not coil and sheet against not sheet classifications. The logistic regression classier accuracy is dependent on the window size; there is a positive relationship between the accuracy and window size. The logistic regression classier approach achieved the highest accuracy when compared to the classification tree and Naive Bayes classifiers for each classification task; predicting helix against not helix with accuracy 77.74 percent, for sheet against not sheet with accuracy 81.22 percent and for coil against not coil with accuracy 73.39 percent. It is noted that it is easier to compare classifiers if the classification process could be completely facilitated in R. Alternatively, it would be easier to assess these logistic regression classifiers if SPSS had a function to determine the accuracy of the logistic regression classifier.

APA, Harvard, Vancouver, ISO, and other styles

3

Koc, Levent. "Application of a Hidden Bayes Naive Multiclass Classifier in Network Intrusion Detection." The George Washington University, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

4

Vallin, Simon. "Likelihood-based classification of single trees in hemi-boreal forests." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-99691.

Full text

Abstract:

Determining species of individual trees is important for forest management. In this thesis we investigate if it is possible to discriminate between Norway spruce, Scots pine and deciduous trees from airborne laser scanning data by using unique probability density functions estimated for each specie. We estimate the probability density functions in three different ways: by fitting a beta distribution, histogram density estimation and kernel density estimation. All these methods classifies single laser returns (and not segments of laser returns). The resulting classification is compared with a reference method based on features extracted from airborne laser scanning data.We measure how well a method performs by using the overall accuracy, that is the proportion of correctly predicted trees. The highest overall accuracy obtained by the methods we developed in this thesis is obtained by using histogram-density estimation where an overall accuracy of 83.4 percent is achieved. This result can be compared with the best result from the reference method that produced an overall accuracy of 84.1 percent. The fact that we achieve a high level of correctly classified trees indicates that it is possible to use these types of methods for identification of tree species.
Att kunna artbestämma enskilda träd är viktigt inom skogsbruket. I denna uppsats undersöker vi om det är möjligt att skilja mellan gran, tall och lövträd med data från en flygburen laserskanner genom att skatta en unik täthetsfunktion för varje trädslag. Täthetsfunktionerna skattas på tre olika sätt: genom att anpassa en beta-fördelning, skatta täthetsfunktionen med histogram samt skatta täthetsfunktionen med en kernel täthetsskattning. Alla dessa metoder klassificerar varje enskild laserretur (och inte segment av laserreturer). Resultaten från vår klassificering jämförs sedan med en referensmetod som bygger på särdrag från laserskanner data. Vi mäter hur väl metoderna presterar genom att jämföra den totala precisionen, vilket är andelen korrektklassificerade träd. Den högsta totala precisionen för de framtagna metoderna i denna uppsats erhölls med metoden som bygger på täthetsskattning med histogram. Precisionen för denna metod var 83,4 procent rättklassicerade träd. Detta kan jämföras med en rättklassificering på 84,1 procent vilket är det bästa resultatet för referensmetoderna. Att vi erhåller en så pass hög grad av rättklassificerade träd tyder på att de metoder som vi använder oss av är användbara för trädslagsklassificering.

APA, Harvard, Vancouver, ISO, and other styles

5

Warsitha, Tedy, and Robin Kammerlander. "Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188132.

Full text

Abstract:

A study was performed on Naive-Bayes and Label Spread- ing methods applied in a spam filter as classifiers. In the testing procedure their ability to predict was observed and the results were compared in a McNemar test; leading to the discovery of the strengths and weaknesses of the chosen methods in a environment of varying training data. Though the results were inconclusive due to resource restrictions, the theory is discussed from various angles in order to pro- vide a better understanding of the conditions that can lead to potentially different results between the chosen meth- ods; opening up for improvement and further studies. The conclusion made of this study is that a significant differ- ence exists in terms of ability to predict labels between the two classifiers. On a secondary note it is recommended to choose a classifier depending on available training data and computational power.
En studie utfördes på klassifieringsmetoderna Naive-Bayes och Label Spreading applicerade i ett spam filter. Meto- dernas förmåga att predicera observerades och resultaten jämfördes i ett McNemar test, vilket ledde till upptäckten av styrkorna och svagheterna av de valda metoderna i en miljö med varierande träningsdata. Fastän resultaten var ofullständiga på grund av bristfälliga resurser, så diskute- ras den bakomliggande teorin utifrån flera vinklar. Denna diskussion har målet att ge en bättre förståelse kring de bakomliggande förutsättningarna som kan leda till poten- tiellt annorlunda resultat för de valda metoderna. Vidare öppnar detta möjligheter för förbättringar och framtida stu- dier. Slutsatsen som dras av denna studie är att signifikanta skillnader existerar i förmågan att kunna predicera klasser mellan de två valda klassifierarna. Den slutgiltiga rekom- mendationen blir att välja en klassifierare utifrån utbudet av träningsdata och tillgängligheten av datorkraft.

APA, Harvard, Vancouver, ISO, and other styles

6

SILVA, Antonio Carlos de Castro da. "Reconhecimento automático de defeitos de fabricação em painéis TFT-LCD através de inspeção de imagem." Universidade Federal de Pernambuco, 2016. https://repositorio.ufpe.br/handle/123456789/17823.

Full text

Abstract:

Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-09-12T14:09:09Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) MSc_Antonio Carlos de Castro da Silva_digital_12_04_16.pdf: 2938596 bytes, checksum: 9d5e96b489990fe36c4e1ad5a23148dd (MD5)
Made available in DSpace on 2016-09-12T14:09:09Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) MSc_Antonio Carlos de Castro da Silva_digital_12_04_16.pdf: 2938596 bytes, checksum: 9d5e96b489990fe36c4e1ad5a23148dd (MD5) Previous issue date: 2016-01-15
A detecção prematura de defeitos nos componentes de linhas de montagem de fabricação é determinante para a obtenção de produtos finais de boa qualidade. Partindo desse pressuposto, o presente trabalho apresenta uma plataforma desenvolvida para detecção automática dos defeitos de fabricação em painéis TFT-LCD (Thin Film Transistor-Liquid Cristal Displays) através da realização de inspeção de imagem. A plataforma desenvolvida é baseada em câmeras, sendo o painel inspecionado posicionado em uma câmara fechada para não sofrer interferência da luminosidade do ambiente. As etapas da inspeção consistem em aquisição das imagens pelas câmeras, definição da região de interesse (detecção do quadro), extração das características, análise das imagens, classificação dos defeitos e tomada de decisão de aprovação ou rejeição do painel. A extração das características das imagens é realizada tomando tanto o padrão RGB como imagens em escala de cinza. Para cada componente RGB a intensidade de pixels é analisada e a variância é calculada, se um painel apresentar variação de 5% em relação aos valores de referência, o painel é rejeitado. A classificação é realizada por meio do algorítimo de Naive Bayes. Os resultados obtidos mostram um índice de 94,23% de acurácia na detecção dos defeitos. Está sendo estudada a incorporação da plataforma aqui descrita à linha de produção em massa da Samsung em Manaus.
The early detection of defects in the parts used in manufacturing assembly lines is crucial for assuring the good quality of the final product. Thus, this paper presents a platform developed for automatically detecting manufacturing defects in TFT-LCD (Thin Film Transistor-Liquid Cristal Displays) panels by image inspection. The developed platform is based on câmeras. The panel under inspection is positioned in a closed chamber to avoid interference from light sources from the environment. The inspection steps encompass image acquisition by the cameras, setting the region of interest (frame detection), feature extraction, image analysis, classification of defects, and decision making. The extraction of the features of the acquired images is performed using both the standard RGB and grayscale images. For each component the intensity of RGB pixels is analyzed and the variance is calculated. A panel is rejected if the value variation of the measure obtained is 5% of the reference values. The classification is performed using the Naive Bayes algorithm. The results obtained show an accuracy rate of 94.23% in defect detection. Samsung (Manaus) is considering the possibility of incorporating the platform described here to its mass production line.

APA, Harvard, Vancouver, ISO, and other styles

7

Pyon, Yoon Soo. "Variant Detection Using Next Generation Sequencing Data." Case Western Reserve University School of Graduate Studies / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=case1347053645.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Anderson, Michael P. "Bayesian classification of DNA barcodes." Diss., Manhattan, Kan. : Kansas State University, 2009. http://hdl.handle.net/2097/2247.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Lee, Jun won. "Relationships Among Learning Algorithms and Tasks." BYU ScholarsArchive, 2011. https://scholarsarchive.byu.edu/etd/2478.

Full text

Abstract:

Metalearning aims to obtain knowledge of the relationship between the mechanism of learning and the concrete contexts in which that mechanisms is applicable. As new mechanisms of learning are continually added to the pool of learning algorithms, the chances of encountering behavior similarity among algorithms are increased. Understanding the relationships among algorithms and the interactions between algorithms and tasks help to narrow down the space of algorithms to search for a given learning task. In addition, this process helps to disclose factors contributing to the similar behavior of different algorithms. We first study general characteristics of learning tasks and their correlation with the performance of algorithms, isolating two metafeatures whose values are fairly distinguishable between easy and hard tasks. We then devise a new metafeature that measures the difficulty of a learning task that is independent of the performance of learning algorithms on it. Building on these preliminary results, we then investigate more formally how we might measure the behavior of algorithms at a ner grained level than a simple dichotomy between easy and hard tasks. We prove that, among all many possible candidates, the Classifi er Output Difference (COD) measure is the only one possessing the properties of a metric necessary for further use in our proposed behavior-based clustering of learning algorithms. Finally, we cluster 21 algorithms based on COD and show the value of the clustering in 1) highlighting interesting behavior similarity among algorithms, which leads us to a thorough comparison of Naive Bayes and Radial Basis Function Network learning, and 2) designing more accurate algorithm selection models, by predicting clusters rather than individual algorithms.

APA, Harvard, Vancouver, ISO, and other styles

10

Giunchi, Massimiliano. "Tecnologie per la gestione di big data: analisi della piattaforma Hadoop." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2017.

Find full text

Abstract:

L’obiettivo del lavoro di tesi è approfondire i Big Data e le tecnologie idonee a trattarli con uno specifico focus su Hadoop, nonché eseguire una sperimentazione che concretizzi quanto esposto nei primi due punti. Per quel che concerne i Big Data è stata effettuata una panoramica delle principali caratteristiche, delle fonti che li generano e delle opportunità che offrono. Riguardo alle tecnologie che permettono di memorizzare ed elaborare Big Data, sono state analizzate alcune soluzioni offerte dal mercato: in questo ambito la più diffusa è rappresentata dalla piattaforma Hadoop implementata con varie modalità. Sono stati illustrati anche altri sistemi alternativi per la gestione dei Big Data quali i DBMS NoSQL. Il lavoro è proseguito con l’analisi dettagliata di Hadoop ossia il suo file system distribuito HDFS, il paradigma MapReduce e YARN che è il gestore delle risorse. La parte sperimentale è avvenuta in parallelo allo studio teorico: il primo passo è stato quello di installare Hadoop su un cluster. Poiché lo scopo consisteva nell’analizzare un set di dati proveniente da una tipica fonte di Big Data, la scelta in questo caso è ricaduta su Twitter e l’analisi che si è intrapresa è stata di sentiment analysis. Ciò ha comportato l’impiego di uno strumento per intercettare i dati, uno per elaborarli e successivamente uno per classificarli: Flume e Hive hanno reso possibile i primi due passi, mentre per compiere la classificazione si è ricorso ad un classificatore bayesiano-naif. Mahout è la libreria del framework che contiene alcuni algoritmi di machine learning tra cui anche quelli per la classificazione. Il lavoro è proseguito con la spiegazione del modello VSM per la rappresentazione dei documenti in formato vettoriale, dell’algortimo TF-IDF per la corretta attribuzione dei pesi al dizionario costruito e degli indici statistici necessari per valutare le prestazioni del classificatore. Infine sono stati mostrati i risultati ottenuti sui set di dati acquisiti.

APA, Harvard, Vancouver, ISO, and other styles

11

Kraus, Michal. "Zjednoznačňování slovních významů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235964.

Full text

Abstract:

The master's thesis deals with sense disambiguation of Czech words. Reader is informed about task's history and used algorithms are introduced. There are naive Bayes classifier, AdaBoost classifier, maximum entrophy method and decision trees described in this thesis. Used methods are clearly demonstrated. In the next parts of this thesis are used data also described. Last part of the thesis describe reached results. There are some ideas to improve the system at the end of the thesis.

APA, Harvard, Vancouver, ISO, and other styles

12

Денисова, П. А., and P. A. Denisova. "COVID-19: Анализ эмоциональной окраски сообщений в социальных сетях (на материале сети «Twitter») : магистерская диссертация." Master's thesis, б. и, 2021. http://hdl.handle.net/10995/97958.

Full text

Abstract:

Работа посвящена изучению анализа тональности текстов в социальных сетях на примере сообщений-твитов из социальной сети Twitter. Материал исследования составили 818 224 сообщения по 17-ти ключевым словам, из которых 89 025 твитов содержали слова «COVID-19» и «Сoronavirus». В первой части работы рассматриваются общие теоретические и методологические вопросы: вводится понятие Sentiment Analysis, анализируются различные подходы к классификации тональности текстов. Особое внимание в задачах классификации текстов уделяется Байесовскому классификатору, который показывает высокую точность работы. Изучаются особенности анализа тональности текстов в социальных сетях во время эпидемий и вспышек болезней. Описывается процедура и алгоритм анализа тональности текста. Большое внимание уделяется анализу тональности текстов в Python с помощью библиотеки TextBlob, а также выбирается ещё один из инструментов «SaaS» - программное обеспечение как услуга, который позволяет реализовать анализ тональности текстов в режиме реального времени, где нет необходимости в большом опыте машинного обучения и обработке естественного языка, в сравнении с языком программирования Python. Вторая часть исследования начинается с построения выборок, т.е. определения ключевых слов, по которым в работе осуществляется поиск и экспорт необходимых твитов. Для этой цели используется корпус - Coronavirus Corpus, предназначенный для отражения социальных, культурных и экономических последствий коронавируса (COVID-19) в 2020 году и в последующий период. Анализируется динамика использования слов по изучаемой тематике в течение 2020 года и проводится аналогия между частотой их использования и происходящими событиями. Далее по выбранным ключевым словам осуществляется поиск твитов и, основываясь на полученных данных, реализуется анализ тональности cообщений с помощью библиотеки Python - TextBlob, созданной для обработки текстовых данных, и онлайн - сервиса Brand24. Сравнивая данные инструменты, отмечается схожесть полученных результатов. Исследование помогает быстро и в реальном времени понять общественные настроения по поводу вспышки COVID-19, способствуя тем самым пониманию развивающихся событий. Также данная работа может быть использована в качестве модели для определения эмоционального состояния интернет-пользователей в различных ситуациях.
The work is devoted to the sentiment analysis study of messages in Twitter social network. The research material consisted of 818,224 messages and 17 keywords, whereas 89,025 tweets contained the words "COVID-19" and "Coronavirus". In the first part, theoretical and methodological issues are considered: the concept of sentiment analysis is introduced, various approaches to text classification are analyzed. Particular attention in the problems of text classification is given to Naive Bayes classifier, which shows high accuracy of work. The features of sentiment analysis in social networks during epidemics and disease outbreaks are studied. The procedure and algorithm for analyzing the sentiment of the text are described. Much attention is paid to the analysis of sentiment of texts in Python using TextBlob library, and also one of the SaaS tools is chosen - software as a service, which allows real-time sentiment analysis of texts, where there is no need for extensive experience in machine learning and natural language processing against Python programming language. The second part of the study begins with sampling, i.e. definition of keywords by which the search and export of the necessary tweets is carried out. For this purpose, the Coronavirus Corpus is used, designed to reflect the social, cultural and economic consequences of the coronavirus (COVID-19) in 2020 and beyond. The dynamics of the topic words usage during 2020 is analyzed and an analogy is drawn between the frequency of their usage and the events in place. Next, the selected keywords are used to search for tweets and, based on the data obtained, the sentiment analysis of messages is carried out using the Python library - TextBlob, created for processing textual data, and the Brand24 online service. Comparing these tools, the results are similar. The study helps to understand quickly and in real-time public sentiments about the COVID-19 outbreak, thereby contributing to the understanding of developing events. Also, this work can be used as a model for determining the emotional state of Internet users in various situations.

APA, Harvard, Vancouver, ISO, and other styles

13

Petřík, Patrik. "Predikce vývoje akciového trhu prostřednictvím technické a psychologické analýzy." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2010. http://www.nusl.cz/ntk/nusl-222507.

Full text

Abstract:

This work deals with stock market prediction via technical and psychological analysis. We introduce theoretical resources of technical and psychological analysis. We also introduce some methods of artificial intelligence, specially neural networks and genetic algorithms. We design a system for stock market prediction. We implement and test a part of system. In conclusion we discuss results.

APA, Harvard, Vancouver, ISO, and other styles

14

Khan, Syeduzzaman. "A PROBABILISTIC MACHINE LEARNING FRAMEWORK FOR CLOUD RESOURCE SELECTION ON THE CLOUD." Scholarly Commons, 2020. https://scholarlycommons.pacific.edu/uop_etds/3720.

Full text

Abstract:

The execution of the scientific applications on the Cloud comes with great flexibility, scalability, cost-effectiveness, and substantial computing power. Market-leading Cloud service providers such as Amazon Web service (AWS), Azure, Google Cloud Platform (GCP) offer various general purposes, memory-intensive, and compute-intensive Cloud instances for the execution of scientific applications. The scientific community, especially small research institutions and undergraduate universities, face many hurdles while conducting high-performance computing research in the absence of large dedicated clusters. The Cloud provides a lucrative alternative to dedicated clusters, however a wide range of Cloud computing choices makes the instance selection for the end-users. This thesis aims to simplify Cloud instance selection for end-users by proposing a probabilistic machine learning framework to allow to users select a suitable Cloud instance for their scientific applications. This research builds on the previously proposed A2Cloud-RF framework that recommends high-performing Cloud instances by profiling the application and the selected Cloud instances. The framework produces a set of objective scores called the A2Cloud scores, which denote the compatibility level between the application and the selected Cloud instances. When used alone, the A2Cloud scores become increasingly unwieldy with an increasing number of tested Cloud instances. Additionally, the framework only examines the raw application performance and does not consider the execution cost to guide resource selection. To improve the usability of the framework and assist with economical instance selection, this research adds two Naïve Bayes (NB) classifiers that consider both the application’s performance and execution cost. These NB classifiers include: 1) NB with a Random Forest Classifier (RFC) and 2) a standalone NB module. Naïve Bayes with a Random Forest Classifier (RFC) augments the A2Cloud-RF framework's final instance ratings with the execution cost metric. In the training phase, the classifier builds the frequency and probability tables. The classifier recommends a Cloud instance based on the highest posterior probability for the selected application. The standalone NB classifier uses the generated A2Cloud score (an intermediate result from the A2Cloud-RF framework) and execution cost metric to construct an NB classifier. The NB classifier forms a frequency table and probability (prior and likelihood) tables. For recommending a Cloud instance for a test application, the classifier calculates the highest posterior probability for all of the Cloud instances. The classifier recommends a Cloud instance with the highest posterior probability. This study performs the execution of eight real-world applications on 20 Cloud instances from AWS, Azure, GCP, and Linode. We train the NB classifiers using 80% of this dataset and employ the remaining 20% for testing. The testing yields more than 90% recommendation accuracy for the chosen applications and Cloud instances. Because of the imbalanced nature of the dataset and multi-class nature of classification, we consider the confusion matrix (true positive, false positive, true negative, and false negative) and F1 score with above 0.9 scores to describe the model performance. The final goal of this research is to make Cloud computing an accessible resource for conducting high-performance scientific executions by enabling users to select an effective Cloud instance from across multiple providers.

APA, Harvard, Vancouver, ISO, and other styles

15

Drábek, Matěj. "Využití vybraných metod strojového učení pro modelování kreditního rizika." Master's thesis, Vysoká škola ekonomická v Praze, 2017. http://www.nusl.cz/ntk/nusl-360509.

Full text

Abstract:

This master's thesis is divided into three parts. In the first part I described P2P lending, its characteristics, basic concepts and practical implications. I also compared P2P market in the Czech Republic, UK and USA. The second part consists of theoretical basics for chosen methods of machine learning, which are naive bayes classifier, classification tree, random forest and logistic regression. I also described methods to evaluate the quality of classification models listed above. The third part is a practical one and shows the complete workflow of creating classification model, from data preparation to evaluation of model.

APA, Harvard, Vancouver, ISO, and other styles

16

Helmersson, Benjamin. "Definition Extraction From Swedish Technical Documentation : Bridging the gap between industry and academy approaches." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-131057.

Full text

Abstract:

Terminology is concerned with the creation and maintenance of concept systems, terms and definitions. Automatic term and definition extraction is used to simplify this otherwise manual and sometimes tedious process. This thesis presents an integrated approach of pattern matching and machine learning, utilising feature vectors in which each feature is a Boolean function of a regular expression. The integrated approach is compared with the two more classic approaches, showing a significant increase in recall while maintaining a comparable precision score. Less promising is the negative correlation between the performance of the integrated approach and training size. Further research is suggested.

APA, Harvard, Vancouver, ISO, and other styles

17

Hrach, Vlastimil. "Využití prostředků umělé inteligence na kapitálových trzích." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2011. http://www.nusl.cz/ntk/nusl-222912.

Full text

Abstract:

The diploma thesis deals with artificial intelligence utilization for predictions on stock markets.The prediction is unconventionally based on Bayes' probabilistic model theorem and on its based Naive Bayes classifier. I the practical part algorithm is designed. The algorithm uses recognized relations between identifiers of technical analyze. Concretely exponential running averages at 20 and 50 days had been used. The program output is a graphic forecast of future stock development which is designed on ground of relations classification between the identifiers

APA, Harvard, Vancouver, ISO, and other styles

18

Mackových, Marek. "Analýza experimentálních EKG." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2016. http://www.nusl.cz/ntk/nusl-241981.

Full text

Abstract:

This thesis is focused on the analysis of experimental ECG records drawn up in isolated rabbit hearts and aims to describe changes in EKG caused by ischemia and left ventricular hypertrophy. It consists of a theoretical analysis of the problems in the evaluation of ECG during ischemia and hypertrophy, and describes an experimental ECG recording. Theoretical part is followed by a practical section which describes the method for calculating morphological parameters, followed by ROC analysis to evaluate their suitability for the classification of hypertrophy and at the end is focused on classification.

APA, Harvard, Vancouver, ISO, and other styles

19

Guňka, Jiří. "Adaptivní klient pro sociální síť Twitter." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-237052.

Full text

Abstract:

The goal of this term project is create user friendly client of Twitter. They may use methods of machine learning as naive bayes classifier to mentions new interests tweets. For visualissation this tweets will be use hyperbolic trees and some others methods.

APA, Harvard, Vancouver, ISO, and other styles

20

Maršánová, Lucie. "Analýza experimentálních EKG záznamů." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2015. http://www.nusl.cz/ntk/nusl-221365.

Full text

Abstract:

This diploma thesis deals with the analysis of experimental electrograms (EG) recorded from isolated rabbit hearts. The theoretical part is focused on the basic principles of electrocardiography, pathological events in ECGs, automatic classification of ECG and experimental cardiological research. The practical part deals with manual classification of individual pathological events – these results will be presented in the database of EG records, which is under developing at the Department of Biomedical Engineering at BUT nowadays. Manual scoring of data was discussed with experts. After that, the presence of pathological events within particular experimental periods was described and influence of ischemia on heart electrical activity was reviewed. In the last part, morphological parameters calculated from EG beats were statistically analised with Kruskal-Wallis and Tukey-Kramer tests and also principal component analysis (PCA) and used as classification features to classify automatically four types of the beats. Classification was realized with four approaches such as discriminant function analysis, k-Nearest Neighbours, support vector machines, and naive Bayes classifier.

APA, Harvard, Vancouver, ISO, and other styles

21

Ekdahl, Magnus. "Approximations of Bayes Classifiers for Statistical Learning of Clusters." Licentiate thesis, Linköping : Linköpings universitet, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-5856.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Nálevka, Petr. "Improving Efficiency of Prevention in Telemedicine." Doctoral thesis, Vysoká škola ekonomická v Praze, 2010. http://www.nusl.cz/ntk/nusl-113299.

Full text

Abstract:

This thesis employs data-mining techniques and modern information and communication technology to develop methods which may improve efficiency of prevention oriented telemedical programs. In particular this thesis uses the ITAREPS program as a case study and demonstrates that an extension of the program based on the proposed methods may significantly improve the program's efficiency. ITAREPS itself is a state of the art telemedical program operating since 2006. It has been deployed in 8 different countries around the world, and solely in the Czech republic it helped prevent schizophrenic relapse in over 400 participating patients. Outcomes of this thesis are widely applicable not just to schizophrenic patients but also to other psychotic or non-psychotic diseases which follow a relapsing path and satisfy certain preconditions defined in this thesis. Two main areas of improvement are proposed. First, this thesis studies various temporal data-mining methods to improve relapse prediction efficiency based on diagnostic data history. Second, latest telecommunication technologies are used in order to improve quality of the gathered diagnostic data directly at the source.

APA, Harvard, Vancouver, ISO, and other styles

23

Sjöqvist, Hugo. "Classifying Forest Cover type with cartographic variables via the Support Vector Machine, Naive Bayes and Random Forest classifiers." Thesis, Örebro universitet, Handelshögskolan vid Örebro Universitet, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-58384.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Sheppard, Sarah E. "Application of a Naïve Bayes Classifier to Assign Polyadenylation Sites from 3' End Deep Sequencing Data: A Dissertation." eScholarship@UMMS, 2013. http://escholarship.umassmed.edu/gsbs_diss/653.

Full text

Abstract:

Cleavage and polyadenylation of a precursor mRNA is important for transcription termination, mRNA stability, and regulation of gene expression. This process is directed by a multitude of protein factors and cis elements in the pre-mRNA sequence surrounding the cleavage and polyadenylation site. Importantly, the location of the cleavage and polyadenylation site helps define the 3’ untranslated region of a transcript, which is important for regulation by microRNAs and RNA binding proteins. Additionally, these sites have generally been poorly annotated. To identify 3’ ends, many techniques utilize an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Previously, simple heuristic filters relying on the number of adenines in the genomic sequence downstream of a putative polyadenylation site have been used to remove these sites of internal priming. However, these simple filters may not remove all sites of internal priming and may also exclude true polyadenylation sites. Therefore, I developed a naïve Bayes classifier to identify putative sites from oligo-dT primed 3’ end deep sequencing as true or false/internally primed. Notably, this algorithm uses a combination of sequence elements to distinguish between true and false sites. Finally, the resulting algorithm is highly accurate in multiple model systems and facilitates identification of novel polyadenylation sites.

APA, Harvard, Vancouver, ISO, and other styles

25

Trevino, Alberto. "Improving Filtering of Email Phishing Attacks by Using Three-Way Text Classifiers." BYU ScholarsArchive, 2012. https://scholarsarchive.byu.edu/etd/3103.

Full text

Abstract:

The Internet has been plagued with endless spam for over 15 years. However, in the last five years spam has morphed from an annoying advertising tool to a social engineering attack vector. Much of today's unwanted email tries to deceive users into replying with passwords, bank account information, or to visit malicious sites which steal login credentials and spread malware. These email-based attacks are known as phishing attacks. Much has been published about these attacks which try to appear real not only to users and subsequently, spam filters. Several sources indicate traditional content filters have a hard time detecting phishing attacks because the emails lack the traditional features and characteristics of spam messages. This thesis tests the hypothesis that by separating the messages into three categories (ham, spam and phish) content filters will yield better filtering performance. Even though experimentation showed three-way classification did not improve performance, several additional premises were tested, including the validity of the claim that phishing emails are too much like legitimate emails and the ability of Naive Bayes classifiers to properly classify emails.

APA, Harvard, Vancouver, ISO, and other styles

26

Sandberg, Sebastian. "Identifying Hateful Text on Social Media with Machine Learning Classifiers and Normalization Methods - Using Support Vector Machines and Naive Bayes Algorithm." Thesis, Umeå universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-155353.

Full text

Abstract:

Hateful content on social media is a growing problem. In this thesis, machine learning algorithms and pre-processing methods have been combined in order to train classifiers in identifying hateful text on social media. The combinations have been compared in terms of performance, where the considered performance criteria have been F-score and accuracy in classification. Training are performed using Naive Bayes algorithm(NB) and Support Vector Machines (SVM). The pre-processing techniques that have been used are tokenization and normalization. Fortokenization, an open-source unigram tokenizer have been used while a normalization model that normalizes each tweet pre-classification have been developed in Java. Normalization include basic clean up methods such as removing stop words, URLs, and punctuation, as well as altering methods such as emoticon conversion and spell checking. Both binary and multi-class versions of the classifiers have been used on balanced and unbalanced data. Both machine learning algorithms perform on a reasonable level with accuracy between 76.70% and 93.55% and an F-score between 0.766 and 0.935. The results point towards the fact that the main purpose of normalization is to reduce noise, balancing data is necessary and that SVM seem to slightly outperform NB.

APA, Harvard, Vancouver, ISO, and other styles

27

Polák, Michael Adam. "Identifikace zařízení na základě jejich chování v síti." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2020. http://www.nusl.cz/ntk/nusl-417243.

Full text

Abstract:

Táto práca sa zaoberá problematikou identifikácie sieťových zariadení na základe ich chovania v sieti. S neustále sa zvyšujúcim počtom zariadení na sieti je neustále dôležitejšia schopnosť identifikovať zariadenia z bezpečnostných dôvodov. Táto práca ďalej pojednáva o základoch počítačových sietí a metódach, ktoré boli využívané v minulosti na identifikáciu sieťových zariadení. Následne sú popísané algoritmy využívané v strojovom učení a taktiež sú popísané ich výhody i nevýhody. Nakoniec, táto práca otestuje dva tradičné algorithmy strojového učenia a navrhuje dva nové prístupy na identifikáciu sieťových zariadení. Výsledný navrhovaný algoritmus v tejto práci dosahuje 89% presnosť identifikácii sieťových zariadení na reálnej dátovej sade s viac ako 10000 zariadeniami.

APA, Harvard, Vancouver, ISO, and other styles

28

Gaspareto, Marinaldo José. "SELEÇÃO DE ATRIBUTOS EM IMAGENS COLETADAS SOB CONDIÇÕES DE ILUMINAÇÃO NÃO CONTROLADA E SUA INFLUÊNCIA NO DESEMPENHO DE CLASSIFICADORES NAIVE BAYES PARA IDENTIFICAÇÃO DE OBJETOS EM ESTUFAS AGRÍCOLAS." UNIVERSIDADE ESTADUAL DE PONTA GROSSA, 2013. http://tede2.uepg.br/jspui/handle/prefix/172.

Full text

Abstract:

Made available in DSpace on 2017-07-21T14:19:40Z (GMT). No. of bitstreams: 1 Marinaldo Gaspareto.pdf: 1456191 bytes, checksum: ffaf0b449c6b9d107bdf1946a4619315 (MD5) Previous issue date: 2013-09-10
A problem regarding the implementation of navigation systems for autonomous moving robots is to detect the objects of interest and obstacles which are in the environment. This study considers the detection of walls / low walls of agricultural greenhouses in digital images obtained without illumination control. The proposed approach employs techniques of digital image processing and digital classification to detect the object of interest. The classifier has been developed digital type Naive Bayes. Two important issues when employing classification methods in computer vision is the accuracy of the classifier and the complexity of computing time. The selection of attributes descriptors that comprise a classifier has great impact on these two factors, generally the fewer attributes are required, the lower the computational cost. Regarding it, this study compared the performance of two methods of feature selection based on principal component analysis, named B2 and B4 in two cases. In the first scenario the feature selection was conducted on all the data extracted from all images. The second selection was performed for images grouped by similarity. After selection, the selected attributes for each approach was used to construct the type Naive Bayes classifier with 12, 17, 22 and 27 input variables. The results indicate that the grouping of images is useful when: (a) the distance from the center of the group to the center of the original database exceeds a threshold and (b) a correlation among the descriptors variables and the target variable is greater than in the group as a whole complete data. Keywords: Greenhouses, Autonomous navigation, Selection attributes, Naive Bayes classifiers.
Um problema relativo à implementação de sistemas de navegação para robôs autônomos móveis é a detecção dos objetos de interesse e dos obstáculos que estão no ambiente. Este trabalho considera a detecção das paredes/muretas de estufas agrícolas em imagens digitais adquiridas sem controle de iluminação. A abordagem proposta emprega técnicas de processamento digital de imagens e classificação digital para detectar o objeto de interesse. O classificador digital desenvolvido foi do tipo Naive Bayes. Duas questões importantes quando do emprego de métodos de classificação em visão computacional são a acurácia do classificador e a complexidade de tempo de computação. A seleção dos atributos descritores que compõem um classificador tem grande impacto sobre estes dois fatores, de um modo geral, quanto menos atributos forem necessários, menor o custo computacional. Considerando isso, este trabalho comparou o desempenho de dois métodos de seleção de atributos baseados na análise de componentes principais, chamados B2 e B4 em duas situações. Na primeira situação, a seleção de atributos foi realizada sobre o conjunto dos dados extraídos de todas as imagens. Na segunda, a seleção foi realizada para imagens agrupadas por similaridade. Após a seleção, os atributos selecionados em cada uma das abordagens foram usados para construir classificadores do tipo Naive Bayes com 12, 17, 22 e 27 variáveis de entrada. Os resultados indicam que o agrupamento de imagens é útil quando: (a) a distância do centro do grupo ao centro da base original ultrapassa um limiar e (b) a correlação entre as variáveis descritoras e a variável meta é maior no grupo do que no conjunto completo de dados.

APA, Harvard, Vancouver, ISO, and other styles

29

Marin, Rodenas Alfonso. "Comparison of Automatic Classifiers’ Performances using Word-based Feature Extraction Techniques in an E-government setting." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-32363.

Full text

Abstract:

Nowadays email is commonly used by citizens to establish communication with their government. On the received emails, governments deal with some common queries and subjects which some handling officers have to manually answer. Automatic email classification of the incoming emails allows to increase the communication efficiency by decreasing the delay between the query and its response. This thesis takes part within the IMAIL project, which aims to provide an automatic answering solution to the Swedish Social Insurance Agency (SSIA) (“Försäkringskassan” in Swedish). The goal of this thesis is to analyze and compare the classification performance of different sets of features extracted from SSIA emails on different automatic classifiers. The features extracted from the emails will depend on the previous preprocessing that is carried out as well. Compound splitting, lemmatization, stop words removal, Part-of-Speech tagging and Ngrams are the processes used in the data set. Moreover, classifications will be performed using Support Vector Machines, k- Nearest Neighbors and Naive Bayes. For the analysis and comparison of different results, precision, recall and F-measure are used. From the results obtained in this thesis, SVM provides the best classification with a F-measure value of 0.787. However, Naive Bayes provides a better classification for most of the email categories than SVM. Thus, it can not be concluded whether SVM classify better than Naive Bayes or not. Furthermore, a comparison to Dalianis et al. (2011) is made. The results obtained in this approach outperformed the results obtained before. SVM provided a F-measure value of 0.858 when using PoS-tagging on original emails. This result improves by almost 3% the 0.83 obtained in Dalianis et al. (2011). In this case, SVM was clearly better than Naive Bayes.

APA, Harvard, Vancouver, ISO, and other styles

30

Margold, Tomáš. "Klasifikace příspěvků ve webových diskusích." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235908.

Full text

Abstract:

This thesis is dealing text ranking on the internet background. There are described available methods for classification and splitting of the text reports. The part of this thesis is implementation of Bayes naive algorithm and classifier using neuron nets. Selected methods are compared considering their error rate or other ranking features.

APA, Harvard, Vancouver, ISO, and other styles

31

Michel, David. "All Negative on the Western Front: Analyzing the Sentiment of the Russian News Coverage of Sweden with Generic and Domain-Specific Multinomial Naive Bayes and Support Vector Machines Classifiers." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447398.

Full text

Abstract:

This thesis explores to what extent Multinomial Naive Bayes (MNB) and Support Vector Machines (SVM) classifiers can be used to determine the polarity of news, specifically the news coverage of Sweden by the Russian state-funded news outlets RT and Sputnik. Three experiments are conducted. In the first experiment, an MNB and an SVM classifier are trained with the Large Movie Review Dataset (Maas et al., 2011) with a varying number of samples to determine how training data size affects classifier performance. In the second experiment, the classifiers are trained with 300 positive, negative, and neutral news articles (Agarwal et al., 2019) and tested on 95 RT and Sputnik news articles about Sweden (Bengtsson, 2019) to determine if the domain specificity of the training data outweighs its limited size. In the third experiment, the movie-trained classifiers are put up against the domain-specific classifiers to determine if well-trained classifiers from another domain perform better than relatively untrained, domain-specific classifiers. Four different types of feature sets (unigrams, unigrams without stop words removal, bigrams, trigrams) were used in the experiments. Some of the model parameters (TF-IDF vs. feature count and SVM’s C parameter) were optimized with 10-fold cross-validation. Other than the superior performance of SVM, the results highlight the need for comprehensive and domain-specific training data when conducting machine learning tasks, as well as the benefits of feature engineering, and to a limited extent, the removal of stop words. Interestingly, the classifiers performed the best on the negative news articles, which made up most of the test set (and possibly of Russian news coverage of Sweden in general).

APA, Harvard, Vancouver, ISO, and other styles

32

Dočekal, Martin. "Porovnání klasifikačních metod." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403211.

Full text

Abstract:

This thesis deals with a comparison of classification methods. At first, these classification methods based on machine learning are described, then a classifier comparison system is designed and implemented. This thesis also describes some classification tasks and datasets on which the designed system will be tested. The evaluation of classification tasks is done according to standard metrics. In this thesis is presented design and implementation of a classifier that is based on the principle of evolutionary algorithms.

APA, Harvard, Vancouver, ISO, and other styles

33

Hátle, Lukáš. "Využití Bayesovských sítí pro predikci korporátních bankrotů." Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-192331.

Full text

Abstract:

The aim of this study is to evaluate feasibility of using Bayes classifiers for predicting corporate bankruptcies. The results obtain show that Bayes classifiers do reach comparable results to then more commonly used methods such the logistic regression and the decision trees. The comparison has been carried out based on Czech and Polish data sets. The overall accuracy rate of these so called naive Bayes classifiers, using entropic discretization along with the hybrid pre-selection of the explanatory attributes, reaches 77.19 % for the Czech dataset and 79.76 % for the Polish set respectively. The AUC values for these data sets are 0.81 and 0.87. The results obtained for the Polish data set have been compared to the already published articles by Tsai (2009) and Wang et al. (2014) who applied different classification algorithms. The method proposed in my study, when compared to the above earlier works, comes out as quite successful. The thesis also includes comparing various approaches as regards the discretisation of numerical attributes and selecting the relevant explanatory attributes. These are the key issues for increasing performance of the naive Bayes classifiers

APA, Harvard, Vancouver, ISO, and other styles

34

Ochuko, Rita E. "E-banking operational risk assessment. A soft computing approach in the context of the Nigerian banking industry." Thesis, University of Bradford, 2012. http://hdl.handle.net/10454/5733.

Full text

Abstract:

This study investigates E-banking Operational Risk Assessment (ORA) to enable the development of a new ORA framework and methodology. The general view is that E-banking systems have modified some of the traditional banking risks, particularly Operational Risk (OR) as suggested by the Basel Committee on Banking Supervision in 2003. In addition, recent E-banking financial losses together with risk management principles and standards raise the need for an effective ORA methodology and framework in the context of E-banking. Moreover, evaluation tools and / or methods for ORA are highly subjective, are still in their infant stages, and have not yet reached a consensus. Therefore, it is essential to develop valid and reliable methods for effective ORA and evaluations. The main contribution of this thesis is to apply Fuzzy Inference System (FIS) and Tree Augmented Naïve Bayes (TAN) classifier as standard tools for identifying OR, and measuring OR exposure level. In addition, a new ORA methodology is proposed which consists of four major steps: a risk model, assessment approach, analysis approach and a risk assessment process. Further, a new ORA framework and measurement metrics are proposed with six factors: frequency of triggering event, effectiveness of avoidance barriers, frequency of undesirable operational state, effectiveness of recovery barriers before the risk outcome, approximate cost for Undesirable Operational State (UOS) occurrence, and severity of the risk outcome. The study results were reported based on surveys conducted with Nigerian senior banking officers and banking customers. The study revealed that the framework and assessment tools gave good predictions for risk learning and inference in such systems. Thus, results obtained can be considered promising and useful for both E-banking system adopters and future researchers in this area.

APA, Harvard, Vancouver, ISO, and other styles

35

Tully, Philip. "Spike-Based Bayesian-Hebbian Learning in Cortical and Subcortical Microcircuits." Doctoral thesis, KTH, Beräkningsvetenskap och beräkningsteknik (CST), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-205568.

Full text

Abstract:

Cortical and subcortical microcircuits are continuously modified throughout life. Despite ongoing changes these networks stubbornly maintain their functions, which persist although destabilizing synaptic and nonsynaptic mechanisms should ostensibly propel them towards runaway excitation or quiescence. What dynamical phenomena exist to act together to balance such learning with information processing? What types of activity patterns do they underpin, and how do these patterns relate to our perceptual experiences? What enables learning and memory operations to occur despite such massive and constant neural reorganization? Progress towards answering many of these questions can be pursued through large-scale neuronal simulations. In this thesis, a Hebbian learning rule for spiking neurons inspired by statistical inference is introduced. The spike-based version of the Bayesian Confidence Propagation Neural Network (BCPNN) learning rule involves changes in both synaptic strengths and intrinsic neuronal currents. The model is motivated by molecular cascades whose functional outcomes are mapped onto biological mechanisms such as Hebbian and homeostatic plasticity, neuromodulation, and intrinsic excitability. Temporally interacting memory traces enable spike-timing dependence, a stable learning regime that remains competitive, postsynaptic activity regulation, spike-based reinforcement learning and intrinsic graded persistent firing levels. The thesis seeks to demonstrate how multiple interacting plasticity mechanisms can coordinate reinforcement, auto- and hetero-associative learning within large-scale, spiking, plastic neuronal networks. Spiking neural networks can represent information in the form of probability distributions, and a biophysical realization of Bayesian computation can help reconcile disparate experimental observations.

QC 20170421

APA, Harvard, Vancouver, ISO, and other styles

36

Ochuko, Rita Erhovwo. "E-banking operational risk assessment : a soft computing approach in the context of the Nigerian banking industry." Thesis, University of Bradford, 2012. http://hdl.handle.net/10454/5733.

Full text

Abstract:

This study investigates E-banking Operational Risk Assessment (ORA) to enable the development of a new ORA framework and methodology. The general view is that E-banking systems have modified some of the traditional banking risks, particularly Operational Risk (OR) as suggested by the Basel Committee on Banking Supervision in 2003. In addition, recent E-banking financial losses together with risk management principles and standards raise the need for an effective ORA methodology and framework in the context of E-banking. Moreover, evaluation tools and / or methods for ORA are highly subjective, are still in their infant stages, and have not yet reached a consensus. Therefore, it is essential to develop valid and reliable methods for effective ORA and evaluations. The main contribution of this thesis is to apply Fuzzy Inference System (FIS) and Tree Augmented Naïve Bayes (TAN) classifier as standard tools for identifying OR, and measuring OR exposure level. In addition, a new ORA methodology is proposed which consists of four major steps: a risk model, assessment approach, analysis approach and a risk assessment process. Further, a new ORA framework and measurement metrics are proposed with six factors: frequency of triggering event, effectiveness of avoidance barriers, frequency of undesirable operational state, effectiveness of recovery barriers before the risk outcome, approximate cost for Undesirable Operational State (UOS) occurrence, and severity of the risk outcome. The study results were reported based on surveys conducted with Nigerian senior banking officers and banking customers. The study revealed that the framework and assessment tools gave good predictions for risk learning and inference in such systems. Thus, results obtained can be considered promising and useful for both E-banking system adopters and future researchers in this area.

APA, Harvard, Vancouver, ISO, and other styles

37

Блінков, Євген Миколайович. "Інформаційна технологія визначення тональності текстів." Bachelor's thesis, КПІ ім. Ігоря Сікорського, 2020. https://ela.kpi.ua/handle/123456789/39700.

Full text

Abstract:

Структура та обсяг роботи. Пояснювальна записка дипломного проекту складається з п’яти розділів, містить 26 рисунків, 7 таблиць, 1 додаток, 17 джерел. Дипломний проект присвячений автоматизації процесів проведення аналізу тональності текстів, застосовуючи різні алгоритми та порівняння ефективності цих алгоритмів. В даному проекті описуються методи визначення тональності текстів та способи їх застосування при розробці інформаційної технології визначення тональності текстів. У розділі інформаційного забезпечення були наведені набори вхідних та вихідних даних, а також описано їх формат, структуру та призначення у програмному продукті. Розділ математичного забезпечення, насамперед, присвячений опису змістовної та математичної постановок задачі, а також ключових методів розв’язання поставленої задачі. Крім цього, наводиться обґрунтування вибору даних методів для їх реалізації у програмному продукті. У розділі програмного забезпечення наводяться засоби розробки, які були використані у даному програмному продукті, а також описується принцип роботи веб-застосування у вигляді різних діаграм. У технологічному розділі наводиться керівництво користувачу при користуванні даною програмою, а також описуються результати випробувань.
Structure and scope of work. The explanatory note of the diploma project consists of five sections, contains 26 figures, 7 tables, 1 appendix, 17 sources. The diploma project is devoted to automation of processes of the sentimental analysis of texts, applying various algorithms, and comparison of efficiency of these algorithms. This project describes the methods of sentimental analysis of texts and principles of their application in the development of information technology for text sentimental analysis. The information support section provided sets of input and output data, as well as described their format, structure and purpose in the software product. The section of mathematical support is primarily devoted to the description of meaningful and mathematical formulations of the problem, as well as key methods for solving the problem. In addition, the rationale for the choice of these methods for their implementation in the software product. The software section lists the development tools that have been used in this software product, as well as how the web application works in the form of various diagrams. The technological section provides user guidance for using this program, as well as describes the test results.

APA, Harvard, Vancouver, ISO, and other styles

38

Heidfors, Filip, and Elias Moltedo. "Maskininlärning: avvikelseklassificering på sekventiell sensordata. En jämförelse och utvärdering av algoritmer för att klassificera avvikelser i en miljövänlig IoT produkt med sekventiell sensordata." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20742.

Full text

Abstract:

Ett företag har tagit fram en miljövänlig IoT produkt med sekventiell sensordata och vill genom maskininlärning kunna klassificera avvikelser i sensordatan. Det har genom åren utvecklats ett flertal väl fungerande algoritmer för klassificering men det finns emellertid ingen algoritm som fungerar bäst för alla olika problem. Syftet med det här arbetet var därför att undersöka, jämföra och utvärdera olika klassificerare inom "supervised machine learning" för att ta reda på vilken klassificerare som ger högst träffsäkerhet att klassificera avvikelser i den typ av IoT produkt som företaget tagit fram. Genom en litteraturstudie tog vi först reda på vilka klassificerare som vanligtvis använts och fungerat bra i tidigare vetenskapliga arbeten med liknande applikationer. Vi kom fram till att jämföra och utvärdera Random Forest, Naïve Bayes klassificerare och Support Vector Machines ytterligare. Vi skapade sedan ett dataset på 513 exempel som vi använde för träning och validering för respektive klassificerare. Resultatet visade att Random Forest hade betydligt högre träffsäkerhet med 95,7% jämfört med Naïve Bayes klassificerare (81,5%) och Support Vector Machines (78,6%). Slutsatsen för arbetet är att Random Forest med sina 95,7% ger en tillräckligt hög träffsäkerhet så att företaget kan använda maskininlärningsmodellen för att förbättra sin produkt. Resultatet pekar också på att Random Forest, för det här arbetets specifika klassificeringsproblem, är den klassificerare som fungerar bäst inom "supervised machine learning" men att det eventuellt finns möjlighet att få ännu högre träffsäkerhet med andra tekniker som till exempel "unsupervised machine learning" eller "semi-supervised machine learning".
A company has developed a environment-friendly IoT device with sequential sensor data and want to use machine learning to classify anomalies in their data. Throughout the years, several well working algorithms for classifications have been developed. However, there is no optimal algorithm for every problem. The purpose of this work was therefore to investigate, compare and evaluate different classifiers within supervised machine learning to find out which classifier that gives the best accuracy to classify anomalies in the kind of IoT device that the company has developed. With a literature review we first wanted to find out which classifiers that are commonly used and have worked well in related work for similar purposes and applications. We concluded to further compare and evaluate Random Forest, Naïve Bayes and Support Vector Machines. We created a dataset of 513 examples that we used for training and evaluation for each classifier. The result showed that Random Forest had superior accuracy with 95.7% compared to Naïve Bayes (81.5%) and Support Vector Machines (78.6%). The conclusion for this work is that Random Forest, with 95.7%, gives a high enough accuracy for the company to have good use of the machine learning model. The result also indicates that Random Forest, for this thesis specific classification problem, is the best classifier within supervised machine learning but that there is a potential possibility to get even higher accuracy with other techniques such as unsupervised machine learning or semi-supervised machine learning.

APA, Harvard, Vancouver, ISO, and other styles

39

Moraes, Rodrigo de. "Uma investigação empírica e comparativa da aplicação de RNAs ao problema de mineração de opiniões e análise de sentimentos." Universidade do Vale do Rio dos Sinos, 2013. http://www.repositorio.jesuita.org.br/handle/UNISINOS/3411.

Full text

Abstract:

Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2015-05-04T17:25:43Z No. of bitstreams: 1 Rodrigo Morais.pdf: 5083865 bytes, checksum: 69563cc7178422ac20ff08fe38ee97de (MD5)
Made available in DSpace on 2015-05-04T17:25:43Z (GMT). No. of bitstreams: 1 Rodrigo Morais.pdf: 5083865 bytes, checksum: 69563cc7178422ac20ff08fe38ee97de (MD5) Previous issue date: 2013
Nenhuma
A área de Mineração de Opiniões e Análise de Sentimentos surgiu da necessidade de processamento automatizado de informações textuais referentes a opiniões postadas na web. Como principal motivação está o constante crescimento do volume desse tipo de informação, proporcionado pelas tecnologia trazidas pela Web 2.0, que torna inviável o acompanhamento e análise dessas opiniões úteis tanto para usuários com pretensão de compra de novos produtos quanto para empresas para a identificação de demanda de mercado. Atualmente, a maioria dos estudos em Mineração de Opiniões e Análise de Sentimentos que fazem o uso de mineração de dados se voltam para o desenvolvimentos de técnicas que procuram uma melhor representação do conhecimento e acabam utilizando técnicas de classificação comumente aplicadas, não explorando outras que apresentam bons resultados em outros problemas. Sendo assim, este trabalho tem como objetivo uma investigação empírica e comparativa da aplicação do modelo clássico de Redes Neurais Artificiais (RNAs), o multilayer perceptron , no problema de Mineração de Opiniões e Análise de Sentimentos. Para isso, bases de dados de opiniões são definidas e técnicas de representação de conhecimento textual são aplicadas sobre essas objetivando uma igual representação dos textos para os classificadores através de unigramas. A partir dessa reresentação, os classificadores Support Vector Machines (SVM), Naïve Bayes (NB) e RNAs são aplicados considerandos três diferentes contextos de base de dados: (i) bases de dados balanceadas, (ii) bases com diferentes níveis de desbalanceamento e (iii) bases em que a técnica para o tratamento do desbalanceamento undersampling randômico é aplicada. A investigação do contexto desbalanceado e de outros originados dele se mostra relevante uma vez que bases de opiniões disponíveis na web normalmente apresentam mais opiniões positivas do que negativas. Para a avaliação dos classificadores são utilizadas métricas tanto para a mensuração de desempenho de classificação quanto para a de tempo de execução. Os resultados obtidos sobre o contexto balanceado indicam que as RNAs conseguem superar significativamente os resultados dos demais classificadores e, apesar de apresentarem um grande custo computacional para treinamento, proporcionam tempos de classificação significantemente inferiores aos do classificador que apresentou os resultados de classificação mais próximos aos dos resultados das RNAs. Já para o contexto desbalanceado, as RNAs se mostram sensíveis ao aumento de ruído na representação dos dados e ao aumento do desbalanceamento, se destacando nestes experimentos, o classificador NB. Com a aplicação de undersampling as RNAs conseguem ser equivalentes aos demais classificadores apresentando resultados competitivos. Porém, podem não ser o classificador mais adequado de se adotar nesse contexto quando considerados os tempos de treinamento e classificação, e também a diferença pouco expressiva de acerto de classificação.
The area of Opinion Mining and Sentiment Analysis emerges from the need for automated processing of textual information about reviews posted in the web. The main motivation of this area is the constant volume growth of such information, provided by the technologies brought by Web 2.0, that makes impossible the monitoring and analysis of these reviews that are useful for users, who desire to purchase new products, and for companies to identify market demand as well. Currently, the most studies of Opinion Mining and Sentiment Analysis that make use of data mining aims to the development of techniques that seek a better knowledge representation and using classification techniques commonly applied and they not explore others classifiers that work well in other problems. Thus, this work aims a comparative empirical research of the ap-plication of the classical model of Artificial Neural Networks (ANN), the multilayer perceptron, in the Opinion Mining and Sentiment Analysis problem. For this, reviews datasets are defined and techniques for textual knowledge representation applied to these aiming an equal texts rep-resentation for the classifiers. From this representation, the classifiers Support Vector Machines (SVM), Naïve Bayes (NB) and ANN are applied considering three data context: (i) balanced datasets, (ii) datasets with different unbalanced ratio and (iii) datasets with the application of random undersampling technique for the unbalanced handling. The unbalanced context inves-tigation and of others originated from it becomes relevant once datasets available in the web ordinarily contain more positive opinions than negative. For the classifiers evaluation, metrics both for the classification perform and for run time are used. The results obtained in the bal-anced context indicate that ANN outperformed significantly the others classifiers and, although it has a large computation cost for the training fase, the ANN classifier provides classification time (real-time) significantly less than the classifier that obtained the results closer than ANN. For the unbalanced context, the ANN are sensitive to the growth of noise representation and the unbalanced growth while the NB classifier stood out. With the undersampling application, the ANN classifier is equivalent to the others classifiers attaining competitive results. However, it can not be the most appropriate classifier to this context when the training and classification time and its little advantage of classification accuracy are considered.

APA, Harvard, Vancouver, ISO, and other styles

40

Solis, Montero Andres. "Efficient Feature Extraction for Shape Analysis, Object Detection and Tracking." Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/34830.

Full text

Abstract:

During the course of this thesis, two scenarios are considered. In the first one, we contribute to feature extraction algorithms. In the second one, we use features to improve object detection solutions and localization. The two scenarios give rise to into four thesis sub-goals. First, we present a new shape skeleton pruning algorithm based on contour approximation and the integer medial axis. The algorithm effectively removes unwanted branches, conserves the connectivity of the skeleton and respects the topological properties of the shape. The algorithm is robust to significant boundary noise and to rigid shape transformations. It is fast and easy to implement. While shape-based solutions via boundary and skeleton analysis are viable solutions to object detection, keypoint features are important for textured object detection. Therefore, we present a keypoint featurebased planar object detection framework for vision-based localization. We demonstrate that our framework is robust against illumination changes, perspective distortion, motion blur, and occlusions. We increase robustness of the localization scheme in cluttered environments and decrease false detection of targets. We present an off-line target evaluation strategy and a scheme to improve pose. Third, we extend planar object detection to a real-time approach for 3D object detection using a mobile and uncalibrated camera. We develop our algorithm based on two novel naive Bayes classifiers for viewpoint and feature matching that improve performance and decrease memory usage. Our algorithm exploits the specific structure of various binary descriptors in order to boost feature matching by conserving descriptor properties. Our novel naive classifiers require a database with a small memory footprint because we only store efficiently encoded features. We improve the feature-indexing scheme to speed up the matching process creating a highly efficient database for objects. Finally, we present a model-free long-term tracking algorithm based on the Kernelized Correlation Filter. The proposed solution improves the correlation tracker based on precision, success, accuracy and robustness while increasing frame rates. We integrate adjustable Gaussian window and sparse features for robust scale estimation creating a better separation of the target and the background. Furthermore, we include fast descriptors and Fourier spectrum packed format to boost performance while decreasing the memory footprint. We compare our algorithm with state-of-the-art techniques to validate the results.

APA, Harvard, Vancouver, ISO, and other styles

41

Hung, Chi-Chang, and 洪啟彰. "Improving Naive Bayes Classifier with Association Rules." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/72415233167975672024.

Full text

Abstract:

碩士
國立中正大學
資訊工程研究所
92
Naive Bayes classifier in machine learning is a kind of probabilistic classifiers based on Bayesian theory. It uses statistic method to classify a new instance by assigning a class with the maximum conditional probability. Naive Bayes assumes that conditional probabilities of terms are the independent assumption. On another domain, association rule mining is done on learning rules by exhaustive search. It aims to find all rules that satisfy user-specified minimum support and minimum confidence. A classifier is trained only by the set of rules may not be used for accurate classification. But association rule mining can find some strong evidences from training set. Our approach is to involve those strong evidences into Naive Bayes classifier. The accuracy of combination is better than the single Naive Bayes classifier.

APA, Harvard, Vancouver, ISO, and other styles

42

Tsai, Zong-Ching, and 蔡宗欽. "Improving Naive Bayes Classifier with Multiple Attributes Association Rules." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/54197840022046525752.

Full text

Abstract:

碩士
南台科技大學
工業管理研究所
93
Many previous studies indicate that NBC (Naïve Bayesian Classifier) is a simple and effective classification method. But NBC’s attribute independence assumption on which it is based, makes it unable to cope with the dependence among attributes, and affects its classification performance. There are many improvement methods that put forward in times gone by, however the accuracy promotes meanwhile, but usually needs tremendous calculation time. This research provides a new classification method: MANBC (Multiple Attributes Naïve Bayes Classifier). MANBC adopts a greedy algorithm to generate important class association rules from the examples that missed classification by NBC and uses the best rule in prediction. In the experiment, we compared the MANBC with other 13 methods by running the tests from 31 data sets of the UCI Repository. Experimental results showed that MANBC outstrips NBC, on an average. The classification accuracy is higher about 1% to 2%, and in the data set that attribute independence assumption is violated, its margin is about in the amount of 20%. Meanwhile, the number of the rules generated by MANBC is usually 7% of CPAR’s only. In the time aspect, for those classifiers that are as accurate as MANBC, their performance times are slightly longer than MANBC’s, except CPAR.

APA, Harvard, Vancouver, ISO, and other styles

43

Liu, Yu-Hsuan, and 劉宇軒. "Naive Bayes classifier with Principal Components Analysis and Fisher Information." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/ugz36c.

Full text

Abstract:

碩士
國立臺灣科技大學
資訊管理系
105
Naive Bayes classifier is a simple probabilistic classifier which is based on applying Bayes’theorem which strong independence assumptions between the features. We propose a method based on Naive Bayes classifier with Principal Components Analysis(PCA) and Fisher Information. We use Principal Components Analysis to make features uncorrelated. The transformed features are ranked by Fisher Information score which measuring the amount of information and calculate the posterior probability where the likelihood is replaced by p-value. We conclude our research through the classification accuracy with some examples and present our vision for future research.

APA, Harvard, Vancouver, ISO, and other styles

44

Chen, I.-Chieh, and 陳羿捷. "Image Classification Using Naive Bayes Classifier With Pairwise Local Observations." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/99031641900696139918.

Full text

Abstract:

碩士
國立清華大學
電機工程學系
103
We present image classification method using Naive Bayes classifier using pairwise local observations (NBPLO) based on the salient region (SR) selection and the local feature detection. Different from previous image classification algorithms, our method is a scale, translation, and rotation invariant classification algorithm. By transforming the pairwise local observations into training vectors, we may simulate the human visual system by developing the training classification model based on the neighboring relationship of the selected SRs. We verify our assumptions with Scene-15 and Caltech-101 database and compare the difference of mainstream feature point detection methods. And also compare the experiment results of bag-of-features (BoF) and SPM algorithms.

APA, Harvard, Vancouver, ISO, and other styles

45

Lin, Cheng-Lung, and 林政龍. "Internet Traffic Classification based on Hybrid Naive Bayes HMMs Classifier." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/59142288452653192129.

Full text

Abstract:

碩士
國立臺灣科技大學
資訊工程系
96
To deal with the large network infrastructure, we must rely on an automatic network management system. Traditionally, most of the firewall simply use the port number of the packets to identify abnormal network traffic. Furthermore, some of them observe the characteristic in application layer to identify abnormal network traffic such as the payload of a packet. However, the traditional security mechanisms encounter difficulties with the increasing popularity of encrypted protocols. Recently, some related researches which can identify application protocol by some restricted characteristics and behaviors in transition layer of TCP/IP model after encryption. Therefore, we combine and implement two models which are Naive Bayes and Hidden Markov Models (HMMs) as an automatic system and use the limited information of encrypted packets to infer and classify the application protocol behavior. Generally speaking, HMMs are relatively good to estimate the potential relationship with temporal data. Naive Bayes is simple, fast, and effective. It is usually used for dealing multidimension dataset in lots of cases. In this thesis, we propose hybrid Naive Bayes HMMs classifier as a fundamental framework to infer application protocol behavior in encrypted network traffic. The hybrid model uses the temporal property of HMMs to inspect the relation between the packets and employs Naive Bayes to character the statistical signature. In this study, our approach can not only identify network behavior in encrypted network traffic, but also employ the temporal property to raise the accuracy. It can be applied to infer application protocol and detects the abnormal behavior. Comparing to related researches, our method only uses a few features to classify multi-flow protocol and get respectable performance.

APA, Harvard, Vancouver, ISO, and other styles

46

Wu, Jo-Ping, and 吳若平. "Naive Bayes classifier with Principal Components Analysis for continuous attributes." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/62937176964151133310.

Full text

Abstract:

碩士
國立中央大學
工業管理研究所
103
Due to the progressing of the science and technology, the data is growing rapidly. The speed of classifier has become an important part of data mining. Naïve Bayes classifier model is a simple and practical method of classification, it is based on applying Bayes’ theorem with strong independence assumptions between the features. But this assumption is not very realistic as in many real situations. We propose a classifier method, PC-Naïve, which is based on Naïve Bayes classifier. We keep the simple and fast advantages of the Naïve Bays classifier and relax vital assumption for independence of the Naïve Bayes classifie model. We use Principal components analysis to transform the original data, make the attributes mutual linearly independence. Then discretization the transform data and calculate the prior and conditional probability. Final we can get the posterior probability and classifier the data. We have used the examples to present the classifier procedures in our research and compare the accuracy with four models, including PC-Naïve model, tradition Naïve Bayes model, Decision Tree model and Stepwise Logistic Regression model. At the end, we have discuss the accuracy of different dimension and discretization methods.

APA, Harvard, Vancouver, ISO, and other styles

47

Chang, Liang-Hao, and 張良豪. "Improving the performance of Naive Bayes Classifier by using Selective Naive Bayesian Algorithm and Prior Distributions." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/92613736217287175606.

Full text

Abstract:

碩士
國立成功大學
工業與資訊管理學系碩博士班
97
Naive Bayes classifiers have been widely used for data classification because of its computational efficiency and competitive accuracy. When all attributes are employed for classification, the accuracy of the naive Bayes classifier is generally affected by noisy attributes. A mechanism for attribute selection should be considered for improving its prediction accuracy. Selective naive Bayesian method is a very successful approach for removing noisy and/or redundant attributes. In addition, attributes are generally assumed to have prior distributions, such as Dirichlet or generalized Dirichlet distributions, for achieving a higher prediction accuracy. Many studies have proposed the methods for finding the best priors for attributes, but none of them takes attribute selection into account. Thus, this thesis proposes two models for combining prior distribution and feature selection together for increasing the accuracy of the naive Bayes classifier. Model I finds out the best prior for each attribute after all attributes have been determined by the selective naive Bayesian algorithm. Model II finds the best prior of the newest attribute determined by the selective naive Bayesian algorithm when all predecessors of the newest attribute have their best priors. The experimental result on 17 data sets form UCI data repository shows that Model I with the general Dirichlet prior generally and consistently achieves a higher classification accuracy.

APA, Harvard, Vancouver, ISO, and other styles

48

Zhi-JunChen and 陳志濬. "Investigating the Effect of Attribute Value Ranking Methods on Naive Bayes Classifier with Generalized Dirichlet Priors." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/88464897701033696577.

Full text

Abstract:

碩士
國立成功大學
工業管理科學系碩博士班
98
Na?ve Bayesian classifiers have been widely used for data classification because of its computational efficiency and competitive accuracy. In a na?ve Bayesian classifier, the prior distributions of an attribute are generally assumed to be Dirichlet or generalized Dirichlet distributions. The generalized Dirichlet distribution can release the restrictions of the Dirichlet distribution, and usually results in higher classification accuracy. However, the order of the variables in a generalized Dirichlet random vector is generally not arbitrary. In this study, three methods for determining the order of attribute values are proposed to study their impact on the performance of the na?ve Bayesian classifiers with noninformative generalized Dirichlet priors. The experimental results on 20 data sets from UCI data repository demonstrate that when attribute values are properly ordered, the classification accuracy can be slightly improved with respect to nonordered attribute values. When computational efficiency is a major concern, ordering attribute values for employing noninformative generalized Dirichlet priors will not be necessary.

APA, Harvard, Vancouver, ISO, and other styles

49

Liu, Chong-Hsien, and 劉忠賢. "Real-time and Low-memory Multi-face Detection System Design based on Naive Bayes Classifier using FPGA." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/f92z7n.

Full text

Abstract:

碩士
國立交通大學
電控工程研究所
104
In recent years, face detection is widely used in various fields, such as face recognition, image focusing, and surveillance systems. This thesis proposes a real-time face detection system based on naive Bayesian classifier using FPGA. The system divided into three main parts, feature extraction, candidates face detection, and false elimination. First downscale the image to the image pyramid and extract local binary image features from each downscaling image; then features go through the naive Bayesian classifier to identify candidate faces. Finally, use skin color filter and face overlapping elimination to remove false positives. Detection results output to the monitor in VGA. In this thesis, face detection system to implement in FPGA. As a result of the FPGA parallel processing, in 640480 resolutions, the face detection of an image executes within 16.7 milliseconds. And the improved local binary features, compared to Haar features, save around 140 times the amount of memory. The experimental results show that the accuracy rate is higher than 95% in face detection, which implies the proposed real-time detection system is indeed effective and efficient.

APA, Harvard, Vancouver, ISO, and other styles

50

Mawila, Ntombhimuni. "Natural language processing for researchh philosophies and paradigms dissertation (DFIT91)." Diss., 2021. http://hdl.handle.net/10500/27471.

Full text

Abstract:

Research philosophies and paradigms (RPPs) reveal researchers’ assumptions and provide a systematic way in which research can be carried out effectively and appropriately. Different studies highlight cognitive and comprehension challenges of RPPs concepts at the postgraduate level. This study develops a natural language processing (NLP) supervised classification application that guides students in identifying RPPs applicable to their study. By using algorithms rooted in a quantitative research approach, this study builds a corpus represented using the Bag of Words model to train the naïve Bayes, Logistic Regression, and Support Vector Machine algorithms. Computer experiments conducted to evaluate the performance of the algorithms reveal that the Naïve Bayes algorithm presents the highest accuracy and precision levels. In practice, user testing results show the varying impact of knowledge, performance, and effort expectancy. The findings contribute to the minimization of issues postgraduates encounter in identifying research philosophies and the underlying paradigms for their studies.
Science and Technology Education
MTech. (Information Technology)

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Naive bayes classifier'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles