Dissertations / Theses: 'K-Nearest Neighbors algorithm'

1

Li, Zheng, and Zheng Li. "Improving Estimation Accuracy of GPS-Based Arterial Travel Time Using K-Nearest Neighbors Algorithm." Thesis, The University of Arizona, 2017. http://hdl.handle.net/10150/625901.

Full text

Abstract:

Link travel time plays a significant role in traffic planning, traffic management and Advanced Traveler Information Systems (ATIS). A public probe vehicle dataset is a probe vehicle dataset that is collected from public people or public transport. The appearance of public probe vehicle datasets can support travel time collection at a large temporal and spatial scale but at a relatively low cost. Traditionally, link travel time is the aggregation of travel time by different movements. A recent study proved that link travel time of different movements is significantly different from their aggregation. However, there is still not a complete framework for estimating movement-based link travel time. In addition, probe vehicle datasets usually have a low penetration rate but no previous study has solved this problem. To solve the problems above, this study proposed a detailed framework to estimate movement-based link travel time using a high sampling rate public probe vehicle dataset. Our study proposed a k-Nearest Neighbors (k-NN) regression method to increase travel time samples using incomplete trajectory. An incomplete trajectory was compared with historical complete trajectories and the link travel time of the incomplete trajectory was represented by its similar complete trajectories. The result of our study showed that the method can significantly increase link travel time samples but there are still limitations. In addition, our study investigated the performance of k-NN regression under different parameters and input data. The sensitivity analysis of k-NN algorithm showed that the algorithm performed differently under different parameters and input data. Our study suggests optimal parameters should be selected using a historical dataset before real-world application.

APA, Harvard, Vancouver, ISO, and other styles

2

Piro, Paolo. "Learning prototype-based classification rules in a boosting framework: application to real-world and medical image categorization." Phd thesis, Université de Nice Sophia-Antipolis, 2010. http://tel.archives-ouvertes.fr/tel-00590403.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Gupta, Nidhi. "Mutual k Nearest Neighbor based Classifier." University of Cincinnati / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1289937369.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Olivares, Javier. "Scaling out-of-core k-nearest neighbors computation on single machines." Thesis, Rennes 1, 2016. http://www.theses.fr/2016REN1S073/document.

Full text

Abstract:

La technique des K-plus proches voisins (K-Nearest Neighbors (KNN) en Anglais) est une méthode efficace pour trouver des données similaires au sein d'un grand ensemble de données. Au fil des années, un grand nombre d'applications ont utilisé les capacités du KNN pour découvrir des similitudes dans des jeux de données de divers domaines tels que les affaires, la médecine, la musique, ou l'informatique. Bien que des années de recherche aient apporté plusieurs approches de cet algorithme, sa mise en œuvre reste un défi, en particulier aujourd'hui alors que les quantités de données croissent à des vitesses inimaginables. Dans ce contexte, l'exécution du KNN sur de grands ensembles pose deux problèmes majeurs: d'énormes empreintes mémoire et de très longs temps d'exécution. En raison de ces coût élevés en termes de ressources de calcul et de temps, les travaux de l'état de l'art ne considèrent pas le fait que les données peuvent changer au fil du temps, et supposent toujours que les données restent statiques tout au long du calcul, ce qui n'est malheureusement pas du tout conforme à la réalité. Nos contributions dans cette thèse répondent à ces défis. Tout d'abord, nous proposons une approche out-of-core pour calculer les KNN sur de grands ensembles de données en utilisant un seul ordinateur. Nous préconisons cette approche comme un moyen moins coûteux pour faire passer à l'échelle le calcul des KNN par rapport au coût élevé d'un algorithme distribué, tant en termes de ressources de calcul que de temps de développement, de débogage et de déploiement. Deuxièmement, nous proposons une approche out-of-core multithreadée (i.e. utilisant plusieurs fils d'exécution) pour faire face aux défis du calcul des KNN sur des données qui changent rapidement et continuellement au cours du temps. Après une évaluation approfondie, nous constatons que nos principales contributions font face aux défis du calcul des KNN sur de grands ensembles de données, en tirant parti des ressources limitées d'une machine unique, en diminuant les temps d'exécution par rapport aux performances actuelles, et en permettant le passage à l'échelle du calcul, à la fois sur des données statiques et des données dynamiques
The K-Nearest Neighbors (KNN) is an efficient method to find similar data among a large set of it. Over the years, a huge number of applications have used KNN's capabilities to discover similarities within the data generated in diverse areas such as business, medicine, music, and computer science. Despite years of research have brought several approaches of this algorithm, its implementation still remains a challenge, particularly today where the data is growing at unthinkable rates. In this context, running KNN on large datasets brings two major issues: huge memory footprints and very long runtimes. Because of these high costs in terms of computational resources and time, KNN state-of the-art works do not consider the fact that data can change over time, assuming always that the data remains static throughout the computation, which unfortunately does not conform to reality at all. In this thesis, we address these challenges in our contributions. Firstly, we propose an out-of-core approach to compute KNN on large datasets, using a commodity single PC. We advocate this approach as an inexpensive way to scale the KNN computation compared to the high cost of a distributed algorithm, both in terms of computational resources as well as coding, debugging and deployment effort. Secondly, we propose a multithreading out-of-core approach to face the challenges of computing KNN on data that changes rapidly and continuously over time. After a thorough evaluation, we observe that our main contributions address the challenges of computing the KNN on large datasets, leveraging the restricted resources of a single machine, decreasing runtimes compared to that of the baselines, and scaling the computation both on static and dynamic datasets

APA, Harvard, Vancouver, ISO, and other styles

5

Wong, Wing Sing. "K-nearest-neighbor queries with non-spatial predicates on range attributes /." View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?COMP%202005%20WONGW.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Aikes, Junior Jorge. "Estudo da influência de diversas medidas de similaridade na previsão de séries temporais utilizando o algoritmo KNN-TSP." Universidade Estadual do Oeste do Parana, 2012. http://tede.unioeste.br:8080/tede/handle/tede/1084.

Full text

Abstract:

Made available in DSpace on 2017-07-10T17:11:50Z (GMT). No. of bitstreams: 1 JORGE AIKES JUNIOR.PDF: 2050278 bytes, checksum: f5bae18bbcb7465240488c45b2c813e7 (MD5) Previous issue date: 2012-04-11
Time series can be understood as any set of observations which are time ordered. Among the many possible tasks appliable to temporal data, one that has attracted increasing interest, due to its various applications, is the time series forecasting. The k-Nearest Neighbor - Time Series Prediction (kNN-TSP) algorithm is a non-parametric method for forecasting time series. One of its advantages, is its easiness application when compared to parametric methods. Even though its easier to deﬁne kNN-TSP s parameters, some issues remain opened. This research is focused on the study of one of these parameters: the similarity measure. This parameter was empirically evaluated using various similarity measures in a large set of time series, including artiﬁcial series with seasonal and chaotic characteristics, and several real world time series. It was also carried out a case study comparing the predictive accuracy of the kNN-TSP algorithm with the Moving Average (MA), univariate Seasonal Auto-Regressive Integrated Moving Average (SARIMA) and multivariate SARIMA methods in a time series of a Korean s hospital daily patients ﬂow in the Emergency Department. This work also proposes an approach to the development of a hybrid similarity measure which combines characteristics from several measures. The research s result demonstrated that the Lp Norm s measures have an advantage over other measures evaluated, due to its lower computational cost and for providing, in general, greater accuracy in temporal data forecasting using the kNN-TSP algorithm. Although the literature in general adopts the Euclidean similarity measure to calculate de similarity between time series, the Manhattan s distance can be considered an interesting candidate for deﬁning similarity, due to the absence of statistical signiﬁcant difference and to its lower computational cost when compared to the Euclidian measure. The measure proposed in this work does not show signiﬁcant results, but it is promising for further research. Regarding the case study, the kNN-TSP algorithm with only the similarity measure parameter optimized achieves a considerably lower error than the MA s best conﬁguration, and a slightly greater error than the univariate e multivariate SARIMA s optimal settings presenting less than one percent of difference.
Séries temporais podem ser entendidas como qualquer conjunto de observações que se encontram ordenadas no tempo. Dentre as várias tarefas possíveis com dados temporais, uma que tem atraído crescente interesse, devido a suas várias aplicações, é a previsão de séries temporais. O algoritmo k-Nearest Neighbor - Time Series Prediction (kNN-TSP) é um método não-paramétrico de previsão de séries temporais que apresenta como uma de suas vantagens a facilidade de aplicação, quando comparado aos métodos paramétricos. Apesar da maior facilidade na determinação de seus parâmetros, algumas questões relacionadas continuam em aberto. Este trabalho está focado no estudo de um desses parâmetros: a medida de similaridade. Esse parâmetro foi avaliado empiricamente utilizando diversas medidas de similaridade em um grande conjunto de séries temporais que incluem séries artiﬁciais, com características sazonais e caóticas, e várias séries reais. Foi realizado também um estudo de caso comparativo entre a precisão da previsão do algoritmo kNN-TSP e a dos métodos de Médias Móveis (MA), Auto-regressivos de Médias Móveis Integrados Sazonais (SARIMA) univariado e SARIMA multivariado, em uma série de ﬂuxo diário de pacientes na Área de Emergência de um hospital coreano. Neste trabalho é ainda proposta uma abordagem para o desenvolvimento de uma medida de similaridade híbrida, que combine características de várias medidas. Os resultados obtidos neste trabalho demonstram que as medidas da Norma Lp apresentam vantagem sobre as demais medidas avaliadas, devido ao seu menor custo computacional e por apresentar, em geral, maior precisão na previsão de dados temporais utilizando o algoritmo kNN-TSP. Apesar de na literatura, em geral, a medida Euclidiana ser adotada como medida de similaridade, a medida Manhattan pode ser considerada candidata interessante para deﬁnir a similaridade entre séries temporais, devido a não apresentar diferença estatisticamente signiﬁcativa com a medida Euclidiana e possuir menor custo computacional. A medida proposta neste trabalho, não apresenta resultados signiﬁcantes, mas apresenta-se promissora para novas pesquisas. Com relação ao estudo de caso, o algoritmo kNN-TSP, com apenas o parâmetro de medida de similaridade otimizado, alcança um erro consideravelmente inferior a melhor conﬁguração com MA, e pouco maior que as melhores conﬁgurações dos métodos SARIMA univariado e SARIMA multivariado, sendo essa diferença inferior a um por cento.

APA, Harvard, Vancouver, ISO, and other styles

7

Johansson, David. "Price Prediction of Vinyl Records Using Machine Learning Algorithms." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-96464.

Full text

Abstract:

Machine learning algorithms have been used for price prediction within several application areas. Examples include real estate, the stock market, tourist accommodation, electricity, art, cryptocurrencies, and fine wine. Common approaches in studies are to evaluate the accuracy of predictions and compare different algorithms, such as Linear Regression or Neural Networks. There is a thriving global second-hand market for vinyl records, but the research of price prediction within the area is very limited. The purpose of this project was to expand on existing knowledge within price prediction in general to evaluate some aspects of price prediction of vinyl records. That included investigating the possible level of accuracy and comparing the efficiency of algorithms. A dataset of 37000 samples of vinyl records was created with data from the Discogs website, and multiple machine learning algorithms were utilized in a controlled experiment. Among the conclusions drawn from the results was that the Random Forest algorithm generally generated the strongest results, that results can vary substantially between different artists or genres, and that a large part of the predictions had a good accuracy level, but that a relatively small amount of large errors had a considerable effect on the general results.

APA, Harvard, Vancouver, ISO, and other styles

8

Mestre, Ricardo Jorge Palheira. "Improvements on the KNN classifier." Master's thesis, Faculdade de Ciências e Tecnologia, 2013. http://hdl.handle.net/10362/10923.

Full text

Abstract:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática
The object classification is an important area within the artificial intelligence and its application extends to various areas, whether or not in the branch of science. Among the other classifiers, the K-nearest neighbor (KNN) is among the most simple and accurate especially in environments where the data distribution is unknown or apparently not parameterizable. This algorithm assigns the classifying element the major class in the K nearest neighbors. According to the original algorithm, this classification implies the calculation of the distances between the classifying instance and each one of the training objects. If on the one hand, having an extensive training set is an element of importance in order to obtain a high accuracy, on the other hand, it makes the classification of each object slower due to its lazy-learning algorithm nature. Indeed, this algorithm does not provide any means of storing information about the previous calculated classifications,making the calculation of the classification of two equal instances mandatory. In a way, it may be said that this classifier does not learn. This dissertation focuses on the lazy-learning fragility and intends to propose a solution that transforms the KNNinto an eager-learning classifier. In other words, it is intended that the algorithm learns effectively with the training set, thus avoiding redundant calculations. In the context of the proposed change in the algorithm, it is important to highlight the attributes that most characterize the objects according to their discriminating power. In this framework, there will be a study regarding the implementation of these transformations on data of different types: continuous and/or categorical.

APA, Harvard, Vancouver, ISO, and other styles

9

Liu, Dongqing. "GENETIC ALGORITHMS FOR SAMPLE CLASSIFICATION OF MICROARRAY DATA." University of Akron / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=akron1125253420.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Neo, TohKoon. "A Direct Algorithm for the K-Nearest-Neighbor Classifier via Local Warping of the Distance Metric." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2168.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Borén, Mirjam. "Classification of discrete stress levels in users using eye tracker and K- Nearest Neighbour algorithm." Thesis, Umeå universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-176258.

Full text

Abstract:

The advancement of the Head Mounted Display (HMD) used for Virtual Reality (VR) has come a long way and now the option of eye tracking is available in some HMD. The eyes show physiological responses when healthy individuals are stressed, justifying eye tracking as a tool to estimate at minimum, the very presence of stress. Stress can present itself in many shapes and may be caused by different factors such as work, social situations, cognitive load and many others. The stress test Group Stroop Color Word Test (GSCWT) can induce four different levels of stress in users; no stress, low stress, medium stress and high stress. In this thesis GSCWT was implemented in a virtual reality and users had their pupil dilation and blinking rate recorded. The data was then used to train and test a K-Nearest Neighbour algorithm (KNN). The KNN- algorithm could not accurately predict between the four different stress classes but it could predict the presence or absence of stress. VR has been used successfully as a tool for practicing different social skills and other everyday life skills for individuals with Autism Spectrum Disorder (ASD). By correctly identifying the stress level in the user in VR, tools for practicing social skills for ASD individuals could be more personalized and improved.

APA, Harvard, Vancouver, ISO, and other styles

12

Rudin, Pierre. "Football result prediction using simple classification algorithms, a comparison between k-Nearest Neighbor and Linear Regression." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-187659.

Full text

Abstract:

Ever since humans started competing with each other, people have tried to accurately predict the outcome of such events. Football is no exception to this and is extra interesting as subject for a project like this with the ever growing amount of data gathered from matches these days. Previously predictors had to make there predictions using there own knowledge and small amounts of data. This report will use this growing amount of data and find out if it is possible to accurately predict the outcome of a football match using the k-Nearest Neighbor algorithm and Linear regression. The algorithms are compared on how accurately they predict the winner of a match, how precise they predict how many goals each team will score and the accuracy of the predicted goal difference. The results are graphed and presented in tables. A discussion analyzes the results and draw the conclusion that booth algorithms could be useful if used with a good model, and that Linear Regression out performs k-NN.
Ända sedan vi människor började tävla mot varandra, har folk försökt förutspå vinnaren i tävlingarna. Fotboll är inget undantag till detta och är extra intressant för den här studien då den tillgängliga mängden data från fotbollsmatcher ständigt ökar. Tidigare har egna kunskaper och små mängder data använts för att förutspå resultaten. Den här rapporten kommer dra nytta av den växande mängden data för att ta reda på om det är möjligt att med hjälp av k-Nearest Neighbor algoritmen och Linjär regression förutspå resultat i fotbollsmatcher. Algoritmerna kommer jämföras utifrån hur exakt de förutspår vinnaren i matcher, hur många mål de båda lagen gör samt hur precist algoritmerna förutspår målskilnaden i matcherna. Resultaten presenteras både i grafer och i tabeller. En diskusion förs för att analysera resultaten och kommer fram till att båda algoritmerna kan vara användbara om modelen är välkonstruerad, och att Linjär regression är bättre lämpad än k-NN.

APA, Harvard, Vancouver, ISO, and other styles

13

Neo, Toh Koon Charlie. "A direct boosting algorithm for the k-nearest neighbor classifier via local warping of the distance metric /." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2168.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Karginova, Nadezda. "Identification of Driving Styles in Buses." Thesis, Halmstad University, Intelligent systems (IS-lab), 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-4830.

Full text

Abstract:

It is important to detect faults in bus details at an early stage. Because the driving style affects the breakdown of different details in the bus, identification of the driving style is important to minimize the number of failures in buses.

The identification of the driving style of the driver was based on the input data which contained examples of the driving runs of each class. K-nearest neighbor and neural networks algorithms were used. Different models were tested.

It was shown that the results depend on the selected driving runs. A hypothesis was suggested that the examples from different driving runs have different parameters which affect the results of the classification.

The best results were achieved by using a subset of variables chosen with help of the forward feature selection procedure. The percent of correct classifications is about 89-90 % for the k-nearest neighbor algorithm and 88-93 % for the neural networks.

Feature selection allowed a significant improvement in the results of the k-nearest neighbor algorithm and in the results of the neural networks algorithm received for the case when the training and testing data sets were selected from the different driving runs. On the other hand, feature selection did not affect the results received with the neural networks for the case when the training and testing data sets were selected from the same driving runs.

Another way to improve the results is to use smoothing. Computing the average class among a number of consequent examples allowed achieving a decrease in the error.

APA, Harvard, Vancouver, ISO, and other styles

15

Agarwal, Akrita. "Exploring the Noise Resilience of Combined Sturges Algorithm." University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1447070335.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Mao, Qian. "Clusters Identification: Asymmetrical Case." Thesis, Uppsala universitet, Informationssystem, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-208328.

Full text

Abstract:

Cluster analysis is one of the typical tasks in Data Mining, and it groups data objects based only on information found in the data that describes the objects and their relationships. The purpose of this thesis is to verify a modified K-means algorithm in asymmetrical cases, which can be regarded as an extension to the research of Vladislav Valkovsky and Mikael Karlsson in Department of Informatics and Media. In this thesis an experiment is designed and implemented to identify clusters with the modified algorithm in asymmetrical cases. In the experiment the developed Java application is based on knowledge established from previous research. The development procedures are also described and input parameters are mentioned along with the analysis. This experiment consists of several test suites, each of which simulates the situation existing in real world, and test results are displayed graphically. The findings mainly emphasize the limitations of the algorithm, and future work for digging more essences of the algorithm is also suggested.

APA, Harvard, Vancouver, ISO, and other styles

17

Pathirana, Vindya Kumari. "Nearest Neighbor Foreign Exchange Rate Forecasting with Mahalanobis Distance." Scholar Commons, 2015. http://scholarcommons.usf.edu/etd/5757.

Full text

Abstract:

Foreign exchange (FX) rate forecasting has been a challenging area of study in the past. Various linear and nonlinear methods have been used to forecast FX rates. As the currency data are nonlinear and highly correlated, forecasting through nonlinear dynamical systems is becoming more relevant. The nearest neighbor (NN) algorithm is one of the most commonly used nonlinear pattern recognition and forecasting methods that outperforms the available linear forecasting methods for the high frequency foreign exchange data. The basic idea behind the NN is to capture the local behavior of the data by selecting the instances having similar dynamic behavior. The most relevant k number of histories to the present dynamical structure are the only past values used to predict the future. Due to this reason, NN algorithm is also known as the k-nearest neighbor algorithm (k-NN). Here k represents the number of chosen neighbors. In the k-nearest neighbor forecasting procedure, similar instances are captured through a distance function. Since the forecasts completely depend on the chosen nearest neighbors, the distance plays a key role in the k-NN algorithm. By choosing an appropriate distance, we can improve the performance of the algorithm significantly. The most commonly used distance for k-NN forecasting in the past was the Euclidean distance. Due to possible correlation among vectors at different time frames, distances based on deterministic vectors, such as Euclidean, are not very appropriate when applying for foreign exchange data. Since Mahalanobis distance captures the correlations, we suggest using this distance in the selection of neighbors. In the present study, we used five different foreign currencies, which are among the most traded currencies, to compare the performances of the k-NN algorithm with traditional Euclidean and Absolute distances to performances with the proposed Mahalanobis distance. The performances were compared in two ways: (i) forecast accuracy and (ii) transforming their forecasts in to a more effective technical trading rule. The results were obtained with real FX trading data, and the results showed that the method introduced in this work outperforms the other popular methods. Furthermore, we conducted a thorough investigation of optimal parameter choice with different distance measures. We adopted the concept of distance based weighting to the NN and compared the performances with traditional unweighted NN algorithm based forecasting. Time series forecasting methods, such as Auto regressive integrated moving average process (ARIMA), are widely used in many ares of time series as a forecasting technique. We compared the performances of proposed Mahalanobis distance based k-NN forecasting procedure with the traditional general ARIM- based forecasting algorithm. In this case the forecasts were also transformed into a technical trading strategy to create buy and sell signals. The two methods were evaluated for their forecasting accuracy and trading performances. Multi-step ahead forecasting is an important aspect of time series forecasting. Even though many researchers claim that the k-Nearest Neighbor forecasting procedure outperforms the linear forecasting methods for financial time series data, and the available work in the literature supports this claim with one step ahead forecasting. One of our goals in this work was to improve FX trading with multi-step ahead forecasting. A popular multi-step ahead forecasting strategy was adopted in our work to obtain more than one day ahead forecasts. We performed a comparative study on the performance of single step ahead trading strategy and multi-step ahead trading strategy by using five foreign currency data with Mahalanobis distance based k-nearest neighbor algorithm.

APA, Harvard, Vancouver, ISO, and other styles

18

Pešek, Milan. "Detekce logopedických vad v řeči." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218106.

Full text

Abstract:

The thesis deals with a design and an implementation of software for a detection of logopaedia defects of speech. Due to the need of early logopaedia defects detecting, this software is aimed at a child’s age speaker. The introductory part describes the theory of speech realization, simulation of speech realization for numerical processing, phonetics, logopaedia and basic logopaedia defects of speech. There are also described used methods for feature extraction, for segmentation of words to speech sounds and for features classification into either correct or incorrect pronunciation class. In the next part of the thesis there are results of testing of selected methods presented. For logopaedia speech defects recognition algorithms are used in order to extract the features MFCC and PLP. The segmentation of words to speech sounds is performed on the base of Differential Function method. The extracted features of a sound are classified into either a correct or an incorrect pronunciation class with one of tested methods of pattern recognition. To classify the features, the k-NN, SVN, ANN, and GMM methods are tested.

APA, Harvard, Vancouver, ISO, and other styles

19

Young, Barrington R. St A. "Efficient Algorithms for Data Mining with Federated Databases." University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1179332091.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Torres, Winnie de Lima. "Detecção de desvios vocais utilizando modelos auto regressivos e o algoritmo KNN." PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃO, 2018. https://repositorio.ufrn.br/jspui/handle/123456789/25105.

Full text

Abstract:

Submitted by Automação e Estatística (sst@bczm.ufrn.br) on 2018-05-02T22:45:42Z No. of bitstreams: 1 WinnieDeLimaTorres_DISSERT.pdf: 1538022 bytes, checksum: ad6fc16589291a27b8b718b755afdf44 (MD5)
Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2018-05-07T21:40:35Z (GMT) No. of bitstreams: 1 WinnieDeLimaTorres_DISSERT.pdf: 1538022 bytes, checksum: ad6fc16589291a27b8b718b755afdf44 (MD5)
Made available in DSpace on 2018-05-07T21:40:35Z (GMT). No. of bitstreams: 1 WinnieDeLimaTorres_DISSERT.pdf: 1538022 bytes, checksum: ad6fc16589291a27b8b718b755afdf44 (MD5) Previous issue date: 2018-01-30
Alguns campos da ciência propõem-se a estudar distúrbios no trato vocal a partir de análises sobre padrões de vibração da voz. Em geral, a importância dessas pesquisas está na identificação, em uma fase mais específica, de doenças de maior ou menor gravidade, a serem sanadas com terapia vocal ou que requerem maior atenção, gerando inclusive a necessidade de procedimentos cirúrgicos para o seu controle. Embora, já exista na literatura indicações de que o processamento digital de sinais permite diagnosticar, de um modo não invasivo, patologias laríngeas, como doenças vocais que ocasionem edema, nódulo e paralisia, não existe definição do método mais indicado e das características, ou parâmetros, mais adequados para detectar a presença de desvios vocais. Sendo assim, neste trabalho é proposto um algoritmo para detecção de desvios vocais por meio da análise de sinais de voz. Para a realização deste trabalho, utilizou-se dados constantes no banco de dados Disordered Voice Database, desenvolvido pelo Massachusetts Eye and Ear Infirmary (MEEI), devido sua utilização em pesquisas na área acústica de voz. Foram utilizados 166 sinais contidos nessa base de dados, com sinais de vozes saudáveis e de vozes patológicas afetadas por edema, por nódulo e por paralisia nas pregas vocais. A partir dos sinais de voz, foram gerados modelos Auto Regressivos (AR e ARMA) para representação desses sinais e, utilizando os parâmetros dos modelos obtidos, foi utilizado o algoritmo K-Nearest Neighbors (KNN) para a classificação dos sinais analisados. Com o intuito de analisar a eficiência do algoritmo proposto neste estudo, os resultados obtidos desse algoritmo foram comparados com um método de detecção considerando apenas distância euclidiana entre os sinais. Os resultados encontrados apontam que o método proposto neste trabalho apresenta um bom resultado, gerando uma taxa de acerto na classificação acima de 71% (maior que os 31% a partir do uso da distância euclidiana). Além disso, o método utilizado é de fácil implementação, podendo ser utilizado em hardwares mais simples. Logo, essa pesquisa tem potencial para gerar um classificador barato e acessível para a utilização em larga escala por profissionais de saúde, como uma alternativa de pré análise não invasiva para detecção de patologias otorrinolaringológicas que afetem a voz.
Some fields in Science propose to study vocal tract disorders from an analysis about voice vibration patterns. Generally, the weight of those researches is given by the identification – in a more specific level – of diseases in different stages of severity, which would be redressed through voice therapy or means that require more attention, hence generating the need of surgical procedures for its control. Although there are evidences in literature that the Digital Signal Processing allows a non-invasive diagnosis of laryngeal pathologies, such as vocal cord disorders, which provoke swelling, nodules, and paralyses, there is no definition of any most indicated method, and characteristics or appropriated parameters to detect voice deviations. Thus, the present paper proposes an algorithm to detect vocal deviances through the Voice Signal Analysis. In order to complete this study, it had been used data from the Disordered Voice Database, developed by the Massachusetts Eye and Ear Infirmary (MEEI) due to their wide use in researches regarding the voice and speech. A total of 166 signals from this database were used, including healthy voices and pathologic voices affected by swelling, nodule, and vocal fold paralysis. From the voice signals, autoregressive processes of order (AR and ARMA) were generated for a representation of those signals, and – by using the models’ parameters obtained – it had been used the KNN algorithm for a classification of the signals analyzed. Seeking an analysis of the efficiency of the algorithm proposed in this study, the results obtained from this algorithm were compared to a detection method, which only considers the Euclidian distance between the signals. The results found point that the propositioned method in this work presents a satisfactory result, generating a hit rate on the classification above 71% (more than the 31% from the use of the Euclidian distance). Moreover, the method used is easy to implement, so that it can be used along with simpler hardware. Consequently, this research has the potential to generate a cheap and accessible sorter for wide-scale use by health care professionals as a non-invasive pre-analysis to detect otorhinolaryngological pathologies that affect the voice.

APA, Harvard, Vancouver, ISO, and other styles

21

Curtin, Ryan Ross. "Improving dual-tree algorithms." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54354.

Full text

Abstract:

This large body of work is entirely centered around dual-tree algorithms, a class of algorithm based on spatial indexing structures that often provide large amounts of acceleration for various problems. This work focuses on understanding dual-tree algorithms using a new, tree-independent abstraction, and using this abstraction to develop new algorithms. Stated more clearly, the thesis of this entire work is that we may improve and expand the class of dual-tree algorithms by focusing on and providing improvements for each of the three independent components of a dual-tree algorithm: the type of space tree, the type of pruning dual-tree traversal, and the problem-specific BaseCase() and Score() functions. This is demonstrated by expressing many existing dual-tree algorithms in the tree-independent framework, and focusing on improving each of these three pieces. The result is a formidable set of generic components that can be used to assemble dual-tree algorithms, including faster traversals, improved tree theory, and new algorithms to solve the problems of max-kernel search and k-means clustering.

APA, Harvard, Vancouver, ISO, and other styles

22

Bacchielli, Tommaso. "Algoritmi di Machine Learning per il riconoscimento di attività umane da vibrazioni strutturali." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Find full text

Abstract:

La tesi tratta l'implementazione di algoritmi di "Machine Learning" per il riconoscimento di quattro attività umane (camminata, corsa, bici e auto) sfruttando solo le vibrazioni strutturali che queste producono nel terreno, le quali sono state rilevate mediante due geofoni elettromagnetici (uno orizzontale e uno verticale). Tutte le fasi del progetto, a partire dall'acquisizione ed elaborazione dei dati fino all'implementazione degli algoritmi di "Machine Learning", sono state sviluppate in MATLAB.

APA, Harvard, Vancouver, ISO, and other styles

23

Balocchi, Leonardo. "Anomaly detection mediante algoritmi di machine learning." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Find full text

Abstract:

L'obiettivo dell'elaborato è quello di studiare lo stato di danneggiamento del ponte Z-24, struttura che in fase di demolizione è stata sottoposta ad un danneggiamento progressivo. Lo studio è stato effettuato tramite algoritmi di machine learning ed in particolare sono stati scelti due algoritmi di anomaly detection. Dato che gli algoritmi utilizzati sono due, più chiaramente ACH e KNN, alla fine dell'elaborato è stato effettuato un confronto per capire quale fosse il migliore per l'analisi strutturale. Il confronto fra i due algoritmi è stato fatto tramite matrici di confusione e tramite curve roc.

APA, Harvard, Vancouver, ISO, and other styles

24

Cirincione, Antonio. "Algoritmi di Machine Learning per la Classificazione di Dati Inerziali." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Find full text

Abstract:

Questo studio tratta l'implementazione di due algoritmi di Machine Learning per il riconoscimento di tre attività motorie: camminata, corsa e andare in bicicletta. Mediante l'applicazione per smartphone Matlab Mobile sono state acquisii profili di accelerazione da tre utenti, due usati in fase di Test ed uno usato in fase di Training degli algoritmi. Si sono classificate tali attività estraendo delle opportune feature di interesse alla classificazione, in particolare la deviazione standard del''accelerazione che si è rivelata una buona scelta per discriminare le attività motorie. Gli algoritmi di classificazione testati sono K-Means e Nearest Neighbour e hanno dimostrato, tramite la matrice di confusione, di saper riconoscere se un utente sta svolgendo le attività citate rispettivamente nel 95,6% e 99.6% dei casi.

APA, Harvard, Vancouver, ISO, and other styles

25

Samara, Rafat. "TOP-K AND SKYLINE QUERY PROCESSING OVER RELATIONAL DATABASE." Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH. Forskningsmiljö Informationsteknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-20108.

Full text

Abstract:

Top-k and Skyline queries are a long study topic in database and information retrieval communities and they are two popular operations for preference retrieval. Top-k query returns a subset of the most relevant answers instead of all answers. Efficient top-k processing retrieves the k objects that have the highest overall score. In this paper, some algorithms that are used as a technique for efficient top-k processing for different scenarios have been represented. A framework based on existing algorithms with considering based cost optimization that works for these scenarios has been presented. This framework will be used when the user can determine the user ranking function. A real life scenario has been applied on this framework step by step. Skyline query returns a set of points that are not dominated (a record x dominates another record y if x is as good as y in all attributes and strictly better in at least one attribute) by other points in the given datasets. In this paper, some algorithms that are used for evaluating the skyline query have been introduced. One of the problems in the skyline query which is called curse of dimensionality has been presented. A new strategy that based on the skyline existing algorithms, skyline frequency and the binary tree strategy which gives a good solution for this problem has been presented. This new strategy will be used when the user cannot determine the user ranking function. A real life scenario is presented which apply this strategy step by step. Finally, the advantages of the top-k query have been applied on the skyline query in order to have a quickly and efficient retrieving results.

APA, Harvard, Vancouver, ISO, and other styles

26

Zapletal, Petr. "Klasifikační metody analýzy vrstvy nervových vláken na sítnici." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2010. http://www.nusl.cz/ntk/nusl-218575.

Full text

Abstract:

This thesis is deal with classification for retinal nerve fibre layer. Texture features from six texture analysis methods are used for classification. All methods calculate feature vector from inputs images. This feature vector is characterized for every cluster (class). Classification is realized by three supervised learning algorithms and one unsupervised learning algorithm. The first testing algorithm is called Ho-Kashyap. The next is Bayess classifier NDDF (Normal Density Discriminant Function). The third is the Nearest Neighbor algorithm k-NN and the last tested classifier is algorithm K-means, which belongs to clustering. For better compactness of this thesis, three methods for selection of training patterns in supervised learning algorithms are implemented. The methods are based on Repeated Random Subsampling Cross Validation, K-Fold Cross Validation and Leave One Out Cross Validation algorithms. All algorithms are quantitatively compared in the sense of classication error evaluation.

APA, Harvard, Vancouver, ISO, and other styles

27

Landmér, Pedersen Jesper. "Weighing Machine Learning Algorithms for Accounting RWISs Characteristics in METRo : A comparison of Random Forest, Deep Learning & kNN." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-85586.

Full text

Abstract:

The numerical model to forecast road conditions, Model of the Environment and Temperature of Roads (METRo), laid the foundation of solving the energy balance and calculating the temperature evolution of roads. METRo does this by providing a numerical modelling system making use of Road Weather Information Stations (RWIS) and meteorological projections. While METRo accommodates tools for correcting errors at each station, such as regional differences or microclimates, this thesis proposes machine learning as a supplement to the METRo prognostications for accounting station characteristics. Controlled experiments were conducted by comparing four regression algorithms, that is, recurrent and dense neural network, random forest and k-nearest neighbour, to predict the squared deviation of METRo forecasted road surface temperatures. The results presented reveal that the models utilising the random forest algorithm yielded the most reliable predictions of METRo deviations. However, the study also presents the promise of neural networks and the ability and possible advantage of seasonal adjustments that the networks could offer.

APA, Harvard, Vancouver, ISO, and other styles

28

Raykhel, Ilya Igorevitch. "Real-Time Automatic Price Prediction for eBay Online Trading." BYU ScholarsArchive, 2008. https://scholarsarchive.byu.edu/etd/1631.

Full text

Abstract:

While Machine Learning is one of the most popular research areas in Computer Science, there are still only a few deployed applications intended for use by the general public. We have developed an exemplary application that can be directly applied to eBay trading. Our system predicts how much an item would sell for on eBay based on that item's attributes. We ran our experiments on the eBay laptop category, with prior trades used as training data. The system implements a feature-weighted k-Nearest Neighbor algorithm, using genetic algorithms to determine feature weights. Our results demonstrate an average prediction error of 16%; we have also shown that this application greatly reduces the time a reseller would need to spend on trading activities, since the bulk of market research is now done automatically with the help of the learned model.

APA, Harvard, Vancouver, ISO, and other styles

29

Linton, Thomas. "Forecasting hourly electricity consumption for sets of households using machine learning algorithms." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186592.

Full text

Abstract:

To address inefficiency, waste, and the negative consequences of electricity generation, companies and government entities are looking to behavioural change among residential consumers. To drive behavioural change, consumers need better feedback about their electricity consumption. A monthly or quarterly bill provides the consumer with almost no useful information about the relationship between their behaviours and their electricity consumption. Smart meters are now widely dispersed in developed countries and they are capable of providing electricity consumption readings at an hourly resolution, but this data is mostly used as a basis for billing and not as a tool to assist the consumer in reducing their consumption. One component required to deliver innovative feedback mechanisms is the capability to forecast hourly electricity consumption at the household scale. The work presented by this thesis is an evaluation of the effectiveness of a selection of kernel based machine learning methods at forecasting the hourly aggregate electricity consumption for different sized sets of households. The work of this thesis demonstrates that k-Nearest Neighbour Regression and Gaussian process Regression are the most accurate methods within the constraints of the problem considered. In addition to accuracy, the advantages and disadvantages of each machine learning method are evaluated, and a simple comparison of each algorithms computational performance is made.
För att ta itu med ineffektivitet, avfall, och de negativa konsekvenserna av elproduktion så vill företag och myndigheter se beteendeförändringar bland hushållskonsumenter. För att skapa beteendeförändringar så behöver konsumenterna bättre återkoppling när det gäller deras elförbrukning. Den nuvarande återkopplingen i en månads- eller kvartalsfaktura ger konsumenten nästan ingen användbar information om hur deras beteenden relaterar till deras konsumtion. Smarta mätare finns nu överallt i de utvecklade länderna och de kan ge en mängd information om bostäders konsumtion, men denna data används främst som underlag för fakturering och inte som ett verktyg för att hjälpa konsumenterna att minska sin konsumtion. En komponent som krävs för att leverera innovativa återkopplingsmekanismer är förmågan att förutse elförbrukningen på hushållsskala. Arbetet som presenteras i denna avhandling är en utvärdering av noggrannheten hos ett urval av kärnbaserad maskininlärningsmetoder för att förutse den sammanlagda förbrukningen för olika stora uppsättningar av hushåll. Arbetet i denna avhandling visar att "k-Nearest Neighbour Regression" och "Gaussian Process Regression" är de mest exakta metoder inom problemets begränsningar. Förutom noggrannhet, så görs en utvärdering av fördelar, nackdelar och prestanda hos varje maskininlärningsmetod.

APA, Harvard, Vancouver, ISO, and other styles

30

Duan, Haoyang. "Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery Disease." Thèse, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/31113.

Full text

Abstract:

From a fresh data science perspective, this thesis discusses the prediction of coronary artery disease based on Single-Nucleotide Polymorphisms (SNPs) from the Ontario Heart Genomics Study (OHGS). First, the thesis explains the k-Nearest Neighbour (k-NN) and Random Forest learning algorithms, and includes a complete proof that k-NN is universally consistent in finite dimensional normed vector spaces. Second, the thesis introduces two dimensionality reduction techniques: Random Projections and a new method termed Mass Transportation Distance (MTD) Feature Selection. Then, this thesis compares the performance of Random Projections with k-NN against MTD Feature Selection and Random Forest for predicting artery disease. Results demonstrate that MTD Feature Selection with Random Forest is superior to Random Projections and k-NN. Random Forest is able to obtain an accuracy of 0.6660 and an area under the ROC curve of 0.8562 on the OHGS dataset, when 3335 SNPs are selected by MTD Feature Selection for classification. This area is considerably better than the previous high score of 0.608 obtained by Davies et al. in 2010 on the same dataset.

APA, Harvard, Vancouver, ISO, and other styles

31

Guňka, Jiří. "Adaptivní klient pro sociální síť Twitter." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-237052.

Full text

Abstract:

The goal of this term project is create user friendly client of Twitter. They may use methods of machine learning as naive bayes classifier to mentions new interests tweets. For visualissation this tweets will be use hyperbolic trees and some others methods.

APA, Harvard, Vancouver, ISO, and other styles

32

Bastabak, Burcu. "A Data Mining Framework To Detect Tariff Code Circumvention In Turkish Customs Database." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614616/index.pdf.

Full text

Abstract:

Customs and foreign trade regulations are made to regulate import and export activities. The majority of these regulations are applied on import procedures. The country of origin and the tariff code become important when determining the tax amount of the merchandise in importation. Anti-dumping duty is defined as a financial penalty, published by the Ministry of Economy, enforced for suspiciously low priced imports in order to protect the local industry from unfair competition. It is accrued according to tariff code and the country of origin. To avoid such an obligation in order to not to pay tax, a tariff code that is different from the original tariff code may be declared on the customs declaration which is called as "
Tariff Code Circumvention"
. To identify such misdeclarations, a physical examination of the merchandise is required. However, with limited personnel resources, the physical examination of all imported merchandise is not possible. In this study, a data mining framework is developed on Turkish customs database in order to detect &ldquo
Tariff Code Circumvention&rdquo
. For this purpose, four types of products, which are the most circumvented goods in the Turkish customs, have been chosen. First, with the help of Risk Analysis Office, the significant features are identified. Then, Infogain algorithm is used for ranking these features. Finally, KNN algorithm is applied on the Turkish customs database in order to identify the circumvented goods automatically. The results show that the framework is able to find such circumvented goods successfully.

APA, Harvard, Vancouver, ISO, and other styles

33

Jiao, Lianmeng. "Classification of uncertain data in the framework of belief functions : nearest-neighbor-based and rule-based approaches." Thesis, Compiègne, 2015. http://www.theses.fr/2015COMP2222/document.

Full text

Abstract:

Dans de nombreux problèmes de classification, les données sont intrinsèquement incertaines. Les données d’apprentissage disponibles peuvent être imprécises, incomplètes, ou même peu fiables. En outre, des connaissances spécialisées partielles qui caractérisent le problème de classification peuvent également être disponibles. Ces différents types d’incertitude posent de grands défis pour la conception de classifieurs. La théorie des fonctions de croyance fournit un cadre rigoureux et élégant pour la représentation et la combinaison d’une grande variété d’informations incertaines. Dans cette thèse, nous utilisons cette théorie pour résoudre les problèmes de classification des données incertaines sur la base de deux approches courantes, à savoir, la méthode des k plus proches voisins (kNN) et la méthode à base de règles.Pour la méthode kNN, une préoccupation est que les données d’apprentissage imprécises dans les régions où les classes de chevauchent peuvent affecter ses performances de manière importante. Une méthode d’édition a été développée dans le cadre de la théorie des fonctions de croyance pour modéliser l’information imprécise apportée par les échantillons dans les régions qui se chevauchent. Une autre considération est que, parfois, seul un ensemble de données d’apprentissage incomplet est disponible, auquel cas les performances de la méthode kNN se dégradent considérablement. Motivé par ce problème, nous avons développé une méthode de fusion efficace pour combiner un ensemble de classifieurs kNN couplés utilisant des métriques couplées apprises localement. Pour la méthode à base de règles, afin d’améliorer sa performance dans les applications complexes, nous étendons la méthode traditionnelle dans le cadre des fonctions de croyance. Nous développons un système de classification fondé sur des règles de croyance pour traiter des informations incertains dans les problèmes de classification complexes. En outre, dans certaines applications, en plus de données d’apprentissage, des connaissances expertes peuvent également être disponibles. Nous avons donc développé un système de classification hybride fondé sur des règles de croyance permettant d’utiliser ces deux types d’information pour la classification
In many classification problems, data are inherently uncertain. The available training data might be imprecise, incomplete, even unreliable. Besides, partial expert knowledge characterizing the classification problem may also be available. These different types of uncertainty bring great challenges to classifier design. The theory of belief functions provides a well-founded and elegant framework to represent and combine a large variety of uncertain information. In this thesis, we use this theory to address the uncertain data classification problems based on two popular approaches, i.e., the k-nearest neighbor rule (kNN) andrule-based classification systems. For the kNN rule, one concern is that the imprecise training data in class over lapping regions may greatly affect its performance. An evidential editing version of the kNNrule was developed based on the theory of belief functions in order to well model the imprecise information for those samples in over lapping regions. Another consideration is that, sometimes, only an incomplete training data set is available, in which case the ideal behaviors of the kNN rule degrade dramatically. Motivated by this problem, we designedan evidential fusion scheme for combining a group of pairwise kNN classifiers developed based on locally learned pairwise distance metrics.For rule-based classification systems, in order to improving their performance in complex applications, we extended the traditional fuzzy rule-based classification system in the framework of belief functions and develop a belief rule-based classification system to address uncertain information in complex classification problems. Further, considering that in some applications, apart from training data collected by sensors, partial expert knowledge can also be available, a hybrid belief rule-based classification system was developed to make use of these two types of information jointly for classification

APA, Harvard, Vancouver, ISO, and other styles

34

Bahri, Maroua. "Improving IoT data stream analytics using summarization techniques." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT017.

Full text

Abstract:

Face à cette évolution technologique vertigineuse, l’utilisation des dispositifs de l'Internet des Objets (IdO), les capteurs, et les réseaux sociaux, d'énormes flux de données IdO sont générées quotidiennement de différentes applications pourront être transformées en connaissances à travers l’apprentissage automatique. En pratique, de multiples problèmes se posent afin d’extraire des connaissances utiles de ces flux qui doivent être gérés et traités efficacement. Dans ce contexte, cette thèse vise à améliorer les performances (en termes de mémoire et de temps) des algorithmes de l'apprentissage supervisé, principalement la classification à partir de flux de données en évolution. En plus de leur nature infinie, la dimensionnalité élevée et croissante de ces flux données dans certains domaines rendent la tâche de classification plus difficile. La première partie de la thèse étudie l’état de l’art des techniques de classification et de réduction de dimension pour les flux de données, tout en présentant les travaux les plus récents dans ce cadre.La deuxième partie de la thèse détaille nos contributions en classification pour les flux de données. Il s’agit de nouvelles approches basées sur les techniques de réduction de données visant à réduire les ressources de calcul des classificateurs actuels, presque sans perte en précision. Pour traiter les flux de données de haute dimension efficacement, nous incorporons une étape de prétraitement qui consiste à réduire la dimension de chaque donnée (dès son arrivée) de manière incrémentale avant de passer à l’apprentissage. Dans ce contexte, nous présentons plusieurs approches basées sur: Bayesien naïf amélioré par les résumés minimalistes et hashing trick, k-NN qui utilise compressed sensing et UMAP, et l’utilisation d’ensembles d’apprentissage également
With the evolution of technology, the use of smart Internet-of-Things (IoT) devices, sensors, and social networks result in an overwhelming volume of IoT data streams, generated daily from several applications, that can be transformed into valuable information through machine learning tasks. In practice, multiple critical issues arise in order to extract useful knowledge from these evolving data streams, mainly that the stream needs to be efficiently handled and processed. In this context, this thesis aims to improve the performance (in terms of memory and time) of existing data mining algorithms on streams. We focus on the classification task in the streaming framework. The task is challenging on streams, principally due to the high -- and increasing -- data dimensionality, in addition to the potentially infinite amount of data. The two aspects make the classification task harder.The first part of the thesis surveys the current state-of-the-art of the classification and dimensionality reduction techniques as applied to the stream setting, by providing an updated view of the most recent works in this vibrant area.In the second part, we detail our contributions to the field of classification in streams, by developing novel approaches based on summarization techniques aiming to reduce the computational resource of existing classifiers with no -- or minor -- loss of classification accuracy. To address high-dimensional data streams and make classifiers efficient, we incorporate an internal preprocessing step that consists in reducing the dimensionality of input data incrementally before feeding them to the learning stage. We present several approaches applied to several classifications tasks: Naive Bayes which is enhanced with sketches and hashing trick, k-NN by using compressed sensing and UMAP, and also integrate them in ensemble methods

APA, Harvard, Vancouver, ISO, and other styles

35

Prokopová, Ivona. "Detekce fibrilace síní v EKG." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-413170.

Full text

Abstract:

Atrial fibrillation is one of the most common cardiac rhythm disorders characterized by ever-increasing prevalence and incidence in the Czech Republic and abroad. The incidence of atrial fibrillation is reported at 2-4 % of the population, but due to the often asymptomatic course, the real prevalence is even higher. The aim of this work is to design an algorithm for automatic detection of atrial fibrillation in the ECG record. In the practical part of this work, an algorithm for the detection of atrial fibrillation is proposed. For the detection itself, the k-nearest neighbor method, the support vector method and the multilayer neural network were used to classify ECG signals using features indicating the variability of RR intervals and the presence of the P wave in the ECG recordings. The best detection was achieved by a model using a multilayer neural network classification with two hidden layers. Results of success indicators: Sensitivity 91.23 %, Specificity 99.20 %, PPV 91.23 %, F-measure 91.23 % and Accuracy 98.53 %.

APA, Harvard, Vancouver, ISO, and other styles

36

Bílý, Ondřej. "Moderní řečové příznaky používané při diagnóze chorob." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2011. http://www.nusl.cz/ntk/nusl-218971.

Full text

Abstract:

This work deals with the diagnosis of Parkinson's disease by analyzing the speech signal. At the beginning of this work there is described speech signal production. The following is a description of the speech signal analysis, its preparation and subsequent feature extraction. Next there is described Parkinson's disease and change of the speech signal by this disability. The following describes the symptoms, which are used for the diagnosis of Parkinson's disease (FCR, VSA, VOT, etc.). Another part of the work deals with the selection and reduction symptoms using the learning algorithms (SVM, ANN, k-NN) and their subsequent evaluation. In the last part of the thesis is described a program to count symptoms. Further is described selection and the end evaluated all the result.

APA, Harvard, Vancouver, ISO, and other styles

37

Klimeš, Filip. "Zpracování obrazových sekvencí sítnice z fundus kamery." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2015. http://www.nusl.cz/ntk/nusl-220975.

Full text

Abstract:

Cílem mé diplomové práce bylo navrhnout metodu analýzy retinálních sekvencí, která bude hodnotit kvalitu jednotlivých snímků. V teoretické části se také zabývám vlastnostmi retinálních sekvencí a způsobem registrace snímků z fundus kamery. V praktické části je implementována metoda hodnocení kvality snímků, která je otestována na reálných retinálních sekvencích a vyhodnocena její úspěšnost. Práce hodnotí i vliv této metody na registraci retinálních snímků.

APA, Harvard, Vancouver, ISO, and other styles

38

Dočekal, Martin. "Porovnání klasifikačních metod." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403211.

Full text

Abstract:

This thesis deals with a comparison of classification methods. At first, these classification methods based on machine learning are described, then a classifier comparison system is designed and implemented. This thesis also describes some classification tasks and datasets on which the designed system will be tested. The evaluation of classification tasks is done according to standard metrics. In this thesis is presented design and implementation of a classifier that is based on the principle of evolutionary algorithms.

APA, Harvard, Vancouver, ISO, and other styles

39

Chang, Tung-Lin, and 張東琳. "Improvement Sleep Apnoea Auxiliary Equipment Performance With k-nearest Neighbors Algorithm." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/jdys75.

Full text

Abstract:

碩士
國立中央大學
光機電工程研究所
106
In this paper, we use non-invasive continuous detection to monitor blood oxygen and photoplethysmogram by pulse oximetry .For the various monitoring data preprocessing including regression analysis and frequency domain analysis are performed, and the training set is obtained after obtaining a plurality of feature samples. The KNN classification algorithm is used to estimate the clinical Respiratory/Disturbance Index (RDI) value of the subject, and the data is transmitted back to the Internet. The control signal is transmitted through the wireless communication module to the Sleep Apnoea Auxiliary Equipment ,which is developed by the research and called “POM Pillow”. This research also designed a variety of sleep posture for trigger condition. In conclusion, the “POM Pillow” can improve the effectively frequency of obstructive respiratory arrest in patients suffering from sleep apnea to improve sleep quality.

APA, Harvard, Vancouver, ISO, and other styles

40

Pei-NiChen and 陳貝妮. "Diagnosis System of Rotor Faults for Three Phase Induction Motor Based on K-Nearest Neighbors Algorithm." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/r48sdb.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Huang, Haitao. "Spatial Analysis of Retinal Pigment Epithelium Morphology." 2016. http://scholarworks.gsu.edu/math_theses/153.

Full text

Abstract:

In patients with age-related macular degeneration, a monolayer of cells in the eyes called retinal pigment epithelium differ from healthy ones in morphology. It is therefore important to quantify the morphological changes, which will help us better understand the physiology, disease progression and classification. Classification of the RPE morphometry has been accomplished with whole tissue data. In this work, we focused on the spatial aspect of RPE morphometric analysis. We used the second-order spatial analysis to reveal the distinct patterns of cell clustering between normal and diseased eyes for both simulated and experimental human RPE data. We classified the mouse genotype and age by the k-Nearest Neighbors algorithm. Radially aligned regions showed different classification power for several cell shape variables. Our proposed methods provide a useful addition to classification and prognosis of eye disease noninvasively.

APA, Harvard, Vancouver, ISO, and other styles

42

Gu, Yu-Jia, and 古祐嘉. "Adaptive K-Nearest Neighbor Algorithm." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/35887089581319797969.

Full text

Abstract:

碩士
元智大學
資訊管理學系
97
The K-nearest-neighbor algorithm traditionally predicts the class of a record based on the decision from the K nearest neighbors of the record, for a fixed K value. However, recent studies showed that using different K values for different records could improve the prediction accuracy. This study integrates Fuzzy C-means algorithm to assist determining a proper K value for each record in a local KNN algorithm. Performance results show this method outperforms the traditional KNN in term of prediction accuracy.

APA, Harvard, Vancouver, ISO, and other styles

43

LIN, CHENG-YI, and 林承毅. "Accelerating k-Nearest Neighbor Algorithm Using GPU and Chunking Method." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/3hx66j.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Chen, Nai-Wen, and 陳艿玟. "Feature Weighting for k-Nearest Neighbor Classifiers Using Differential Evolution Algorithms." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/3gfxz4.

Full text

Abstract:

碩士
國立臺灣海洋大學
資訊工程學系
104
Since the industrial revolution, people seek to replace human workers with machines in terms of benefits in labor, time and cost savings etc. With the advances in hardware and software technology in the recent years, data collected in practice are becoming larger, fast-changing and more complex. Big Data, which contain large-scale and/or high-dimensional data, cause serious obstacles for people in data interpretation and applications. As a result, machine learning has been a popular research topic within many fields of study. Machine learning, which can iteratively learn from data, allows computers to find hidden insights of data without explicit knowledge. Machine learning techniques have been widely applied to mine valuable information and help us in the decision making process. In dealing with high-dimensional Big Data, the determination of feature importance plays a key issue in order to reduce the high complexity of computing and data storage. This paper presents a method to determine feature importance and feature weighting using an integration of Differential Evolution (DE) algorithm and k-Nearest Neighbors (kNN) algorithm. DE algorithm, a heuristic optimization algorithm, follows biological evolution via mutation, crossover and selection operations to find an optimal solution. The kNN algorithm is a simple classifier algorithm but works incredibly well in various fields in practice. In our proposed method, the weights of features and the k value for kNN are first chosen by DE algorithm and then evaluated by the accuracy performance of kNN algorithm. Our experimental results on six UCI datasets show that when using appropriate DE parameters, the proposed method can have the better overall accuracy performance and outperform the six compared approaches.

APA, Harvard, Vancouver, ISO, and other styles

45

Scrimieri, Daniele, and S. M. Ratchev. "A k-nearest neighbour technique for experience-based adaptation of assembly stations." 2014. http://hdl.handle.net/10454/17725.

Full text

Abstract:

Yes
We present a technique for automatically acquiring operational knowledge on how to adapt assembly systems to new production demands or recover from disruptions. Dealing with changes and disruptions affecting an assembly station is a complex process which requires deep knowledge of the assembly process, the product being assembled and the adopted technologies. Shop-floor operators typically perform a series of adjustments by trial and error until the expected results in terms of performance and quality are achieved. With the proposed approach, such adjustments are captured and their effect on the station is measured. Adaptation knowledge is then derived by generalising from individual cases using a variant of the k-nearest neighbour algorithm. The operator is informed about potential adaptations whenever the station enters a state similar to one contained in the experience base, that is, a state on which adaptation information has been captured. A case study is presented, showing how the technique enables to reduce adaptation times. The general system architecture in which the technique has been implemented is described, including the role of the different software components and their interactions.

APA, Harvard, Vancouver, ISO, and other styles

46

Ku, Chin, and 顧堇. "The study on reproducibility of modified genetic algorithms/k-nearest neighbors method for microarray data." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/44038757595629869436.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Chang, Chung-Ting, and 張仲霆. "An Application of V2V Communication: Cooperative Vehicle Positioning System based on Topology Matching and k-Nearest Neighbor Algorithm." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/23xetz.

Full text

Abstract:

碩士
國立臺灣大學
電機工程學研究所
104
As V2V (Vehicle to Vehicle) and V2I (Vehicle to Infrastructure) area are well researched in recent years where the V2V technology can allow vehicles share information with nearby vehicles and the V2I technology can allow vehicles share information with nearby infrastructures by wireless communication device. The advanced driving assistance system can be divided into self-sufficient systems and interactive systems. The interactive systems, as the name implies, interact with infrastructures and/or other vehicles where these systems receive spatial information from nearby vehicles to prevent from forward collision. While the self-sufﬁcient systems are limited to line-of-sight detection, the interactive systems account for scenarios farther ahead by predicting the position of occluded vehicle. In this thesis, each vehicle is assumed to generate a local map which is a set of position measurements of nearby vehicles by using onboard low-cost GPS and ranging sensor, and shares it with the nearby vehicles by broadcasting via wireless communication device. When the ego-vehicle receives multiple local maps from nearby vehicles, the received local maps are matched with the local map generated by ego-vehicle by topology matching. The position measurements belong to the same vehicle are clustered by automatic points clustering based on k-Nearest Neighbor algorithm. Those position measurements belong to the same vehicle are combined by adaptive position estimation which updates position estimation according to accuracy of the sensor currently. In this thesis, both simulation results and experimental results by proposed cooperative vehicle positioning system are presented. The simulation results show that the number of detected vehicle by the proposed cooperative vehicle positioning system is more than by a single sensor alone in most of the time. It turns out that a vehicle can get an extended view of surroundings to improve driving safety. The stereo camera is used as a ranging sensor equipped on the vehicle to produce position measurements in a real scenario. In the scenario, there are 3 vehicles nearby the ego-vehicle. First, the ego-vehicle estimates the range to the other 3 vehicles by stereo camera only. The experimental result show that the stereo camera gets a higher range estimation accuracy to the middle vehicle than the side vehicle. Second, the ego-vehicle estimates the range to the other 3 vehicles by the proposed cooperative vehicle positioning system. The position of the ego-vehicle is estimated by 4 measurements where 1 measurement is measured by GPS sensor of the ego-vehicle and the other 3 measurements are measured by both GPS sensors and ranging sensors of the other 3 vehicles respectively. The experimental results show that the accuracy of range estimation by the proposed system is better than by the stereo camera only.

APA, Harvard, Vancouver, ISO, and other styles

48

Shen, Kuo-Cheng, and 沈國丞. "Building a PC-Based Image Inspection System to detect the Blood Eggs with the K-Nearest Neighbor Algorithm." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/vynxa3.

Full text

Abstract:

碩士
國立虎尾科技大學
電機工程系碩士班
105
There are currently 1,300 units established for poultry farm feeding laying hens in Taiwan. However, there are no more than 25 units for which the egg quality meets the CAS standards. At present, equipment needs to be imported for firms to carry out grading and packaging of eggs, and this is very expensive. If the equipment can be developed within Taiwan, then this would reduce costs and raise the quality of eggs. This paper presents a system to detect blood spot in eggs, and a simple man-machine interface for users to quickly adopt this approach. A non-destructive method is proposed based on image detection. A simple box with a light source is sued to make the eggs transparent and then an image is taken. The captured image is then binarized. We then normalize the images, derive the size of the egg, perform median filtering, and then converted the image into HSV color space for color analysis. We take out the H component as a feature, and use the K-Nearest Neighbor classification for processing. Finally, the results of the analysis will be shown on a PC screen, and thus reveal whether the eggs have blood sports or not.

APA, Harvard, Vancouver, ISO, and other styles

49

Vicente, Sergio. "Apprentissage statistique avec le processus ponctuel déterminantal." Thesis, 2021. http://hdl.handle.net/1866/25249.

Full text

Abstract:

Cette thèse aborde le processus ponctuel déterminantal, un modèle probabiliste qui capture la répulsion entre les points d’un certain espace. Celle-ci est déterminée par une matrice de similarité, la matrice noyau du processus, qui spécifie quels points sont les plus similaires et donc moins susceptibles de figurer dans un même sous-ensemble. Contrairement à la sélection aléatoire uniforme, ce processus ponctuel privilégie les sous-ensembles qui contiennent des points diversifiés et hétérogènes. La notion de diversité acquiert une importante grandissante au sein de sciences comme la médecine, la sociologie, les sciences forensiques et les sciences comportementales. Le processus ponctuel déterminantal offre donc une alternative aux traditionnelles méthodes d’échantillonnage en tenant compte de la diversité des éléments choisis. Actuellement, il est déjà très utilisé en apprentissage automatique comme modèle de sélection de sous-ensembles. Son application en statistique est illustrée par trois articles. Le premier article aborde le partitionnement de données effectué par un algorithme répété un grand nombre de fois sur les mêmes données, le partitionnement par consensus. On montre qu’en utilisant le processus ponctuel déterminantal pour sélectionner les points initiaux de l’algorithme, la partition de données finale a une qualité supérieure à celle que l’on obtient en sélectionnant les points de façon uniforme. Le deuxième article étend la méthodologie du premier article aux données ayant un grand nombre d’observations. Ce cas impose un effort computationnel additionnel, étant donné que la sélection de points par le processus ponctuel déterminantal passe par la décomposition spectrale de la matrice de similarité qui, dans ce cas-ci, est de grande taille. On présente deux approches différentes pour résoudre ce problème. On montre que les résultats obtenus par ces deux approches sont meilleurs que ceux obtenus avec un partitionnement de données basé sur une sélection uniforme de points. Le troisième article présente le problème de sélection de variables en régression linéaire et logistique face à un nombre élevé de covariables par une approche bayésienne. La sélection de variables est faite en recourant aux méthodes de Monte Carlo par chaînes de Markov, en utilisant l’algorithme de Metropolis-Hastings. On montre qu’en choisissant le processus ponctuel déterminantal comme loi a priori de l’espace des modèles, le sous-ensemble final de variables est meilleur que celui que l’on obtient avec une loi a priori uniforme.
This thesis presents the determinantal point process, a probabilistic model that captures repulsion between points of a certain space. This repulsion is encompassed by a similarity matrix, the kernel matrix, which selects which points are more similar and then less likely to appear in the same subset. This point process gives more weight to subsets characterized by a larger diversity of its elements, which is not the case with the traditional uniform random sampling. Diversity has become a key concept in domains such as medicine, sociology, forensic sciences and behavioral sciences. The determinantal point process is considered a promising alternative to traditional sampling methods, since it takes into account the diversity of selected elements. It is already actively used in machine learning as a subset selection method. Its application in statistics is illustrated with three papers. The first paper presents the consensus clustering, which consists in running a clustering algorithm on the same data, a large number of times. To sample the initials points of the algorithm, we propose the determinantal point process as a sampling method instead of a uniform random sampling and show that the former option produces better clustering results. The second paper extends the methodology developed in the first paper to large-data. Such datasets impose a computational burden since sampling with the determinantal point process is based on the spectral decomposition of the large kernel matrix. We introduce two methods to deal with this issue. These methods also produce better clustering results than consensus clustering based on a uniform sampling of initial points. The third paper addresses the problem of variable selection for the linear model and the logistic regression, when the number of predictors is large. A Bayesian approach is adopted, using Markov Chain Monte Carlo methods with Metropolis-Hasting algorithm. We show that setting the determinantal point process as the prior distribution for the model space selects a better final model than the model selected by a uniform prior on the model space.

APA, Harvard, Vancouver, ISO, and other styles

50

Lee, Chien-Pang, and 李建邦. "The Study on Gene Selection and Sample Classification Based on Gene Expression Data Using Adaptive Genetic Algorithms / k-Nearest Neighbors Method." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/01635740897987498234.

Full text

Abstract:

碩士
國立中興大學
農藝學系所
94
Microarray technology has become a valuable tool for studying gene expression in recent years. The main difference between microarray and traditional methods is that microarray can measure thousands of genes at the same time. In the past, researchers always used parametric statistical methods to find the significant genes. However, microarray data often cannot obey some assumptions of parametric statistical methods, and type I error would be over expanded while each gene was tested for significance. Therefore, this research was expected to find a variable selection method without assumptions restriction to reduce the dimension of the data set. After using the proposed method, biologists can select the relevant genes according to the sub-gene set. In this study, adaptive genetic algorithms / k-nearest neighbors (AGA / KNN) was used to reduce the dimension of the data set, and it was based on genetic algorithms / k-nearest neighbors (GA / KNN) which was first described by Li et al.(2001a). Although AGA and KNN were well-developed, AGA / KNN was first used to analyze the microarray data. Since AGA was a machine learning tool and KNN was a nonparametric discrimination analysis, both of them could be used without assumptions restriction. There are three main differences between AGA/KNN and GA / KNN. Firstly, the encoding has become binary code, and each string included all genes. Secondly, the adaptive probabilities of crossover and mutation were added. Finally, the extinction and immigration strategy was added. Since GA can just find the near optimal solution, the best string of each run is often not the same. Here, AGA / KNN was repeated by many runs to solve that problem. Thus, lots of the best strings were saved. The frequency of gene was computed by those strings to reduce the dimension of the data set. In this study, an original colon data which is a high-density oligonucleotide chip (Alon et al., 1999) was analyzed. In addition, mice apo AI data which is a cDNA chip (Callow et al., 2000) was also used to compare the ability of gene selection of AGA / KNN and GA / KNN. Based on the results, it was found that AGA / KNN and GA / KNN could reduce the dimension of the data set and all samples could be classified correctly. But the accuracy of AGA / KNN was higher than that of GA / KNN, and it only took half CPU time of GA / KNN. Therefore, it was claimed that the performance of AGA / KNN should not be worse than that of GA / KNN. Finally, we suggested that when AGA / KNN was employed to analyze the microarray data, the top 50 and up to 100 most frequent genes were selected after AGA / KNN were repeated about 100 runs. Those selected genes should include relevant genes, and those selected genes could classify sample correctly.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'K-Nearest Neighbors algorithm'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles