Dissertations / Theses on the topic 'K-Nearest Neighbors algorithm'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'K-Nearest Neighbors algorithm.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Li, Zheng, and Zheng Li. "Improving Estimation Accuracy of GPS-Based Arterial Travel Time Using K-Nearest Neighbors Algorithm." Thesis, The University of Arizona, 2017. http://hdl.handle.net/10150/625901.
Full textPiro, Paolo. "Learning prototype-based classification rules in a boosting framework: application to real-world and medical image categorization." Phd thesis, Université de Nice Sophia-Antipolis, 2010. http://tel.archives-ouvertes.fr/tel-00590403.
Full textGupta, Nidhi. "Mutual k Nearest Neighbor based Classifier." University of Cincinnati / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1289937369.
Full textOlivares, Javier. "Scaling out-of-core k-nearest neighbors computation on single machines." Thesis, Rennes 1, 2016. http://www.theses.fr/2016REN1S073/document.
Full textThe K-Nearest Neighbors (KNN) is an efficient method to find similar data among a large set of it. Over the years, a huge number of applications have used KNN's capabilities to discover similarities within the data generated in diverse areas such as business, medicine, music, and computer science. Despite years of research have brought several approaches of this algorithm, its implementation still remains a challenge, particularly today where the data is growing at unthinkable rates. In this context, running KNN on large datasets brings two major issues: huge memory footprints and very long runtimes. Because of these high costs in terms of computational resources and time, KNN state-of the-art works do not consider the fact that data can change over time, assuming always that the data remains static throughout the computation, which unfortunately does not conform to reality at all. In this thesis, we address these challenges in our contributions. Firstly, we propose an out-of-core approach to compute KNN on large datasets, using a commodity single PC. We advocate this approach as an inexpensive way to scale the KNN computation compared to the high cost of a distributed algorithm, both in terms of computational resources as well as coding, debugging and deployment effort. Secondly, we propose a multithreading out-of-core approach to face the challenges of computing KNN on data that changes rapidly and continuously over time. After a thorough evaluation, we observe that our main contributions address the challenges of computing the KNN on large datasets, leveraging the restricted resources of a single machine, decreasing runtimes compared to that of the baselines, and scaling the computation both on static and dynamic datasets
Wong, Wing Sing. "K-nearest-neighbor queries with non-spatial predicates on range attributes /." View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?COMP%202005%20WONGW.
Full textAikes, Junior Jorge. "Estudo da influência de diversas medidas de similaridade na previsão de séries temporais utilizando o algoritmo KNN-TSP." Universidade Estadual do Oeste do Parana, 2012. http://tede.unioeste.br:8080/tede/handle/tede/1084.
Full textTime series can be understood as any set of observations which are time ordered. Among the many possible tasks appliable to temporal data, one that has attracted increasing interest, due to its various applications, is the time series forecasting. The k-Nearest Neighbor - Time Series Prediction (kNN-TSP) algorithm is a non-parametric method for forecasting time series. One of its advantages, is its easiness application when compared to parametric methods. Even though its easier to define kNN-TSP s parameters, some issues remain opened. This research is focused on the study of one of these parameters: the similarity measure. This parameter was empirically evaluated using various similarity measures in a large set of time series, including artificial series with seasonal and chaotic characteristics, and several real world time series. It was also carried out a case study comparing the predictive accuracy of the kNN-TSP algorithm with the Moving Average (MA), univariate Seasonal Auto-Regressive Integrated Moving Average (SARIMA) and multivariate SARIMA methods in a time series of a Korean s hospital daily patients flow in the Emergency Department. This work also proposes an approach to the development of a hybrid similarity measure which combines characteristics from several measures. The research s result demonstrated that the Lp Norm s measures have an advantage over other measures evaluated, due to its lower computational cost and for providing, in general, greater accuracy in temporal data forecasting using the kNN-TSP algorithm. Although the literature in general adopts the Euclidean similarity measure to calculate de similarity between time series, the Manhattan s distance can be considered an interesting candidate for defining similarity, due to the absence of statistical significant difference and to its lower computational cost when compared to the Euclidian measure. The measure proposed in this work does not show significant results, but it is promising for further research. Regarding the case study, the kNN-TSP algorithm with only the similarity measure parameter optimized achieves a considerably lower error than the MA s best configuration, and a slightly greater error than the univariate e multivariate SARIMA s optimal settings presenting less than one percent of difference.
Séries temporais podem ser entendidas como qualquer conjunto de observações que se encontram ordenadas no tempo. Dentre as várias tarefas possíveis com dados temporais, uma que tem atraído crescente interesse, devido a suas várias aplicações, é a previsão de séries temporais. O algoritmo k-Nearest Neighbor - Time Series Prediction (kNN-TSP) é um método não-paramétrico de previsão de séries temporais que apresenta como uma de suas vantagens a facilidade de aplicação, quando comparado aos métodos paramétricos. Apesar da maior facilidade na determinação de seus parâmetros, algumas questões relacionadas continuam em aberto. Este trabalho está focado no estudo de um desses parâmetros: a medida de similaridade. Esse parâmetro foi avaliado empiricamente utilizando diversas medidas de similaridade em um grande conjunto de séries temporais que incluem séries artificiais, com características sazonais e caóticas, e várias séries reais. Foi realizado também um estudo de caso comparativo entre a precisão da previsão do algoritmo kNN-TSP e a dos métodos de Médias Móveis (MA), Auto-regressivos de Médias Móveis Integrados Sazonais (SARIMA) univariado e SARIMA multivariado, em uma série de fluxo diário de pacientes na Área de Emergência de um hospital coreano. Neste trabalho é ainda proposta uma abordagem para o desenvolvimento de uma medida de similaridade híbrida, que combine características de várias medidas. Os resultados obtidos neste trabalho demonstram que as medidas da Norma Lp apresentam vantagem sobre as demais medidas avaliadas, devido ao seu menor custo computacional e por apresentar, em geral, maior precisão na previsão de dados temporais utilizando o algoritmo kNN-TSP. Apesar de na literatura, em geral, a medida Euclidiana ser adotada como medida de similaridade, a medida Manhattan pode ser considerada candidata interessante para definir a similaridade entre séries temporais, devido a não apresentar diferença estatisticamente significativa com a medida Euclidiana e possuir menor custo computacional. A medida proposta neste trabalho, não apresenta resultados significantes, mas apresenta-se promissora para novas pesquisas. Com relação ao estudo de caso, o algoritmo kNN-TSP, com apenas o parâmetro de medida de similaridade otimizado, alcança um erro consideravelmente inferior a melhor configuração com MA, e pouco maior que as melhores configurações dos métodos SARIMA univariado e SARIMA multivariado, sendo essa diferença inferior a um por cento.
Johansson, David. "Price Prediction of Vinyl Records Using Machine Learning Algorithms." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-96464.
Full textMestre, Ricardo Jorge Palheira. "Improvements on the KNN classifier." Master's thesis, Faculdade de Ciências e Tecnologia, 2013. http://hdl.handle.net/10362/10923.
Full textThe object classification is an important area within the artificial intelligence and its application extends to various areas, whether or not in the branch of science. Among the other classifiers, the K-nearest neighbor (KNN) is among the most simple and accurate especially in environments where the data distribution is unknown or apparently not parameterizable. This algorithm assigns the classifying element the major class in the K nearest neighbors. According to the original algorithm, this classification implies the calculation of the distances between the classifying instance and each one of the training objects. If on the one hand, having an extensive training set is an element of importance in order to obtain a high accuracy, on the other hand, it makes the classification of each object slower due to its lazy-learning algorithm nature. Indeed, this algorithm does not provide any means of storing information about the previous calculated classifications,making the calculation of the classification of two equal instances mandatory. In a way, it may be said that this classifier does not learn. This dissertation focuses on the lazy-learning fragility and intends to propose a solution that transforms the KNNinto an eager-learning classifier. In other words, it is intended that the algorithm learns effectively with the training set, thus avoiding redundant calculations. In the context of the proposed change in the algorithm, it is important to highlight the attributes that most characterize the objects according to their discriminating power. In this framework, there will be a study regarding the implementation of these transformations on data of different types: continuous and/or categorical.
Liu, Dongqing. "GENETIC ALGORITHMS FOR SAMPLE CLASSIFICATION OF MICROARRAY DATA." University of Akron / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=akron1125253420.
Full textNeo, TohKoon. "A Direct Algorithm for the K-Nearest-Neighbor Classifier via Local Warping of the Distance Metric." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2168.pdf.
Full textBorén, Mirjam. "Classification of discrete stress levels in users using eye tracker and K- Nearest Neighbour algorithm." Thesis, Umeå universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-176258.
Full textRudin, Pierre. "Football result prediction using simple classification algorithms, a comparison between k-Nearest Neighbor and Linear Regression." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-187659.
Full textÄnda sedan vi människor började tävla mot varandra, har folk försökt förutspå vinnaren i tävlingarna. Fotboll är inget undantag till detta och är extra intressant för den här studien då den tillgängliga mängden data från fotbollsmatcher ständigt ökar. Tidigare har egna kunskaper och små mängder data använts för att förutspå resultaten. Den här rapporten kommer dra nytta av den växande mängden data för att ta reda på om det är möjligt att med hjälp av k-Nearest Neighbor algoritmen och Linjär regression förutspå resultat i fotbollsmatcher. Algoritmerna kommer jämföras utifrån hur exakt de förutspår vinnaren i matcher, hur många mål de båda lagen gör samt hur precist algoritmerna förutspår målskilnaden i matcherna. Resultaten presenteras både i grafer och i tabeller. En diskusion förs för att analysera resultaten och kommer fram till att båda algoritmerna kan vara användbara om modelen är välkonstruerad, och att Linjär regression är bättre lämpad än k-NN.
Neo, Toh Koon Charlie. "A direct boosting algorithm for the k-nearest neighbor classifier via local warping of the distance metric /." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2168.pdf.
Full textKarginova, Nadezda. "Identification of Driving Styles in Buses." Thesis, Halmstad University, Intelligent systems (IS-lab), 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-4830.
Full textIt is important to detect faults in bus details at an early stage. Because the driving style affects the breakdown of different details in the bus, identification of the driving style is important to minimize the number of failures in buses.
The identification of the driving style of the driver was based on the input data which contained examples of the driving runs of each class. K-nearest neighbor and neural networks algorithms were used. Different models were tested.
It was shown that the results depend on the selected driving runs. A hypothesis was suggested that the examples from different driving runs have different parameters which affect the results of the classification.
The best results were achieved by using a subset of variables chosen with help of the forward feature selection procedure. The percent of correct classifications is about 89-90 % for the k-nearest neighbor algorithm and 88-93 % for the neural networks.
Feature selection allowed a significant improvement in the results of the k-nearest neighbor algorithm and in the results of the neural networks algorithm received for the case when the training and testing data sets were selected from the different driving runs. On the other hand, feature selection did not affect the results received with the neural networks for the case when the training and testing data sets were selected from the same driving runs.
Another way to improve the results is to use smoothing. Computing the average class among a number of consequent examples allowed achieving a decrease in the error.
Agarwal, Akrita. "Exploring the Noise Resilience of Combined Sturges Algorithm." University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1447070335.
Full textMao, Qian. "Clusters Identification: Asymmetrical Case." Thesis, Uppsala universitet, Informationssystem, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-208328.
Full textPathirana, Vindya Kumari. "Nearest Neighbor Foreign Exchange Rate Forecasting with Mahalanobis Distance." Scholar Commons, 2015. http://scholarcommons.usf.edu/etd/5757.
Full textPešek, Milan. "Detekce logopedických vad v řeči." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218106.
Full textYoung, Barrington R. St A. "Efficient Algorithms for Data Mining with Federated Databases." University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1179332091.
Full textTorres, Winnie de Lima. "Detecção de desvios vocais utilizando modelos auto regressivos e o algoritmo KNN." PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃO, 2018. https://repositorio.ufrn.br/jspui/handle/123456789/25105.
Full textApproved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2018-05-07T21:40:35Z (GMT) No. of bitstreams: 1 WinnieDeLimaTorres_DISSERT.pdf: 1538022 bytes, checksum: ad6fc16589291a27b8b718b755afdf44 (MD5)
Made available in DSpace on 2018-05-07T21:40:35Z (GMT). No. of bitstreams: 1 WinnieDeLimaTorres_DISSERT.pdf: 1538022 bytes, checksum: ad6fc16589291a27b8b718b755afdf44 (MD5) Previous issue date: 2018-01-30
Alguns campos da ciência propõem-se a estudar distúrbios no trato vocal a partir de análises sobre padrões de vibração da voz. Em geral, a importância dessas pesquisas está na identificação, em uma fase mais específica, de doenças de maior ou menor gravidade, a serem sanadas com terapia vocal ou que requerem maior atenção, gerando inclusive a necessidade de procedimentos cirúrgicos para o seu controle. Embora, já exista na literatura indicações de que o processamento digital de sinais permite diagnosticar, de um modo não invasivo, patologias laríngeas, como doenças vocais que ocasionem edema, nódulo e paralisia, não existe definição do método mais indicado e das características, ou parâmetros, mais adequados para detectar a presença de desvios vocais. Sendo assim, neste trabalho é proposto um algoritmo para detecção de desvios vocais por meio da análise de sinais de voz. Para a realização deste trabalho, utilizou-se dados constantes no banco de dados Disordered Voice Database, desenvolvido pelo Massachusetts Eye and Ear Infirmary (MEEI), devido sua utilização em pesquisas na área acústica de voz. Foram utilizados 166 sinais contidos nessa base de dados, com sinais de vozes saudáveis e de vozes patológicas afetadas por edema, por nódulo e por paralisia nas pregas vocais. A partir dos sinais de voz, foram gerados modelos Auto Regressivos (AR e ARMA) para representação desses sinais e, utilizando os parâmetros dos modelos obtidos, foi utilizado o algoritmo K-Nearest Neighbors (KNN) para a classificação dos sinais analisados. Com o intuito de analisar a eficiência do algoritmo proposto neste estudo, os resultados obtidos desse algoritmo foram comparados com um método de detecção considerando apenas distância euclidiana entre os sinais. Os resultados encontrados apontam que o método proposto neste trabalho apresenta um bom resultado, gerando uma taxa de acerto na classificação acima de 71% (maior que os 31% a partir do uso da distância euclidiana). Além disso, o método utilizado é de fácil implementação, podendo ser utilizado em hardwares mais simples. Logo, essa pesquisa tem potencial para gerar um classificador barato e acessível para a utilização em larga escala por profissionais de saúde, como uma alternativa de pré análise não invasiva para detecção de patologias otorrinolaringológicas que afetem a voz.
Some fields in Science propose to study vocal tract disorders from an analysis about voice vibration patterns. Generally, the weight of those researches is given by the identification – in a more specific level – of diseases in different stages of severity, which would be redressed through voice therapy or means that require more attention, hence generating the need of surgical procedures for its control. Although there are evidences in literature that the Digital Signal Processing allows a non-invasive diagnosis of laryngeal pathologies, such as vocal cord disorders, which provoke swelling, nodules, and paralyses, there is no definition of any most indicated method, and characteristics or appropriated parameters to detect voice deviations. Thus, the present paper proposes an algorithm to detect vocal deviances through the Voice Signal Analysis. In order to complete this study, it had been used data from the Disordered Voice Database, developed by the Massachusetts Eye and Ear Infirmary (MEEI) due to their wide use in researches regarding the voice and speech. A total of 166 signals from this database were used, including healthy voices and pathologic voices affected by swelling, nodule, and vocal fold paralysis. From the voice signals, autoregressive processes of order (AR and ARMA) were generated for a representation of those signals, and – by using the models’ parameters obtained – it had been used the KNN algorithm for a classification of the signals analyzed. Seeking an analysis of the efficiency of the algorithm proposed in this study, the results obtained from this algorithm were compared to a detection method, which only considers the Euclidian distance between the signals. The results found point that the propositioned method in this work presents a satisfactory result, generating a hit rate on the classification above 71% (more than the 31% from the use of the Euclidian distance). Moreover, the method used is easy to implement, so that it can be used along with simpler hardware. Consequently, this research has the potential to generate a cheap and accessible sorter for wide-scale use by health care professionals as a non-invasive pre-analysis to detect otorhinolaryngological pathologies that affect the voice.
Curtin, Ryan Ross. "Improving dual-tree algorithms." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54354.
Full textBacchielli, Tommaso. "Algoritmi di Machine Learning per il riconoscimento di attività umane da vibrazioni strutturali." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019.
Find full textBalocchi, Leonardo. "Anomaly detection mediante algoritmi di machine learning." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019.
Find full textCirincione, Antonio. "Algoritmi di Machine Learning per la Classificazione di Dati Inerziali." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019.
Find full textSamara, Rafat. "TOP-K AND SKYLINE QUERY PROCESSING OVER RELATIONAL DATABASE." Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH. Forskningsmiljö Informationsteknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-20108.
Full textZapletal, Petr. "Klasifikační metody analýzy vrstvy nervových vláken na sítnici." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2010. http://www.nusl.cz/ntk/nusl-218575.
Full textLandmér, Pedersen Jesper. "Weighing Machine Learning Algorithms for Accounting RWISs Characteristics in METRo : A comparison of Random Forest, Deep Learning & kNN." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-85586.
Full textRaykhel, Ilya Igorevitch. "Real-Time Automatic Price Prediction for eBay Online Trading." BYU ScholarsArchive, 2008. https://scholarsarchive.byu.edu/etd/1631.
Full textLinton, Thomas. "Forecasting hourly electricity consumption for sets of households using machine learning algorithms." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186592.
Full textFör att ta itu med ineffektivitet, avfall, och de negativa konsekvenserna av elproduktion så vill företag och myndigheter se beteendeförändringar bland hushållskonsumenter. För att skapa beteendeförändringar så behöver konsumenterna bättre återkoppling när det gäller deras elförbrukning. Den nuvarande återkopplingen i en månads- eller kvartalsfaktura ger konsumenten nästan ingen användbar information om hur deras beteenden relaterar till deras konsumtion. Smarta mätare finns nu överallt i de utvecklade länderna och de kan ge en mängd information om bostäders konsumtion, men denna data används främst som underlag för fakturering och inte som ett verktyg för att hjälpa konsumenterna att minska sin konsumtion. En komponent som krävs för att leverera innovativa återkopplingsmekanismer är förmågan att förutse elförbrukningen på hushållsskala. Arbetet som presenteras i denna avhandling är en utvärdering av noggrannheten hos ett urval av kärnbaserad maskininlärningsmetoder för att förutse den sammanlagda förbrukningen för olika stora uppsättningar av hushåll. Arbetet i denna avhandling visar att "k-Nearest Neighbour Regression" och "Gaussian Process Regression" är de mest exakta metoder inom problemets begränsningar. Förutom noggrannhet, så görs en utvärdering av fördelar, nackdelar och prestanda hos varje maskininlärningsmetod.
Duan, Haoyang. "Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery Disease." Thèse, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/31113.
Full textGuňka, Jiří. "Adaptivní klient pro sociální síť Twitter." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-237052.
Full textBastabak, Burcu. "A Data Mining Framework To Detect Tariff Code Circumvention In Turkish Customs Database." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614616/index.pdf.
Full textTariff Code Circumvention"
. To identify such misdeclarations, a physical examination of the merchandise is required. However, with limited personnel resources, the physical examination of all imported merchandise is not possible. In this study, a data mining framework is developed on Turkish customs database in order to detect &ldquo
Tariff Code Circumvention&rdquo
. For this purpose, four types of products, which are the most circumvented goods in the Turkish customs, have been chosen. First, with the help of Risk Analysis Office, the significant features are identified. Then, Infogain algorithm is used for ranking these features. Finally, KNN algorithm is applied on the Turkish customs database in order to identify the circumvented goods automatically. The results show that the framework is able to find such circumvented goods successfully.
Jiao, Lianmeng. "Classification of uncertain data in the framework of belief functions : nearest-neighbor-based and rule-based approaches." Thesis, Compiègne, 2015. http://www.theses.fr/2015COMP2222/document.
Full textIn many classification problems, data are inherently uncertain. The available training data might be imprecise, incomplete, even unreliable. Besides, partial expert knowledge characterizing the classification problem may also be available. These different types of uncertainty bring great challenges to classifier design. The theory of belief functions provides a well-founded and elegant framework to represent and combine a large variety of uncertain information. In this thesis, we use this theory to address the uncertain data classification problems based on two popular approaches, i.e., the k-nearest neighbor rule (kNN) andrule-based classification systems. For the kNN rule, one concern is that the imprecise training data in class over lapping regions may greatly affect its performance. An evidential editing version of the kNNrule was developed based on the theory of belief functions in order to well model the imprecise information for those samples in over lapping regions. Another consideration is that, sometimes, only an incomplete training data set is available, in which case the ideal behaviors of the kNN rule degrade dramatically. Motivated by this problem, we designedan evidential fusion scheme for combining a group of pairwise kNN classifiers developed based on locally learned pairwise distance metrics.For rule-based classification systems, in order to improving their performance in complex applications, we extended the traditional fuzzy rule-based classification system in the framework of belief functions and develop a belief rule-based classification system to address uncertain information in complex classification problems. Further, considering that in some applications, apart from training data collected by sensors, partial expert knowledge can also be available, a hybrid belief rule-based classification system was developed to make use of these two types of information jointly for classification
Bahri, Maroua. "Improving IoT data stream analytics using summarization techniques." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT017.
Full textWith the evolution of technology, the use of smart Internet-of-Things (IoT) devices, sensors, and social networks result in an overwhelming volume of IoT data streams, generated daily from several applications, that can be transformed into valuable information through machine learning tasks. In practice, multiple critical issues arise in order to extract useful knowledge from these evolving data streams, mainly that the stream needs to be efficiently handled and processed. In this context, this thesis aims to improve the performance (in terms of memory and time) of existing data mining algorithms on streams. We focus on the classification task in the streaming framework. The task is challenging on streams, principally due to the high -- and increasing -- data dimensionality, in addition to the potentially infinite amount of data. The two aspects make the classification task harder.The first part of the thesis surveys the current state-of-the-art of the classification and dimensionality reduction techniques as applied to the stream setting, by providing an updated view of the most recent works in this vibrant area.In the second part, we detail our contributions to the field of classification in streams, by developing novel approaches based on summarization techniques aiming to reduce the computational resource of existing classifiers with no -- or minor -- loss of classification accuracy. To address high-dimensional data streams and make classifiers efficient, we incorporate an internal preprocessing step that consists in reducing the dimensionality of input data incrementally before feeding them to the learning stage. We present several approaches applied to several classifications tasks: Naive Bayes which is enhanced with sketches and hashing trick, k-NN by using compressed sensing and UMAP, and also integrate them in ensemble methods
Prokopová, Ivona. "Detekce fibrilace síní v EKG." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-413170.
Full textBílý, Ondřej. "Moderní řečové příznaky používané při diagnóze chorob." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2011. http://www.nusl.cz/ntk/nusl-218971.
Full textKlimeš, Filip. "Zpracování obrazových sekvencí sítnice z fundus kamery." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2015. http://www.nusl.cz/ntk/nusl-220975.
Full textDočekal, Martin. "Porovnání klasifikačních metod." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403211.
Full textChang, Tung-Lin, and 張東琳. "Improvement Sleep Apnoea Auxiliary Equipment Performance With k-nearest Neighbors Algorithm." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/jdys75.
Full text國立中央大學
光機電工程研究所
106
In this paper, we use non-invasive continuous detection to monitor blood oxygen and photoplethysmogram by pulse oximetry .For the various monitoring data preprocessing including regression analysis and frequency domain analysis are performed, and the training set is obtained after obtaining a plurality of feature samples. The KNN classification algorithm is used to estimate the clinical Respiratory/Disturbance Index (RDI) value of the subject, and the data is transmitted back to the Internet. The control signal is transmitted through the wireless communication module to the Sleep Apnoea Auxiliary Equipment ,which is developed by the research and called “POM Pillow”. This research also designed a variety of sleep posture for trigger condition. In conclusion, the “POM Pillow” can improve the effectively frequency of obstructive respiratory arrest in patients suffering from sleep apnea to improve sleep quality.
Pei-NiChen and 陳貝妮. "Diagnosis System of Rotor Faults for Three Phase Induction Motor Based on K-Nearest Neighbors Algorithm." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/r48sdb.
Full textHuang, Haitao. "Spatial Analysis of Retinal Pigment Epithelium Morphology." 2016. http://scholarworks.gsu.edu/math_theses/153.
Full textGu, Yu-Jia, and 古祐嘉. "Adaptive K-Nearest Neighbor Algorithm." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/35887089581319797969.
Full text元智大學
資訊管理學系
97
The K-nearest-neighbor algorithm traditionally predicts the class of a record based on the decision from the K nearest neighbors of the record, for a fixed K value. However, recent studies showed that using different K values for different records could improve the prediction accuracy. This study integrates Fuzzy C-means algorithm to assist determining a proper K value for each record in a local KNN algorithm. Performance results show this method outperforms the traditional KNN in term of prediction accuracy.
LIN, CHENG-YI, and 林承毅. "Accelerating k-Nearest Neighbor Algorithm Using GPU and Chunking Method." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/3hx66j.
Full textChen, Nai-Wen, and 陳艿玟. "Feature Weighting for k-Nearest Neighbor Classifiers Using Differential Evolution Algorithms." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/3gfxz4.
Full text國立臺灣海洋大學
資訊工程學系
104
Since the industrial revolution, people seek to replace human workers with machines in terms of benefits in labor, time and cost savings etc. With the advances in hardware and software technology in the recent years, data collected in practice are becoming larger, fast-changing and more complex. Big Data, which contain large-scale and/or high-dimensional data, cause serious obstacles for people in data interpretation and applications. As a result, machine learning has been a popular research topic within many fields of study. Machine learning, which can iteratively learn from data, allows computers to find hidden insights of data without explicit knowledge. Machine learning techniques have been widely applied to mine valuable information and help us in the decision making process. In dealing with high-dimensional Big Data, the determination of feature importance plays a key issue in order to reduce the high complexity of computing and data storage. This paper presents a method to determine feature importance and feature weighting using an integration of Differential Evolution (DE) algorithm and k-Nearest Neighbors (kNN) algorithm. DE algorithm, a heuristic optimization algorithm, follows biological evolution via mutation, crossover and selection operations to find an optimal solution. The kNN algorithm is a simple classifier algorithm but works incredibly well in various fields in practice. In our proposed method, the weights of features and the k value for kNN are first chosen by DE algorithm and then evaluated by the accuracy performance of kNN algorithm. Our experimental results on six UCI datasets show that when using appropriate DE parameters, the proposed method can have the better overall accuracy performance and outperform the six compared approaches.
Scrimieri, Daniele, and S. M. Ratchev. "A k-nearest neighbour technique for experience-based adaptation of assembly stations." 2014. http://hdl.handle.net/10454/17725.
Full textWe present a technique for automatically acquiring operational knowledge on how to adapt assembly systems to new production demands or recover from disruptions. Dealing with changes and disruptions affecting an assembly station is a complex process which requires deep knowledge of the assembly process, the product being assembled and the adopted technologies. Shop-floor operators typically perform a series of adjustments by trial and error until the expected results in terms of performance and quality are achieved. With the proposed approach, such adjustments are captured and their effect on the station is measured. Adaptation knowledge is then derived by generalising from individual cases using a variant of the k-nearest neighbour algorithm. The operator is informed about potential adaptations whenever the station enters a state similar to one contained in the experience base, that is, a state on which adaptation information has been captured. A case study is presented, showing how the technique enables to reduce adaptation times. The general system architecture in which the technique has been implemented is described, including the role of the different software components and their interactions.
Ku, Chin, and 顧堇. "The study on reproducibility of modified genetic algorithms/k-nearest neighbors method for microarray data." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/44038757595629869436.
Full textChang, Chung-Ting, and 張仲霆. "An Application of V2V Communication: Cooperative Vehicle Positioning System based on Topology Matching and k-Nearest Neighbor Algorithm." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/23xetz.
Full text國立臺灣大學
電機工程學研究所
104
As V2V (Vehicle to Vehicle) and V2I (Vehicle to Infrastructure) area are well researched in recent years where the V2V technology can allow vehicles share information with nearby vehicles and the V2I technology can allow vehicles share information with nearby infrastructures by wireless communication device. The advanced driving assistance system can be divided into self-sufficient systems and interactive systems. The interactive systems, as the name implies, interact with infrastructures and/or other vehicles where these systems receive spatial information from nearby vehicles to prevent from forward collision. While the self-sufficient systems are limited to line-of-sight detection, the interactive systems account for scenarios farther ahead by predicting the position of occluded vehicle. In this thesis, each vehicle is assumed to generate a local map which is a set of position measurements of nearby vehicles by using onboard low-cost GPS and ranging sensor, and shares it with the nearby vehicles by broadcasting via wireless communication device. When the ego-vehicle receives multiple local maps from nearby vehicles, the received local maps are matched with the local map generated by ego-vehicle by topology matching. The position measurements belong to the same vehicle are clustered by automatic points clustering based on k-Nearest Neighbor algorithm. Those position measurements belong to the same vehicle are combined by adaptive position estimation which updates position estimation according to accuracy of the sensor currently. In this thesis, both simulation results and experimental results by proposed cooperative vehicle positioning system are presented. The simulation results show that the number of detected vehicle by the proposed cooperative vehicle positioning system is more than by a single sensor alone in most of the time. It turns out that a vehicle can get an extended view of surroundings to improve driving safety. The stereo camera is used as a ranging sensor equipped on the vehicle to produce position measurements in a real scenario. In the scenario, there are 3 vehicles nearby the ego-vehicle. First, the ego-vehicle estimates the range to the other 3 vehicles by stereo camera only. The experimental result show that the stereo camera gets a higher range estimation accuracy to the middle vehicle than the side vehicle. Second, the ego-vehicle estimates the range to the other 3 vehicles by the proposed cooperative vehicle positioning system. The position of the ego-vehicle is estimated by 4 measurements where 1 measurement is measured by GPS sensor of the ego-vehicle and the other 3 measurements are measured by both GPS sensors and ranging sensors of the other 3 vehicles respectively. The experimental results show that the accuracy of range estimation by the proposed system is better than by the stereo camera only.
Shen, Kuo-Cheng, and 沈國丞. "Building a PC-Based Image Inspection System to detect the Blood Eggs with the K-Nearest Neighbor Algorithm." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/vynxa3.
Full text國立虎尾科技大學
電機工程系碩士班
105
There are currently 1,300 units established for poultry farm feeding laying hens in Taiwan. However, there are no more than 25 units for which the egg quality meets the CAS standards. At present, equipment needs to be imported for firms to carry out grading and packaging of eggs, and this is very expensive. If the equipment can be developed within Taiwan, then this would reduce costs and raise the quality of eggs. This paper presents a system to detect blood spot in eggs, and a simple man-machine interface for users to quickly adopt this approach. A non-destructive method is proposed based on image detection. A simple box with a light source is sued to make the eggs transparent and then an image is taken. The captured image is then binarized. We then normalize the images, derive the size of the egg, perform median filtering, and then converted the image into HSV color space for color analysis. We take out the H component as a feature, and use the K-Nearest Neighbor classification for processing. Finally, the results of the analysis will be shown on a PC screen, and thus reveal whether the eggs have blood sports or not.
Vicente, Sergio. "Apprentissage statistique avec le processus ponctuel déterminantal." Thesis, 2021. http://hdl.handle.net/1866/25249.
Full textThis thesis presents the determinantal point process, a probabilistic model that captures repulsion between points of a certain space. This repulsion is encompassed by a similarity matrix, the kernel matrix, which selects which points are more similar and then less likely to appear in the same subset. This point process gives more weight to subsets characterized by a larger diversity of its elements, which is not the case with the traditional uniform random sampling. Diversity has become a key concept in domains such as medicine, sociology, forensic sciences and behavioral sciences. The determinantal point process is considered a promising alternative to traditional sampling methods, since it takes into account the diversity of selected elements. It is already actively used in machine learning as a subset selection method. Its application in statistics is illustrated with three papers. The first paper presents the consensus clustering, which consists in running a clustering algorithm on the same data, a large number of times. To sample the initials points of the algorithm, we propose the determinantal point process as a sampling method instead of a uniform random sampling and show that the former option produces better clustering results. The second paper extends the methodology developed in the first paper to large-data. Such datasets impose a computational burden since sampling with the determinantal point process is based on the spectral decomposition of the large kernel matrix. We introduce two methods to deal with this issue. These methods also produce better clustering results than consensus clustering based on a uniform sampling of initial points. The third paper addresses the problem of variable selection for the linear model and the logistic regression, when the number of predictors is large. A Bayesian approach is adopted, using Markov Chain Monte Carlo methods with Metropolis-Hasting algorithm. We show that setting the determinantal point process as the prior distribution for the model space selects a better final model than the model selected by a uniform prior on the model space.
Lee, Chien-Pang, and 李建邦. "The Study on Gene Selection and Sample Classification Based on Gene Expression Data Using Adaptive Genetic Algorithms / k-Nearest Neighbors Method." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/01635740897987498234.
Full text國立中興大學
農藝學系所
94
Microarray technology has become a valuable tool for studying gene expression in recent years. The main difference between microarray and traditional methods is that microarray can measure thousands of genes at the same time. In the past, researchers always used parametric statistical methods to find the significant genes. However, microarray data often cannot obey some assumptions of parametric statistical methods, and type I error would be over expanded while each gene was tested for significance. Therefore, this research was expected to find a variable selection method without assumptions restriction to reduce the dimension of the data set. After using the proposed method, biologists can select the relevant genes according to the sub-gene set. In this study, adaptive genetic algorithms / k-nearest neighbors (AGA / KNN) was used to reduce the dimension of the data set, and it was based on genetic algorithms / k-nearest neighbors (GA / KNN) which was first described by Li et al.(2001a). Although AGA and KNN were well-developed, AGA / KNN was first used to analyze the microarray data. Since AGA was a machine learning tool and KNN was a nonparametric discrimination analysis, both of them could be used without assumptions restriction. There are three main differences between AGA/KNN and GA / KNN. Firstly, the encoding has become binary code, and each string included all genes. Secondly, the adaptive probabilities of crossover and mutation were added. Finally, the extinction and immigration strategy was added. Since GA can just find the near optimal solution, the best string of each run is often not the same. Here, AGA / KNN was repeated by many runs to solve that problem. Thus, lots of the best strings were saved. The frequency of gene was computed by those strings to reduce the dimension of the data set. In this study, an original colon data which is a high-density oligonucleotide chip (Alon et al., 1999) was analyzed. In addition, mice apo AI data which is a cDNA chip (Callow et al., 2000) was also used to compare the ability of gene selection of AGA / KNN and GA / KNN. Based on the results, it was found that AGA / KNN and GA / KNN could reduce the dimension of the data set and all samples could be classified correctly. But the accuracy of AGA / KNN was higher than that of GA / KNN, and it only took half CPU time of GA / KNN. Therefore, it was claimed that the performance of AGA / KNN should not be worse than that of GA / KNN. Finally, we suggested that when AGA / KNN was employed to analyze the microarray data, the top 50 and up to 100 most frequent genes were selected after AGA / KNN were repeated about 100 runs. Those selected genes should include relevant genes, and those selected genes could classify sample correctly.