Log in

Relevant bibliographies by topics / K-Nearest Neighbours (KNN) / Dissertations / Theses

To see the other types of publications on this topic, follow the link: K-Nearest Neighbours (KNN).

Dissertations / Theses on the topic 'K-Nearest Neighbours (KNN)'

Author: Grafiati

Published: 5 June 2025

Last updated: 24 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 20 dissertations / theses for your research on the topic 'K-Nearest Neighbours (KNN).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Villa, Medina Joe Luis. "Reliability of classification and prediction in k-nearest neighbours." Doctoral thesis, Universitat Rovira i Virgili, 2013. http://hdl.handle.net/10803/127108.

Full text

Abstract:

En esta tesis doctoral seha desarrollado el cálculo de la fiabilidad de clasificación y de la fiabilidad de predicción utilizando el método de los k-vecinos más cercanos (k-nearest neighbours, kNN) y estrategias de remuestreo basadas en bootstrap. Se han desarrollado, además, dos nuevos métodos de clasificación:Probabilistic Bootstrapk-Nearest Neighbours (PBkNN) y Bagged k-Nearest Neighbours (BaggedkNN),yun nuevo método de predicción,el Direct OrthogonalizationkNN (DOkNN).En todos los casos, los resultados obtenidos con los nuevos métodos han sido comparables o mejores que los obtenidos utilizando métodos clásicos de clasificación y calibración multivariante.<br>En aquesta tesi doctoral s'ha desenvolupat el càlcul de la fiabilitat de classificació i de la fiabilitat de predicció utilitzant el mètode dels k-veïns més propers (k-nearest neighbours, kNN) i estratègies de remostreig basades en bootstrap. S'han desenvolupat, a més, dos nous mètodes de classificació: Probabilistic Bootstrap k-Nearest Neighbours (PBkNN) i Bagged k-Nearest Neighbours (Bagged kNN), i un nou mètode de predicció, el Direct OrthogonalizationkNN (DOkNN). En tots els casos, els resultats obtinguts amb els nous mètodes han estat comparables o millors que els obtinguts utilitzant mètodes clàssics de classificació i calibratge multivariant.

APA, Harvard, Vancouver, ISO, and other styles

2

Darborg, Alex. "Identifiera känslig data inom ramen för GDPR : Med K-Nearest Neighbors." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-34070.

Full text

Abstract:

General Data Protection Regulation, GDPR, is a regulation coming into effect on May 25th 2018. Due to this, organizations face large decisions concerning how sensitive data, stored in databases, are to be identified. Meanwhile, there is an expansion of machine learning on the software market. The goal of this project has been to develop a tool which, through machine learning, can identify sensitive data. The development of this tool has been accomplished through the use of agile methods and has included comparisions of various algorithms and the development of a prototype. This by using tools such as Spyder and XAMPP. The results show that different types of sensitive data give variating results in the developed software solution. The kNN algorithm showed strong results in such cases when the sensitive data concerned Swedish Social Security numbers of 10 digits, and phone numbers in the length of ten or eleven digits, either starting with 46-, 070, 072 or 076 and also addresses. Regular expression showed strong results concerning e-mails and IP-addresses.<br>General Data Protection Regulation, GDPR, är en reglering som träder i kraft 25 maj 2018. I och med detta ställs organisationer inför stora beslut kring hur de ska finna känsliga data som är lagrad i databaser. Samtidigt expanderar maskininlärning på mjukvarumarknaden. Målet för detta projekt har varit att ta fram ett verktyg som med hjälp av maskininlärning kan identifiera känsliga data. Utvecklingen av detta verktyg har skett med hjälp av agila metoder och har innefattat jämförelser av olika algoritmer och en framtagning av en prototyp. Detta med hjälp av verktyg såsom Spyder och XAMPP. Resultatet visar på att olika typer av känsliga data ger olika starka resultat i den utvecklade programvaran. kNN-algoritmen visade starka resultat i de fall då den känsliga datan rörde svenska, tiosiffriga personnummer samt telefonnummer i tio- eller elva-siffrigt format, och antingen inleds med 46, 070, 072 eller 076 samt då den rörde adresser. Regular expression visade på starka resultat när det gällde e- mails och IP-adresser.

APA, Harvard, Vancouver, ISO, and other styles

3

Kuhlman, Caitlin Anne. "Pivot-based Data Partitioning for Distributed k Nearest Neighbor Mining." Digital WPI, 2017. https://digitalcommons.wpi.edu/etd-theses/1212.

Full text

Abstract:

This thesis addresses the need for a scalable distributed solution for k-nearest-neighbor (kNN) search, a fundamental data mining task. This unsupervised method poses particular challenges on shared-nothing distributed architectures, where global information about the dataset is not available to individual machines. The distance to search for neighbors is not known a priori, and therefore a dynamic data partitioning strategy is required to guarantee that exact kNN can be found autonomously on each machine. Pivot-based partitioning has been shown to facilitate bounding of partitions, however state-of-the-art methods suffer from prohibitive data duplication (upwards of 20x the size of the dataset). In this work an innovative method for solving exact distributed kNN search called PkNN is presented. The key idea is to perform computation over several rounds, leveraging pivot-based data partitioning at each stage. Aggressive data-driven bounds limit communication costs, and a number of optimizations are designed for efficient computation. Experimental study on large real-world data (over 1 billion points) compares PkNN to the state-of-the-art distributed solution, demonstrating that the benefits of additional stages of computation in the PkNN method heavily outweigh the added I/O overhead. PkNN achieves a data duplication rate close to 1, significant speedup over previous solutions, and scales effectively in data cardinality and dimension. PkNN can facilitate distributed solutions to other unsupervised learning methods which rely on kNN search as a critical building block. As one example, a distributed framework for the Local Outlier Factor (LOF) algorithm is given. Testing on large real-world and synthetic data with varying characteristics measures the scalability of PkNN and the distributed LOF framework in data size and dimensionality.

APA, Harvard, Vancouver, ISO, and other styles

4

Aikes, Junior Jorge. "Estudo da influência de diversas medidas de similaridade na previsão de séries temporais utilizando o algoritmo KNN-TSP." Universidade Estadual do Oeste do Parana, 2012. http://tede.unioeste.br:8080/tede/handle/tede/1084.

Full text

Abstract:

Made available in DSpace on 2017-07-10T17:11:50Z (GMT). No. of bitstreams: 1 JORGE AIKES JUNIOR.PDF: 2050278 bytes, checksum: f5bae18bbcb7465240488c45b2c813e7 (MD5) Previous issue date: 2012-04-11<br>Time series can be understood as any set of observations which are time ordered. Among the many possible tasks appliable to temporal data, one that has attracted increasing interest, due to its various applications, is the time series forecasting. The k-Nearest Neighbor - Time Series Prediction (kNN-TSP) algorithm is a non-parametric method for forecasting time series. One of its advantages, is its easiness application when compared to parametric methods. Even though its easier to deﬁne kNN-TSP s parameters, some issues remain opened. This research is focused on the study of one of these parameters: the similarity measure. This parameter was empirically evaluated using various similarity measures in a large set of time series, including artiﬁcial series with seasonal and chaotic characteristics, and several real world time series. It was also carried out a case study comparing the predictive accuracy of the kNN-TSP algorithm with the Moving Average (MA), univariate Seasonal Auto-Regressive Integrated Moving Average (SARIMA) and multivariate SARIMA methods in a time series of a Korean s hospital daily patients ﬂow in the Emergency Department. This work also proposes an approach to the development of a hybrid similarity measure which combines characteristics from several measures. The research s result demonstrated that the Lp Norm s measures have an advantage over other measures evaluated, due to its lower computational cost and for providing, in general, greater accuracy in temporal data forecasting using the kNN-TSP algorithm. Although the literature in general adopts the Euclidean similarity measure to calculate de similarity between time series, the Manhattan s distance can be considered an interesting candidate for deﬁning similarity, due to the absence of statistical signiﬁcant difference and to its lower computational cost when compared to the Euclidian measure. The measure proposed in this work does not show signiﬁcant results, but it is promising for further research. Regarding the case study, the kNN-TSP algorithm with only the similarity measure parameter optimized achieves a considerably lower error than the MA s best conﬁguration, and a slightly greater error than the univariate e multivariate SARIMA s optimal settings presenting less than one percent of difference.<br>Séries temporais podem ser entendidas como qualquer conjunto de observações que se encontram ordenadas no tempo. Dentre as várias tarefas possíveis com dados temporais, uma que tem atraído crescente interesse, devido a suas várias aplicações, é a previsão de séries temporais. O algoritmo k-Nearest Neighbor - Time Series Prediction (kNN-TSP) é um método não-paramétrico de previsão de séries temporais que apresenta como uma de suas vantagens a facilidade de aplicação, quando comparado aos métodos paramétricos. Apesar da maior facilidade na determinação de seus parâmetros, algumas questões relacionadas continuam em aberto. Este trabalho está focado no estudo de um desses parâmetros: a medida de similaridade. Esse parâmetro foi avaliado empiricamente utilizando diversas medidas de similaridade em um grande conjunto de séries temporais que incluem séries artiﬁciais, com características sazonais e caóticas, e várias séries reais. Foi realizado também um estudo de caso comparativo entre a precisão da previsão do algoritmo kNN-TSP e a dos métodos de Médias Móveis (MA), Auto-regressivos de Médias Móveis Integrados Sazonais (SARIMA) univariado e SARIMA multivariado, em uma série de ﬂuxo diário de pacientes na Área de Emergência de um hospital coreano. Neste trabalho é ainda proposta uma abordagem para o desenvolvimento de uma medida de similaridade híbrida, que combine características de várias medidas. Os resultados obtidos neste trabalho demonstram que as medidas da Norma Lp apresentam vantagem sobre as demais medidas avaliadas, devido ao seu menor custo computacional e por apresentar, em geral, maior precisão na previsão de dados temporais utilizando o algoritmo kNN-TSP. Apesar de na literatura, em geral, a medida Euclidiana ser adotada como medida de similaridade, a medida Manhattan pode ser considerada candidata interessante para deﬁnir a similaridade entre séries temporais, devido a não apresentar diferença estatisticamente signiﬁcativa com a medida Euclidiana e possuir menor custo computacional. A medida proposta neste trabalho, não apresenta resultados signiﬁcantes, mas apresenta-se promissora para novas pesquisas. Com relação ao estudo de caso, o algoritmo kNN-TSP, com apenas o parâmetro de medida de similaridade otimizado, alcança um erro consideravelmente inferior a melhor conﬁguração com MA, e pouco maior que as melhores conﬁgurações dos métodos SARIMA univariado e SARIMA multivariado, sendo essa diferença inferior a um por cento.

APA, Harvard, Vancouver, ISO, and other styles

5

Kharsikar, Saket. "A GENE ONTOLOGY BASED COMPUTATIONAL APPROACH FOR THE PREDICTION OF PROTEIN FUNCTIONS." University of Akron / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=akron1187026388.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Bertilsson, Tobias, and Romario Johansson. "Undersökning om hjulmotorströmmar kan användas som alternativ metod för kollisiondetektering i autonoma gräsklippare. : Klassificering av hjulmotorströmmar med KNN och MLP." Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH, Datateknik och informatik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-43555.

Full text

Abstract:

Purpose – The purpose of the study is to expand the knowledge of how wheel motor currents can be combined with machine learning to be used in a collision detection system for autonomous robots, in order to decrease the number of external sensors and open new design opportunities and lowering production costs. Method – The study is conducted with design science research where two artefacts are developed in a cooperation with Globe Tools Group. The artefacts are evaluated in how they categorize data given by an autonomous robot in the two categories collision and non-collision. The artefacts are then tested by generated data to analyse their ability to categorize. Findings – Both artefacts showed a 100 % accuracy in detecting the collisions in the given data by the autonomous robot. In the second part of the experiment the artefacts show that they have different decision boundaries in how they categorize the data, which will make them useful in different applications. Implications – The study contributes to an expanding knowledge in how machine learning and wheel motor currents can be used in a collision detection system. The results can lead to lowering production costs and opening new design opportunities. Limitations – The data used in the study is gathered by an autonomous robot which only did frontal collisions on an artificial lawn. Keywords – Machine learning, K-Nearest Neighbour, Multilayer Perceptron, collision detection, autonomous robots, Collison detection based on current.<br>Syfte – Studiens syfte är att utöka kunskapen om hur hjulmotorstömmar kan kombineras med maskininlärning för att användas vid kollisionsdetektion hos autonoma robotar, detta för att kunna minska antalet krävda externa sensorer hos dessa robotar och på så sätt öppna upp design möjligheter samt minska produktionskostnader Metod – Studien genomfördes med design science research där två artefakter utvecklades i samarbete med Globe Tools Group. Artefakterna utvärderades sedan i hur de kategoriserade kollisioner utifrån en given datamängd som genererades från en autonom gräsklippare. Studiens experiment introducerade sedan in data som inte ingick i samma datamängd för att se hur metoderna kategoriserade detta. Resultat – Artefakterna klarade med 100% noggrannhet att detektera kollisioner i den giva datamängden som genererades. Dock har de två olika artefakterna olika beslutsregioner i hur de kategoriserar datamängderna till kollision samt icke-kollisioner, vilket kan ge dom olika användningsområden Implikationer – Examensarbetet bidrar till en ökad kunskap om hur maskininlärning och hjulmotorströmmar kan användas i ett kollisionsdetekteringssystem. Studiens resultat kan bidra till minskade kostnader i produktion samt nya design möjligheter Begränsningar – Datamängden som användes i studien samlades endast in av en autonom gräsklippare som gjorde frontalkrockar med underlaget konstgräs. Nyckelord – Maskininlärning, K-nearest neighbor, Multi-layer perceptron, kollisionsdetektion, autonoma robotar

APA, Harvard, Vancouver, ISO, and other styles

7

Ozsakabasi, Feray. "Classification Of Forest Areas By K Nearest Neighbor Method: Case Study, Antalya." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/12609548/index.pdf.

Full text

Abstract:

Among the various remote sensing methods that can be used to map forest areas, the K Nearest Neighbor (KNN) supervised classification method is becoming increasingly popular for creating forest inventories in some countries. In this study, the utility of the KNN algorithm is evaluated for forest/non-forest/water stratification. Antalya is selected as the study area. The data used are composed of Landsat TM and Landsat ETM satellite images, acquired in 1987 and 2002, respectively, SRTM 90 meters digital elevation model (DEM) and land use data from the year 2003. The accuracies of different modifications of the KNN algorithm are evaluated using Leave One Out, which is a special case of K-fold cross-validation, and traditional accuracy assessment using error matrices. The best parameters are found to be Euclidean distance metric, inverse distance weighting, and k equal to 14, while using bands 4, 3 and 2. With these parameters, the cross-validation error is 0.009174, and the overall accuracy is around 86%. The results are compared with those from the Maximum Likelihood algorithm. KNN results are found to be accurate enough for practical applicability of this method for mapping forest areas.

APA, Harvard, Vancouver, ISO, and other styles

8

Neo, TohKoon. "A Direct Algorithm for the K-Nearest-Neighbor Classifier via Local Warping of the Distance Metric." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2168.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Mestre, Ricardo Jorge Palheira. "Improvements on the KNN classifier." Master's thesis, Faculdade de Ciências e Tecnologia, 2013. http://hdl.handle.net/10362/10923.

Full text

Abstract:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática<br>The object classification is an important area within the artificial intelligence and its application extends to various areas, whether or not in the branch of science. Among the other classifiers, the K-nearest neighbor (KNN) is among the most simple and accurate especially in environments where the data distribution is unknown or apparently not parameterizable. This algorithm assigns the classifying element the major class in the K nearest neighbors. According to the original algorithm, this classification implies the calculation of the distances between the classifying instance and each one of the training objects. If on the one hand, having an extensive training set is an element of importance in order to obtain a high accuracy, on the other hand, it makes the classification of each object slower due to its lazy-learning algorithm nature. Indeed, this algorithm does not provide any means of storing information about the previous calculated classifications,making the calculation of the classification of two equal instances mandatory. In a way, it may be said that this classifier does not learn. This dissertation focuses on the lazy-learning fragility and intends to propose a solution that transforms the KNNinto an eager-learning classifier. In other words, it is intended that the algorithm learns effectively with the training set, thus avoiding redundant calculations. In the context of the proposed change in the algorithm, it is important to highlight the attributes that most characterize the objects according to their discriminating power. In this framework, there will be a study regarding the implementation of these transformations on data of different types: continuous and/or categorical.

APA, Harvard, Vancouver, ISO, and other styles

10

Chucre, Mirla Rafaela Rafael Braga. "K-nearest neighbors queries in time-dependent road networks: analyzing scenarios where points of interest move to the query point." reponame:Repositório Institucional da UFC, 2015. http://www.repositorio.ufc.br/handle/riufc/23696.

Full text

Abstract:

CHUCRE, Mirla Rafaela Rafael Braga. K-nearest neighbors queries in time-dependent road networks: analyzing scenarios where points of interest move to the query point. 2015. 65 f. Dissertação (Mestrado em Ciência da Computação)-Universidade Federal do Ceará, Fortaleza, 2015.<br>Submitted by Jonatas Martins (jonatasmartins@lia.ufc.br) on 2017-06-29T12:26:58Z No. of bitstreams: 1 2015_dis_mrrbchucre.pdf: 15845328 bytes, checksum: a2e4d0a03ca943372c92852d4bcf7236 (MD5)<br>Approved for entry into archive by Rocilda Sales (rocilda@ufc.br) on 2017-06-29T13:54:36Z (GMT) No. of bitstreams: 1 2015_dis_mrrbchucre.pdf: 15845328 bytes, checksum: a2e4d0a03ca943372c92852d4bcf7236 (MD5)<br>Made available in DSpace on 2017-06-29T13:54:36Z (GMT). No. of bitstreams: 1 2015_dis_mrrbchucre.pdf: 15845328 bytes, checksum: a2e4d0a03ca943372c92852d4bcf7236 (MD5) Previous issue date: 2015<br>A kNN query retrieve the k points of interest that are closest to the query point, where proximity is computed from the query point to the points of interest. Time-dependent road networks are represented as weighted graphs, where the weight of an edge depends on the time one passes through that edge. This way, we can model periodic congestions during rush hour and similar effects. Travel time on road networks heavily depends on the traffic and, typically, the time a moving object takes to traverse a segment depends on departure time. In time-dependent networks, a kNN query, called TD-kNN, returns the k points of interest with minimum travel-time from the query point. As a more concrete example, consider the following scenario. Imagine a tourist in Paris who is interested to visit the touristic attraction closest from him/her. Let us consider two points of interest in the city, the Eiffel Tower and the Cathedral of Notre Dame. He/she asks a query asking for the touristic attraction whose the path leading up to it is the fastest at that time, the answer depends on the departure time. For example, at 10h it takes 10 minutes to go to the Cathedral. It is the nearest attraction. Although, if he/she asks the same query at 22h, in the same spatial point, the nearest attraction is the Eiffel Tower. In this work, we identify a variation of nearest neighbors queries in time-dependent road networks that has wide applications and requires novel algorithms for processing. Differently from TD-kNN queries, we aim at minimizing the travel time from points of interest to the query point. With this approach, a cab company can find the nearest taxi in time to a passenger requesting transportation. More specifically, we address the following query: find the k points of interest (e.g. taxi drivers) which can move to the query point (e.g. a taxi user) in the minimum amount of time. Previous works have proposed solutions to answer kNN queries considering the time dependency of the network but not computing the proximity from the points of interest to the query point. We propose and discuss a solution to this type of query which are based on the previously proposed incremental network expansion and use the A∗ search algorithm equipped with suitable heuristic functions. We also discuss the design and correctness of our algorithm and present experimental results that show the efficiency and effectiveness of our solution.<br>Uma consulta de vizinhos mais próximos (ou kNN, do inglês k nearest neighbours) recupera o conjunto de k pontos de interesse que são mais próximos a um ponto de consulta, onde a proximidade é computada do ponto de consulta para cada ponto de interesse. Nas redes de rodovias tradicionais (estáticas) o custo de deslocamento de um ponto a outro é dado pela distância física entre esses dois pontos. Por outro lado, nas redes dependentes do tempo o custo de deslocamento (ou seja, o tempo de viagem) entre dois pontos varia de acordo com o instante de partida. Nessas redes, as consultas kNN são denominadas TD-kNN (do inglês Time-Dependent kNN). As redes de rodovias dependentes do tempo representam de forma mais adequada algumas situações reais, como, por exemplo, o deslocamento em grandes centros urbanos, onde o tempo para se deslocar de um ponto a outro durante os horários de pico, quando o tráfego é intenso e as ruas estão congestionadas, é muito maior do que em horários normais. Neste contexto, uma consulta típica consiste em descobrir os k restaurantes (pontos de interesse) mais próximos de um determinado cliente (ponto de consulta) caso este inicie o seu deslocamento ao meio dia. Nesta dissertação nós estudamos o problema de processar uma variação de consulta de vizinhos mais próximos em redes viárias dependentes do tempo. Diferentemente das consultas TD-kNN, onde a proximidade é calculada do ponto de consulta para um determinado ponto de interesse, estamos interessados em situações onde a proximidade deve ser calculada de um ponto de interesse para o ponto de consulta. Neste caso, uma consulta típica consiste em descobrir os k taxistas (pontos de interesse) mais próximos (ou seja, com o menor tempo de viagem) de um determinado cliente (ponto de consulta) caso eles iniciem o seu deslocamento até o referido cliente ao meio dia. Desta forma, nos cenários investigados nesta dissertação, são os pontos de interesse que se deslocam até o ponto de consulta, e não o contrário. O método proposto para executar este tipo de consulta aplica uma busca A∗ à medida que vai, de maneira incremental, explorando a rede. O objetivo do método é reduzir o percentual da rede avaliado na busca. A construção e a corretude do método são discutidas e são apresentados resultados experimentais com dados reais e sintéticos que mostram a eficiência da solução proposta.

APA, Harvard, Vancouver, ISO, and other styles

11

Torres, Winnie de Lima. "Detecção de desvios vocais utilizando modelos auto regressivos e o algoritmo KNN." PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃO, 2018. https://repositorio.ufrn.br/jspui/handle/123456789/25105.

Full text

Abstract:

Submitted by Automação e Estatística (sst@bczm.ufrn.br) on 2018-05-02T22:45:42Z No. of bitstreams: 1 WinnieDeLimaTorres_DISSERT.pdf: 1538022 bytes, checksum: ad6fc16589291a27b8b718b755afdf44 (MD5)<br>Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2018-05-07T21:40:35Z (GMT) No. of bitstreams: 1 WinnieDeLimaTorres_DISSERT.pdf: 1538022 bytes, checksum: ad6fc16589291a27b8b718b755afdf44 (MD5)<br>Made available in DSpace on 2018-05-07T21:40:35Z (GMT). No. of bitstreams: 1 WinnieDeLimaTorres_DISSERT.pdf: 1538022 bytes, checksum: ad6fc16589291a27b8b718b755afdf44 (MD5) Previous issue date: 2018-01-30<br>Alguns campos da ciência propõem-se a estudar distúrbios no trato vocal a partir de análises sobre padrões de vibração da voz. Em geral, a importância dessas pesquisas está na identificação, em uma fase mais específica, de doenças de maior ou menor gravidade, a serem sanadas com terapia vocal ou que requerem maior atenção, gerando inclusive a necessidade de procedimentos cirúrgicos para o seu controle. Embora, já exista na literatura indicações de que o processamento digital de sinais permite diagnosticar, de um modo não invasivo, patologias laríngeas, como doenças vocais que ocasionem edema, nódulo e paralisia, não existe definição do método mais indicado e das características, ou parâmetros, mais adequados para detectar a presença de desvios vocais. Sendo assim, neste trabalho é proposto um algoritmo para detecção de desvios vocais por meio da análise de sinais de voz. Para a realização deste trabalho, utilizou-se dados constantes no banco de dados Disordered Voice Database, desenvolvido pelo Massachusetts Eye and Ear Infirmary (MEEI), devido sua utilização em pesquisas na área acústica de voz. Foram utilizados 166 sinais contidos nessa base de dados, com sinais de vozes saudáveis e de vozes patológicas afetadas por edema, por nódulo e por paralisia nas pregas vocais. A partir dos sinais de voz, foram gerados modelos Auto Regressivos (AR e ARMA) para representação desses sinais e, utilizando os parâmetros dos modelos obtidos, foi utilizado o algoritmo K-Nearest Neighbors (KNN) para a classificação dos sinais analisados. Com o intuito de analisar a eficiência do algoritmo proposto neste estudo, os resultados obtidos desse algoritmo foram comparados com um método de detecção considerando apenas distância euclidiana entre os sinais. Os resultados encontrados apontam que o método proposto neste trabalho apresenta um bom resultado, gerando uma taxa de acerto na classificação acima de 71% (maior que os 31% a partir do uso da distância euclidiana). Além disso, o método utilizado é de fácil implementação, podendo ser utilizado em hardwares mais simples. Logo, essa pesquisa tem potencial para gerar um classificador barato e acessível para a utilização em larga escala por profissionais de saúde, como uma alternativa de pré análise não invasiva para detecção de patologias otorrinolaringológicas que afetem a voz.<br>Some fields in Science propose to study vocal tract disorders from an analysis about voice vibration patterns. Generally, the weight of those researches is given by the identification – in a more specific level – of diseases in different stages of severity, which would be redressed through voice therapy or means that require more attention, hence generating the need of surgical procedures for its control. Although there are evidences in literature that the Digital Signal Processing allows a non-invasive diagnosis of laryngeal pathologies, such as vocal cord disorders, which provoke swelling, nodules, and paralyses, there is no definition of any most indicated method, and characteristics or appropriated parameters to detect voice deviations. Thus, the present paper proposes an algorithm to detect vocal deviances through the Voice Signal Analysis. In order to complete this study, it had been used data from the Disordered Voice Database, developed by the Massachusetts Eye and Ear Infirmary (MEEI) due to their wide use in researches regarding the voice and speech. A total of 166 signals from this database were used, including healthy voices and pathologic voices affected by swelling, nodule, and vocal fold paralysis. From the voice signals, autoregressive processes of order (AR and ARMA) were generated for a representation of those signals, and – by using the models’ parameters obtained – it had been used the KNN algorithm for a classification of the signals analyzed. Seeking an analysis of the efficiency of the algorithm proposed in this study, the results obtained from this algorithm were compared to a detection method, which only considers the Euclidian distance between the signals. The results found point that the propositioned method in this work presents a satisfactory result, generating a hit rate on the classification above 71% (more than the 31% from the use of the Euclidian distance). Moreover, the method used is easy to implement, so that it can be used along with simpler hardware. Consequently, this research has the potential to generate a cheap and accessible sorter for wide-scale use by health care professionals as a non-invasive pre-analysis to detect otorhinolaryngological pathologies that affect the voice.

APA, Harvard, Vancouver, ISO, and other styles

12

VANCE, DANNY W. "AN ALL-ATTRIBUTES APPROACH TO SUPERVISED LEARNING." University of Cincinnati / OhioLINK, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1162335608.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Landmér, Pedersen Jesper. "Weighing Machine Learning Algorithms for Accounting RWISs Characteristics in METRo : A comparison of Random Forest, Deep Learning & kNN." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-85586.

Full text

Abstract:

The numerical model to forecast road conditions, Model of the Environment and Temperature of Roads (METRo), laid the foundation of solving the energy balance and calculating the temperature evolution of roads. METRo does this by providing a numerical modelling system making use of Road Weather Information Stations (RWIS) and meteorological projections. While METRo accommodates tools for correcting errors at each station, such as regional differences or microclimates, this thesis proposes machine learning as a supplement to the METRo prognostications for accounting station characteristics. Controlled experiments were conducted by comparing four regression algorithms, that is, recurrent and dense neural network, random forest and k-nearest neighbour, to predict the squared deviation of METRo forecasted road surface temperatures. The results presented reveal that the models utilising the random forest algorithm yielded the most reliable predictions of METRo deviations. However, the study also presents the promise of neural networks and the ability and possible advantage of seasonal adjustments that the networks could offer.

APA, Harvard, Vancouver, ISO, and other styles

14

Stümer, Wolfgang. "Kombination von terrestrischen Aufnahmen und Fernerkundungsdaten mit Hilfe der kNN-Methode zur Klassifizierung und Kartierung von Wäldern." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2004. http://nbn-resolving.de/urn:nbn:de:swb:14-1096379861218-08302.

Full text

Abstract:

Bezüglich des Waldes hat sich in den letzten Jahren seitens der Politik und Wirtschaft ein steigender Informationsbedarf entwickelt. Zur Bereitstellung dieses Bedarfes stellt die Fernerkundung ein wichtiges Hilfsmittel dar, mit dem sich flächendeckende Datengrundlagen erstellen lassen. Die k-nächsten-Nachbarn-Methode (kNN-Methode), die terrestrische Aufnahmen mit Fernerkundungsdaten kombiniert, stellt eine Möglichkeit dar, diese Datengrundlage mit Hilfe der Fernerkundung zu verwirklichen. Deshalb beschäftigt sich die vorliegende Dissertation eingehend mit der kNN-Methode. An Hand der zwei Merkmale Grundfläche (metrische Daten) und Totholz (kategoriale Daten) wurden umfangreiche Berechnungen durchgeführt, wobei verschiedenste Variationen der kNN-Methode berücksichtigt wurden. Diese Variationen umfassen verschiedenste Einstellungen der Distanzfunktion, der Wichtungsfunktion und der Anzahl k-nächsten Nachbarn. Als Fernerkundungsdatenquellen kamen Landsat- und Hyperspektraldaten zum Einsatz, die sich sowohl von ihrer spektralen wie auch ihrer räumlichen Auflösung unterscheiden. Mit Hilfe von Landsat-Szenen eines Gebietes von verschiedenen Zeitpunkten wurde außerdem der multitemporale Ansatz berücksichtigt. Die terrestrische Datengrundlage setzt sich aus Feldaufnahmen mit verschiedenen Aufnahmedesigns zusammen, wobei ein wichtiges Kriterium die gleichmäßige Verteilung von Merkmalswerten (z.B. Grundflächenwerten) über den Merkmalsraum darstellt. Für die Durchführung der Berechnungen wurde ein Programm mit Visual Basic programmiert, welches mit der Integrierung aller Funktionen auf der Programmoberfläche eine benutzerfreundliche Bedienung ermöglicht. Die pixelweise Ausgabe der Ergebnisse mündete in detaillierte Karten und die Verifizierung der Ergebnisse wurde mit Hilfe des prozentualen Root Mean Square Error und der Bootstrap-Methode durchgeführt. Die erzielten Genauigkeiten für das Merkmal Grundfläche liegen zwischen 35 % und 67 % (Landsat) bzw. zwischen 65 % und 67 % (HyMapTM). Für das Merkmal Totholz liegen die Übereinstimmungen zwischen den kNN-Schätzern und den Referenzwerten zwischen 60,0 % und 73,3 % (Landsat) und zwischen 60,0 % und 63,3 % (HyMapTM). Mit den erreichten Genauigkeiten bietet sich die kNN-Methode für die Klassifizierung von Beständen bzw. für die Integrierung in Klassifizierungsverfahren an<br>Mapping forest variables and associated characteristics is fundamental for forest planning and management. The following work describes the k-nearest neighbors (kNN) method for improving estimations and to produce maps for the attributes basal area (metric data) and deadwood (categorical data). Several variations within the kNN-method were tested, including: distance metric, weighting function and number of neighbors. As sources of remote sensing Landsat TM satellite images and hyper spectral data were used, which differ both from their spectral as well as their spatial resolutions. Two Landsat scenes from the same area acquired September 1999 and 2000 regard multiple approaches. The field data for the kNN- method comprise tree field measurements which were collected from the test site Tharandter Wald (Germany). The three field data collections are characterized by three different designs. For the kNN calculation a program with integration all kNN functions were developed. The relative root mean square errors (RMSE) and the Bootstrap method were evaluated in order to find optimal parameters. The estimation accuracy for the attribute basal area is between 35 % and 67 % (Landsat) and 65 % and 67 % (HyMapTM). For the attribute deadwood is the accuracy between 60 % and 73 % (Landsat) and 60 % and 63 % (HyMapTM). Recommendations for applying the kNN method for mapping and regional estimation are provided

APA, Harvard, Vancouver, ISO, and other styles

15

Axillus, Viktor. "Comparing Julia and Python : An investigation of the performance on image processing with deep neural networks and classification." Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-19160.

Full text

Abstract:

Python is the most popular language when it comes to prototyping and developing machine learning algorithms. Python is an interpreted language that causes it to have a significant performance loss compared to compiled languages. Julia is a newly developed language that tries to bridge the gap between high performance but cumbersome languages such as C++ and highly abstracted but typically slow languages such as Python. However, over the years, the Python community have developed a lot of tools that addresses its performance problems. This raises the question if choosing one language over the other has any significant performance difference. This thesis compares the performance, in terms of execution time, of the two languages in the machine learning domain. More specifically, image processing with GPU-accelerated deep neural networks and classification with k-nearest neighbor on the MNIST and EMNIST dataset. Python with Keras and Tensorflow is compared against Julia with Flux for GPU-accelerated neural networks. For classification Python with Scikit-learn is compared against Julia with Nearestneighbors.jl. The results point in the direction that Julia has a performance edge in regards to GPU-accelerated deep neural networks. With Julia outperforming Python by roughly 1.25x − 1.5x. For classification with k-nearest neighbor the results were a bit more varied with Julia outperforming Python in 5 out of 8 different measurements. However, there exists some validity threats and additional research is needed that includes all different frameworks available for the languages in order to provide a more conclusive and generalized answer.

APA, Harvard, Vancouver, ISO, and other styles

16

Alsouda, Yasser. "An IoT Solution for Urban Noise Identification in Smart Cities : Noise Measurement and Classification." Thesis, Linnéuniversitetet, Institutionen för fysik och elektroteknik (IFE), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-80858.

Full text

Abstract:

Noise is defined as any undesired sound. Urban noise and its effect on citizens area significant environmental problem, and the increasing level of noise has become a critical problem in some cities. Fortunately, noise pollution can be mitigated by better planning of urban areas or controlled by administrative regulations. However, the execution of such actions requires well-established systems for noise monitoring. In this thesis, we present a solution for noise measurement and classification using a low-power and inexpensive IoT unit. To measure the noise level, we implement an algorithm for calculating the sound pressure level in dB. We achieve a measurement error of less than 1 dB. Our machine learning-based method for noise classification uses Mel-frequency cepstral coefficients for audio feature extraction and four supervised classification algorithms (that is, support vector machine, k-nearest neighbors, bootstrap aggregating, and random forest). We evaluate our approach experimentally with a dataset of about 3000 sound samples grouped in eight sound classes (such as car horn, jackhammer, or street music). We explore the parameter space of the four algorithms to estimate the optimal parameter values for the classification of sound samples in the dataset under study. We achieve noise classification accuracy in the range of 88% – 94%.

APA, Harvard, Vancouver, ISO, and other styles

17

Rekathati, Faton. "Curating news sections in a historical Swedish news corpus." Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166313.

Full text

Abstract:

The National Library of Sweden uses optical character recognition software to digitize their collections of historical newspapers. The purpose of such software is first to automatically segment text and images from scanned newspaper pages, and second to read the contents of the identified text regions. While the raw text is often digitized successfully, important contextual information regarding whether the text constitutes for example a header, a section title or the body text of an article is not captured. These characteristics are easy for a human to distinguish, yet they remain difficult for a machine to recognize. The main purpose of this thesis is to investigate how well section titles in the newspaper Svenska Dagbladet can be classified by using so called image embeddings as features. A secondary aim is to examine whether section titles become harder to classify in older newspaper data. Lastly, we explore if manual annotation work can be reduced using the predictions of a semi-supervised classifier to help in the labeling process. Results indicate the use of image embeddings help quite substantially in classifying section titles. Datasets from three different time periods: 1990-1997, 2004-2013, and 2017 and onwards were sampled and annotated. The best performing model (Xgboost) achieved macro F1 scores of 0.886, 0.936 and 0.980 for the respective time periods. The results also showed classification became more difficult on older newspapers. Furthermore, a semi-supervised classifier managed an average precision of 83% with only single section title examples, showing promise as way to speed up manual annotation of data.

APA, Harvard, Vancouver, ISO, and other styles

18

"SPSR Efficient Processing of Socially k-Nearest Neighbors with Spatial Range Filter." Master's thesis, 2016. http://hdl.handle.net/2286/R.I.40219.

Full text

Abstract:

abstract: Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users. And with rapid increase in the usage of mobile phones and wearables, social media data is being tied to spatial networks. This research document proposes an efficient technique that answers socially k-Nearest Neighbors with Spatial Range Filter. The proposed approach performs a joint search on both the social and spatial domains which radically improves the performance compared to straight forward solutions. The research document proposes a novel index that combines social and spatial indexes. In other words, graph data is stored in an organized manner to filter it based on spatial (region of interest) and social constraints (top-k closest vertices) at query time. That leads to pruning necessary paths during the social graph traversal procedure, and only returns the top-K social close venues. The research document then experimentally proves how the proposed approach outperforms existing baseline approaches by at least three times and also compare how each of our algorithms perform under various conditions on a real geo-social dataset extracted from Yelp.<br>Dissertation/Thesis<br>Masters Thesis Computer Science 2016

APA, Harvard, Vancouver, ISO, and other styles

19

Pei-ChiLiu and 劉佩琦. "An Efficient Processing Framework for Multiple k Nearest Neighbor (kNN) Queries." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/30331584107279865951.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Σαψάνης, Χρήστος. "Αναγνώριση βασικών κινήσεων του χεριού με χρήση ηλεκτρομυογραφήματος". Thesis, 2013. http://hdl.handle.net/10889/6420.

Full text

Abstract:

Ο στόχος αυτής της εργασίας ήταν η αναγνώριση έξι βασικών κινήσεων του χεριού με χρήση δύο συστημάτων. Όντας θέμα διεπιστημονικού επιπέδου έγινε μελέτη της ανατομίας των μυών του πήχη, των βιοσημάτων, της μεθόδου της ηλεκτρομυογραφίας (ΗΜΓ) και μεθόδων αναγνώρισης προτύπων. Παράλληλα, το σήμα περιείχε αρκετό θόρυβο και έπρεπε να αναλυθεί, με χρήση του EMD, να εξαχθούν χαρακτηριστικά αλλά και να μειωθεί η διαστασιμότητά τους, με χρήση των RELIEF και PCA, για βελτίωση του ποσοστού επιτυχίας ταξινόμησης. Στο πρώτο μέρος γίνεται χρήση συστήματος ΗΜΓ της Delsys αρχικά σε ένα άτομο και στη συνέχεια σε έξι άτομα με το κατά μέσο όρο επιτυχημένης ταξινόμησης, για τις έξι αυτές κινήσεις, να αγγίζει ποσοστά άνω του 80%. Το δεύτερο μέρος περιλαμβάνει την κατασκευή αυτόνομου συστήματος ΗΜΓ με χρήση του Arduino μικροελεγκτή, αισθητήρων ΗΜΓ και ηλεκτροδίων, τα οποία είναι τοποθετημένα σε ένα ελαστικό γάντι. Τα αποτελέσματα ταξινόμησης σε αυτή την περίπτωση αγγίζουν το 75%.<br>The aim of this work was to identify six basic movements of the hand using two systems. Being an interdisciplinary topic, there has been conducted studying in the anatomy of forearm muscles, biosignals, the method of electromyography (EMG) and methods of pattern recognition. Moreover, the signal contained enough noise and had to be analyzed, using EMD, to extract features and to reduce its dimensionality, using RELIEF and PCA, to improve the success rate of classification. The first part uses an EMG system of Delsys initially for an individual and then for six people with the average successful classification, for these six movements at rates of over 80%. The second part involves the construction of an autonomous system EMG using an Arduino microcontroller, EMG sensors and electrodes, which are arranged in an elastic glove. Classification results in this case reached 75% of success.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!