Log in

Relevant bibliographies by topics / K-Means Cluster (K-means) / Dissertations / Theses

To see the other types of publications on this topic, follow the link: K-Means Cluster (K-means).

Dissertations / Theses on the topic 'K-Means Cluster (K-means)'

Author: Grafiati

Published: 5 June 2025

Last updated: 24 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'K-Means Cluster (K-means).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Hong, Sui. "Experiments with K-Means, Fuzzy c-Means and Approaches to Choose K and C." Honors in the Major Thesis, University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/1224.

Full text

Abstract:

This item is only available in print in the UCF Libraries. If this is your Honors Thesis, you can help us make it available online for use by researchers around the world by following the instructions on the distribution consent form at http://library.ucf<br>Bachelors<br>Engineering and Computer Science<br>Computer Engineering

APA, Harvard, Vancouver, ISO, and other styles

2

Thibodeau, Éric. "Profiling and optimizing K-means algorithms in a beowulf cluster environment." Mémoire, École de technologie supérieure, 2009. http://espace.etsmtl.ca/85/1/THIBODEAU_%C3%89ric.pdf.

Full text

Abstract:

L'algorithme d'agglomeration statistique K-means sert a classer des bases de donnees non libellees en K groupes. Faisant partie de la fonction d'evaluation d'un Algorithme Ecolutionnaire (AE), I'optimisation de ce dernier est devenu un point d'interet. Malgre les multiples approches proposees pour son optimisation et sa parallelisation, tres pen de recherche s'est attardee aux questions entourant la performance et I'efficacite parallele des implantations. Dans la plupart des cas, les descriptions entourant I'environnement d'execution demeurent opaques et la presentation precise de profiles d'execution est souvent absente. Nous pallions a ces lacunes en presentant une description detaillee de deux environnements, le grappes de calcul Beowulf et les machines paralleles de type Symmertric Multi-Processors (SMP). Une combinaison de modeles theoriques et empirique sert ensuite d'etalon dans la mesure de performance du K-means dans ces environnements. Etant la necessite d'une expertise pluridisciplinaire, une utilisation detaillee de la suite d'outils Tuning and Analysis Utilities (TAU) est presentee pour simplifier la tache du profilage de code parallele. Couplee aux compteurs haute precisions foumies par I'interface Performance Applicafion Programming Interface (PAPI), nous presentons une approche «grey box »ayant permis de muter une implementafion parallele maitre-esclave du K-means vers une version hautement efficace utilisant le paradigme d'llots de calculs. Les optimisations sont guidees grace a 1'utilisation des modeles theoriques et empiriques que nous avons obtenus. Notre travail revele que I'opfimisation de programmes paralleles releve de bien plus qu'un equilibre entre calcul et communications. Nous revelons les impacts negatifs de I'utilisation de bibliotheques de fonctions mathematiques ainsi que de certaines versions des bibliotheques de communications. Un profile d'execution de haute precisions a permis d'etablir que la representation et le pre-traitement des donnees peuvent s'averer etre plus couteux que le calcul et les communications combines.

APA, Harvard, Vancouver, ISO, and other styles

3

Zhao, Jianmin. "Optimal Clustering: Genetic Constrained K-Means and Linear Programming Algorithms." VCU Scholars Compass, 2006. http://hdl.handle.net/10156/1583.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Quinteiro, José António Teixeira. "Segmentação de individuos no Facebook que gostam de música: abordagem exploratória, recorrendo à comparação entre dois algoritmos, k-means e fuzzy c-means." Master's thesis, Instituto Superior de Economia e Gestão, 2011. http://hdl.handle.net/10400.5/4338.

Full text

Abstract:

Mestrado em Gestão/MBA<br>Para se poder definir os melhores planos estratégicos, as decisões de marketing que se têm que tomar, com o intuito de abordar o mercado, escolher a melhor campanha publicitária, seleccionar o segmento e o tipo de produto ou serviço a oferecer, têm que ter por base o resultado de uma boa análise técnica da informação ou dos dados disponíveis. A escolha do método de segmentação, é de primordial importância, pois os dados que se obtêm podem alterar a estratégia de selecção do mercado alvo e a estratégia de posicionamento dos produtos ou serviços, para além dos custos inerentes á tomada da decisão. Este estudo procura encontrar diferenças entre dois métodos de segmentação descritivos post-hoc, (k-means e Fuzzy C-Means), na obtenção dos clusters, tendo por base a população portuguesa que gosta de música e que tem conta activa no Facebook. No âmbito deste trabalho realizou-se uma revisão da literatura conhecida tendo-se efectuado a segmentação da amostra obtida através de dois algoritmos. Complementou-se o estudo com uma análise descritiva das frequências de modo, aquisição e audição dos vários tipos de música.<br>In order to define the best strategic plans, marketing decisions that have to be taken in order to tackle the market, choose the best advertising campaign, select the thread and the type of product or service to offer, they have to be based on the result of a good technical analysis of available data or information. The choice of segmentation method is of paramount importance, since the data obtained may change the target market selection and the strategy of placement of products or services, in addition to the costs related to taking the decision. This study seeks to find differences between two methods of descriptive post-hoc segmentation (k-means clustering and Fuzzy C-Means clustering), in obtaining of clusters, based on the Portuguese population who likes music and have an active account on Facebook. This work there was a review of the literature known followed by the segmentation of the sample obtained through two algorithms. These were complemented with a descriptive analysis of usage situations, acquisition and hearing of various types of music.

APA, Harvard, Vancouver, ISO, and other styles

5

CALENDER, CHRISTOPHER R. "APPROXIMATE N-NEAREST NEIGHBOR CLUSTERING ON DISTRIBUTED DATABASES USING ITERATIVE REFINEMENT." University of Cincinnati / OhioLINK, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1092929952.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Reanier, Richard Eugene. "Refinements to K-means clustering : spatial analysis of the Bateman site, arctic Alaska /." Thesis, Connect to this title online; UW restricted, 1992. http://hdl.handle.net/1773/6420.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Castillo, Gregorio Alfonso. "A K-MEANS BASED WATERSHED IMAGING SEGMENTATION ALGORITHM FOR BANANA CLUSTER QUALITY INSPECTION." OpenSIUC, 2016. https://opensiuc.lib.siu.edu/theses/2037.

Full text

Abstract:

Banana has become the most commonly consumed fresh fruit among US population. It is a challenge to use computer vision to divide touching bananas, for this purpose a novel image segmentation algorithm is proposed, combining k-means and the watershed transformation. The first part is to extract the background, achieved using a K-means based in the HS space, the second part is individual banana segmentation where a smarter selection of the initial markers from where the watershed transformation grows is attained fusing two morphological filters with different structural elements. The validation of the proposed algorithm has been conducted using 124 experimentally capture banana pictures manually segmented. For background extraction K-means in HS space produced the best performance over the other two tested (Otsu, K-means(L*a*b*), getting average a F1 Score average of 96.99%, Otsu and K-means(L*a*b*) scored 82.58% and 88.06% respectively. The result of the watershed segmentation was also compared with the manual segmentation; The overall performance using the F1 Score in average is 92.28%. The performance would improve with modifications to the system, including a more homogenous illumination, only allowing certain positions to be possible for the bananas cluster, and a more adequate background selection.

APA, Harvard, Vancouver, ISO, and other styles

8

Schorsch, Andrea. "Statistische Eigenschaften von Clusterverfahren." Master's thesis, Universität Potsdam, 2008. http://opus.kobv.de/ubp/volltexte/2009/2902/.

Full text

Abstract:

Die vorliegende Diplomarbeit beschäftigt sich mit zwei Aspekten der statistischen Eigenschaften von Clusterverfahren. Zum einen geht die Arbeit auf die Frage der Existenz von unterschiedlichen Clusteranalysemethoden zur Strukturfindung und deren unterschiedlichen Vorgehensweisen ein. Die Methode des Abstandes zwischen Mannigfaltigkeiten und die K-means Methode liefern ausgehend von gleichen Daten unterschiedliche Endclusterungen. Der zweite Teil dieser Arbeit beschäftigt sich näher mit den asymptotischen Eigenschaften des K-means Verfahrens. Hierbei ist die Menge der optimalen Clusterzentren konsistent. Bei Vergrößerung des Stichprobenumfangs gegen Unendlich konvergiert diese in Wahrscheinlichkeit gegen die Menge der Clusterzentren, die das Varianzkriterium minimiert. Ebenfalls konvergiert die Menge der optimalen Clusterzentren für n gegen Unendlich gegen eine Normalverteilung. Es hat sich dabei ergeben, dass die einzelnen Clusterzentren voneinander abhängen.<br>The following thesis describes two different views onto the statistical characterics of clustering procedures. At first it adresses the questions whether different clustering methods exist to ascertain the structure of clusters and in what ays the strategies of these methods differ from each other. The method of distance between the manifolds as well as the k-means method provide different final clusters based on equal initial data. The second part of the thesis concentrates on asymptotic properties of the k-means procedure. Here the amount of optimal clustering centres is consistent. If the size of the sample range is enlarged towards infinity, it also converges in probability towards the amount of clustering centres which minimized the whithin cluster sum of squares. Likewise the amount of optimal clustering centres converges for infinity towards the normal distribution. The main result shows that the individual clustering centres are dependent on each other.

APA, Harvard, Vancouver, ISO, and other styles

9

Hou, Jun. "Using Hadoop to Cluster Data in Energy System." University of Dayton / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1430092547.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Rogers, Matthew Alan. "Properties of the tropical hydrologic cycle as analyzed through 3-dimensional k-means cluster analysis." online access from Digital Dissertation Consortium, 2008. http://libweb.cityu.edu.hk/cgi-bin/er/db/ddcdiss.pl?3332703.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Buck, Robert. "Cluster-Based Salient Object Detection Using K-Means Merging and Keypoint Separation with Rectangular Centers." DigitalCommons@USU, 2016. https://digitalcommons.usu.edu/etd/4631.

Full text

Abstract:

The explosion of internet traffic, advent of social media sites such as Facebook and Twitter, and increased availability of digital cameras has saturated life with images and videos. Never before has it been so important to sift quickly through large amounts of digital information. Salient Object Detection (SOD) is a computer vision topic that finds methods to locate important objects in pictures. SOD has proven to be helpful in numerous applications such as image forgery detection and traffic sign recognition. In this thesis, I outline a novel SOD technique to automatically isolate important objects from the background in images.

APA, Harvard, Vancouver, ISO, and other styles

12

Camara, Assa. "Využití fuzzy množin ve shlukové analýze se zaměřením na metodu Fuzzy C-means Clustering." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2020. http://www.nusl.cz/ntk/nusl-417051.

Full text

Abstract:

This master thesis deals with cluster analysis, more specifically with clustering methods that use fuzzy sets. Basic clustering algorithms and necessary multivariate transformations are described in the first chapter. In the practical part, which is in the third chapter we apply fuzzy c-means clustering and k-means clustering on real data. Data used for clustering are the inputs of chemical transport model CMAQ. Model CMAQ is used to approximate concentration of air pollutants in the atmosphere. To the data we will apply two different clustering methods. We have used two different methods to select optimal weighting exponent to find data structure in our data. We have compared all 3 created data structures. The structures resembled each other but with fuzzy c-means clustering, one of the clusters did not resemble any of the clustering inputs. The end of the third chapter is dedicated to an attempt to find a regression model that finds the relationship between inputs and outputs of model CMAQ.

APA, Harvard, Vancouver, ISO, and other styles

13

Oliveira, Max Gontijo de. "Sistema de localização de facilidades: uma abordagem para mensuração de pontos de demanda e localização de facilidades." Universidade Federal de Goiás, 2012. http://repositorio.bc.ufg.br/tede/handle/tede/5512.

Full text

Abstract:

Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2016-04-27T11:59:30Z No. of bitstreams: 2 Dissertação - Max Gontijo de Oliveira - 2012.pdf: 3940401 bytes, checksum: 9d69259096bb8d7b7239f7eb20579d8d (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)<br>Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2016-04-27T12:01:50Z (GMT) No. of bitstreams: 2 Dissertação - Max Gontijo de Oliveira - 2012.pdf: 3940401 bytes, checksum: 9d69259096bb8d7b7239f7eb20579d8d (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)<br>Made available in DSpace on 2016-04-27T12:01:50Z (GMT). No. of bitstreams: 2 Dissertação - Max Gontijo de Oliveira - 2012.pdf: 3940401 bytes, checksum: 9d69259096bb8d7b7239f7eb20579d8d (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2012-10-08<br>Several organizations need to solve the problem of locate and allocate facilities within a geographic area. There are location/allocation problems in various situations, like the distribution of police cars, ambulances, taxi drivers, bus stops among other numerous situations where the location of such facilities is strategic for organization. In location/allocation problems, usually is necessary allocate each demand point to the closest facility. So, each facility will be located in the center of demand points, considering the demand as weight. However, the majority of the real location problems have capacity constraint. Therefore, each facility has a certain capacity based on the type of demand. Facility location problems can be continuous or discrete. In continuous problems (also called Weber problem with multiple sources), any point in the plane is a potential site for the instalation of the facility. There are several approaches for working with continuous models. Furthermore, there are many others works approaches presenting models with capacity constraint. But most of these approaches turns the continous model to a discrete model. The objective of this work thesis is to present an approach to distribution of facilities in instances of the capacitated facility location problem. A case study will be presented with the purpose of evaluating the results.<br>Diversas organizações precisam lidar com o problema de localizar e alocar facilidades em uma região geográfica. Problemas de localização e alocação podem ser vistos, por exemplo, na distribuição de viaturas policiais, ambulâncias, viaturas de contenção de falhas em redes elétricas, taxistas, pontos de ônibus dentre outras inúmeras situações onde a localização de tais facilidades é um fator estratégico para a organização. Em problemas de localização/alocação de facilidades, geralmente aloca-se cada ponto de demanda à facilidade mais próxima e, localiza-se essa facilidade no centro dos pontos de demanda, considerando o valor da demanda como peso nessa distância. Entretanto, comumente, problemas reais de localização de facilidades possuem restrição de capacidade. Assim, cada facilidade possui uma certa capacidade em função do tipo de demanda. Problemas de localização de facilidades podem ser contínuos ou discretos. Em problemas contínuos (também chamados de problema de Weber com múltiplas fontes), qualquer ponto no plano é um potencial local para se instalar uma facilidade. Existem várias abordagens para trabalhar com modelos contínuos e outras tantas para trabalhar com modelos com restrição de capacidade, mas a maioria dessas abordagens realiza uma discretização do modelo. Assim, o objetivo desse trabalho é apresentar uma abordagem para gerar boas distribuições de facilidades para o problema de localização/alocação contínuo com restrição de capacidade. Um caso de estudo será apresentado com a finalidade de avaliar os resultados obtidos.

APA, Harvard, Vancouver, ISO, and other styles

14

Leisch, Friedrich. "Bagged clustering." SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, 1999. http://epub.wu.ac.at/1272/1/document.pdf.

Full text

Abstract:

A new ensemble method for cluster analysis is introduced, which can be interpreted in two different ways: As complexity-reducing preprocessing stage for hierarchical clustering and as combination procedure for several partitioning results. The basic idea is to locate and combine structurally stable cluster centers and/or prototypes. Random effects of the training set are reduced by repeatedly training on resampled sets (bootstrap samples). We discuss the algorithm both from a more theoretical and an applied point of view and demonstrate it on several data sets. (author's abstract)<br>Series: Working Papers SFB "Adaptive Information Systems and Modelling in Economics and Management Science"

APA, Harvard, Vancouver, ISO, and other styles

15

Abualhaj, Bedor [Verfasser], and Frederik [Akademischer Betreuer] Wenz. "[18F]FET-PET brain image segmentation using k-means: Evaluation of five cluster validity indices / Bedor Abualhaj ; Betreuer: Frederik Wenz." Heidelberg : Universitätsbibliothek Heidelberg, 2017. http://d-nb.info/1178010406/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Palomino, Arce Magda Cristina, and Calhua Renee Michael Morales. "Aplicación de clustering utilizando K-means para la segmentación de clientes en una empresa de televisión paga." Bachelor's thesis, Universidad Nacional Mayor de San Marcos, 2015. https://hdl.handle.net/20.500.12672/8829.

Full text

Abstract:

Publicación a texto completo no autorizada por el autor<br>En la actualidad las empresas tienen gran cantidad de información de sus clientes, esta información es vital para que puedan realizar acciones tácticas que los permitan mantenerse en el mercado. Las empresas de telecomunicaciones son más sensibles a la satisfacción de sus clientes pues su rentabilidad se basa en la cantidad de tiempo que el cliente permanezca con ellos, sobre todo los que generan mayor valor a la compañía. En el caso de la empresa de tv paga, requiere conocer qué clientes son los que generan mayor valor a la compañía, así poder ejecutar acciones de fidelización a estos clientes. La mejor técnica identificada es la de Clustering, con el apoyo del algoritmo K-means los cuales nos permiten una fácil implementación de la solución con un grado alto de eficiencia, logrando una buena segmentación de clientes.<br>Trabajo de suficiencia profesional

APA, Harvard, Vancouver, ISO, and other styles

17

Zubková, Kateřina. "Text mining se zaměřením na shlukovací a fuzzy shlukovací metody." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2018. http://www.nusl.cz/ntk/nusl-382412.

Full text

Abstract:

This thesis is focused on cluster analysis in the field of text mining and its application to real data. The aim of the thesis is to find suitable categories (clusters) in the transcribed calls recorded in the contact center of Česká pojišťovna a.s. by transferring these textual documents into the vector space using basic text mining methods and the implemented clustering algorithms. From the formal point of view, the thesis contains a description of preprocessing and representation of textual data, a description of several common clustering methods, cluster validation, and the application itself.

APA, Harvard, Vancouver, ISO, and other styles

18

Cisterna, Malloco César Enrique. "Segmentación de clientes activos de una entidad financiera empleando el algoritmo de K-means y árbol de decisión." Bachelor's thesis, Universidad Nacional Mayor de San Marcos, 2021. https://hdl.handle.net/20.500.12672/17359.

Full text

Abstract:

Actualmente la Institución Financiera ha identificado a clientes según su interacción con los canales físicos y digitales, entre clientes activos (42%) y clientes inactivos (58%), por lo cual es fundamental poder realizar acciones comerciales diferenciadas sobre este universo de clientes. Se define como cliente activo a aquel cliente que realizó operaciones monetarias y no monetaria por canales digitales del banco dentro de los últimos seis meses o que realizan sus operaciones en canales físicos dentro de los últimos seis meses. Debido a ello las áreas de negocio encargadas de realizar las campañas, decidieron priorizar la acción comercial en los clientes activos, lo cuales son alrededor de un millón setecientos mil clientes de manera mensual. Sin embargo, se desea realizar diferentes acciones comerciales según el perfil de los clientes activos puesto no todos tienen el mismo perfil. Por lo cual, el presente trabajo consiste en la segmentación de clientes activos, el cual se desarrolló dentro del área de Business Analytics, área encargada del perfilamientos y segmentaciones de los clientes. Y mediante la segmentación, los responsables del negocio podrán realizar acciones comerciales que permitan gestionar los KPI’s establecidos, que son el cross, el uso de tarjetas de crédito o débito y el aumento del uso de los canales digitales. Esta segmentación permite conocer de manera acertada el perfil de los clientes activos, lo que permitirá ofrecer productos que calcen con las necesidades de los clientes activos, permitiendo incrementar sus KPI’s.

APA, Harvard, Vancouver, ISO, and other styles

19

Bergström, Sebastian. "Customer segmentation of retail chain customers using cluster analysis." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252559.

Full text

Abstract:

In this thesis, cluster analysis was applied to data comprising of customer spending habits at a retail chain in order to perform customer segmentation. The method used was a two-step cluster procedure in which the first step consisted of feature engineering, a square root transformation of the data in order to handle big spenders in the data set and finally principal component analysis in order to reduce the dimensionality of the data set. This was done to reduce the effects of high dimensionality. The second step consisted of applying clustering algorithms to the transformed data. The methods used were K-means clustering, Gaussian mixture models in the MCLUST family, t-distributed mixture models in the tEIGEN family and non-negative matrix factorization (NMF). For the NMF clustering a slightly different data pre-processing step was taken, specifically no PCA was performed. Clustering partitions were compared on the basis of the Silhouette index, Davies-Bouldin index and subject matter knowledge, which revealed that K-means clustering with K = 3 produces the most reasonable clusters. This algorithm was able to separate the customer into different segments depending on how many purchases they made overall and in these clusters some minor differences in spending habits are also evident. In other words there is some support for the claim that the customer segments have some variation in their spending habits.<br>I denna uppsats har klusteranalys tillämpats på data bestående av kunders konsumtionsvanor hos en detaljhandelskedja för att utföra kundsegmentering. Metoden som använts bestod av en två-stegs klusterprocedur där det första steget bestod av att skapa variabler, tillämpa en kvadratrotstransformation av datan för att hantera kunder som spenderar långt mer än genomsnittet och slutligen principalkomponentanalys för att reducera datans dimension. Detta gjordes för att mildra effekterna av att använda en högdimensionell datamängd. Det andra steget bestod av att tillämpa klusteralgoritmer på den transformerade datan. Metoderna som användes var K-means klustring, gaussiska blandningsmodeller i MCLUST-familjen, t-fördelade blandningsmodeller från tEIGEN-familjen och icke-negativ matrisfaktorisering (NMF). För klustring med NMF användes förbehandling av datan, mer specifikt genomfördes ingen PCA. Klusterpartitioner jämfördes baserat på silhuettvärden, Davies-Bouldin-indexet och ämneskunskap, som avslöjade att K-means klustring med K=3 producerar de rimligaste resultaten. Denna algoritm lyckades separera kunderna i olika segment beroende på hur många köp de gjort överlag och i dessa segment finns vissa skillnader i konsumtionsvanor. Med andra ord finns visst stöd för påståendet att kundsegmenten har en del variation i sina konsumtionsvanor.

APA, Harvard, Vancouver, ISO, and other styles

20

Márquez, Ángela Marqués. "A Machine Learning Approach for Studying Linked Residential Burglaries." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-4280.

Full text

Abstract:

Context. Multiple studies demonstrate that most of the residential burglaries are committed by a few offenders. Statistics collected by the Swedish National Council for Crime Prevention show that the number of residential burglary varies from year to year. But this value normally increases. Besides, around half of all reported burglaries occur in big cities and only some burglaries occur in sparsely-populated areas. Thus, law enforcement agencies need to study possible linked residential burglaries for their investigations. Linking crime-reports is a difficult task and currently there is not a systematic way to do it. Objectives. This study presents an analysis of the different features of the collected residential burglaries by the law enforcement in Sweden. The objective is to study the possibility of linking crimes depending on these features. The characteristics used are residential features, modus operandi, victim features, goods stolen, difference of days and distance between crimes. Methods. To reach the objectives, quasi experiment and repeated measures are used. To obtain the distance between crimes, routes using Google maps are used. Different cluster methods are investigated in order to obtain the best cluster solution for linking residential burglaries. In addition, the study compares different algorithms in order to identify which algorithm offers the best performance in linking crimes. Results. Clustering quality is measured using different methods, Rule of Thumb, the Elbow method and Silhouette. To evaluate these measurements, ANOVA, Tukey and Fisher’s test are used. Silhouette presents the greatest quality level compared to other methods. Other clustering algorithms present similar average Silhouette width, and therefore, similar quality clustering. Results also show that distance, days and residential features are the most important features to link crimes. Conclusions. The clustering suggestion denotes that it is possible to reduce the amount of burglaries cases. This reduction is done by finding linked residential burglaries. Having done the clustering, the results have to be investigated by law enforcement.

APA, Harvard, Vancouver, ISO, and other styles

21

Kondapalli, Swetha. "An Approach To Cluster And Benchmark Regional Emergency Medical Service Agencies." Wright State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=wright1596491788206805.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Протас, О. М. "Порівняльний аналіз якості методів кластеризації: задача кластеризації італійських вин". Master's thesis, Сумський державний університет, 2021. https://essuir.sumdu.edu.ua/handle/123456789/86511.

Full text

Abstract:

У роботі проведено порівняльний аналіз якості методів кластеризації на прикладі задачі про кластеризацію італійських вин за їх хімічним складом, за даними https://www.kaggle.com/harrywang/wine-dataset-for-clustering. Використовуючи стандартні методи визначено кількість кластерів в досліджуваному наборі даних, що дорівнює трьом. Для підвищення якості кластеризації було запропоновано провести попередню обробку даних, щоб середні значення усіх характеристик досліджуваних об’єктів дорівнювали нулю, а дисперсія – одиниці. Така попередня обробка даних дозволила підвищити точність (accuracy) розпізнавання кластерів з 71% до 97%. З’ясовано, що таке суттєве підвищення якості кластеризації пов’язано зі зміною масштабів ознак, що суттєво вплинуло на відстань між об’єктами. Запропоновано використовувати зміну масштабу ознак для підвищення якості кластеризації. Отримано, що найвища якість кластеризації на досліджуваних даних досягається за допомогою метода K means (accuracy дорівнює 96,6%).

APA, Harvard, Vancouver, ISO, and other styles

23

Angeles, Bocanegra Oscar Raúl, and Quispe Cesar Abel Melgarejo. "Algoritmo de clustering utilizando k-means e índice de validación Rose turi para la segmentación de clientes de la Caja Rural Prymera." Bachelor's thesis, Universidad Nacional Mayor de San Marcos, 2012. https://hdl.handle.net/20.500.12672/12131.

Full text

Abstract:

Las empresas en la actualidad necesitan explotar la información que tienen de sus clientes. En particular caja Prymera necesita identificar grupos de clientes para orientar sus recursos y esfuerzos a cada grupo de manera individual. Las técnicas de clustering son de gran utilidad para obtener grupos que compartan características similares internamente y a su vez que los grupos que sean heterogéneos entre sí, es por ello que se realiza un estudio para seleccionar la técnica más adecuada para el problema de la segmentación de clientes, siendo el algoritmo K-Means en complementación con el índice de Rose Turi la técnica a utilizar por su bajo costo computacional, facilidad de implementación y porque permite obtener la cantidad óptima de clusters. Adicionalmente, para validar la eficiencia de la técnica propuesta se implementa el índice de Davies-Bouldin para contrastarlas con la de Rose Turi. Los resultados obtenidos indican que la técnica propuesta obtuvo los de clusters con una eficacia superior en 25% a lo obtenido por el índice de Davies-Bouldin, a su vez en cuanto a eficiencia en tiempo de procesamiento la técnica propuesta es superior en 17%.<br>Trabajo de suficiencia profesional

APA, Harvard, Vancouver, ISO, and other styles

24

Pospíšil, David. "Shluková analýza signálu EKG." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2013. http://www.nusl.cz/ntk/nusl-219954.

Full text

Abstract:

This diploma thesis deals with the use of some methods of cluster analysis on the ECG signal in order to sort QRS complexes according to their morphology to normal and abnormal. It is used agglomerative hierarchical clustering and non-hierarchical method K – Means for which an application in Mathworks MATLAB programming equipment was developed. The first part deals with the theory of the ECG signal and cluster analysis, and then the second is the design, implementation and evaluation of the results of the usage of developed software on the ECG signal for the automatic division of QRS complexes into clusters.

APA, Harvard, Vancouver, ISO, and other styles

25

Hu, Yajie. "Exploring Equity and Resilience of Transportation Network through Modeling Travel Behavior: A Study of OKI Region." University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1554212469614412.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Yan, Mingjin. "Methods of Determining the Number of Clusters in a Data Set and a New Clustering Criterion." Diss., Virginia Tech, 2005. http://hdl.handle.net/10919/29957.

Full text

Abstract:

In cluster analysis, a fundamental problem is to determine the best estimate of the number of clusters, which has a deterministic effect on the clustering results. However, a limitation in current applications is that no convincingly acceptable solution to the best-number-of-clusters problem is available due to high complexity of real data sets. In this dissertation, we tackle this problem of estimating the number of clusters, which is particularly oriented at processing very complicated data which may contain multiple types of cluster structure. Two new methods of choosing the number of clusters are proposed which have been shown empirically to be highly effective given clear and distinct cluster structure in a data set. In addition, we propose a sequential type of clustering approach, called multi-layer clustering, by combining these two methods. Multi-layer clustering not only functions as an efficient method of estimating the number of clusters, but also, by superimposing a sequential idea, improves the flexibility and effectiveness of any arbitrary existing one-layer clustering method. Empirical studies have shown that multi-layer clustering has higher efficiency than one layer clustering approaches, especially in detecting clusters in complicated data sets. The multi-layer clustering approach has been successfully implemented in clustering the WTCHP microarray data and the results can be interpreted very well based on known biological knowledge. Choosing an appropriate clustering method is another critical step in clustering. K-means clustering is one of the most popular clustering techniques used in practice. However, the k-means method tends to generate clusters containing a nearly equal number of objects, which is referred to as the ``equal-size'' problem. We propose a clustering method which competes with the k-means method. Our newly defined method is aimed at overcoming the so-called ``equal-size'' problem associated with the k-means method, while maintaining its advantage of computational simplicity. Advantages of the proposed method over k-means clustering have been demonstrated empirically using simulated data with low dimensionality.<br>Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

27

Єфіменко, Тетяна Михайлівна, Татьяна Михайловна Ефименко, Tetiana Mykhailivna Yefimenko, Олена Владиславівна Коробченко, Елена Владиславовна Коробченко, and Olena Vladyslavivna Korobchenko. "Informational Extreme Cluster Analysis of Input Data." Thesis, Sumy State University, 2016. http://essuir.sumdu.edu.ua/handle/123456789/47076.

Full text

Abstract:

The categorical model and decision support system learning algorithm are considered in the article. Proposed algorithm allows to create decision support system, which is functioning in a clusteranalysis state. Synthesis of the decision support system is based on maximization of informational system ability due to making additional information restrictions in the learning process.

APA, Harvard, Vancouver, ISO, and other styles

28

Val, José Eduardo do. "Alternativas para seleção de touros da raça Nelore considerando características múltiplas de interesse econômico." Universidade de São Paulo, 2006. http://www.teses.usp.br/teses/disponiveis/17/17135/tde-17082007-165258/.

Full text

Abstract:

Este estudo foi desenvolvido a partir de informações das avaliações genéticas de touros pertencentes a rebanhos participantes do Programa de Melhoramento Genético da Raça Nelore (PMGRN-Nelore Brasil), que desenvolve, desde 1995, um teste de progênie denominado Reprodução Programada (RP), o qual tem como finalidade principal de disponibilizar animais com valores genéticos mais confiáveis no mercado de reprodutores. Assim, as Diferenças Esperadas nas Progênies (DEPs) de 234 touros participantes da RP no período de 1996 a 2003 foram analisadas com os seguintes objetivos: 1- Avaliar o mérito genético dos touros ao longo dos anos, utilizando regressão linear entre a DEP e ano de participação do touro na RP para as características, peso aos 120 e 210 dias, efeitos direto e materno (DDPP120, DDPP210, DMPP120 e DMPP210); peso e perímetro escrotal aos 365 e 450 dias, efeito direto (DDP365, DDP450, DDPE365 e DDPE450) e idade ao primeiro parto (DDIPP); 2- Identificar, por meio de abordagens multivariadas, grupos de animais cujas DEPs apresentem padrões de semelhança, assim como discriminar as variáveis que mais influenciam na divisão dos grupos, numa tentativa de auxiliar a tomada de decisão nos sistemas de produção de bovinos de corte, com vistas a maximizar a produtividade. Os procedimentos multivariados de análises de agrupamento e componentes principais foram aplicados às DEPs de sete características (DMPP120, DMPP210, DDPP365, DDPP450, DDPE365, DDPE450 e DDIPP). As análises foram processadas com o auxílio do software Statistica (STATSOFT, 2004). As tendências genéticas das DEPs relacionadas com as características de fertilidade, DDPE365, DDPE450 e DDIPP, mostraram progressos genéticos de 0,051 e 0,061 cm e -0,026 mês por ano respectivamente, enquanto que DDPP450 foi à característica que obteve maior ganho genético dentre as DEPs de crescimento, 1,467 kg/ano. Com referência às abordagens multivariadas, a análise de agrupamento k-médias foi aplicada e o resultado envolvendo três grupos foi o melhor obtido, dos quais dois se destacaram quanto aos valores médios das DEPs. A importância desses dois grupos de touros foi confirmada pela análise de componentes principais que associou a eles valores superiores de DEPs diretas de peso e perímetro escrotal. A quantidade de variabilidade original retida pelos dois primeiros componentes principais foi de 70,22%. Foram observados progressos genéticos nos touros da Reprodução Programada para todas as características durante o período estudado, indicando que a estratégia de seleção praticada vem sendo efetiva e evidenciando a importância da contribuição dos touros da RP para o melhoramento das características reprodutivas e de crescimento da raça Nelore. Neste estudo pode-se verificar o poder classificatório e discriminatório das análises de agrupamentos e componentes principais, o que muito pode contribuir na classificação de touros, facilitando a seleção de animais em Programas de Melhoramento Genético.<br>This research was developed with genetic information of sires that belong to herds of the ?Programa de Melhoramento Genético da Raça Nelore? (PMGRN-Nelore Brasil), witch has been carried on, since 1995, a progeny test denominated ?Reprodução Programada? (RP), whose the main aim is to obtain reliable genetic values for sires market. Therefore, the Expected Progeny Difference (EPD) of 243 sires taking part of the RP from 1996 to 2003 were used with the following objectives of: 1- Evaluating the genetic merit over the years applying linear regression between the EPD and the year of the sires RP participation, for the following traits: weight at 120 and 210 days of age, direct and maternal effects (DDPP120, DDPP210, DMPP120 and DMPP210) weight and scrotal circumference at 365 and 450 days of age, direct effect (DDPP365, DDPP450, DDPE365 and DDPE450) and age at first calving (DDIPP); 2- Identifying groups of animals, whose, EPDs show similarity patterns, as well as, verifying which were the variable that showed greater power in discriminating group formations, trying to help the decisions making support in the beef cattle production system by multivariate approaches, in order to maximizing the productivity. The multivariate procedures of clusters analysis and principal components were applied in the EPDs from seven traits (DMPP120, DMPP210, DDPP365, DDPP450, DDPE365, DDPE450 and DDIPP). The analyses were performed by software Statistica (STATSOFT, 2004). The genetic trends of the EPD related to the fertility traits, DDDPE365, DDPE450 e DDIPP, showed some genetic progress of 0.051 and 0.061 cm and ? 0.026 month per year respectively, while, the DDPP450 was the trait that obtained the highest genetic gain in the growth EPDs, 1.467 kg/year. About the multivariate approaches, the k-means clustering analysis was applied and the results of three groups formation were the best option, two of them stood out in relation to values of the EPDs means. The importance of these two groups was confirmed by the analyses of principal components that associate the direct EPDs of weight and scrotal circumference values to them. The quantity of original variability kept in the first main components was 70.22%. It was observed genetic progress in the RP sires for every trait during the studied period, indicating that the selection has been effective and evidencing how important the contribution of the RP sires for the reproductive and growth traits for the Nelore breed improvement is. In this research, the classificatory and discriminatory power of cluster analyses and principal components could be verify, and certainly could contribute in the sire classification, helping the selection in the Animal Breeding Program.

APA, Harvard, Vancouver, ISO, and other styles

29

Žambochová, Marta. "Shluková analýza rozsáhlých souborů dat: nové postupy založené na metodě k-průměrů." Doctoral thesis, Vysoká škola ekonomická v Praze, 2005. http://www.nusl.cz/ntk/nusl-77061.

Full text

Abstract:

Abstract Cluster analysis has become one of the main tools used in extracting knowledge from data, which is known as data mining. In this area of data analysis, data of large dimensions are often processed, both in the number of objects and in the number of variables, which characterize the objects. Many methods for data clustering have been developed. One of the most widely used is a k-means method, which is suitable for clustering data sets containing large number of objects. It is based on finding the best clustering in relation to the initial distribution of objects into clusters and subsequent step-by-step redistribution of objects belonging to the clusters by the optimization function. The aim of this Ph.D. thesis was a comparison of selected variants of existing k-means methods, detailed characterization of their positive and negative characte- ristics, new alternatives of this method and experimental comparisons with existing approaches. These objectives were met. I focused on modifications of the k-means method for clustering of large number of objects in my work, specifically on the algorithms BIRCH k-means, filtering, k-means++ and two-phases. I watched the time complexity of algorithms, the effect of initialization distribution and outliers, the validity of the resulting clusters. Two real data files and some generated data sets were used. The common and different features of method, which are under investigation, are summarized at the end of the work. The main aim and benefit of the work is to devise my modifications, solving the bottlenecks of the basic procedure and of the existing variants, their programming and verification. Some modifications brought accelerate the processing. The application of the main ideas of algorithm k-means++ brought to other variants of k-means method better results of clustering. The most significant of the proposed changes is a modification of the filtering algorithm, which brings an entirely new feature of the algorithm, which is the detection of outliers. The accompanying CD is enclosed. It includes the source code of programs written in MATLAB development environment. Programs were created specifically for the purpose of this work and are intended for experimental use. The CD also contains the data files used for various experiments.

APA, Harvard, Vancouver, ISO, and other styles

30

Chen, Na. "How Do Socio-Demographics and The Built Environment Affect Individual Accessibility Based on Activity Space as A Transport Exclusion Indicator?" The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1467329535.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Rychnovský, Martin. "Získávání znalostí na webu - shlukování." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235960.

Full text

Abstract:

This work presents the topic of data mining on the web. It is focused on clustering. The aim of this project was to study the field of clustering and to implement clustering through the k-means algorithm. Then, the algorithm was tested on a dataset of text documents and on data extracted from web. This clustering method was implemented by means of Java technologies.

APA, Harvard, Vancouver, ISO, and other styles

32

Assirati, Lucas. "Análise da influência da vizinhança no comportamento individual relativo a viagens através de dados em painel." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/18/18144/tde-26112018-171143/.

Full text

Abstract:

O comportamento individual relativo a viagens sofre a influência de fatores individuais e do meio urbano. Assim, a vizinhança seria uma das variáveis a serem consideradas na análise comportamental relacionada aos deslocamentos. O objetivo principal deste trabalho é analisar a influência da vizinhança no comportamento individual relativo a viagens, através de dados em painel. Dados em painel constituem importante ferramenta em análises comportamentais subjacentes a viagens urbanas, uma vez que propiciam maior quantidade de informações quando comparados aos dados seccionais. Padrões de viagens são mais bem evidenciados, através de dados em painel, caracterizando as habituais rotinas de atividades e viagens, além de melhor identificar comportamentos atípicos. Todavia, a obtenção desses dados comumente não é atividade trivial, demandando recursos monetários e de tempo. Um dos objetivos secundários deste trabalho é apresentar uma maneira prática e pouco onerosa de obtenção de dados em painel através de Smartphones. Tais dados, posteriormente, são aplicados à classificação de indivíduos segundo comportamento relacionado às viagens. A potencialidade da proposta sugerida é validada por meio de um estudo de caso relativo aos estudantes universitários do município de São Carlos - SP, Brasil. Através dos dados em painel, fornecidos pelos estudantes, utilizou-se o algoritmo k-médias considerando quatro variáveis relativas aos deslocamentos. As três categorias obtidas apresentam estrutura espacial e, portanto, possibilitam análises espaciais exploratórias e confirmatórias, almejando a compreensão de influências da vizinhança nas dinâmicas cotidianas. Este trabalho atesta a existência de autocorrelação espacial do conjunto de dados por meio de dois indicadores: Moran e SivarG (Global Spatial Indicator Based on Variogram). A corroboração da dependência espacial, apontada pelos indicadores globais, é confirmada por meio de dois modelos de escolha discreta. Um contendo apenas variáveis originais da base de dados. Outro, análogo ao primeiro, porém adicionado de covariáveis regionais, obtidas por preceitos da geoestatística. A incorporação das covariáveis regionais aumenta a precisão do modelo e promove um incremento das taxas de acertos em validações cruzadas.<br>Individual travel behaviour is influenced by individual factors and the urban environment. Thus, the neighborhood influence would be one of the variables to be considered in travel behavior analysis related to urban displacements. The main objective of this work is to analyze the influence of neighborhood on travel behavior by panel data. Panel data is an important tool in urban travel behavioral analyzes, since they provide a greater amount of information when compared to sectional data. Travel patterns are more evident through panel data, characterizing the usual routines of activities, as well the atypical behaviors. However, obtaining these data is not a simple task, requiring monetary and time resources. Secondary goals of this work aim to present a practical and inexpensive way to obtain panel data through Smartphones. These data are then applied to the classification of individuals according to travel behavior. The potential of the proposal is validated by a case of study concerning undergraduate and PhD students from São Carlos - SP, Brazil. Using the data provided by the students, a k-means algorithm was used considering four variables regarding displacements. These three categories have spatial structure and allow exploratory and confirmatory spatial data analyzes aiming the comprehension of the nearby influence of data at daily dynamics. This work attests to the existence of spatial autocorrelation of the data set by two indicators: Moran and SivarG (Global Spatial Indicator Based on Variogram). Corroboration of spatial dependence, pointed by the global indicators, is confirmed by two discrete choice models. The first one includes just the original database variables. The second one, analogous to the first, but added of regional covariates obtained by geostatistical concepts. The addition of regional variables leads to a more accurate model, increasing cross-validations hit rates.

APA, Harvard, Vancouver, ISO, and other styles

33

Costa, Kleber Carlos de Oliveira. "An?lise de DFA e de agrupamento do perfil de densidade de po?os de petr?leo." Universidade Federal do Rio Grande do Norte, 2009. http://repositorio.ufrn.br:8080/jspui/handle/123456789/12905.

Full text

Abstract:

Made available in DSpace on 2014-12-17T14:08:35Z (GMT). No. of bitstreams: 1 KleberCOCpdf.pdf: 2178209 bytes, checksum: 588b533d30c060af9cf941e7001d3372 (MD5) Previous issue date: 2009-04-22<br>In recent years, the DFA introduced by Peng, was established as an important tool capable of detecting long-range autocorrelation in time series with non-stationary. This technique has been successfully applied to various areas such as: Econophysics, Biophysics, Medicine, Physics and Climatology. In this study, we used the DFA technique to obtain the Hurst exponent (H) of the profile of electric density profile (RHOB) of 53 wells resulting from the Field School of Namorados. In this work we want to know if we can or not use H to spatially characterize the spatial data field. Two cases arise: In the first a set of H reflects the local geology, with wells that are geographically closer showing similar H, and then one can use H in geostatistical procedures. In the second case each well has its proper H and the information of the well are uncorrelated, the profiles show only random fluctuations in H that do not show any spatial structure. Cluster analysis is a method widely used in carrying out statistical analysis. In this work we use the non-hierarchy method of k-means. In order to verify whether a set of data generated by the k-means method shows spatial patterns, we create the parameter ? (index of neighborhood). High ? shows more aggregated data, low ? indicates dispersed or data without spatial correlation. With help of this index and the method of Monte Carlo. Using ? index we verify that random cluster data shows a distribution of ? that is lower than actual cluster ?. Thus we conclude that the data of H obtained in 53 wells are grouped and can be used to characterize space patterns. The analysis of curves level confirmed the results of the k-means<br>Nos ?ltimos anos, o DFA introduzido por Peng, foi estabelecido como uma importante ferramenta capaz de detectar autocorrela??o de longo alcance em s?ries temporais com n?o-estacionaridade. Esta t?cnica vem sendo aplicado com sucesso a diversas ?reas tais como: Econofis?ca, Biof?sica, Medicina, F?sica e Climatologia. No presente trabalho, utilizamos a t?cnica do DFA para obter o expoente de Hurst (H) do perfil el?trico de densidade (RHOB) de 53 po?os provindos do Campo Escola de Namorado. Neste trabalho queremos saber se podemos, ou n?o, utilizar este expoente para caracterizar espacialmente o campo. Duas hip?teses surgem: Na primeira o conjunto dos H reflete a geologia local, po?os com mesmo H se encontram pertos, e ent?o se pode pensar em utilizar H em procedimentos geoestat?sticos espaciais. Na segunda hip?tese cada po?o tem seu H, a informa??o dos H de cada po?o est? descorrelacionada e o conjunto dos perfis mostra apenas flutua??es aleat?rias em H que n?o revelam qualquer estrutura espacial. A an?lise de agrupamentos ? um m?todo bastante utilizado na realiza??o de an?lises estat?sticas. Nesta disserta??o utilizamos o m?todo de agrupamento n?o hier?rquico chamado m?todo do k-m?dia. Com o objetivo de verificar se um conjunto de dados gerados pelo m?todo do k-m?dia, ou de forma aleat?ria, forma padr?es espaciais, criamos o par?metro ? (?ndice de vizinhan?a). Altos ? implicam em dados mais agregados, baixos ? em dados dispersos ou sem correla??o espacial. Com aux?lio deste ?ndice e do m?todo de Monte Carlo verificamos que os dados agrupados aleatoriamente apresentam uma distribui??o mais baixa de ? do que os obtidos dos dados concretos e agrupados pelo k-m?dia. Desta forma conclu?mos que os dados de H obtidos nos 53 po?os est?o agrupados e podem ser usados na caracteriza??o espacial de campos. A an?lise de curvas de n?vel confirmou o resultado do k-m?dia

APA, Harvard, Vancouver, ISO, and other styles

34

Girish, Deeptha S. "Thresholded K-means Algorithm for Image Segmentation." University of Cincinnati / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1479815784173769.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Nordqvist, My. "Classify part of day and snow on the load of timber stacks : A comparative study between partitional clustering and competitive learning." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-42238.

Full text

Abstract:

In today's society, companies are trying to find ways to utilize all the data they have, which considers valuable information and insights to make better decisions. This includes data used to keeping track of timber that flows between forest and industry. The growth of Artificial Intelligence (AI) and Machine Learning (ML) has enabled the development of ML modes to automate the measurements of timber on timber trucks, based on images. However, to improve the results there is a need to be able to get information from unlabeled images in order to decide weather and lighting conditions. The objective of this study is to perform an extensive for classifying unlabeled images in the categories, daylight, darkness, and snow on the load. A comparative study between partitional clustering and competitive learning is conducted to investigate which method gives the best results in terms of different clustering performance metrics. It also examines how dimensionality reduction affects the outcome. The algorithms K-means and Kohonen Self-Organizing Map (SOM) are selected for the clustering. Each model is investigated according to the number of clusters, size of dataset, clustering time, clustering performance, and manual samples from each cluster. The results indicate a noticeable clustering performance discrepancy between the algorithms concerning the number of clusters, dataset size, and manual samples. The use of dimensionality reduction led to shorter clustering time but slightly worse clustering performance. The evaluation results further show that the clustering time of Kohonen SOM is significantly higher than that of K-means.

APA, Harvard, Vancouver, ISO, and other styles

36

Kurin, Erik, and Adam Melin. "Data-driven test automation : augmenting GUI testing in a web application." Thesis, Linköpings universitet, Programvara och system, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-96380.

Full text

Abstract:

For many companies today, it is highly valuable to collect and analyse data in order to support decision making and functions of various sorts. However, this kind of data-driven approach is seldomly applied to software testing and there is often a lack of verification that the testing performed is relevant to how the system under test is used. Therefore, the aim of this thesis is to investigate the possibility of introducing a data-driven approach to test automation by extracting user behaviour data and curating it to form input for testing. A prestudy was initially conducted in order to collect and assess different data sources for augmenting the testing. After suitable data sources were identified, the required data, including data about user activity in the system, was extracted. This data was then processed and three prototypes where built on top of this data. The first prototype augments the model-based testing by automatically creating models of the most common user behaviour by utilising data mining algorithms. The second prototype tests the most frequent occurring client actions. The last prototype visualises which features of the system are not covered by automated regression testing. The data extracted and analysed in this thesis facilitates the understanding of the behaviour of the users in the system under test. The three prototypes implemented with this data as their foundation can be used to assist other testing methods by visualising test coverage and executing regression tests.

APA, Harvard, Vancouver, ISO, and other styles

37

Foster, Robert L. Jr. "Motion tracking using feature point clusters." Thesis, Manhattan, Kan. : Kansas State University, 2008. http://hdl.handle.net/2097/1118.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Mao, Qian. "Clusters Identification: Asymmetrical Case." Thesis, Uppsala universitet, Informationssystem, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-208328.

Full text

Abstract:

Cluster analysis is one of the typical tasks in Data Mining, and it groups data objects based only on information found in the data that describes the objects and their relationships. The purpose of this thesis is to verify a modified K-means algorithm in asymmetrical cases, which can be regarded as an extension to the research of Vladislav Valkovsky and Mikael Karlsson in Department of Informatics and Media. In this thesis an experiment is designed and implemented to identify clusters with the modified algorithm in asymmetrical cases. In the experiment the developed Java application is based on knowledge established from previous research. The development procedures are also described and input parameters are mentioned along with the analysis. This experiment consists of several test suites, each of which simulates the situation existing in real world, and test results are displayed graphically. The findings mainly emphasize the limitations of the algorithm, and future work for digging more essences of the algorithm is also suggested.

APA, Harvard, Vancouver, ISO, and other styles

39

Narreddy, Naga Sambu Reddy, and Tuğrul Durgun. "Clusters (k) Identification without Triangle Inequality : A newly modelled theory." Thesis, Uppsala universitet, Institutionen för informatik och media, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-183608.

Full text

Abstract:

Cluster analysis characterizes data that are similar enough and useful into meaningful groups (clusters).For example, cluster analysis can be applicable to find group of genes and proteins that are similar, to retrieve information from World Wide Web, and to identify locations that are prone to earthquakes. So the study of clustering has become very important in several fields, which includes psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning and data mining [1] [2]. Cluster analysis is the one of the widely used technique in the area of data mining. According to complexity and amount of data in a system, we can use variety of cluster analysis algorithms. K-means clustering is one of the most popular and widely used among the ten algorithms in data mining [3]. Like other clustering algorithms, it is not the silver bullet. K-means clustering requires pre analysis and knowledge before the number of clusters and their centroids are determined. Recent studies show a new approach for K-means clustering which does not require any pre knowledge for determining the number of clusters [4]. In this thesis, we propose a new clustering procedure to solve the central problem of identifying the number of clusters (k) by imitating the desired number of clusters with proper properties. The proposed algorithm is validated by investigating different characteristics of the analyzed data with modified theory, analyze parameters efficiency and their relationships. The parameters in this theory include the selection of embryo-size (m), significance level (α), distributions (d), and training set (n), in the identification of clusters (k).

APA, Harvard, Vancouver, ISO, and other styles

40

Jurásek, Petr. "Shlukování proteinových sekvencí na základě podobnosti primární struktury." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236761.

Full text

Abstract:

This master's thesis consider clustering of protein sequences based on primary structure of proteins. Studies the protein sequences from they primary structure. Describes methods for similarities in the amino acid sequences of proteins, cluster analysis and clustering algorithms. This thesis presents concept of distance function based on similarity of protein sequences and implements clustering algorithms ANGES, k-means, k-medoids in Python programming language.

APA, Harvard, Vancouver, ISO, and other styles

41

Reizer, Gabriella v. "Stability Selection of the Number of Clusters." Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/math_theses/98.

Full text

Abstract:

Selecting the number of clusters is one of the greatest challenges in clustering analysis. In this thesis, we propose a variety of stability selection criteria based on cross validation for determining the number of clusters. Clustering stability measures the agreement of clusterings obtained by applying the same clustering algorithm on multiple independent and identically distributed samples. We propose to measure the clustering stability by the correlation between two clustering functions. These criteria are motivated by the concept of clustering instability proposed by Wang (2010), which is based on a form of clustering distance. In addition, the effectiveness and robustness of the proposed methods are numerically demonstrated on a variety of simulated and real world samples.

APA, Harvard, Vancouver, ISO, and other styles

42

Málik, Peter. "Získávání znalostí z multimediálních databází." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-235525.

Full text

Abstract:

This master"s thesis deals with the knowledge discovery in multimedia databases. It contains general principles of knowledge discovery in databases, especially methods of cluster analysis used for data mining in large and multidimensional databases are described here. The next chapter contains introduction to multimedia databases, focusing on the extraction of low level features from images and video data. The practical part is then an implementation of the methods BIRCH, DBSCAN and k-means for cluster analysis. Final part is dedicated to experiments above TRECVid 2008 dataset and description of achievements.

APA, Harvard, Vancouver, ISO, and other styles

43

Inano, Rika. "Voxel-based clustered imaging by multiparameter diffusion tensor images for glioma grading." Kyoto University, 2016. http://hdl.handle.net/2433/215442.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Hudson, Cody Landon. "Protein structure analysis and prediction utilizing the Fuzzy Greedy K-means Decision Forest model and Hierarchically-Clustered Hidden Markov Models method." Thesis, University of Central Arkansas, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=1549796.

Full text

Abstract:

<p>Structural genomics is a field of study that strives to derive and analyze the structural characteristics of proteins through means of experimentation and prediction using software and other automatic processes. Alongside implications for more effective drug design, the main motivation for structural genomics concerns the elucidation of each protein’s function, given that the structure of a protein almost completely governs its function. Historically, the approach to derive the structure of a protein has been through exceedingly expensive, complex, and time consuming methods such as x-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. </p><p> In response to the inadequacies of these methods, three families of approaches developed in a relatively new branch of computer science known as bioinformatics. The aforementioned families include threading, homology-modeling, and the de novo approach. However, even these methods fail either due to impracticalities, the inability to produce novel folds, rampant complexity, inherent limitations, etc. In their stead, this work proposes the Fuzzy Greedy K-means Decision Forest model, which utilizes sequence motifs that transcend protein family boundaries to predict local tertiary structure, such that the method is cheap, effective, and can produce semi-novel folds due to its local (rather than global) prediction mechanism. This work further extends the FGK-DF model with a new algorithm, the Hierarchically Clustered-Hidden Markov Models (HC-HMM) method to extract protein primary sequence motifs in a more accurate and adequate manner than currently exhibited by the FGK-DF model, allowing for more accurate and powerful local tertiary structure predictions. Both algorithms are critically examined, their methodology thoroughly explained and tested against a consistent data set, the results thereof discussed at length. </p>

APA, Harvard, Vancouver, ISO, and other styles

45

AMORIM, Alcides Leite de. "Utilização de técnicas de classificação automática para definir bacias hidrográficas homogêneas em termos da pluviometria e fluviometria." Universidade Federal de Campina Grande, 1990. http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/2170.

Full text

Abstract:

Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-11-12T19:08:26Z No. of bitstreams: 1 ALCIDES LEITE DE AMORIM - DISSERTAÇÃO PPGECA 1990..pdf: 35306341 bytes, checksum: 58ca99353fdbe330c0375e0d09e45b9a (MD5)<br>Made available in DSpace on 2018-11-12T19:08:26Z (GMT). No. of bitstreams: 1 ALCIDES LEITE DE AMORIM - DISSERTAÇÃO PPGECA 1990..pdf: 35306341 bytes, checksum: 58ca99353fdbe330c0375e0d09e45b9a (MD5) Previous issue date: 1990-12<br>O presente trabalho constitui um estudo da região do nordeste brasileiro, objetivando a definição de regiões representadas pelos postos ou estações com características semelhantes em função de conjuntos de variáveis pluviométricas e fluviométricas. Utilizou-se técnicas de classificação automática aplicadas ao conjunto de variáveis que foram obtidas da combinação do período de referência (ano, semestre e trimestre) com os parâmetros (média aritmética, desvio padrão, coeficiente de variação e coeficiente de assimetria) e o valor máximo. Os dados pluviométricos são compostos por quatrocentos postos no intervalo de tempo entre 1337 e 1973, com trinta anos de registros e uma folga de cinco anos, enquanto os fluviométricos de noventa e sete estações com pelo menos oito anos de registros e que tenham seu inicio nas décadas de sessenta ou setenta ou seu término nas décadas de setenta ou oitenta. Foram aplicados os Métodos "Quick Cluster" e "K-Means" (técnicas de . classificação não hierárquicas) nos conjuntos de variáveis pluviométricas e os Métodos de, "Ward", Ligação Simples, Ligação Completa e Centróide (técnicas hierárquicas) nos conjuntos de variáveis fluviométricas. Foi também discutido a aplicabilidade de cada método. Os resultados decorrentes deste trabalho, ilustrados nos mapas, são úteis para o preenchimento de falhas, geração de dados, determinação da curva regional de probabilidade, determinação de um modelo determinístico tipo Chuva-Uazão, etc.<br>The present thesis constitutes a study of the north-east region of Brasil, with the objective of defining the groupings of raingauge stations and flow measuring stations, that have similar characteristics. Techniques of automatic cIassifiction as applied to a set of variables were utilised herein. These variables were obtained for a combination of reference periods (being a year, semester or trimester) among the parameters of the station data, such as arithmatic mean, standard deviation, coefficient of variation skewness coefficient and the maximum value. The rainfall data for 400 raingauge stations between the years 1937 to 1973 were utilized in the study. Thirty (30) years of data, with a superposition of atleast 5 years between the stations, were utilized for raingauge stations. The data for flow measuring stations, numbering 97, consisted of reliable data over an eight-year period. The Methods of "Quick Cluster" and "K-Means" (which belong to the techniques of non-hierarquic classification) were applied to the set of precipitation variables and the Methods of," U/ard", Simple Linking, Complete Linking and Centroid (which pertain to hierarquical techniques) were applied to the set of flow variables. The applicability of each of these methods is discussed here-in.

APA, Harvard, Vancouver, ISO, and other styles

46

Klus, Roman. "Analýza velkých dat v kontextu optimalizace mobilních sítí." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-400542.

Full text

Abstract:

Tato práce se zabývá technologiemi velkých dat v kontextu měření parametrů sítě. Popisuje téma velkých dat a jejich využití, představuje základní parametry sítě, jejich měření a metody zhodnocení. Vyhodnocuje RTR NetTest aplikaci, testovací proceduru a měřené parametry. Byla vytvořena skupina nástrojů pro posouzení základních kvantitativních parametrů mobilní sítě na základě dat z databáze RTR. Rozbor denního efektu shrnuje časovou proměnlivost sítě. Chování v prostoru je posouzeno binováním a shlukovou analýzou, současně se srovnáním řízeného testování a crowdsourcingu.

APA, Harvard, Vancouver, ISO, and other styles

47

Janse, Van Vuuren Michaella. "Human Pose and Action Recognition using Negative Space Analysis." Diss., University of Cape Town, 2004. http://hdl.handle.net/10919/71571.

Full text

Abstract:

This thesis proposes a novel approach to extracting pose information from image sequences. Current state of the art techniques focus exclusively on the image space occupied by the body for pose and action recognition. The method proposed here, however, focuses on the negative spaces: the areas surrounding the individual. This has resulted in the colour-coded negative space approach, an image preprocessing step that circumvents the need for complicated model fitting or template matching methods. The approach can be described as follows: negative spaces surrounding the human silhouette are extracted using horizontal and vertical scanning processes. These negative space areas are more numerous, and undergo more radical changes in shape than the single area occupied by the figure of the person performing an action. The colour-coded negative space representation is formed using the four binary images produced by the scanning processes. Features are then extracted from the colour-coded images. These are based on the percentage of area occupied by distinct coloured regions as well as the bounding box proportions. Pose clusters are identified using feedback from an independent action set. Subsequent images are classified using a simple Euclidean distance measure. An image sequence is thus temporally segmented into its corresponding pose representations. Action recognition simply becomes the detection of a temporally ordered sequence of poses that characterises the action. The method is purely vision-based, utilising monocular images with no need for body markers or special clothing. Two datasets were constructed using several actors performing different poses and actions. Some of these actions included actors waving their arms, sitting down or kicking a leg. These actions were recorded against a monochrome background to simplify the segmentation of the actors from the background. The actions were then recorded on DV cam and digitised into a data base. The silhouette images from these actions were isolated and placed in a frame or bounding box. The next step was to highlight the negative spaces using a directional scanning method. This scanning method colour-codes the negative spaces of each action. What became immediately apparent is that very distinctive colour patterns formed for different actions. To emphasise the action, different colours were allocated to negative spaces surrounding the image. For example, the space between the legs of an actor standing in a T - pose with legs apart would be allocated yellow, while the space below the arms were allocated different shades of green. The space surrounding the head would be different shades of purple. During an action when the actor moves one leg up in a kicking fashion, the yellow colour would increase. Inversely, when the actor closes his legs and puts them together, the yellow colour filling the negative space would decrease substantially. What also became apparent is that these coloured negative spaces are interdependent and that they influence each other during the course of an action. For example, when an actor lifts one of his legs, increasing the yellow-coded negative space, the green space between that leg and the arm decreases. This interrelationship between colours hold true for all poses and actions as presented in this thesis. In terms of pose recognition, it is significant that these colour coded negative spaces and the way the change during an action or a movement are substantial and instantly recognisable. Compare for example, looking at someone lifting an arm as opposed to seeing a vast negative space changing shape. In a controlled research environment, several actors were instructed to perform a number of different actions. After colour coding the negative spaces, it became apparent that every action can be recognised by a unique colour coded pattern. The challenge is to ascribe a numerical presentation, a mathematical quotation, to extract the essence of what is so visually apparent. The essence of pose recognition and it's measurability lies in the relationship between the colours in these negative spaces and how they impact on each other during a pose or an action. The simplest way of measuring this relationship is by calculating the percentage of each colour present during an action. These calculated percentages become the basis of pose and action recognition. By plotting these percentages on a graph confirms that the essence of these different actions and poses can in fact been captured and recognised. Despite variations in these traces caused by time differences, personal appearance and mannerisms, what emerged is a clear recognisable pattern that can be married to an action or different parts of an action. 7 Actors might lift their left leg, some slightly higher than others, some slower than others and these variations in terms of colour percentages would be recorded as a trace, but there would be very specific stages during the action where the traces would correspond, making the action recognisable.In conclusion, using negative space as a tool in human pose and tracking recognition presents an exiting research avenue because it is influenced less by variations such as difference in personal appearance and changes in the angle of observation. This approach is also simplistic and does not rely on complicated models and templates

APA, Harvard, Vancouver, ISO, and other styles

48

"ASA and homogeneity hypothesis using K-means cluster analysis." ROOSEVELT UNIVERSITY, 2008. http://pqdtopen.proquest.com/#viewpdf?dispub=1450566.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Lin, You-Shin, and 林佑信. "Divisive K-Means Clustering Algorithm for Determining k and Positions of Cluster Centers." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/90292399455881263618.

Full text

Abstract:

碩士<br>國立交通大學<br>電信工程系所<br>97<br>Clustering is a well-known research topic, which applied widely in many fields. Among of the clustering algorithms, k-means algorithm is one of the most popular, simple, and fast clustering algorithm. However, there are two major problems in the application of the k-means algorithm. First, the right value of k is usually unknown in a real data set. Second, it is difficult to select effectively initial cluster centers, and the clustering result is sensitive to the initial cluster centers. In order to solve the two problems, we propose a new algorithm which extends the standard k-means algorithm by introducing a conflict term to the objective function to make the clustering process not sensitive to the initial cluster centers. Combined with the cluster validation technique, we can determine the optimal k and the positions of cluster centers. Simulation results on synthetic data sets show the effectiveness of the proposed algorithm in determining the number and positions of the cluster centers.

APA, Harvard, Vancouver, ISO, and other styles

50

賴仁傑. "Generalized Fuzzy k-Means Clustering Using m Nearest Cluster Centers." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/19302324043911615857.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!