Dissertations / Theses: 'Statistical cluster analysis'

1

Sullivan, Terry. "The Cluster Hypothesis: A Visual/Statistical Analysis." Thesis, University of North Texas, 2000. https://digital.library.unt.edu/ark:/67531/metadc2444/.

Full text

Abstract:

By allowing judgments based on a small number of exemplar documents to be applied to a larger number of unexamined documents, clustered presentation of search results represents an intuitively attractive possibility for reducing the cognitive resource demands on human users of information retrieval systems. However, clustered presentation of search results is sensible only to the extent that naturally occurring similarity relationships among documents correspond to topically coherent clusters. The Cluster Hypothesis posits just such a systematic relationship between document similarity and topical relevance. To date, experimental validation of the Cluster Hypothesis has proved problematic, with collection-specific results both supporting and failing to support this fundamental theoretical postulate. The present study consists of two computational information visualization experiments, representing a two-tiered test of the Cluster Hypothesis under adverse conditions. Both experiments rely on multidimensionally scaled representations of interdocument similarity matrices. Experiment 1 is a term-reduction condition, in which descriptive titles are extracted from Associated Press news stories drawn from the TREC information retrieval test collection. The clustering behavior of these titles is compared to the behavior of the corresponding full text via statistical analysis of the visual characteristics of a two-dimensional similarity map. Experiment 2 is a dimensionality reduction condition, in which inter-item similarity coefficients for full text documents are scaled into a single dimension and then rendered as a two-dimensional visualization; the clustering behavior of relevant documents within these unidimensionally scaled representations is examined via visual and statistical methods. Taken as a whole, results of both experiments lend strong though not unqualified support to the Cluster Hypothesis. In Experiment 1, semantically meaningful 6.6-word document surrogates systematically conform to the predictions of the Cluster Hypothesis. In Experiment 2, the majority of the unidimensionally scaled datasets exhibit a marked nonuniformity of distribution of relevant documents, further supporting the Cluster Hypothesis. Results of the two experiments are profoundly question-specific. Post hoc analyses suggest that it may be possible to predict the success of clustered searching based on the lexical characteristics of users' natural-language expression of their information need.

APA, Harvard, Vancouver, ISO, and other styles

2

Santiago, Calderón José Bayoán. "On Cluster Robust Models." Scholarship @ Claremont, 2019. https://scholarship.claremont.edu/cgu_etd/132.

Full text

Abstract:

Cluster robust models are a kind of statistical models that attempt to estimate parameters considering potential heterogeneity in treatment effects. Absent heterogeneity in treatment effects, the partial and average treatment effect are the same. When heterogeneity in treatment effects occurs, the average treatment effect is a function of the various partial treatment effects and the composition of the population of interest. The first chapter explores the performance of common estimators as a function of the presence of heterogeneity in treatment effects and other characteristics that may influence their performance for estimating average treatment effects. The second chapter examines various approaches to evaluating and improving cluster structures as a way to obtain cluster-robust models. Both chapters are intended to be useful to practitioners as a how-to guide to examine and think about their applications and relevant factors. Empirical examples are provided to illustrate theoretical results, showcase potential tools, and communicate a suggested thought process. The third chapter relates to an open-source statistical software package for the Julia language. The content includes a description for the software functionality and technical elements. In addition, it features a critique and suggestions for statistical software development and the Julia ecosystem. These comments come from my experience throughout the development process of the package and related activities as an open-source and professional software developer. One goal of the paper is to make econometrics more accessible not only through accessibility to functionality, but understanding of the code, mathematics, and transparency in implementations.

APA, Harvard, Vancouver, ISO, and other styles

3

Hager, Creighton Tsuan-Ren. "Statistical Analysis of ATM Call Detail Records." Thesis, Virginia Tech, 1999. http://hdl.handle.net/10919/30937.

Full text

Abstract:

Network management is a problem that faces designers and operators of any type of network. Conventional methods of capacity planning or configuration management are difficult to apply directly to networks that dynamically allocate resources, such as Asynchronous Transfer Mode (ATM) networks and emerging Internet Protocol (IP) networks employing Differentiated Services (DiffServ). This work shows a method to generically classify traffic in an ATM network such that capacity planning may be possible. These methods are generally applicable to other networks that support dynamically allocated resources. In this research, Call Detail Records (CDRs) captured from a Â¡Â§liveÂ¡Â¨ ATM network were successfully classified into three traffic categories. The traffic categories correspond to three different video speeds (1152 kbps, 768 kbps, and 384 kbps) in the network. Further statistical analysis was used to characterize these traffic categories and found them to fit deterministic distributions. The statistical analysis methods were also applied to several different network planning and management functions. Three specific potential applications related to network management were examined: capacity planning, traffic modeling, and configuration management.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

4

Vohra, Neeru Rani. "Three dimensional statistical graphs, visual cues and clustering." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp03/MQ56213.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Chung, Hyoju. "GEE with large cluster sizes : high-dimensional working correlation models /." Thesis, Connect to this title online; UW restricted, 2006. http://hdl.handle.net/1773/9545.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Farahi, Arya, August E. Evrard, Eduardo Rozo, Eli S. Rykoff, and Risa H. Wechsler. "Galaxy cluster mass estimation from stacked spectroscopic analysis." OXFORD UNIV PRESS, 2016. http://hdl.handle.net/10150/621426.

Full text

Abstract:

We use simulated galaxy surveys to study: (i) how galaxy membership in redMaPPer clusters maps to the underlying halo population, and (ii) the accuracy of a mean dynamical cluster mass, M-sigma(lambda), derived from stacked pairwise spectroscopy of clusters with richness lambda. Using similar to 130 000 galaxy pairs patterned after the Sloan Digital Sky Survey (SDSS) redMaPPer cluster sample study of Rozo et al., we show that the pairwise velocity probability density function of central-satellite pairs with m(i) < 19 in the simulation matches the form seen in Rozo et al. Through joint membership matching, we deconstruct the main Gaussian velocity component into its halo contributions, finding that the top-ranked halo contributes similar to 60 per cent of the stacked signal. The halo mass scale inferred by applying the virial scaling of Evrard et al. to the velocity normalization matches, to within a few per cent, the log-mean halo mass derived through galaxy membership matching. We apply this approach, along with miscentring and galaxy velocity bias corrections, to estimate the log-mean matched halo mass at z = 0.2 of SDSS redMaPPer clusters. Employing the velocity bias constraints of Guo et al., we find aEuroln (M-200c)|lambda aEuro parts per thousand = ln (< M-30) + alpha(m) ln (lambda/30) with M-30 = 1.56 +/- 0.35 x 10(14) M-aS (TM) and alpha(m) = 1.31 +/- 0.06(stat) +/- 0.13(sys). Systematic uncertainty in the velocity bias of satellite galaxies overwhelmingly dominates the error budget.

APA, Harvard, Vancouver, ISO, and other styles

7

Gomes, Manuel. "Statistical methods for cost-effectiveness analysis that use cluster-randomised trials." Thesis, London School of Hygiene and Tropical Medicine (University of London), 2012. http://researchonline.lshtm.ac.uk/4646546/.

Full text

Abstract:

This thesis considers alternative statistical methods for cost-effectiveness analysis (CEA) that use cluster randomised trials (CRTs). The thesis has four objectives: firstly to develop criteria for identifying appropriate methods for CEA that use CRTs; secondly to critically appraise the methods used in applied CEAs that use CRTs; thirdly to assess the performance of alternative methods for CEA that use CRTs in settings where baseline covariates are balanced; fourthly to compare statistical methods that adjust for systematic covariate imbalance in CEA that use CRTs. The thesis developed a checklist to assess the methodological quality of published CEAs that use CRTs. This checklist was informed by a conceptual review of statistical methods, and applied in a systematic literature review of published CEAs that use CRTs. The review found that most studies adopted statistical methods that ignored clustering or correlation between costs and health outcomes. A simulation study was conducted to assess the performance of alternative methods for CEA that use CRTs across different circumstances where baseline covariates are balanced. This study considered: seemingly unrelated regression (SUR) and generalised estimating equations (GEEs), both with a robust standard error; multilevel models (MLMs) and a non-parametric 'two-stage' bootstrap (TS8). Performance was reported as, for example, bias and confidence interval (Cl) coverage of the incremental net benefit. The MLMs and the TSB performed well across all settings; SUR and GEEs reported poor Cl coverage in CRTs with few clusters. The thesis compared methods for CEA that use CRTs when there are systematic differences in baseline covariates between the treatment groups. In a case study and further simulations, the thesis considered SUR, MLMs, and TSB combined with SUR to adjust for covariate imbalance. The case-study showed that cost-effectiveness results can differ according to adjustment method. The simulations reported that MLMs performed well across all settings, and unlike the other methods, provided Cl coverage close to nominal levels, even with few clusters and unequal cluster sizes. The thesis concludes that MLMs are the most appropriate method across the circumstances considered. This thesis presents methods for improving the quality ofCEA that use CRTs, to help future studies provide a sound basis for policy making.

APA, Harvard, Vancouver, ISO, and other styles

8

Majeed, Salar Mustafa. "Cluster detection and analysis with geo-spatial datasets using a hybrid statistical and neural networks hierarchical approach." Thesis, University of South Wales, 2010. https://pure.southwales.ac.uk/en/studentthesis/cluster-detection-and-analysis-with-geospatial-datasets-using-a-hybrid-statistical-and-neural-networks-hierarchical-approach(c57662b9-b685-4cfb-bd04-33e6e1655758).html.

Full text

Abstract:

Spatial datasets contain information relating to the locations of incidents of phenomena for example, crime and disease. Areas that contain a higher than expected incidence of the phenomena, given background population and census datasets, are of particular interest. By analysing the locations of potential influence, it may be possible to establish where a cause and effect relationship is present in the observed process. Cluster detection techniques can be applied to such datasets in order to reveal information relating to the spatial distribution of the cases. Research in these areas has mainly concentrated on either computational or statistical aspects of cluster detection. Each clustering algorithm has its own strengths and weakness. Their main weaknesses causing their unreliability can be estimating the number of clusters, testing the number of components, selecting initial seeds (centroids), running time and memory requirements. Consequently, a new cluster detection methodology has been developed in this thesis based on knowledge drawn from both statistical and computing domains. This methodology is based on a hybrid of statistical methods using properties of probability rather than distance to associate data with clusters. No previous knowledge of the dataset is required and the number of clusters is not predetermined. It performs efficiently in terms of memory requirements, running time and cluster quality. The algorithm for determining both the centre of clusters and the existence of the clusters themselves was applied and tested on simulated and real datasets. The results which were obtained from identification of hotspots were compared with results of other available algorithms such as CLAP (Cluster Location Analysis Procedure), Satscan and GAM (Geographical Analysis Machine). The outputs are very similar. XVI GIS presented in this thesis encompasses the SCS algorithm, statistics and neural networks for developing a hybrid predictive crime model, mapping, visualizing crime data and the corresponding population in the study region, visualizing the location of obtained clusters and burglary incidence concentration ‘hotspots’ which was specified by clustering algorithm SCS. Naturally the quality of results is subject to the accuracy of the used data. GIS is used in this thesis for developing a methodology for modelling data containing multiple functions. The census data used throughout this construction provided a useful source of geo-demographic information. The obtained datasets were used for predictive crime modelling. This thesis has benefited from several existing methodologies to develop a hybrid modelling approach. The methodology was applied to real data on burglary incidence distribution in the study region. Relevant principles of statistics, Geographical Information System, Neural Networks and SCS algorithm were utilized for the analysis of observed data. Regression analysis was used for building a predictive crime model and combined with Neural Networks with the aim of developing a new hierarchical neural Network approaches to generate a more reliable prediction. The promising results were compared with the non-hierarchical neural Network back-propagation network and multiple regression analysis. The average percentage accuracy achieved by the new methodology at testing stage increase 13% compared with the non-hierarchical BP performance. In general the analysis reveals a number of predictors that increase the risk of burglary in the study region. Specifically living in a household in which there is ‘one person’, ‘lone parent’, household where occupations are in elementary or intermediate and unemployed. For the influence of Household space, the results indicate that the risk of burglary rate increases within the household living in shared houses.

APA, Harvard, Vancouver, ISO, and other styles

9

Fiero, Mallorie H. "Statistical Approaches for Handling Missing Data in Cluster Randomized Trials." Diss., The University of Arizona, 2016. http://hdl.handle.net/10150/612860.

Full text

Abstract:

In cluster randomized trials (CRTs), groups of participants are randomized as opposed to individual participants. This design is often chosen to minimize treatment arm contamination or to enhance compliance among participants. In CRTs, we cannot assume independence among individuals within the same cluster because of their similarity, which leads to decreased statistical power compared to individually randomized trials. The intracluster correlation coefficient (ICC) is crucial in the design and analysis of CRTs, and measures the proportion of total variance due to clustering. Missing data is a common problem in CRTs and should be accommodated with appropriate statistical techniques because they can compromise the advantages created by randomization and are a potential source of bias. In three papers, I investigate statistical approaches for handling missing data in CRTs. In the first paper, I carry out a systematic review evaluating current practice of handling missing data in CRTs. The results show high rates of missing data in the majority of CRTs, yet handling of missing data remains suboptimal. Fourteen (16%) of the 86 reviewed trials reported carrying out a sensitivity analysis for missing data. Despite suggestions to weaken the missing data assumption from the primary analysis, only five of the trials weakened the assumption. None of the trials reported using missing not at random (MNAR) models. Due to the low proportion of CRTs reporting an appropriate sensitivity analysis for missing data, the second paper aims to facilitate performing a sensitivity analysis for missing data in CRTs by extending the pattern mixture approach for missing clustered data under the MNAR assumption. I implement multilevel multiple imputation (MI) in order to account for the hierarchical structure found in CRTs, and multiply imputed values by a sensitivity parameter, k, to examine parameters of interest under different missing data assumptions. The simulation results show that estimates of parameters of interest in CRTs can vary widely under different missing data assumptions. A high proportion of missing data can occur among CRTs because missing data can be found at the individual level as well as the cluster level. In the third paper, I use a simulation study to compare missing data strategies to handle missing cluster level covariates, including the linear mixed effects model, single imputation, single level MI ignoring clustering, MI incorporating clusters as fixed effects, and MI at the cluster level using aggregated data. The results show that when the ICC is small (ICC ≤ 0.1) and the proportion of missing data is low (≤ 25\%), the mixed model generates unbiased estimates of regression coefficients and ICC. When the ICC is higher (ICC > 0.1), MI at the cluster level using aggregated data performs well for missing cluster level covariates, though caution should be taken if the percentage of missing data is high.

APA, Harvard, Vancouver, ISO, and other styles

10

Torp, Emil, and Patrik Önnegren. "Driving Cycle Generation Using Statistical Analysis and Markov Chains." Thesis, Linköpings universitet, Fordonssystem, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-94147.

Full text

Abstract:

A driving cycle is a velocity profile over time. Driving cycles can be used for environmental classification of cars and to evaluate vehicle performance. The benefit by using stochastic driving cycles instead of predefined driving cycles, i.e. the New European Driving Cycle, is for instance that the risk of cycle beating is reduced. Different methods to generate stochastic driving cycles based on real-world data have been used around the world, but the representativeness of the generated driving cycles has been difficult to ensure. The possibility to generate stochastic driving cycles that captures specific features from a set of real-world driving cycles is studied. Data from more than 500 real-world trips has been processed and categorized. The driving cycles are merged into several transition probability matrices (TPMs), where each element corresponds to a specific state defined by its velocity and acceleration. The TPMs are used with Markov chain theory to generate stochastic driving cycles. The driving cycles are validated using percentile limits on a set of characteristic variables, that are obtained from statistical analysis of real-world driving cycles. The distribution of the generated driving cycles is investigated and compared to real-world driving cycles distribution. The generated driving cycles proves to represent the original set of real-world driving cycles in terms of key variables determined through statistical analysis. Four different methods are used to determine which statistical variables that describes the features of the provided driving cycles. Two of the methods uses regression analysis. Hierarchical clustering of statistical variables is proposed as a third alternative, and the last method combines the cluster analysis with the regression analysis. The entire process is automated and a graphical user interface is developed in Matlab to facilitate the use of the software.
En körcykel är en beskriving av hur hastigheten för ett fordon ändras under en körning. Körcykler används bland annat till att miljöklassa bilar och för att utvärdera fordonsprestanda. Olika metoder för att generera stokastiska körcykler baserade på verklig data har använts runt om i världen, men det har varit svårt att efterlikna naturliga körcykler. Möjligheten att generera stokastiska körcykler som representerar en uppsättning naturliga körcykler studeras. Data från över 500 körcykler bearbetas och kategoriseras. Dessa används för att skapa överergångsmatriser där varje element motsvarar ett visst tillstånd, med hastighet och acceleration som tillståndsvariabler. Matrisen tillsammans med teorin om Markovkedjor används för att generera stokastiska körcykler. De genererade körcyklerna valideras med hjälp percentilgränser för ett antal karaktäristiska variabler som beräknats för de naturliga körcyklerna. Hastighets- och accelerationsfördelningen hos de genererade körcyklerna studeras och jämförs med de naturliga körcyklerna för att säkerställa att de är representativa. Statistiska egenskaper jämfördes och de genererade körcyklerna visade sig likna den ursprungliga uppsättningen körcykler. Fyra olika metoder används för att bestämma vilka statistiska variabler som beskriver de naturliga körcyklerna. Två av metoderna använder regressionsanalys. Hierarkisk klustring av statistiska variabler föreslås som ett tredje alternativ. Den sista metoden kombinerar klusteranalysen med regressionsanalysen. Hela processen är automatiserad och ett grafiskt användargränssnitt har utvecklats i Matlab för att underlätta användningen av programmet.

APA, Harvard, Vancouver, ISO, and other styles

11

Holland, Jennifer M. "An Exploration of the Ground Water Quality of the Trinity Aquifer Using Multivariate Statistical Techniques." Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc84218/.

Full text

Abstract:

The ground water quality of the Trinity Aquifer for wells sampled between 2000 and 2009 was examined using multivariate and spatial statistical techniques. A Kruskal-Wallis test revealed that all of the water quality parameters with the exception of nitrate vary with land use. A Spearman’s rho analysis illustrates that every water quality parameter with the exception of silica correlated with well depth. Factor analysis identified four factors contributable to hydrochemical processes, electrical conductivity, alkalinity, and the dissolution of parent rock material into the ground water. The cluster analysis generated seven clusters. A chi-squared analysis shows that Clusters 1, 2, 5, and 6 are reflective of the distribution of the entire dataset when looking specifically at land use categories. The nearest neighbor analysis revealed clustered, dispersed, and random patterns depending upon the entity being examined. The spatial autocorrelation technique used on the water quality parameters for the entire dataset identified that all of the parameters are random with the exception of pH which was found to be spatially clustered. The combination of the multivariate and spatial techniques together identified influences on the Trinity Aquifer including hydrochemical processes, agricultural activities, recharge, and land use. In addition, the techniques aided in identifying areas warranting future monitoring which are located in the western and southwestern parts of the aquifer.

APA, Harvard, Vancouver, ISO, and other styles

12

Gao, Dexiang. "Analysis of clustered longitudinal count data /." Connect to full text via ProQuest. Limited to UCD Anschutz Medical Campus, 2007.

Find full text

Abstract:

Thesis (Ph.D. in Analytic Health Sciences, Department of Preventive Medicine and Biometrics) -- University of Colorado Denver, 2007.
Typescript. Includes bibliographical references (leaves 75-77). Free to UCD affiliates. Online version available via ProQuest Digital Dissertations;

APA, Harvard, Vancouver, ISO, and other styles

13

Van, Deventer Jacobus Philippus. "The fundamental building blocks of organisational knowledge management - a statistical evaluation." Thesis, University of Pretoria, 2013. http://hdl.handle.net/2263/39924.

Full text

Abstract:

As organisations and managers start to realise the strategic value of knowledge within their organisation, several attempts have been made to implement Knowledge Management (KM) within these organisations. The standard approach, which leads to the failure of KM initiatives, is to view KM as a type of technological implementation while failing to realise that the organisation needs to facilitate a KM-friendly environment. Organisations that have successfully implemented KM within their boundaries, structure and scope have developed unique and organisation-specific KM implementations, making it difficult for the success factors associated with these implementations to be transferred to other organisations. As a result, researchers and authors have attempted to develop an ontological or taxonomical mechanism that would assist in the sharing of knowledge within and across organisational boundaries. Due to the organisational specialisation of these mechanisms, these attempts have for the most part been unsuccessful. This study presents foundational work that can be used within an organisation to develop KM initiatives. By focusing on the language used by KM researchers and KM practitioners working with and practising KM within organisations, the author identified multiple terms and concepts that represent the fundamental building blocks of KM. If these building blocks are applied appropriately between different organisations, they can assist in the development of a KM initiative. The identified fundamental building blocks offer a starting point for the development of a KM initiative. As the study focuses on organisational KM needs, these building blocks may be used to implement a KM initiative that would satisfy an organisation‘s KM needs. The goal of this study is therefore to identify the fundamental building blocks of KM that, when applied constructively, would assist the KM practitioner in satisfying an organisation‘s KM needs. In order to achieve this goal, the research focused on the following objectives (as reflected in the research question, subquestions and chapter division):  To identify why there is a need for KM within organisations, and how it has been addressed in research, KM initiatives and organisations.  To clearly delineate the concepts of Knowledge, Management and KM that can be applied in relationship with the process of organisational management.  To identify organisational KM needs as linked to a generic organisation that is associated with a system interacting with its environment (gaining or losing knowledge due to the system‘s nature).  To identify KM‘s fundamental building blocks associated with the language used by KM researchers and practitioners.  To represent the identified fundamental KM building blocks that can be applied to a generic organisation to satisfy organisational KM needs. As a result of the discussion, review and study conducted for this thesis, the author found specific dimensions pertaining to the fundamental building blocks of KM that satisfy organisational needs.  It was established that there is a clear need for organisational KM in an effort to retain and manage knowledge resources to the benefit of the organisation. This highlighted the need for organisational KM, outlining possible solutions plus concerns found in previous research. It was found that although there is a need for organisational KM, this need has been poorly addressed thus far. Based on the discussion and findings in this thesis, it was found that there is a clear distinction between the concepts of Knowledge, Management and KM and it was found that KM provides support for the day-to-day management processes to which it is aligned. This highlighted the nature of Knowledge, Management and KM by redefining the construct of KM based on core considerations related to the concepts of Knowledge and Management and the critical interaction between the two.  It was found that due to the systemic nature of an organisation, knowledge dissipates into the organisational environment. KM is essential to minimise this effect. Furthermore, organisational KM needs can be satisfied by applying the fundamental building blocks of KM during the implementation of an organisational KM initiative.  After analysing the lexicon used by KM practitioners, the building blocks of KM were clearly highlighted by comparing patterns presented within the results analysed for this study.  The final objective highlights and represents the fundamental building blocks of KM that satisfy organisational KM needs as clearly identified from the language used by KM practitioners. By extending this study to the language used by KM practitioners as formulated within communities of practice in describing KM, the results of this study link directly to not only what KM theoretically appears to be, but also to how KM is viewed by people who work within the KM and knowledge environment on a day-to-day basis.
Thesis (PhD)--University of Pretoria, 2013.
gm2014
Informatics
unrestricted

APA, Harvard, Vancouver, ISO, and other styles

14

Marco, Almagro Lluís. "Statistical methods in Kansei engineering studies." Doctoral thesis, Universitat Politècnica de Catalunya, 2011. http://hdl.handle.net/10803/85059.

Full text

Abstract:

Aquesta tesi doctoral tracta sobre Enginyeria Kansei (EK), una tècnica per traslladar emocions transmeses per productes en paràmetres tècnics, i sobre mètodes estadístics que poden beneficiar la disciplina. El propòsit bàsic de l'EK és descobrir de quina manera algunes propietats d'un producte transmeten certes emocions als seus usuaris. És un mètode quantitatiu, i les dades es recullen típicament fent servir qüestionaris. S'extreuen conclusions en analitzar les dades recollides, normalment usant algun tipus d'anàlisi de regressió. L'EK es pot situar en l'àrea de recerca del disseny emocional. La tesi comença justificant la importància del disseny emocional. Com que el rang de tècniques usades sota el nom d'EK és extens i no massa clar, la tesi proposa una definició d'EK que serveix per delimitar el seu abast. A continuació, es suggereix un model per desenvolupar estudis d'EK. El model inclou el desenvolupament de l'espai semàntic – el rang d'emocions que el producte pot transmetre – i l'espai de propietats – les variables tècniques que es poden modificar en la fase de disseny. Després de la recollida de dades, l'etapa de síntesi enllaça ambdós espais (descobreix com diferents propietats del producte transmeten certes emocions). Cada pas del model s'explica detalladament usant un estudi d'EK realitzat per aquesta tesi: l'experiment dels sucs de fruites. El model inicial es va millorant progressivament durant la tesi i les dades de l'experiment es van reanalitzant usant noves propostes.Moltes inquietuds pràctiques apareixen quan s'estudia el model per a estudis d'EK esmentat anteriorment (entre d'altres, quants participants són necessaris i com es desenvolupa la sessió de recollida de dades). S'ha realitzat una extensa revisió bibliogràfica amb l'objectiu de respondre aquestes i altres preguntes. Es descriuen també les aplicacions d'EK més habituals, juntament amb comentaris sobre idees particularment interessants de diferents articles. La revisió bibliogràfica serveix també per llistar quines són les eines més comunament utilitzades en la fase de síntesi.La part central de la tesi se centra precisament en les eines per a la fase de síntesi. Eines estadístiques com la teoria de quantificació tipus I o la regressió logística ordinal s'estudien amb detall, i es proposen diverses millores. En particular, es proposa una nova forma gràfica de representar els resultats d'una regressió logística ordinal. S'introdueix una tècnica d'aprenentatge automàtic, els conjunts difusos (rough sets), i s'inclou una discussió sobre la seva idoneïtat per a estudis d'EK. S'usen conjunts de dades simulades per avaluar el comportament de les eines estadístiques suggerides, la qual cosa dóna peu a proposar algunes recomanacions.Independentment de les eines d'anàlisi utilitzades en la fase de síntesi, les conclusions seran probablement errònies quan la matriu del disseny no és adequada. Es proposa un mètode per avaluar la idoneïtat de matrius de disseny basat en l'ús de dos nous indicadors: un índex d'ortogonalitat i un índex de confusió. S'estudia l'habitualment oblidat rol de les interaccions en els estudis d'EK i es proposa un mètode per incloure una interacció, juntament amb una forma gràfica de representar-la. Finalment, l'última part de la tesi es dedica a l'escassament tractat tema de la variabilitat en els estudis d'EK. Es proposen un mètode (basat en l'anàlisi clúster) per segmentar els participants segons les seves respostes emocionals i una forma d'ordenar els participants segons la seva coherència en valorar els productes (usant un coeficient de correlació intraclasse). Com que molts usuaris d'EK no són especialistes en la interpretació de sortides numèriques, s'inclouen representacions visuals per a aquests dos nous mètodes que faciliten el processament de les conclusions.
Esta tesis doctoral trata sobre Ingeniería Kansei (IK), una técnica para trasladar emociones transmitidas por productos en parámetros técnicos, y sobre métodos estadísticos que pueden beneficiar la disciplina. El propósito básico de la IK es descubrir de qué manera algunas propiedades de un producto transmiten ciertas emociones a sus usuarios. Es un método cuantitativo, y los datos se recogen típicamente usando cuestionarios. Se extraen conclusiones al analizar los datos recogidos, normalmente usando algún tipo de análisis de regresión.La IK se puede situar en el área de investigación del diseño emocional. La tesis empieza justificando la importancia del diseño emocional. Como que el rango de técnicas usadas bajo el nombre de IK es extenso y no demasiado claro, la tesis propone una definición de IK que sirve para delimitar su alcance. A continuación, se sugiere un modelo para desarrollar estudios de IK. El modelo incluye el desarrollo del espacio semántico – el rango de emociones que el producto puede transmitir – y el espacio de propiedades – las variables técnicas que se pueden modificar en la fase de diseño. Después de la recogida de datos, la etapa de síntesis enlaza ambos espacios (descubre cómo distintas propiedades del producto transmiten ciertas emociones). Cada paso del modelo se explica detalladamente usando un estudio de IK realizado para esta tesis: el experimento de los zumos de frutas. El modelo inicial se va mejorando progresivamente durante la tesis y los datos del experimento se reanalizan usando nuevas propuestas. Muchas inquietudes prácticas aparecen cuando se estudia el modelo para estudios de IK mencionado anteriormente (entre otras, cuántos participantes son necesarios y cómo se desarrolla la sesión de recogida de datos). Se ha realizado una extensa revisión bibliográfica con el objetivo de responder éstas y otras preguntas. Se describen también las aplicaciones de IK más habituales, junto con comentarios sobre ideas particularmente interesantes de distintos artículos. La revisión bibliográfica sirve también para listar cuáles son las herramientas más comúnmente utilizadas en la fase de síntesis. La parte central de la tesis se centra precisamente en las herramientas para la fase de síntesis. Herramientas estadísticas como la teoría de cuantificación tipo I o la regresión logística ordinal se estudian con detalle, y se proponen varias mejoras. En particular, se propone una nueva forma gráfica de representar los resultados de una regresión logística ordinal. Se introduce una técnica de aprendizaje automático, los conjuntos difusos (rough sets), y se incluye una discusión sobre su idoneidad para estudios de IK. Se usan conjuntos de datos simulados para evaluar el comportamiento de las herramientas estadísticas sugeridas, lo que da pie a proponer algunas recomendaciones. Independientemente de las herramientas de análisis utilizadas en la fase de síntesis, las conclusiones serán probablemente erróneas cuando la matriz del diseño no es adecuada. Se propone un método para evaluar la idoneidad de matrices de diseño basado en el uso de dos nuevos indicadores: un índice de ortogonalidad y un índice de confusión. Se estudia el habitualmente olvidado rol de las interacciones en los estudios de IK y se propone un método para incluir una interacción, juntamente con una forma gráfica de representarla. Finalmente, la última parte de la tesis se dedica al escasamente tratado tema de la variabilidad en los estudios de IK. Se proponen un método (basado en el análisis clúster) para segmentar los participantes según sus respuestas emocionales y una forma de ordenar los participantes según su coherencia al valorar los productos (usando un coeficiente de correlación intraclase). Puesto que muchos usuarios de IK no son especialistas en la interpretación de salidas numéricas, se incluyen representaciones visuales para estos dos nuevos métodos que facilitan el procesamiento de las conclusiones.
This PhD thesis deals with Kansei Engineering (KE), a technique for translating emotions elicited by products into technical parameters, and statistical methods that can benefit the discipline. The basic purpose of KE is discovering in which way some properties of a product convey certain emotions in its users. It is a quantitative method, and data are typically collected using questionnaires. Conclusions are reached when analyzing the collected data, normally using some kind of regression analysis. Kansei Engineering can be placed under the more general area of research of emotional design. The thesis starts justifying the importance of emotional design. As the range of techniques used under the name of Kansei Engineering is rather vast and not very clear, the thesis develops a detailed definition of KE that serves the purpose of delimiting its scope. A model for conducting KE studies is then suggested. The model includes spanning the semantic space – the whole range of emotions the product can elicit – and the space of properties – the technical variables that can be modified in the design phase. After the data collection, the synthesis phase links both spaces; that is, discovers how several properties of the product elicit certain emotions. Each step of the model is explained in detail using a KE study specially performed for this thesis: the fruit juice experiment. The initial model is progressively improved during the thesis and data from the experiment are reanalyzed using the new proposals. Many practical concerns arise when looking at the above mentioned model for KE studies (among many others, how many participants are used and how the data collection session is conducted). An extensive literature review is done with the aim of answering these and other questions. The most common applications of KE are also depicted, together with comments on particular interesting ideas from several papers. The literature review also serves to list which are the most common tools used in the synthesis phase. The central part of the thesis focuses precisely in tools for the synthesis phase. Statistical tools such as quantification theory type I and ordinal logistic regression are studied in detail, and several improvements are suggested. In particular, a new graphical way to represent results from an ordinal logistic regression is proposed. An automatic learning technique, rough sets, is introduced and a discussion is included on its adequacy for KE studies. Several sets of simulated data are used to assess the behavior of the suggested statistical techniques, leading to some useful recommendations. No matter the analysis tools used in the synthesis phase, conclusions are likely to be flawed when the design matrix is not appropriate. A method to evaluate the suitability of design matrices used in KE studies is proposed, based on the use of two new indicators: an orthogonality index and a confusion index. The commonly forgotten role of interactions in KE studies is studied and a method to include an interaction in KE studies is suggested, together with a way to represent it graphically. Finally, the untreated topic of variability in KE studies is tackled in the last part of the thesis. A method (based in cluster analysis) for finding segments among subjects according to their emotional responses and a way to rank subjects based on their coherence when rating products (using an intraclass correlation coefficient) are proposed. As many users of Kansei Engineering are not specialists in the interpretation of the numerical output from statistical techniques, visual representations for these two new proposals are included to aid understanding.

APA, Harvard, Vancouver, ISO, and other styles

15

Logan, Ben. "A Statistical Examination of the Climatic Human Expert System, The Sunset Garden Zones for California." Thesis, Virginia Tech, 2006. http://hdl.handle.net/10919/32371.

Full text

Abstract:

Twentieth Century climatology was dominated by two great figures: Wladamir KÃ¶ppen and C. Warren Thornthwaite. The first carefully developed climatic parameters to match the larger world vegetation communities. The second developed complex formulas of "Moisture Factors" that provided efficient understanding of how evapotranspiration influences plant growth and health, both for native and non-native communities. In the latter half of the Twentieth Century, the Sunset Magazine Corporation develop a purely empirical set of Garden Zones, first for California, then for the thirteen states of the West, now for the entire nation in the National Garden Maps. The Sunset Garden Zones are well recognized and respected in Western States for illustrating the several factors of climate that distinguish zones. But the Sunset Garden Zones have never before been digitized and examined statistically for validation of their demarcations. This thesis examines the digitized zones with reference to PRISM climate data. Variable coverages resembling those described by Sunset are extracted from the PRISM data. These variable coverages are collected for two buffered areas, one in northern California and one in southern California. The coverages are exported from ArcGIS 9.1 to SASÂ® where they are processed first through a Principal Component Analysis, and then the first five principal components are entered into a Ward's Hierarchical Cluster Analysis. The resulting clusters were translated back into ArcGIS as a raster coverage, where the clusters were climatic regions. This process is quite amenable for further examination of other regions of California
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

16

Benkovská, Petra. "Web Usage Mining." Master's thesis, Vysoká škola ekonomická v Praze, 2007. http://www.nusl.cz/ntk/nusl-3950.

Full text

Abstract:

General characteristic of web mining including methodology and procedures incorporated into this term. Relation to other areas (data mining, artificial intelligence, statistics, databases, internet technologies, management etc.) Web usage mining - data sources, data pre-processing, characterization of analytical methods and tools, interpretation of outputs (results), and possible areas of usage including examples. Suggestion of solution method, realization and a concrete example's outputs interpretation while using above mentioned methods of web usage mining.

APA, Harvard, Vancouver, ISO, and other styles

17

Marková, Monika. "Finding groups of the similar variables with statistical software SAS and SPSS." Master's thesis, Vysoká škola ekonomická v Praze, 2007. http://www.nusl.cz/ntk/nusl-10417.

Full text

Abstract:

My diploma thesis focuses on the comparison of possibilities of the statistical software SAS and SPSS in the area of the factor and cluster analysis and the multidimensional scaling. They deal with the methods for identifying groups of the similar statistical values (variables). The ascertained relations among the variables can serve to decrease the proportion vectors of the variables, which describe the individual monitored objects (statistical units), which helps us to apply other various methods, for example the regression or discriminant analysis. By one of the ways for finding the similarity of variables in the cluster analysis or the multidimensional scaling is searching for their relations. Whereas the base of the factor analysis is the formulation of the relation between two variables by means of the covariances, eventually Pearson correlation coefficient, it is possible to use also coefficients of correlation for the cluster analysis and the multidimensional scaling, in some case other measures. The thesis describes mainly the command syntax of the procedures implemented in SAS and SPSS. The meaning of the individual parametres and the partial specifications of each command are explained. The results gained by various types of analyses are compared on the basis of the real dataset. The possibilities of the statistical software SAS and SPSS are evaluated in the conclusion and it is referred to their advantages or disadvantages. The attention is also paid, for example, to the form of the input dataset, to the quaility of outputs or to the partial methods.

APA, Harvard, Vancouver, ISO, and other styles

18

Fiero, M., S. Huang, and M. L. Bell. "Statistical analysis and handling of missing data in cluster randomised trials: protocol for a systematic review." BMJ, 2015. http://hdl.handle.net/10150/617201.

Full text

Abstract:

UA Open Access Publishing Fund.
Introduction: Cluster randomised trials (CRTs) randomise participants in groups, rather than as individuals, and are key tools used to assess interventions in health research where treatment contamination is likely or if individual randomisation is not feasible. Missing outcome data can reduce power in trials, including in CRTs, and is a potential source of bias. The current review focuses on evaluating methods used in statistical analysis and handling of missing data with respect to the primary outcome in CRTs. Methods and analysis: We will search for CRTs published between August 2013 and July 2014 using PubMed, Web of Science and PsycINFO. We will identify relevant studies by screening titles and abstracts, and examining full-text articles based on our predefined study inclusion criteria. 86 studies will be randomly chosen to be included in our review. Two independent reviewers will collect data from each study using a standardised, prepiloted data extraction template. Our findings will be summarised and presented using descriptive statistics. Ethics and dissemination: This methodological systematic review does not need ethical approval because there are no data used in our study that are linked to individual patient data. After completion of this systematic review, data will be immediately analysed, and findings will be disseminated through a peer-reviewed publication and conference presentation.

APA, Harvard, Vancouver, ISO, and other styles

19

Mosavel, Haajierah. "Petrophysical characterization of sandstone reservoirs through boreholes E-S3, E-S5 and F-AH4 using multivariate statistical techniques and seismic facies in the Central Bredasdorp Basin." Thesis, University of the Western Cape, 2014. http://hdl.handle.net/11394/3984.

Full text

Abstract:

>Magister Scientiae - MSc
The thesis aims to determine the depositional environments, rock types and petrophysical characteristics of the reservoirs in Wells E-S3, E-S5 and F-AH4 of Area X in the Bredasdorp Basin, offshore South Africa. The three wells were studied using methods including core description, petrophysical analysis, seismic facies and multivariate statistics in order to evaluate their reservoir potential. The thesis includes digital wireline log signatures, 2D seismic data, well data and core analysis from selected depths. Based on core description, five lithofacies were identified as claystone (HM1), fine to coarse grained sandstone (HM2), very fine to medium grained sandstone (HM3), fine to medium grained sandstone (HM4) and conglomerate (HM5). Deltaic and shallow marine depositional environments were also interpreted from the core description based on the sedimentary structures and ichnofossils. The results obtained from the petrophysical analysis indicate that the sandstone reservoirs show a relatively fair to good porosity (range 13-20 %), water saturation (range 17-45 %) and a predicted permeability (range 4- 108 mD) for Wells E-S3, E-S5 andF-AH4. The seismic facies model of the study area shows five seismic facies described as parallel, variable amplitude variable continuity, semi-continuous high amplitude, divergent variable amplitude and chaotic seismic facies as well as a probable shallow marine, deltaic and submarine fan depositional system. Linking lithofacies to seismic facies maps helped to understand and predict the distribution and quality of reservoir packages in the studied wells

APA, Harvard, Vancouver, ISO, and other styles

20

French, Benjamin. "Analysis of aggregate longitudinal data with time-dependent exposure /." Thesis, Connect to this title online; UW restricted, 2008. http://hdl.handle.net/1773/9569.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Xu, Tianbing. "Nonparametric evolutionary clustering." Diss., Online access via UMI:, 2009.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

22

Clatworthy, Jane. "The theoretical and statistical value of cluster analysis in health psychology : an empirical investigation using artificial and existing data sets." Thesis, University of Brighton, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.404065.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Krupka, Ondřej. "Klasifikace vzorků 1D gelové elektroforézy." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2015. http://www.nusl.cz/ntk/nusl-221331.

Full text

Abstract:

This term project deals with the classification of 1D gel electrophoresis samples. It describes the theoretical information about gel electrophoresis, various types of errors, processing of the image and its classification using the cluster analysis. One of the main goals is creation of images with the highest quality as possible. A realization of pre-processing and detection of the sample borders is made in the MATLAB environment. And finally, classification of samples is done with subsequent statistical analysis.

APA, Harvard, Vancouver, ISO, and other styles

24

Granjeiro, Michel Lopes. "AnÃlise estatÃstica R-Modal, Q-Modal e Cluster no estudo da qualidade da Ãgua subterrÃnea do aquÃfero JandaÃra na Chapada do Apodi." Universidade Federal do CearÃ, 2012. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=7765.

Full text

Abstract:

Conselho Nacional de Desenvolvimento CientÃfico e TecnolÃgico
AnÃlise de 15 parÃmetros hidroquÃmicos em Ãguas subterrÃneas do aquÃfero JandaÃra, na Bacia Potiguar, foi realizada em dois conjuntos de poÃos amostrados, o primeiro com 97 amostras no perÃodo seco e o segundo com 80 no inÃcio do perÃodo chuvoso. Os poÃos estÃo localizados nos municÃpios de BaraÃna e MossorÃ, no Estado do Rio Grande do Norte e nos municÃpios de Jaguaruana, QuixerÃ e Limoeiro do Norte, no Estado do CearÃ. O tratamento estatÃstico dos dados de cada coleta feito com a AnÃlise Fatorial R-modal permitiu identificar os processos responsÃveis pela presenÃa dos sais nas Ãguas. No perÃodo seco, o Fator 1 indicou a importÃncia dos aerossÃis de origem marinha, principalmente pelas concentraÃÃes de cloretos e de sÃdio, e a influÃncia litolÃgica pela dureza. O Fator 2 indicou dissoluÃÃo de rochas calcÃrias, mostrada pela presenÃa do bicarbonato e do pH. No perÃodo chuvoso, o Fator 1 tambÃm reflete a influÃncia dos aerossÃis e da dureza, mas o Fator 2 considera somente o bicarbonato, indicando recarga no aqÃÃfero. Nas duas simulaÃÃes, o domÃnio de uma Ãnica variÃvel nos outros fatores indica mistura de diferentes tipos de Ãgua e revela uma leve aÃÃo antrÃpica. A AnÃlise Q-modal feita com todos os poÃos, indicou como grupo melhor correlacionado o dos poÃos do municÃpio de BaraÃna, representado pelo Fator 1, seguido pelos poÃos de MossorÃ, representados pelo Fator 2. A AnÃlise Cluster complementou a AnÃlise Q-modal e permitiu associar os poÃos em grupos, indicando uma conectividade entre eles ou uma recarga por Ãgua semelhante. Essa anÃlise mostrou equivalÃncia entre o Primeiro Grupo e o Fator 1 da AnÃlise Q-modal e o Segundo Grupo e o Fator 2 da AnÃlise Q-modal. O Primeiro e o Segundo Grupo tÃm mais poÃos no perÃodo seco do que no chuvoso quando a recarga contribui para mistura com Ãgua de composiÃÃo diferente da Ãgua do aquÃfero
Analyses of 15 hydrochemical parameters in groundwater from the JandaÃra aquifer of the Potiguar Basin were performed on two sets of wells, one with 97 wells sampled in the dry season and the other one with 80 wells sampled at the beginning of the rainy season. The wells are located in the townships of BaraÃna and MossorÃ, in the State of Rio Grande do Norte, and in Jaguaruana, QuixerÃ, and Limoeiro do Norte, in the State of CearÃ. Statistical treatment of data from each sampling applying R-modal Factor Analysis allowed to identify the processes responsible for the presence of minerals in the waters. During the dry season, Factor 1 indicates the importance of marine aerosol, mainly through the concentrations of chloride and sodium, and lithologocal influence through hardness. Factor 2 indicates dissolution of limestone, evidenced by the presence of bicarbonate and pH. In the rainy season, Factor 1 also reflects the influence of aerosol and hardness, but Factor 2 considers bicarbonate only, indicating recharge to the aquifer. In both simulations, the predominance of a single variable in the other factors indicates mixing of different water types and also reveals slight human impact. Q-modal Analysis done with all wells indicates as the group best correlated the wells of BaraÃna, represented by Factor 1, followed by wells of MossorÃ, represented by Factor 2. Cluster Analysis complemented Q-modal Analysis and allows to associate wells in groups, indicating a hydraulic connection between them or recharge through similar water. This analysis reveals equivalence between the First Group and Factor 1 of Q-modal Analysis and of the Second Group and Factor 2 of Q-modal Analysis. The First and Second Groups comprise more wells in the dry season than in the rainy one, when recharge contributes to mixing with water of different composition than that in the aquifer.

APA, Harvard, Vancouver, ISO, and other styles

25

Granjeiro, Michel Lopes. "Análise estatística R-Modal, Q-Modal e Cluster no estudo da qualidade da água subterrânea do aquífero Jandaíra na Chapada do Apodi." reponame:Repositório Institucional da UFC, 2012. http://www.repositorio.ufc.br/handle/riufc/12540.

Full text

Abstract:

GRANJEIRO, Michel Lopes. Análise estatística R-Modal, Q-Modal e Cluster no estudo da qualidade da água subterrânea do aquífero Jandaíra na Chapada do Apodi. 2012. 118 f. Tese (Doutorado em Física) - Programa de Pós-Graduação em Física, Departamento de Física, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, 2012.
Submitted by Edvander Pires (edvanderpires@gmail.com) on 2015-05-29T20:14:33Z No. of bitstreams: 1 2012_tese_mlgranjeiro.pdf: 2308619 bytes, checksum: 27af20ae427c0f1e4fe5560ed9c7320a (MD5)
Approved for entry into archive by Edvander Pires(edvanderpires@gmail.com) on 2015-05-29T20:23:01Z (GMT) No. of bitstreams: 1 2012_tese_mlgranjeiro.pdf: 2308619 bytes, checksum: 27af20ae427c0f1e4fe5560ed9c7320a (MD5)
Made available in DSpace on 2015-05-29T20:23:01Z (GMT). No. of bitstreams: 1 2012_tese_mlgranjeiro.pdf: 2308619 bytes, checksum: 27af20ae427c0f1e4fe5560ed9c7320a (MD5) Previous issue date: 2012
Analyses of 15 hydrochemical parameters in groundwater from the Jandaíra aquifer of the Potiguar Basin were performed on two sets of wells, one with 97 wells sampled in the dry season and the other one with 80 wells sampled at the beginning of the rainy season. The wells are located in the townships of Baraúna and Mossoró, in the State of Rio Grande do Norte, and in Jaguaruana, Quixeré, and Limoeiro do Norte, in the State of Ceará. Statistical treatment of data from each sampling applying R-modal Factor Analysis allowed to identify the processes responsible for the presence of minerals in the waters. During the dry season, Factor 1 indicates the importance of marine aerosol, mainly through the concentrations of chloride and sodium, and lithologocal influence through hardness. Factor 2 indicates dissolution of limestone, evidenced by the presence of bicarbonate and pH. In the rainy season, Factor 1 also reflects the influence of aerosol and hardness, but Factor 2 considers bicarbonate only, indicating recharge to the aquifer. In both simulations, the predominance of a single variable in the other factors indicates mixing of different water types and also reveals slight human impact. Q-modal Analysis done with all wells indicates as the group best correlated the wells of Baraúna, represented by Factor 1, followed by wells of Mossoró, represented by Factor 2. Cluster Analysis complemented Q-modal Analysis and allows to associate wells in groups, indicating a hydraulic connection between them or recharge through similar water. This analysis reveals equivalence between the First Group and Factor 1 of Q-modal Analysis and of the Second Group and Factor 2 of Q-modal Analysis. The First and Second Groups comprise more wells in the dry season than in the rainy one, when recharge contributes to mixing with water of different composition than that in the aquifer.
Análise de 15 parâmetros hidroquímicos em águas subterrâneas do aquífero Jandaíra, na Bacia Potiguar, foi realizada em dois conjuntos de poços amostrados, o primeiro com 97 amostras no período seco e o segundo com 80 no início do período chuvoso. Os poços estão localizados nos municípios de Baraúna e Mossoró, no Estado do Rio Grande do Norte e nos municípios de Jaguaruana, Quixeré e Limoeiro do Norte, no Estado do Ceará. O tratamento estatístico dos dados de cada coleta feito com a Análise Fatorial R-modal permitiu identificar os processos responsáveis pela presença dos sais nas águas. No período seco, o Fator 1 indicou a importância dos aerossóis de origem marinha, principalmente pelas concentrações de cloretos e de sódio, e a influência litológica pela dureza. O Fator 2 indicou dissolução de rochas calcárias, mostrada pela presença do bicarbonato e do pH. No período chuvoso, o Fator 1 também reflete a influência dos aerossóis e da dureza, mas o Fator 2 considera somente o bicarbonato, indicando recarga no aqüífero. Nas duas simulações, o domínio de uma única variável nos outros fatores indica mistura de diferentes tipos de água e revela uma leve ação antrópica. A Análise Q-modal feita com todos os poços, indicou como grupo melhor correlacionado o dos poços do município de Baraúna, representado pelo Fator 1, seguido pelos poços de Mossoró, representados pelo Fator 2. A Análise Cluster complementou a Análise Q-modal e permitiu associar os poços em grupos, indicando uma conectividade entre eles ou uma recarga por água semelhante. Essa análise mostrou equivalência entre o Primeiro Grupo e o Fator 1 da Análise Q-modal e o Segundo Grupo e o Fator 2 da Análise Q-modal. O Primeiro e o Segundo Grupo têm mais poços no período seco do que no chuvoso quando a recarga contribui para mistura com água de composição diferente da água do aquífero.

APA, Harvard, Vancouver, ISO, and other styles

26

Frank, Erika. "A STATISTICAL APPROACH FOR IDENTIFICATION OF CHEMICAL GROUPINGS OF ELEMENTS IN SWEDISH ROCKS WITH SPECIAL FOCUS ON ARSENIC AND SULPHUR." Thesis, Uppsala universitet, Statistiska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447527.

Full text

Abstract:

Groundwater analyses have revealed high concentrations of the toxic element arsenic around Stockholm and Mälardalen, a problem that often is linked to high levels of arsenic in the bedrock and which could be escalated by the many construction projects in the same region. However, it is unknown what part of the bedrock is causing the contamination. The aim of this thesis is to identify the chemical elements that associate with arsenic and study how the rock types differ in their content of elements and compounds. The highest median concentration of arsenic is found in quartz-feltspar-rich sedimentary rock, while intrusive rock types reveal the lowest levels. Using cluster analysis, arsenic is placed in a group including nine other elements, to which the strongest correlations are found with antimony, bismuth and silver. A moderate correlation with sulphur is also observed. The associations between groupings of elements are analysed using measures of dependence, which reveal relatively strong associations. Dimension reduction and ordination techniques provide further insight to the typical appearances of elements and reveal two groups of similar rock types.

APA, Harvard, Vancouver, ISO, and other styles

27

Tansel, Icten. "Differentiation And Classification Of Counterfeit And Real Coins By Applying Statistical Methods." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614417/index.pdf.

Full text

Abstract:

ABSTRACT DIFFERENTIATION AND CLASSIFICATION OF COUNTERFEIT AND REAL COINS BY APPLYING STATISTICAL METHODS Tansel, Iç
ten M.Sc, Archaeometry Graduate Program Supervisor : Assist. Prof. Dr. Zeynep Isil Kalaylioglu Co-Supervisor : Prof. Dr. Sahinde Demirci June 2012, 105 pages In this study, forty coins which were obtained from Museum of Anatolian Civilizations (MAC) in Ankara were investigated. Some of those coins were real (twenty two coins) and the remaining ones (eighteen coins) were fake coins. Forty coins were Greek coins which were dated back to middle of the fifth century BCE and reign of Alexander the Great (323 &ndash
336 BCE). The major aims of this study can be summarized as follow

APA, Harvard, Vancouver, ISO, and other styles

28

Kersten, Stefan. "Statistical modelling and resynthesis of environmental texture sounds." Doctoral thesis, Universitat Pompeu Fabra, 2016. http://hdl.handle.net/10803/400395.

Full text

Abstract:

Environmental texture sounds are an integral, though often overlooked, part of our daily life. They constitute those elements of our sounding environment that we tend to perceive subconsciously but which we miss when they are missing. Those sounds are also increasingly important for adding realism to virtual environments, from immersive artificial worlds through computer games to mobile augmented reality systems. This work spans the spectrum from data-driven stochastic sound synthesis methods to distributed virtual reality environments and their aesthetic and technological implications. We propose a framework for statistically modelling environmental texture sounds in different sparse signal representations. We explore three different instantiations of this framework, two of which constitute a novel way of representing texture sounds in a physically-inspired sparse statistical model and of estimating model parameters from recorded sound examples.
Los sonidos texturales ambientales son parte integral de nuestra vida diaria, a pesar de que muchas veces pasen desapercibidos. Constituyen esos elementos de nuestro entorno sonoro que solemos percibir de manera subconsciente pero que extrañamos cuando desaparecen. Esos sonidos son también cada vez más importantes para añadir realismo a los ambientes virtuales, desde mundos artificiales de inmersión hasta sistemas móviles de realidad aumentada, pasando por juegos de ordenador. Este trabajo abarca todo el espectro desde métodos de síntesis de sonido estocásticos basados en datos hasta entornos distribuidos de realidad virtual, así como sus implicaciones estéticas y tecnológicas. Proponemos un marco para modelar estadísticamente sonidos ambientales texturales en diferentes representaciones sparse de señales. Exploramos tres diferentes instanciaciones de este marco, dos de las cuales constituyen una nueva manera de representar sonidos texturales en un modelo estadístico inspirado físicamente así como de estimar parámetros de modelo a partir de ejemplos de sonido grabados.

APA, Harvard, Vancouver, ISO, and other styles

29

Hennon, Christopher C. "Investigating Probabilistic Forecasting of Tropical Cyclogenesis Over the North Atlantic Using Linear and Non-Linear Classifiers." The Ohio State University, 2003. http://rave.ohiolink.edu/etdc/view?acc_num=osu1047237423.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Kim, Doo Young. "Statistical Modeling of Carbon Dioxide and Cluster Analysis of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, and Multi-Level Time Series Clustering." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6277.

Full text

Abstract:

The current study consists of three major parts. Statistical modeling, the connection between statistical modeling and cluster analysis, and proposing new methods to cluster time dependent information. First, we perform a statistical modeling of the Carbon Dioxide (CO2) emission in South Korea in order to identify the attributable variables including interaction effects. One of the hot issues in the earth in 21st century is Global warming which is caused by the marriage between atmospheric temperature and CO2 in the atmosphere. When we confront this global problem, we first need to verify what causes the problem then we can find out how to solve the problem. Thereby, we find and rank the attributable variables and their interactions based on their semipartial correlation and compare our findings with the results from the United States and European Union. This comparison shows that the number one contributing variable in South Korea and the United States is Liquid Fuels while it is the number 8 ranked in EU. This comparison provides the evidence to support regional policies and not global, to control CO2 in an optimal level in our atmosphere. Second, we study regional behavior of the atmospheric CO2 in the United States. Utilizing the longitudinal transitional modeling scheme, we calculate transitional probabilities based on effects from five end-use sectors that produce most of the CO2 in our atmosphere, that is, the commercial sector, electric power sector, industrial sector, residential sector, and the transportation sector. Then, using those transitional probabilities we perform a hierarchical clustering procedure to classify the regions with similar characteristics based on nine US climate regions. This study suggests that our elected officials can proceed to legislate regional policies by end-use sectors in order to maintain the optimal level of the atmospheric CO2 which is required by global consensus. Third, we propose new methods to cluster time dependent information. It is almost impossible to find data that are not time dependent among floods of information that we have nowadays, and it needs not to emphasize the importance of data mining of the time dependent information. The first method we propose is called “Lag Target Time Series Clustering (LTTC)” which identifies actual level of time dependencies among clustering objects. The second method we propose is the “Multi-Factor Time Series Clustering (MFTC)” which allows us to consider the distance in multi-dimensional space by including multiple information at a time. The last method we propose is the “Multi-Level Time Series Clustering (MLTC)” which is especially important when you have short term varying time series responses to cluster. That is, we extract only pure lag effect from LTTC. The new methods that we propose give excellent results when applied to time dependent clustering. Finally, we develop appropriate algorithm driven by the analytical structure of the proposed methods to cluster financial information of the ten business sectors of the N.Y. Stock Exchange. We used in our clustering scheme 497 stocks that constitute the S&P 500 stocks. We illustrated the usefulness of the subject study by structuring diversified financial portfolio.

APA, Harvard, Vancouver, ISO, and other styles

31

Meyer, Andréia da Silva. "Comparação de coeficientes de similaridade usados em análises de agrupamento com dados de marcadores moleculares dominantes." Universidade de São Paulo, 2002. http://www.teses.usp.br/teses/disponiveis/11/11134/tde-24072002-165250/.

Full text

Abstract:

Estudos de divergência genética e relações filogenéticas entre espécies vegetais de importância agronômica têm merecido atenção cada vez maior com o recente advento dos marcadores moleculares. Nesses trabalhos, os pesquisadores têm interesse em agrupar os indivíduos semelhantes de forma que as maiores diferenças ocorram entre os grupos formados. Métodos estatísticos de análise, tais como análise de agrupamentos, análise de fatores e análise de componentes principais auxiliam nesse tipo de estudo. Contudo, antes de se empregar algum desses métodos, deve ser obtida uma matriz de similaridade entre os genótipos, sendo que diversos coeficientes são propostos na literatura para esse fim. O presente trabalho teve como objetivo avaliar se diferentes coeficientes de similaridade influenciam os resultados das análises de agrupamentos, feitas a partir de dados provenientes de análises com marcadores moleculares dominantes. Foram utilizados dados de 18 linhagens de milho provenientes de duas diferentes populações, BR-105 e BR-106, as quais foram analisadas por marcadores dos tipos AFLP e RAPD. Foram considerados para comparação os coeficientes de Jaccard, Sorensen-Dice, Anderberg, Ochiai, Simple Matching, Rogers e Tanimoto, Ochiai II e Russel e Rao, para os quais foram obtidas as matrizes de similaridade. Essas matrizes foram comparadas utilizando as correlações de Pearson e Spearman, análise de agrupamentos com construção de dendrogramas, correlações, distorção e estresse entre as matrizes de similaridade e as matrizes cofenéticas, índices de consenso entre os dendrogramas, grupos obtidos com o método de otimização de Tocher e com a projeção no plano bidimensional das matrizes de similaridade. Os resultados mostraram que para praticamente todas metodologias usadas, para ambos marcadores, os coeficientes de Jaccard, Sorensen-Dice, Anderberg e Ochiai mostraram resultados semelhantes entre si, o que foi atribuído ao fato deles apresentarem como propriedade comum a desconsideração da ausência conjunta de bandas. Isso também foi observado para os coeficientes de Simple Matching, Rogers e Tanimoto e Ochiai II, que também não apresentaram entre si grandes alterações nos resultados, possivelmente devido ao fato de todos considerarem a ausência conjunta. O coeficiente de Russel e Rao apresentou resultados muito diferentes dos demais coeficientes, em função dele excluir a ausência conjunta do numerador e incluí-la no denominador, não sendo recomendado seu uso. Devido ao fato da ausência conjunta não significar necessariamente que as regiões do DNA são idênticas, sugere-se a escolha dentre os coeficientes que desconsideram a ausência conjunta.
With the recent advent of the molecular markers, studies of divergence and phylogenetic relationships between and within vegetable species of agricultural interest have been received greater attention. In these studies, the aim is to group similar individuals looking for bigger differences among the groups. Statistical methods of analysis such as cluster analysis, factor analysis and principal components analysis can be used in this kind of study. However, before to employ some method, the similarity matrix between genotypes must be obtained using one of the several coefficients proposed in the concerning literature. The aim of this study was to evaluate if different similarity coefficients can influence the results of cluster analysis with dominant markers. Data from 18 inbred lines of maize from two different populations, BR-105 and BR-106, were analyzed by AFLP and RAPD markers and eight similarity coefficients (Jaccard, Sorensen-Dice, Anderberg, Ochiai, Simple-matching, Rogers and Tanimoto, Ochiai II and Russel and Rao) were obtained. The similarity matrices were compared by Pearson's and Spearman's correlations, cluster analysis (with dendrograms, correlations, distortion and stress between the similarity and cofenetical matrices, consensus fork index between all pairs of dendrograms), Tocher´s optimization procedure and with the projection in two-dimensional space of the similarity matrices. The results showed that for almost all of the methodologies and both markers, the coefficients of Jaccard, Sorensen-Dice, Anderberg and Ochiai, gave similar results, due to the fact that all of them excludes negative co-occurences. It was also observed that the Simple Matching, Rogers and Tanimoto, and Ochiai II, probably due to the fact of all including the negative co-occurences. The Russel and Rao coefficient presented results very different from the others, because it excludes the negative co-occurences in the numerator and include it in the denominator of its expression, which is a reason for not recommending it. Due the fact of the negative co-occurences does not mean, necessarily, that the regions of the DNA are identical, it is suggested to choose one those coefficients that do not include it.

APA, Harvard, Vancouver, ISO, and other styles

32

Nabeel, Muhammad. "A study of micro-particles in the dust and melt at different stages of iron and steelmaking." Doctoral thesis, KTH, Tillämpad processmetallurgi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-196805.

Full text

Abstract:

The dust particles generated due to mechanical wear of iron ore pellets and clusters formed in molten stainless steel alloyed with rare earth metals (REM) are considered in this study. Firstly, the influence of the characteristics of iron ore pellets, applied load on a pellet bed and partial reduction of the pellets on the size distribution of the generated dust was investigated. Secondly, REM clusters are investigated to evaluate the size distribution of the clusters. Also, an extreme value distribution (EVD) analysis has been applied for the observed REM clusters. The large sized pellets showed 10-20% higher wear rate than small sized pellets during wear in a planetary mill. Moreover, an increase of ~67% was observed in the friction and dust generation in the pellet bed as the applied load increased from 1 to 3 kg. Also, it was observed that a higher friction in the pellet bed can lead to an increased amount of airborne particles. The mechanical wear experiments of pellets reduced at 500 °C (P500) and 850 °C (P850) showed that P500 pellets exhibit ~16-35% higher wear rate than unreduced pellets. For the P850 pellets, the wear is inhibited by formation of a metallic layer at the outer surface of the pellets. The mechanism of dust generation has been explained using the obtained results. A reliable cluster size distribution of REM clusters was obtained by improving the observation method and it was used to explicate the formation and growth mechanism of REM clusters. The results show that the growth of clusters is governed by different types of collisions depending on the size of the clusters. For EVD analysis three different size parameters were considered. Moreover, using the maximum length of clusters results in a better correlation of EVD regression lines compared to other size parameters. Moreover, a comparison of predicted and observed maximum lengths of clusters showed that further work is required for the application of EVD analyses for REM clusters.
Studien fokuserar på två olika typer av mikropartiklar som är valda från olika delar av järn- och ståltillverkningsprocessen. Dessa partiklar är dels stoft som genereras på grund av mekanisk nötning av partiklar och dels klusters som bildas i flytande rostfria stål legerade med sällsynta jordartsmetaller (REM). Inledningsvis så undersöktes inverkan av tre faktorer på storleksfördelningen hos stoft som bildas vid hantering av järnoxidpellets. De undersökta faktorerna inkluderade karakteristiken hos järnoxidpellets, det applicerade trycket på pelletsbädden och den partiella reduktionen av järnoxidpellets. Därefter så utfördes tredimensionella undersökningar av REM kluster som extraherats med hjälp av elektrolytisk extraction för att bestämma storleksfördelningen hos klustren. Dessutom så utfördes en extremvärdesdistribution (EVD) studie för de studerade klustren. En planetkvarn användes för att undersöka inverkan of karakeristiken hos pellets på stoftbildningen. Resultaten visade att storleken på pellets kan påverka nötningshastigheten under dessa försöksförhållanden. Pellets som hade en större storlek (13.5< Deq <15.0 mm) uppvisade en 10 till 20% högre nötningshastighet i jämförelse med mindre pellets (9.5< Deq <12.5 mm). Baserat på analyserna av stoftet som genererades under nötningsexperimenten så konstaterades att nötningsmekanismerna för dessa pellets var abrasions- och kollisionsnötning. En pelletsbädd skapades för att möjliggöra studier av inverkan av ett applicerat tryck på stoftbildningen och friktionskrafterna i en pelletsbädd. Ett varierat tryck på mellan 1 till 3 kg applicerades på pelletsbädden. Resultaten visade att en ökning på ~67% av friktionskraften och stoftbildningen ägde rum när det applicerade trycket ökades från 1 till 3kg. Dessutom så visade resultaten att en högre friktionskraft i pelletsbädden kan resultera in en ökad mängd luftburna partiklar. Den mekaniska nötningen av pellets som reducerats vid 500 °C (P500) och 850 °C (P850) studerades också genom användande av en planetkvarn. Resultaten visade att P500 pellets uppvisade en ~ 16 till 35% högre nötningshastighet i jämförelse med oreducerade referenspellets. Resultaten för P850 pellets visade att den mekaniska nötningen motverkades genom bildningen av ett metalliskt skikt på den yttre delen av pelletsen. Resultaten visade också att stoftet som bildats pga mekanisk nötning av reducerade pellets innehöll 3 till 6 gånger mer grova partiklar (>20µm) i jämförelse med stoft som bildats från oreducerade pellets. Slutligen så diskuterades hur dessa resultat kan relateras till industriella förhållanden med avseende på mekanismerna som är involverade i den mekaniska nötningen av pellets samt med avseende på relationen mellan hastigheten av de utgående gaserna och storlken och morfologin hos stoftpartiklarna. Klusters innehållande REM-oxider som extraherats från en 253MA rostfri stålsort undersöktes med användande av en tredimensionell teknik. En trovärdig storleksfördelning av klusters (CSD) erhölls genom att förbättra undersökningsmetoden och denna användes för att studera bildningen och tillväxten av REM oxider. Dessutom så användes cirkularitetsfaktorn hos klusters för att dela in klustren i två olika grupper, vilka bildas och tillväxer enligt olika mekanismer. Resultaten visade också att tillväxten av klusters gynnas av olika typer av kollisioner som beror av av storleken på klusters. För REM-klusters så drogs slutsatsen att turbulenta kollisioner är den huvudsakliga mekanismen som påverkar tillväxten. Avhandlingen behandlar även problemet om hur det är möjligt att hantera synfält där det inte förekommer kluster vid en extremvärdesdistribution (EVD) analys. Tre olika parametrar undersöktes i EVD analysen. Resultaten visar att om den maximala längden på kluster (LC) används i analysen så erhålls den bästa korrelationen gällande regressionslinjen för en EVD analys. Specifikt så var R2 värdet upp till 0.9876 i jämförelse med de andra storleksparametrarna som har värden i intervallet 0.9656 – 0.9774. Slutligen så visar resultaten från en jämförelse mellan beräknade och observerade maximala klusterlängder att EVD analyser för studier av REM kluster behöver undersökas ytterligare i framtiden.

QC 20161128

APA, Harvard, Vancouver, ISO, and other styles

33

Tan, Ye. "A comparison of approaches to analysis of clustered binary data when cluster sizes are large /." Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=81443.

Full text

Abstract:

Several methods can be used in cluster-randomized studies with binary outcome, including GLMM, GEE, and ordinary logistic regression. In this thesis, we study cluster-randomized data with large cluster sizes relative to the number of clusters (for example, the PROBIT study). We compared GLMM, GEE, and ordinary logistic regression approaches in terms of parameter interpretation, magnitude, and standard errors of model parameters. A simulation study was performed to evaluate the performance of these methods. GLMM implemented with penalized quasi-likelihood performed well. It gave the highest empirical confidence interval coverage of the true coefficients at 95% level of confidence. GEE was robust with the smallest MSE when within group correlation sigma u ≥ 0.5. Logistic regression models performed well when the correlation was very weak, but fared poorly when the correlation was stronger. When correlations are quite low, logistic models may be acceptable for clustered data; however, they give inappropriate inference when correlation is elevated.

APA, Harvard, Vancouver, ISO, and other styles

34

Xu, Yaomin. "New Clustering and Feature Selection Procedures with Applications to Gene Microarray Data." Case Western Reserve University School of Graduate Studies / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=case1196144281.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Soon, Shih Chung. "On detection of extreme data points in cluster analysis." Connect to resource, 1987. http://rave.ohiolink.edu/etdc/view.cgi?acc%5Fnum=osu1262886219.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Sartorio, Simone Daniela. "Aplicações de técnicas de análise multivariada em experimentos agropecuários usando o software R." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/11/11134/tde-06082008-172655/.

Full text

Abstract:

O uso das técnicas de análise multivariada está reservado aos grandes centros de pesquisa, µas grandes empresas e ao ambiente acad^emico. Essas técnicas s~ao muito interessantes porque utilizam simultaneamente todas as variáveis respostas na interpretação teórica do conjunto de dados, levando em conta as correlações existentes entre elas. Uma das principais barreiras para a utilização dessas técnicas é o seu desconhecimento pelos pesquisadores interessados na pesquisa quantitativa. A outra dificuldade é que a grande maioria de softwares que permitem esse tipo de análise (SAS, MINITAB, BMDP, STATISTICA, S-PLUS, SYSTAT, etc.) não são de domínio público. A disseminação do uso das técnicas multivariadas pode melhorar a qualidade das pesquisas, proporcionar uma economia relativa de tempo e de custo, e facilitar a interpretação das estruturas dos dados, diminuindo a perda de informação. Neste trabalho, foram confirmadas algumas vantagens das técnicas multivariadas sobre as univariadas na análise de dados de expe- rimentos agropecuários. As análises foram realizadas com o auxílio do software R, um software aberto, \"amigável\" e gratuito, com inúmeros recursos disponíveis.
The use of the techniques of multivariate analysis is restricted to large centers of research, the higher companies and the academic environment. These techniques are very inte- resting because of the use of all answers variables simultaneously in theoretical interpretation of the data set, considering the correlations between them. One of the main obstacle to the usage of these techniques is that researchers interested in the quantitative research do not know them. The other di±culty is that most of the software that allow this type of analysis (SAS, MINITAB, BMDP, STATISTICA, S-PLUS, SYSTAT etc.) are not in public domain. Publishing the use of Multivariate techniques can improve the quality of the research, decrease the time spend and the cost, and make easy the interpretation of the structures of the data without cause damage of the information. In this report, were con¯rmed some advantages of the multivariate techniques in a univariate analysis for data of agricultural experiments. The analysis were taken with R software, a open software, \"friendly\" and free, with many statistical resources available.

APA, Harvard, Vancouver, ISO, and other styles

37

Jokela, N. (Nina). "Saksan ääntäminen laulusarjassa Frauenliebe und Leben:tutkimuskohteena eksperttilaulajat." Doctoral thesis, Oulun yliopisto, 2016. http://urn.fi/urn:isbn:9789526213125.

Full text

Abstract:

Abstract This research explored professional singers’ pronunciation of German in Frauenliebe und Leben, a song cycle, composed in 1840 by Robert Schumann (1810–1856). Adopting a two-fold approach, the thesis examined pronunciation from the viewpoints of normative German pronunciation and vocal pedagogy, paying special attention to the relationship between the two sets of norms. Normative German pronunciation is based on ‘reine Hochlautung’ as defined by Theodor Siebs, as well as on his instructions to classical singers. Norms of pronunciation used in vocal pedagogy were derived from 11 German pronunciation manuals for classical singers. At the centre of this study were 11 elements of the German language: the phonemes /s/, /r/, /b/, /d/ and /g/, the suffix -ig, double consonants, connecting consonants, the letter combination ng, liaison and diphthongs in a melismatic setting. Twelve professional singers’ – Anne Sofie von Otter, Barbara Bonney, Brigitte Fassbaender, Elly Ameling, Elisabeth Grümmer, Irmgard Seefried, Janet Baker, Jessye Norman, Kathleen Ferrier, Soile Isokoski, Marjana Lipovšek, Tamara Takács – pronunciation of these elements was studied via their recording of Frauenliebe und Leben between 1950 and 1997. Both descriptive statistical analysis methods and hierarchical cluster analysis were applied to the data. A reliability analysis was conducted using intra-category correlations and relative frequencies. This analysis indicated a reasonable degree of reliability. Descriptive statistical analysis methods yielded the result that the actual pronunciation of the professional singers, as evidenced by their recordings, was considerably more varied than what could be expected on the basis of normative or pedagogical norms. Hierarchical cluster analysis, too, showed that the professional singers shared the same category to a moderate degree. We may conclude that vocal pedagogical theory needs to be further developed, because pedagogical norms for pronunciation proved partially confusing and difficult to grasp. In addition, specific attention must be given to source critique. The actual pronunciation of professional singers challenges singing instructors to reflect on how they teach German pronunciation and what they can do to help their students develop the skills required for a critical listening of recordings
Tiivistelmä Eksperttilaulajien saksan kielen ääntämistä havainnoitiin Robert Schumannin (1810–1856) vuonna 1840 säveltämässä laulusarjassa Frauenliebe und Leben, ja ääntämistä tutkittiin suhteessa sekä saksan kirjakielen ääntämisnormistoon että pedagogiseen ääntämisnormistoon. Tutkimuksessa syvennyttiin myös ääntämisnormistojen välisen suhteen tarkasteluun. Saksan kirjakielen ääntämisnormisto perustui Theodor Siebsin reine Hochlautung -ääntämistapaan sekä Siebsin klassisen laulun ohjeistuksiin. Pedagoginen ääntämisnormisto eli klassisille laulajille osoitetut saksan kielen ääntämisen oppaat oli rajattu yhteentoista teokseen. Tutkittavana oli yksitoista saksan kielen elementtiä: konsonantit s, r, b, d, g, suffiksi -ig, kirjainyhdistelmä ng, kaksoiskonsonantit, konsonanttien yhdistäminen, vokaalien sitominen ja diftongit. Laulusarjan levyttäneitä eksperttilaulajia oli kaksitoista: Anne Sofie von Otter, Barbara Bonney, Brigitte Fassbaender, Elly Ameling, Elisabeth Grümmer, Irmgard Seefried, Janet Baker, Jessye Norman, Kathleen Ferrier, Soile Isokoski, Marjana Lipovšek ja Tamara Takács. Äänitteet olivat vuosien 1950 ja 1997 väliseltä ajalta. Tutkimuksessa käytettiin perinteisiä kuvailevia tilastollisia analyysimenetelmiä ja hierarkkista ryhmittelyanalyysia. Aineiston luotettavuutta tarkasteltiin reliabiliteettianalyysin avulla. Reliabiliteettianalyysi osoitti, että aineistolle saavutettiin kohtuullinen luotettavuus. Kuvailevien tilastollisten analyysimenetelmien tuloksena oli havainto, että eksperttilaulajien ääntäminen äänitteissä oli huomattavasti monimuotoisempaa kuin saksan kirjakielen ja pedagoginen ääntämisnormisto antoivat olettaa. Hierarkkinen ryhmittelyanalyysi puolestaan osoitti, että eksperttilaulajien sijoittuminen samaan ryhmään oli maltillista. Tutkimus vahvistaa käsitystä, että laulupedagogista teoriaa tulee tarkastella dynaamisena, alati muuttuvana ilmiönä. Tutkimuksen perusteella myös havaitaan, että eksperttilaulajien ääntäminen noudattaa laulupedagogiikassa käytettyjä ääntämisen ohjeistuksia vain osin. Eksperttilaulajien ääntämisen käytäntö täten haastaa laulupedagogit pohtimaan sitä, kuinka he itse opettavat saksan kieltä, ja sitä, kuinka he liittävät opetustyöhönsä äänitteiden kriittisen kuuntelemisen taidon ja lauluoppaiden kriittisen käytön

APA, Harvard, Vancouver, ISO, and other styles

38

Łuksza, Marta [Verfasser]. "Cluster statistics and gene expression analysis / Marta Łuksza." Berlin : Freie Universität Berlin, 2012. http://d-nb.info/1026883113/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Dickerson, Cynthia Rose. "USING THE QBEST EQUATION TO EVALUATE ELLAGIC ACID SAFETY DATA: GENERATING A QNOAEL WITH CONFIDENCE LEVELS FROM DISPARATE LITERATURE." UKnowledge, 2018. https://uknowledge.uky.edu/pharmacy_etds/94.

Full text

Abstract:

QBEST, a novel statistical method, can be applied to the problem of estimating the No Observed Adverse Effect Level (NOAEL or QNOAEL) of a New Molecular Entity (NME) in order to anticipate a safe starting dose for beginning clinical trials. The NOAEL from QBEST (called the QNOAEL) can be calculated using multiple disparate studies in the literature and/or from the lab. The QNOAEL is similar in some ways to the Benchmark Dose Method (BMD) used widely in toxicological research, but is superior to the BMD in some ways. The QNOAEL simulation generates an intuitive curve that is comparable to the dose-response curve. The NOAEL of ellagic acid (EA) is calculated for clinical trials as a component therapeutic agent (in BSN476) for treating Chikungunya infections. Results are used in a simulation based on nonparametric cluster analysis methods to calculate confidence levels on the difference between the Effect and the No Effect studies. In order to evaluate the statistical power of the algorithm, simulated data clusters with known parameters are fed into the algorithm in a separate study, testing the algorithm’s accuracy and precision “Around the Compass Rose” at known coordinates along the circumference of a multidimensional data cluster. The specific aims of the proposed study are to evaluate the accuracy and precision of the QBEST Simulation and QNOAEL compared to the Benchmark Dose Method, and to calculate the QNOAEL of EA for BSN476 Drug Development.

APA, Harvard, Vancouver, ISO, and other styles

40

Xu, Ximing. "The statistical analysis of generalized adjacency and GA-clusters." Thesis, University of Ottawa (Canada), 2008. http://hdl.handle.net/10393/27657.

Full text

Abstract:

In this thesis I study a parametrized definition of gene clusters that permits control over the trade-off between increasing gene content versus conserving gene order within a cluster. This is based on the notion of generalized adjacency, which is the property shared by any two genes no farther apart, in the linear order of a chromosome, than a fixed threshold parameter theta. We discuss the the statistical properties of generalized adjacency ( GA) and derive the limiting probability distribution of the number of GA for random genomes. We also propose a test for gene clusters satisfying the generalized adjacency criterion under the null hypothesis that the genes are ordered randomly along the genomes.

APA, Harvard, Vancouver, ISO, and other styles

41

Weißmann, Alexandra. "Statistical analysis of the X-ray morphology of galaxy clusters." Diss., Ludwig-Maximilians-Universität München, 2013. http://nbn-resolving.de/urn:nbn:de:bvb:19-165203.

Full text

Abstract:

The morphological analysis of galaxy clusters in X-rays allows a reliable determination of their dynamical state. Substructures on (sub-)Mpc scale influence the gravitational potential of a cluster and manifest themselves in the X-ray surface brightness distribution as secondary peaks or overall irregular shape. They lead to deviations from the hydrostatic equilibrium and spherical shape, two assumptions which are widely used in galaxy cluster studies to derive global astrophysical properties. Analyzing the X-ray morphology of clusters thus yields valuable information, provided that the employed substructure measures are well-tested and well-calibrated. In this work, the X-ray morphology of galaxy clusters is quantified using three common substructure parameters (power ratios, center shift and the asymmetry parameter), which are subsequently employed to study the disturbed cluster fraction as a function of redshift. To ensure a reliable application of these substructure parameters on a variety of X-ray images, a detailed parameter study is conducted. It focuses on the performance and reliability of the parameters for varying data quality using simulated and observed X-ray images. In particular, when applying them to X-ray images with low photon counts such as observations of distant clusters or survey data, it is important to know the characteristics of the parameters. Comparing the three substructure measures, the center shift parameter is most robust against Poisson noise and allows a reliable determination of the clusters' dynamical state even for low-count observations. Power ratios, especially the hexapole P3/P0, and the asymmetry parameter, on the other hand, are severely affected by noise, which results in spuriously high substructure signals. Furthermore, this work presents methods to minimize the noise bias. The results of the parameter study provide a step forward in the morphological analysis of high-redshift clusters and are employed in the framework of this thesis to quantify the evolution of the disturbed cluster fraction. The sample used for this analysis comprises 78 low-z (z < 0.3) and 51 high-z (0.3 < z < 1.08) galaxy clusters with varying photon statistics. The low-redshift objects were observed with the XMM-Newton observatory, contain a high number of photon counts and are part of several well-known and representative samples. For z > 0.3, the high-redshift subsets of the 400d2 and SPT survey catalog are used. These objects were mainly observed with the Chandra observatory and have low photon counts. To ensure a fair comparison, which is independent of the data quality, the photon statistics of the low- and high-redshift observations are aligned before performing the morphological analysis. In agreement with the hierarchical structure formation model, a mild positive evolution with redshift, i.e. a larger fraction of clusters with disturbed X-ray morphologies at higher redshift, is found. Owing to the low photon counts and small number of high-redshift observations, the statistical significance of this result is low. For two of the three substructure parameters (power ratios and center shift) the findings are also consistent within the significance limits with no evolution, but a negative evolution of the disturbed cluster fraction can be excluded for all parameters.

APA, Harvard, Vancouver, ISO, and other styles

42

Li, Na. "MMD and Ward criterion in a RKHS : application to Kernel based hierarchical agglomerative clustering." Thesis, Troyes, 2015. http://www.theses.fr/2015TROY0033/document.

Full text

Abstract:

La classification non supervisée consiste à regrouper des objets afin de former des groupes homogènes au sens d’une mesure de similitude. C’est un outil utile pour explorer la structure d’un ensemble de données non étiquetées. Par ailleurs, les méthodes à noyau, introduites initialement dans le cadre supervisé, ont démontré leur intérêt par leur capacité à réaliser des traitements non linéaires des données en limitant la complexité algorithmique. En effet, elles permettent de transformer un problème non linéaire en un problème linéaire dans un espace de plus grande dimension. Dans ce travail, nous proposons un algorithme de classification hiérarchique ascendante utilisant le formalisme des méthodes à noyau. Nous avons tout d’abord recherché des mesures de similitude entre des distributions de probabilité aisément calculables à l’aide de noyaux. Parmi celles-ci, la maximum mean discrepancy a retenu notre attention. Afin de pallier les limites inhérentes à son usage, nous avons proposé une modification qui conduit au critère de Ward, bien connu en classification hiérarchique. Nous avons enfin proposé un algorithme itératif de clustering reposant sur la classification hiérarchique à noyau et permettant d’optimiser le noyau et de déterminer le nombre de classes en présence
Clustering, as a useful tool for unsupervised classification, is the task of grouping objects according to some measured or perceived characteristics of them and it has owned great success in exploring the hidden structure of unlabeled data sets. Kernel-based clustering algorithms have shown great prominence. They provide competitive performance compared with conventional methods owing to their ability of transforming nonlinear problem into linear ones in a higher dimensional feature space. In this work, we propose a Kernel-based Hierarchical Agglomerative Clustering algorithms (KHAC) using Ward’s criterion. Our method is induced by a recently arisen criterion called Maximum Mean Discrepancy (MMD). This criterion has firstly been proposed to measure difference between different distributions and can easily be embedded into a RKHS. Close relationships have been proved between MMD and Ward's criterion. In our KHAC method, selection of the kernel parameter and determination of the number of clusters have been studied, which provide satisfactory performance. Finally an iterative KHAC algorithm is proposed which aims at determining the optimal kernel parameter, giving a meaningful number of clusters and partitioning the data set automatically

APA, Harvard, Vancouver, ISO, and other styles

43

Harrington, Justin. "Extending linear grouping analysis and robust estimators for very large data sets." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/845.

Full text

Abstract:

Cluster analysis is the study of how to partition data into homogeneous subsets so that the partitioned data share some common characteristic. In one to three dimensions, the human eye can distinguish well between clusters of data if clearly separated. However, when there are more than three dimensions and/or the data is not clearly separated, an algorithm is required which needs a metric of similarity that quantitatively measures the characteristic of interest. Linear Grouping Analysis (LGA, Van Aelst et al. 2006) is an algorithm for clustering data around hyperplanes, and is most appropriate when: 1) the variables are related/correlated, which results in clusters with an approximately linear structure; and 2) it is not natural to assume that one variable is a “response”, and the remainder the “explanatories”. LGA measures the compactness within each cluster via the sum of squared orthogonal distances to hyperplanes formed from the data. In this dissertation, we extend the scope of problems to which LGA can be applied. The first extension relates to the linearity requirement inherent within LGA, and proposes a new method of non-linearly transforming the data into a Feature Space, using the Kernel Trick, such that in this space the data might then form linear clusters. A possible side effect of this transformation is that the dimension of the transformed space is significantly larger than the number of observations in a given cluster, which causes problems with orthogonal regression. Therefore, we also introduce a new method for calculating the distance of an observation to a cluster when its covariance matrix is rank deficient. The second extension concerns the combinatorial problem for optimizing a LGA objective function, and adapts an existing algorithm, called BIRCH, for use in providing fast, approximate solutions, particularly for the case when data does not fit in memory. We also provide solutions based on BIRCH for two other challenging optimization problems in the field of robust statistics, and demonstrate, via simulation study as well as application on actual data sets, that the BIRCH solution compares favourably to the existing state-of-the-art alternatives, and in many cases finds a more optimal solution.

APA, Harvard, Vancouver, ISO, and other styles

44

Přikrylová, Veronika. "Analýza výkonnosti call centra pomocí statistických metod." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2013. http://www.nusl.cz/ntk/nusl-224253.

Full text

Abstract:

The master’s thesis analyzes the key performance areas of the call centre, which contact debtors of nonbank company, which provides loans. An author analyze a collection process with a wide range of statistical methods and then propose actions, which would lead to do this process more effective, which overall will lead to a better performance of whole call centre.

APA, Harvard, Vancouver, ISO, and other styles

45

Nugent, Rebecca. "Algorithms for estimating the cluster tree of a density /." Thesis, Connect to this title online; UW restricted, 2006. http://hdl.handle.net/1773/8963.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Barvenčík, Oldřich. "Statistické klasifikační metody." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2010. http://www.nusl.cz/ntk/nusl-229025.

Full text

Abstract:

The thesis deals with selected classification methods. The thesis describes the basis of cluster analysis, discriminant analysis and theory of classification trees. The usage is demonstrated by classification of simulated data, the calculation is made in the program STATISTICA. In practical part of the thesis there is the comparison of the methods for classification of real data files of various extent. Classification methods are used for solving of the real task – prediction of air pollution based of the weather forecast.

APA, Harvard, Vancouver, ISO, and other styles

47

Musil, Václav. "Analýza AVG signálů." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217717.

Full text

Abstract:

The presented thesis discusses the basic analysis methods of arteriovelocitograms. The core of this work rests in classification of signals and contribution to possibilities of noninvasive diagnostic methods for evaluation patients with peripheral ischemic occlusive arterial disease. The classification employs multivariate statistical methods and principles of neural networks. The data processing works with an angiographic verified set of arteriovelocitogram dates. The digital subtraction angiography classified them into 3 separable classes in dependence on degree of vascular stenosis. Classification AVG signals are represented in the program by the 6 parameters that are measured on 3 different places on each patient’s leg. Evaluation of disease appeared to be a comprehensive approach at signals acquired from whole patient’s leg. The sensitivity of clustering method compared with angiography is between 82.75 % and 90.90 %, specificity between 80.66 % and 88.88 %. Using neural networks sensitivity is in range of 79.06 % and 96.87 %, specificity is in range of 73.07 % and 91.30 %.

APA, Harvard, Vancouver, ISO, and other styles

48

Ramos, Iloneide Carlos de Oliveira. "Metodologia estat?stica na solu??o do problema do caixeiro viajante e na avalia??o de algoritmos : um estudo aplicado ? transgen?tica computacional." Universidade Federal do Rio Grande do Norte, 2005. http://repositorio.ufrn.br:8080/jspui/handle/123456789/15175.

Full text

Abstract:

Made available in DSpace on 2014-12-17T14:55:03Z (GMT). No. of bitstreams: 1 IloneideCOR.pdf: 1010601 bytes, checksum: 76bbc04aa0a456f079121fb0d750ea74 (MD5) Previous issue date: 2005-03-03
The problems of combinatory optimization have involved a large number of researchers in search of approximative solutions for them, since it is generally accepted that they are unsolvable in polynomial time. Initially, these solutions were focused on heuristics. Currently, metaheuristics are used more for this task, especially those based on evolutionary algorithms. The two main contributions of this work are: the creation of what is called an -Operon- heuristic, for the construction of the information chains necessary for the implementation of transgenetic (evolutionary) algorithms, mainly using statistical methodology - the Cluster Analysis and the Principal Component Analysis; and the utilization of statistical analyses that are adequate for the evaluation of the performance of the algorithms that are developed to solve these problems. The aim of the Operon is to construct good quality dynamic information chains to promote an -intelligent- search in the space of solutions. The Traveling Salesman Problem (TSP) is intended for applications based on a transgenetic algorithmic known as ProtoG. A strategy is also proposed for the renovation of part of the chromosome population indicated by adopting a minimum limit in the coefficient of variation of the adequation function of the individuals, with calculations based on the population. Statistical methodology is used for the evaluation of the performance of four algorithms, as follows: the proposed ProtoG, two memetic algorithms and a Simulated Annealing algorithm. Three performance analyses of these algorithms are proposed. The first is accomplished through the Logistic Regression, based on the probability of finding an optimal solution for a TSP instance by the algorithm being tested. The second is accomplished through Survival Analysis, based on a probability of the time observed for its execution until an optimal solution is achieved. The third is accomplished by means of a non-parametric Analysis of Variance, considering the Percent Error of the Solution (PES) obtained by the percentage in which the solution found exceeds the best solution available in the literature. Six experiments have been conducted applied to sixty-one instances of Euclidean TSP with sizes of up to 1,655 cities. The first two experiments deal with the adjustments of four parameters used in the ProtoG algorithm in an attempt to improve its performance. The last four have been undertaken to evaluate the performance of the ProtoG in comparison to the three algorithms adopted. For these sixty-one instances, it has been concluded on the grounds of statistical tests that there is evidence that the ProtoG performs better than these three algorithms in fifty instances. In addition, for the thirty-six instances considered in the last three trials in which the performance of the algorithms was evaluated through PES, it was observed that the PES average obtained with the ProtoG was less than 1% in almost half of these instances, having reached the greatest average for one instance of 1,173 cities, with an PES average equal to 3.52%. Therefore, the ProtoG can be considered a competitive algorithm for solving the TSP, since it is not rare in the literature find PESs averages greater than 10% to be reported for instances of this size.
Os problemas de otimiza??o combinat?ria t?m envolvido um grande n?mero de pesquisadores na busca por solu??es aproximativas para aqueles, desde a aceita??o de que eles s?o considerados insol?veis em tempo polinomial. Inicialmente, essas solu??es eram focalizadas por meio de heur?sticas. Atualmente, as metaheur?sticas s?o mais utilizadas para essa tarefa, especialmente aquelas baseadas em algoritmos evolucion?rios. As duas principais contribui??es deste trabalho s?o: a cria??o de uma heur?stica, denominada Operon, para a constru??o de cadeias de informa??es necess?rias ? implementa??o de algoritmos transgen?ticos (evolucion?rios) utilizando, principalmente, a metodologia estat?stica - An?lise de Agrupamentos e An?lise de Componentes Principais -; e a utiliza??o de an?lises estat?sticas adequadas ? avalia??o da performance de algoritmos destinados ? solu??o desses problemas. O Operon visa construir, de forma din?mica e de boa qualidade, cadeias de informa??es a fim de promover uma busca -inteligente- no espa?o de solu??es. O Problema do Caixeiro Viajante (PCV) ? focalizado para as aplica??es que s?o realizadas com base num algoritmo transgen?tico, denominado ProtoG. Prop?e-se, tamb?m, uma estrat?gia de renova??o de parte da popula??o de cromossomos indicada pela ado??o de um limite m?nimo no coeficiente de varia??o da fun??o de adequa??o dos indiv?duos, calculado com base na popula??o. S?o propostas tr?s an?lises estat?sticas para avaliar a performance de algoritmos. A primeira ? realizada atrav?s da An?lise de Regress?o Log?stica, com base na probabilidade de obten??o da solu??o ?tima de uma inst?ncia do PCV pelo algoritmo em teste. A segunda ? realizada atrav?s da An?lise de Sobreviv?ncia, com base numa probabilidade envolvendo o tempo de execu??o observado at? que a solu??o ?tima seja obtida. A terceira ? realizada por meio da An?lise de Vari?ncia n?o param?trica, considerando o Erro Percentual da Solu??o (EPS) obtido pela percentagem em que a solu??o encontrada excede a melhor solu??o dispon?vel na literatura. Utiliza-se essa metodologia para a avalia??o da performance de quatro algoritmos, a saber: o ProtoG proposto, dois algoritmos mem?ticos e um algoritmo Simulated Annealing. Foram realizados seis experimentos, aplicados a sessenta e uma inst?ncias do PCV euclidiano, com tamanhos de at? 1.655 cidades. Os dois primeiros experimentos tratam do ajuste de quatro par?metros utilizados no algoritmo ProtoG, visando melhorar a performance do mesmo. Os quatro ?ltimos s?o utilizados para avaliar a performance do ProtoG em compara??o aos tr?s algoritmos adotados. Para essas sessenta e uma inst?ncias, conclui-se, sob testes estat?sticos, que h? evid?ncias de que o ProtoG ? superior a esses tr?s algoritmos em cinq?enta inst?ncias. Al?m disso, para as trinta e seis inst?ncias consideradas nos tr?s ?ltimos experimentos, nos quais a avalia??o da performance dos algoritmos foi realizada com base no EPS, observou-se que o ProtoG obteve EPSs m?dios menores que 1% em quase metade das inst?ncias, tendo atingido a maior m?dia para uma inst?ncia composta por 1.173 cidades, com EPS m?dio igual a 3,52%. Logo, o ProtoG pode ser considerado um algoritmo competitivo para solucionar o PCV, pois n?o ? raro serem reportados, na literatura, EPSs m?dios maiores que 10% para inst?ncias desse porte.

APA, Harvard, Vancouver, ISO, and other styles

49

Beghdadi, Azeddine. "Etude statistique de la morphologie des composés métalliques granulaires par analyse d'image." Paris 6, 1986. http://www.theses.fr/1986PA066281.

Full text

Abstract:

Présentation d'un modèle de traitement d'image adapte à des clicheé de microscopie électronique de couches minces de métaux frittés et de cermets ; ce traitement permet un seuillage de l'image de manière non ambiguë, que l'histogramme des niveaux de gris soit bimodal ou unimodal. La validité du traitement est testée à partir de mesures physiques sur des couches réelles. Etude statistique de la morphologie de l'image binarisée obtenue. Application à des couches d'or granulaire préparées par évaporation thermique sous ultra-vide ; les résultats confirment le caractère aléatoire de la nucléation des couches d'or dans les conditions expérimentales de l'étude.

APA, Harvard, Vancouver, ISO, and other styles

50

Bjärkby, Sarah, and Sofia Grägg. "A Cluster Analysis of Stocks to Define an Investment Strategy." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252746.

Full text

Abstract:

This thesis investigates the possibilities of creating an investment strategy by performing a cluster analysis on stock returns. This to provide a diversified portfolio, which has multiple advantages, for instance that the risk of the investment decreases. The cluster analysis was performed using various methods – Average linkage, Centroid and Ward's method, for the purpose of determining a preferable method. Ward's method was the most appropriate method to use according to the results, since it was the only method providing an analysable result. The investment strategy was therefore based on the result of Ward's method. This resulted in a portfolio consisting of eight stocks from four different clusters, with the eight stocks representing four sectors. Most of the results were not interpretable and some of the decision making regarding the number of clusters and the appropriate portfolio composition was not entirely scientific. Therefore, this thesis should be considered as a first indication of the adequacy of using cluster analysis for the purpose of creating an investment strategy.
Rapporten undersöker möjligheterna att formulera en investeringsstrategi genom att utföra en klusteranalys av aktiers avkastning. En klusteranalys används i detta syfte för att skapa en diversifierad portfölj, vilket bland annat kan minska risken med investeringar. De metoder som tillämpades i klusteranalysen var Average linkage-, Centroid- and Ward's metod. Metoderna jämfördes med syfte att hitta den mest gynnsamma metoden. Enligt resultaten är Ward’s metod att föredra då det var den enda metod som gav ett användbart resultat. Därför baserades investeringsstrategin på Ward's metod vilket resulterade i en portfölj med åtta aktier från fyra olika kluster. De åtta aktierna representerade fyra olika sektorer. Flertalet av resultaten erhållna från metoderna var inte möjliga att analysera och valet av antalet kluster samt konstruktionen av portföljen utfördes inte på vetenskapliga grunder. Därför ska denna rapport endast betraktas som en första indikation på lämpligheten att ta fram en investeringsanalys baserat på en klusteranalys.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Statistical cluster analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles