Dissertations / Theses: 'Statistical multivariate analysis'

1

何志興 and Chi-hing Ho. "The statistical analysis of multivariate counts." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1991. http://hub.hku.hk/bib/B31232218.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Ho, Chi-hing. "The statistical analysis of multivariate counts /." [Hong Kong] : University of Hong Kong, 1991. http://sunzi.lib.hku.hk/hkuto/record.jsp?B12922602.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Lawrence, James. "A Multivariate Statistical Analysis of Stock Trends." Miami University Honors Theses / OhioLINK, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=muhonors1111001677.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Stensholt, B. K. "Statistical analysis of multivariate bilinear time series models." Thesis, University of Manchester, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.582853.

Full text

Abstract:

In the last thirty years there has been extensive research in the analysis of linear time series models. In analyzing univariate and multivariate time series the assumption of linearity is, in many cases, unrealistic. With this in view, recently, many nonlinear models for the analysis of time series have been proposed, mainly for univariate series. One class of models proposed which has received considerable interest, is the class of bilinear models. In particular has the theory of univariate bilinear time series been considered in a number of papers (d. Granger and Andersen (1978), Subba Rao (1981) and Bhaskara Rao et. al. (1983) and references therein); these models are analogues of the bilinear systems as proposed and studied previously by control theorists. Recently several analytic properties of these time series models have been investigated, and their estimation and applications have been reported in Subba Rao and Gabr (1983). But it is important to study the relationship between two or more time series, also 10 the presence of nonlinearity. Therefore, multivariate generalizations of the bilinear models have been considered by Subba Rao (1985) and Stensholt and Tj(llstheim (1985, 1987). Here we consider some theoretical aspects of multivariate bilinear time series models (such as strict and second order stationarity, ergodicity, invertibility, and, for special cases. strong consistency of least squares estimates). The theory developed is illustrated with simulation results. Two applications to real bivariate data (mink-muskrat data and "housing starts-houses soldll data) and the FORTRAN programs developed in this project are also included.

APA, Harvard, Vancouver, ISO, and other styles

5

Wang, Lianming. "Statistical analysis of multivariate interval-censored failure time data." Diss., Columbia, Mo. : University of Missouri-Columbia, 2006. http://hdl.handle.net/10355/4375.

Full text

Abstract:

Thesis (Ph.D.)--University of Missouri-Columbia, 2006.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file viewed on (May 2, 2007) Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

6

Chen, Man-Hua. "Statistical analysis of multivariate interval-censored failure time data." Diss., Columbia, Mo. : University of Missouri-Columbia, 2007. http://hdl.handle.net/10355/4776.

Full text

Abstract:

Thesis (Ph.D.)--University of Missouri-Columbia, 2007.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on March 6, 2009) Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

7

Pan, Jian-Xin. "Multivariate statistical diagnostics with application to the growth curve model." HKBU Institutional Repository, 1996. http://repository.hkbu.edu.hk/etd_ra/64.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Chowdhury, Ashraful Aziz. "Path analysis : a multivariate statistical procedure for nuptiality studies." Virtual Press, 1986. http://liblink.bsu.edu/uhtbin/catkey/445620.

Full text

Abstract:

This thesis may be broadly divided into two parts.The first part critically discusses various aspects of path analysis as a statistical tool for abstract analysis. The second part investigates the changing pattern of nuptiality in Bangladesh by districts using 1981 census data. Path analysis has been applied to find and analyze the nature and extent the of causal relationship between the dependent variable, nuptiality and its determinants. It is observed that education, urbanization,female employment and economic development are all strongly positivelyrelatedto nuptiality. That is, formal and effective education policy combined with proper urbanization and development policies may increase the female employment rate which in turn will raise age at marriage. Further, effects of various indices through childlessness is reasonably high. This indicates that appropriate population distribution policies and introduction of insurance schemes for childless couples and couples with fewer children may indirectly put positive effect on nuptiality.Ball State UniversityMuncie, IN 47306

APA, Harvard, Vancouver, ISO, and other styles

9

Ahmadi-Nedushan, Behrooz 1966. "Multivariate statistical analysis of monitoring data for concrete dams." Thesis, McGill University, 2002. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=82815.

Full text

Abstract:

Major dams in the world are often instrumented in order to validate numerical models, to gain insight into the behavior of the dam, to detect anomalies, and to enable a timely response either in the form of repairs, reservoir management, or evacuation. Advances in automated data monitoring system makes it possible to regularly collect data on a large number of instruments for a dam. Managing this data is a major concern since traditional means of monitoring each instrument are time consuming and personnel intensive. Among tasks that need to be performed are: identification of faulty instruments, removal of outliers, data interpretation, model fitting and management of alarms for detecting statistically significant changes in the response of a dam.
Statistical models such as multiple linear regression, and back propagation neural networks have been used to estimate the response of individual instruments. Multiple linear regression models are of two kinds, (1) Hydro-Seasonal-Time (HST) models and (2) models that consider concrete temperatures as predictors.
Univerariate, bivariate, and multivariate methods are proposed for the identification of anomalies in the instrumentation data. The source of these anomalies can be either bad readings, faulty instruments, or changes in dam behavior.
The proposed methodologies are applied to three different dams, Idukki, Daniel Johnson and Chute-a-Caron, which are respectively an arch, multiple arch and a gravity dam. Displacements, strains, flow rates, and crack openings of these three dams are analyzed.
This research also proposes various multivariate statistical analyses and artificial neural networks techniques to analyze dam monitoring data. One of these methods, Principal Component Analysis (PCA) is concerned with explaining the variance-covariance structure of a data set through a few linear combinations of the original variables. The general objectives are (1) data reduction and (2) data interpretation. Other multivariate analysis methods such as canonical correlation analysis, partial least squares and nonlinear principal component analysis are discussed. The advantages of methodologies for noise reduction, the reduction of number of variables that have to be monitored, the prediction of response parameters, and the identification of faulty readings are discussed. Results indicated that dam responses are generally correlated and that only a few principal components can summarize the behavior of a dam.

APA, Harvard, Vancouver, ISO, and other styles

10

Woldegeorgis, Fasil. "Analysis on seroepidemiology of pertussis, a multivariate statistical approach." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ36539.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Coles, Stuart. "Statistical methodology for the multivariate analysis of environmental extremes." Thesis, University of Sheffield, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.358244.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Agrawala, Gautam Kumar. "Regional ground water interpretation using multivariate statistical methods." To access this resource online via ProQuest Dissertations and Theses @ UTEP, 2007. http://0-proquest.umi.com.lib.utep.edu/login?COPT=REJTPTU0YmImSU5UPTAmVkVSPTI=&clientId=2515.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Shah, Nauman. "Statistical dynamical models of multivariate financial time series." Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:428015e6-8a52-404e-9934-0545c80da4e1.

Full text

Abstract:

The last few years have witnessed an exponential increase in the availability and use of financial market data, which is sampled at increasingly high frequencies. Extracting useful information about the dependency structure of a system from these multivariate data streams has numerous practical applications and can aid in improving our understanding of the driving forces in the global financial markets. These large and noisy data sets are highly non-Gaussian in nature and require the use of efficient and accurate interaction measurement approaches for their analysis in a real-time environment. However, most frequently used measures of interaction have certain limitations to their practical use, such as the assumption of normality or computational complexity. This thesis has two major aims; firstly, to address this lack of availability of suitable methods by presenting a set of approaches to dynamically measure symmetric and asymmetric interactions, i.e. causality, in multivariate non-Gaussian signals in a computationally efficient (online) framework, and secondly, to make use of these approaches to analyse multivariate financial time series in order to extract interesting and practically useful information from financial data. Most of our proposed approaches are primarily based on independent component analysis, a blind source separation method which makes use of higher-order statistics to capture information about the mixing process which gives rise to a set of observed signals. Knowledge about this information allows us to investigate the information coupling dynamics, as well as to study the asymmetric flow of information, in multivariate non-Gaussian data streams. We extend our multivariate interaction models, using a variety of statistical techniques, to study the scale-dependent nature of interactions and to analyse dependencies in high-dimensional systems using complex coupling networks. We carry out a detailed theoretical, analytical and empirical comparison of our proposed approaches with some other frequently used measures of interaction, and demonstrate their comparative utility, efficiency and accuracy using a set of practical financial case studies, focusing primarily on the foreign exchange spot market.

APA, Harvard, Vancouver, ISO, and other styles

14

Ounpraseuth, Songthip T. Young Dean M. "Selected topics in statistical discriminant analysis." Waco, Tex. : Baylor University, 2006. http://hdl.handle.net/2104/4883.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Shiells, Helen. "Advanced multivariate statistical analysis of directly and indirectly observed systems." Thesis, University of Aberdeen, 2017. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=234061.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Hetzer, Joel D. Johnston Dennis A. "Statistical considerations in the analysis of multivariate Phase II testing." Waco, Tex. : Baylor University, 2008. http://hdl.handle.net/2104/5277.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Villasante, Tezanos Alejandro G. "COMPOSITE NONPARAMETRIC TESTS IN HIGH DIMENSION." UKnowledge, 2019. https://uknowledge.uky.edu/statistics_etds/42.

Full text

Abstract:

This dissertation focuses on the problem of making high-dimensional inference for two or more groups. High-dimensional means both the sample size (n) and dimension (p) tend to infinity, possibly at different rates. Classical approaches for group comparisons fail in the high-dimensional situation, in the sense that they have incorrect sizes and low powers. Much has been done in recent years to overcome these problems. However, these recent works make restrictive assumptions in terms of the number of treatments to be compared and/or the distribution of the data. This research aims to (1) propose and investigate refined small-sample approaches for high-dimension data in the multi-group setting (2) propose and study a fully-nonparametric approach, and (3) conduct an extensive comparison of the proposed methods with some existing ones in a simulation. When treatment effects can meaningfully be formulated in terms of means, a semiparametric approach under equal and unequal covariance assumptions is investigated. Composites of F-type statistics are used to construct two tests. One test is a moderate-p version – the test statistic is centered by asymptotic mean – and the other test is a large-p version asymptotic-expansion based finite-sample correction for the mean of the test statistic. These tests do not make any distributional assumptions and, therefore, they are nonparametric in a way. The theory for the tests only requires mild assumptions to regulate the dependence. Simulation results show that, for moderately small samples, the large-p version yields substantial gain in the size with a small power tradeoff. In some situations mean-based inference is not appropriate, for example, for data that is in ordinal scale or heavy tailed. For these situations, a high-dimensional fully-nonparametric test is proposed. In the two-sample situation, a composite of a Wilcoxon-Mann-Whitney type test is investigated. Assumptions needed are weaker than those in the semiparametric approach. Numerical comparisons with the moderate-p version of the semiparametric approach show that the nonparametric test has very similar size but achieves superior power, especially for skewed data with some amount of dependence between variables. Finally, we conduct an extensive simulation to compare our proposed methods with other nonparametric test and rank transformation methods. A wide spectrum of simulation settings is considered. These simulation settings include a variety of heavy tailed and skewed data distributions, homoscedastic and heteroscedastic covariance structures, various amounts of dependence and choices of tuning (smoothing window) parameter for the asymptotic variance estimators. The fully-nonparametric and the rank transformation methods behave similarly in terms of type I and type II errors. However, the two approaches fundamentally differ in their hypotheses. Although there are no formal mathematical proofs for the rank transformations, they have a tendency to provide immunity against effects of outliers. From a theoretical standpoint, our nonparametric method essentially uses variable-by-variable ranking which naturally arises from estimating the nonparametric effect of interest. As a result of this, our method is invariant against application of any monotone marginal transformations. For a more practical comparison, real-data from an Encephalogram (EEG) experiment is analyzed.

APA, Harvard, Vancouver, ISO, and other styles

18

李友榮 and Yau-wing Lee. "Modelling multivariate survival data using semiparametric models." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2000. http://hub.hku.hk/bib/B4257528X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Yan, Lipeng. "The application of multivariate statistical analysis and optimization to batch processes." Thesis, University of Manchester, 2015. https://www.research.manchester.ac.uk/portal/en/theses/the-application-of-multivariate-statistical-analysis-and-optimization-to-batch-processes(e6dbe45d-94bb-4e84-a12f-542876af54f5).html.

Full text

Abstract:

Multivariate statistical process control (MSPC) techniques play an important role in industrial batch process monitoring and control. This research illustrates the capabilities and limitations of existing MSPC technologies, with a particular focus on partial least squares (PLS).In modern industry, batch processes often operate over relatively large spaces, with many chemical and physical systems displaying nonlinear performance. However, the linear PLS model cannot predict nonlinear systems, and hence non-linear extensions to PLS may be required. The nonlinear PLS model can be divided into Type I and Type II nonlinear PLS models. In the Type I Nonlinear PLS method, the observed variables are appended with nonlinear transformations. In contrast to the Type I nonlinear PLS method, the Type II nonlinear PLS method assumes a nonlinear relationship within the latent variable structure of the model. Type I and Type II nonlinear multi-way PLS (MPLS) models were applied to predict the endpoint value of the product in a benchmark simulation of a penicillin batch fermentation process. By analysing and comparing linear MPLS, and Type I and Type II nonlinear MPLS models, the advantages and limitations of these methods were identified and summarized. Due to the limitations of Type I and II nonlinear PLS models, in this study, Neural Network PLS (NNPLS) was proposed and applied to predict the final product quality in the batch process. The application of the NNPLS method is presented with comparison to the linear PLS method, and to the Type I and Type II nonlinear PLS methods. Multi-way NNPLS was found to produce the most accurate results, having the added advantage that no a-priori information regarding the order of the dynamics was required. The NNPLS model was also able to identify nonlinear system dynamics in the batch process. Finally, NNPLS was applied to build the controller and the NNPLS method was combined with the endpoint control algorithm. The proposed controller was able to be used to keep the endpoint value of penicillin and biomass concentration at a set-point.

APA, Harvard, Vancouver, ISO, and other styles

20

Hervás, Marín David. "Use of multivariate statistical methods for the analysis of metabolomic data." Doctoral thesis, Universitat Politècnica de València, 2019. http://hdl.handle.net/10251/130847.

Full text

Abstract:

[ES] En las últimas décadas los avances tecnológicos han tenido como consecuencia la generación de una creciente cantidad de datos en el campo de la biología y la biomedicina. A día de hoy, las así llamadas tecnologías "ómicas", como la genómica, epigenómica, transcriptómica o metabolómica entre otras, producen bases de datos con cientos, miles o incluso millones de variables. El análisis de datos ómicos presenta una serie de complejidades tanto metodoló-gicas como computacionales que han llevado a una revolución en el desarrollo de nuevos métodos estadísticos específicamente diseñados para tratar con este tipo de datos. A estas complejidades metodológicas hay que añadir que, en la mayor parte de los casos, las restricciones logísticas y/o económicas de los proyectos de investigación suelen conllevar que los tamaños muestrales en estas bases de datos con tantas variables sean muy bajos, lo cual no hace sino empeorar las dificultades de análisis, ya que se tienen muchísimas más variables que observaciones. Entre las técnicas desarrolladas para tratar con este tipo de datos podemos encontrar algunas basadas en la penalización de los coeficientes, como lasso o elastic net, otras basadas en técnicas de proyección sobre estructuras latentes como PCA o PLS y otras basadas en árboles o combinaciones de árboles como random forest. Todas estas técnicas funcionan muy bien sobre distintos datos ómicos presentados en forma de matriz (IxJ). Sin embargo, en ocasiones los datos ómicos pueden estar expandidos, por ejemplo, al tomar medidas repetidas en el tiempo sobre los mismos individuos, encontrándonos con estructuras de datos que ya no son matrices, sino arrays tridimensionales o three-way (IxJxK). En estos casos, la mayoría de las técnicas citadas pierden parte de su aplicabilidad, quedando muy pocas opciones viables para el análisis de este tipo de estructuras de datos. Una de las técnicas que sí es útil para el análisis de estructuras three-way es N-PLS, que permite ajustar modelos predictivos razonablemente precisos, así como interpretarlos mediante distintos gráficos. Sin embargo, relacionado con el problema de la escasez de tamaño muestral relativa al desorbitado número de variables, aparece la necesidad de realizar una selección de variables relacionadas con la variable respuesta. Esto es especialmente cierto en el ámbito de la biología y la biomedicina, ya que no solo se quiere poder predecir lo que va a suceder, sino entender por qué sucede, qué variables están implicadas y, a poder ser, no tener que volver a recoger los cientos de miles de variables para realizar una nueva predicción, sino utilizar unas cuantas, las más importantes, para poder diseñar kits predictivos coste/efectivos de utilidad real. Por ello, el objetivo principal de esta tesis es mejorar las técnicas existentes para el análisis de datos ómicos, específicamente las encaminadas a analizar datos three-way, incorporando la capacidad de selección de variables, mejorando la capacidad predictiva y mejorando la interpretabilidad de los resultados obtenidos. Todo ello se implementará además en un paquete de R completamente documentado, que incluirá todas las funciones necesarias para llevar a cabo análisis completos de datos three-way. El trabajo incluido en esta tesis por tanto, consta de una primera parte teórico-conceptual de desarrollo de la idea del algoritmo, así como su puesta a punto, validación y comprobación de su eficacia; de una segunda parte empírico-práctica de comparación de los resultados del algoritmo con otras metodologías de selección de variables existentes, y de una parte adicional de programación y desarrollo de software en la que se presenta todo el desarrollo del paquete de R, su funcionalidad y capacidades de análisis. El desarrollo y validación de la técnica, así como la publicación del paquete de R, ha permitido ampliar las opciones actuales para el análisis
[CAT] En les últimes dècades els avançaments tecnològics han tingut com a conseqüència la generació d'una creixent quantitat de dades en el camp de la biologia i la biomedicina. A dia d'avui, les anomenades tecnologies "òmiques", com la genòmica, epigenòmica, transcriptòmica o metabolòmica entre altres, produeixen bases de dades amb centenars, milers o fins i tot milions de variables. L'anàlisi de dades 'òmiques' presenta una sèrie de complexitats tant metodolò-giques com computacionals que han portat a una revolució en el desenvolupament de nous mètodes estadístics específicament dissenyats per a tractar amb aquest tipus de dades. A aquestes complexitats metodològiques cal afegir que, en la major part dels casos, les restriccions logístiques i / o econòmiques dels projectes de recerca solen comportar que les magnituts de les mostres en aquestes bases de dades amb tantes variables siguen molt baixes, el que no fa sinó empitjorar les dificultats d'anàlisi, ja que es tenen moltíssimes més variables que observacions Entre les tècniques desenvolupades per a tractar amb aquest tipus de dades podem trobar algunes basades en la penalització dels coeficients, com lasso o elastic net, altres basades en tècniques de projecció sobre estructures latents com PCA o PLS i altres basades en arbres o combinacions d'arbres com random forest. Totes aquestes tècniques funcionen molt bé sobre diferents dades 'òmiques' presentats en forma de matriu (IxJ), però, en ocasions les dades òmiques poden estar expandits, per exemple, cuan ni ha mesures repetides en el temps sobre els mateixos individus, trobant-se amb estructures de dades que ja no són matrius, sinó arrays tridimensionals o three-way (IxJxK). En aquestos casos, la majoria de les tècniques mencionades perden tota o bona part de la seua aplicabilitat, quedant molt poques opcions viables per a l'anàlisi d'aquest tipus d'estructures de dades. Una de les tècniques que sí que és útil per a l'anàlisi d'estructures three-way es N-PLS, que permet ajustar models predictius raonablement precisos, així com interpretar-los mitjançant diferents gràfics. No obstant això, relacionat amb el problema de l'escassetat de mostres relativa al desorbitat nombre de variables, apareix la necessitat de realitzar una selecció de variables relacionades amb la variable resposta. Això és especialment cert en l'àmbit de la biologia i la biomedicina, ja que no només es vol poder predir el que va a succeir, sinó entendre per què passa, quines variables estan implicades i, si pot ser, no haver de tornar a recollir els centenars de milers de variables per realitzar una nova predicció, sinó utilitzar unes quantes, les més importants, per poder dissenyar kits predictius cost / efectius d'utilitat real. Per això, l'objectiu principal d'aquesta tesi és millorar les tècniques existents per a l'anàlisi de dades òmiques, específicament les encaminades a analitzar dades three-way, incorporant la capacitat de selecció de variables, millorant la capacitat predictiva i millorant la interpretabilitat dels resultats obtinguts. Tot això s'implementarà a més en un paquet de R completament documentat, que inclourà totes les funcions necessàries per a dur a terme anàlisis completes de dades three-way. El treball inclòs en aquesta tesi per tant, consta d'una primera part teorica-conceptual de desenvolupament de la idea de l'algoritme, així com la seua posada a punt, validació i comprovació de la seua eficàcia, d'una segona part empíric-pràctica de comparació dels resultats de l'algoritme amb altres metodologies de selecció de variables existents i d'una part adicional de programació i desenvolupament de programació en la qual es presenta tot el desenvolupament del paquet de R, la seua funcionalitat i capacitats d'anàlisi. El desenvolupament i validació de la tècnica, així com la publicació del paquet de R, ha permès ampliar les opcions actuals per a l'anàlis
[EN] In the last decades, advances in technology have enabled the gathering of an increasingly amount of data in the field of biology and biomedicine. The so called "-omics" technologies such as genomics, epigenomics, transcriptomics or metabolomics, among others, produce hundreds, thousands or even millions of variables per data set. The analysis of 'omic' data presents different complexities that can be methodological and computational. This has driven a revolution in the development of new statistical methods specifically designed for dealing with these type of data. To this methodological complexities one must add the logistic and economic restrictions usually present in scientific research projects that lead to small sample sizes paired to these wide data sets. This makes the analyses even harder, since there is a problem in having many more variables than observations. Among the methods developed to deal with these type of data there are some based on the penalization of the coefficients, such as lasso or elastic net, others based on projection techniques, such as PCA or PLS, and others based in regression or classification trees and ensemble methods such as random forest. All these techniques work fine when dealing with different 'omic' data in matrix format (IxJ), but sometimes, these IxJ data sets can be expanded by taking, for example, repeated measurements at different time points for each individual, thus having IxJxK data sets that raise more methodological complications to the analyses. These data sets are called three-way data. In this cases, the majority of the cited techniques lose all or a good part of their applicability, leaving very few viable options for the analysis of this type of data structures. One useful tool for analyzing three-way data, when some Y data structure is to be predicted, is N-PLS. N-PLS reduces the inclusion of noise in the models and obtains more robust parameters when compared to PLS while, at the same time, producing easy-to-understand plots. Related to the problem of small sample sizes and exorbitant variable numbers, comes the issue of variable selection. Variable selection is essential for facilitating biological interpretation of the results when analyzing 'omic' data sets. Often, the aim of the study is not only predicting the outcome, but also understanding why it is happening and also what variables are involved. It is also of interest being able to perform new predictions without having to collect all the variables again. Because all of this, the main goal of this thesis is to improve the existing methods for 'omic' data analysis, specifically those for dealing with three-way data, incorporating the ability of variable selection, improving predictive capacity and interpretability of results. All this will be implemented in a fully documented R package, that will include all the necessary functions for performing complete analyses of three-way data. The work included in this thesis consists in a first theoretical-conceptual part where the idea and development of the algorithm takes place, as well as its tuning, validation and assessment of its performance. Then, a second empirical-practical part comes where the algorithm is compared to other variable selection methodologies. Finally, an additional programming and software development part is presented where all the R package development takes place, and its functionality and capabilities are exposed. The development and validation of the technique, as well as the publication of the R package, has opened many future research lines.
Hervás Marín, D. (2019). Use of multivariate statistical methods for the analysis of metabolomic data [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/130847
TESIS

APA, Harvard, Vancouver, ISO, and other styles

21

Mudavanhu, Precious. "A brief introduction to basic multivariate economic statistical process control." Thesis, Stellenbosch : Stellenbosch University, 2012. http://hdl.handle.net/10019.1/71679.

Full text

Abstract:

Thesis (MComm)--Stellenbosch University, 2012.
ENGLISH ABSTRACT: Statistical process control (SPC) plays a very important role in monitoring and improving industrial processes to ensure that products produced or shipped to the customer meet the required specifications. The main tool that is used in SPC is the statistical control chart. The traditional way of statistical control chart design assumed that a process is described by a single quality characteristic. However, according to Montgomery and Klatt (1972) industrial processes and products can have more than one quality characteristic and their joint effect describes product quality. Process monitoring in which several related variables are of interest is referred to as multivariate statistical process control (MSPC). The most vital and commonly used tool in MSPC is the statistical control chart as in the case of the SPC. The design of a control chart requires the user to select three parameters which are: sample size, n , sampling interval, h and control limits, k.Several authors have developed control charts based on more than one quality characteristic, among them was Hotelling (1947) who pioneered the use of the multivariate process control techniques through the development of a 2 T -control chart which is well known as Hotelling 2 T -control chart. Since the introduction of the control chart technique, the most common and widely used method of control chart design was the statistical design. However, according to Montgomery (2005), the design of control has economic implications. There are costs that are incurred during the design of a control chart and these are: costs of sampling and testing, costs associated with investigating an out-of-control signal and possible correction of any assignable cause found, costs associated with the production of nonconforming products, etc. The paper is about giving an overview of the different methods or techniques that have been employed to develop the different economic statistical models for MSPC. The first multivariate economic model presented in this paper is the economic design of the Hotelling‟s 2 T -control chart to maintain current control of a process developed by Montgomery and Klatt (1972). This is followed by the work done by Kapur and Chao (1996) in which the concept of creating a specification region for the multiple quality characteristics together with the use of a multivariate quality loss function is implemented to minimize total loss to both the producer and the customer. Another approach by Chou et al (2002) is also presented in which a procedure is developed that simultaneously monitor the process mean and covariance matrix through the use of a quality loss function. The procedure is based on the test statistic 2ln L and the cost model is based on Montgomery and Klatt (1972) as well as Kapur and Chao‟s (1996) ideas. One example of the use of the variable sample size technique on the economic and economic statistical design of the control chart will also be presented. Specifically, an economic and economic statistical design of the 2 T -control chart with two adaptive sample sizes (Farazet al, 2010) will be presented. Farazet al (2010) developed a cost model of a variable sampling size 2 T -control chart for the economic and economic statistical design using Lorenzen and Vance‟s (1986) model. There are several other approaches to the multivariate economic statistical process control (MESPC) problem, but in this project the focus is on the cases based on the phase II stadium of the process where the mean vector, and the covariance matrix, have been fairly well established and can be taken as known, but both are subject to assignable causes. This latter aspect is often ignored by researchers. Nevertheless, the article by Farazet al (2010) is included to give more insight into how more sophisticated approaches may fit in with MESPC, even if the mean vector, only may be subject to assignable cause. Keywords: control chart; statistical process control; multivariate statistical process control; multivariate economic statistical process control; multivariate control chart; loss function.
AFRIKAANSE OPSOMMING: Statistiese proses kontrole (SPK) speel 'n baie belangrike rol in die monitering en verbetering van industriële prosesse om te verseker dat produkte wat vervaardig word, of na kliënte versend word wel aan die vereiste voorwaardes voldoen. Die vernaamste tegniek wat in SPK gebruik word, is die statistiese kontrolekaart. Die tradisionele wyse waarop statistiese kontrolekaarte ontwerp is, aanvaar dat ‟n proses deur slegs 'n enkele kwaliteitsveranderlike beskryf word. Montgomery and Klatt (1972) beweer egter dat industriële prosesse en produkte meer as een kwaliteitseienskap kan hê en dat hulle gesamentlik die kwaliteit van 'n produk kan beskryf. Proses monitering waarin verskeie verwante veranderlikes van belang mag wees, staan as meerveranderlike statistiese proses kontrole (MSPK) bekend. Die mees belangrike en algemene tegniek wat in MSPK gebruik word, is ewe eens die statistiese kontrolekaart soos dit die geval is by SPK. Die ontwerp van 'n kontrolekaart vereis van die gebruiker om drie parameters te kies wat soos volg is: steekproefgrootte, n , tussensteekproefinterval, h en kontrolegrense, k . Verskeie skrywers het kontrolekaarte ontwikkel wat op meer as een kwaliteitseienskap gebaseer is, waaronder Hotelling wat die gebruik van meerveranderlike proses kontrole tegnieke ingelei het met die ontwikkeling van die T2 -kontrolekaart wat algemeen bekend is as Hotelling se 2 T -kontrolekaart (Hotelling, 1947). Sedert die ingebruikneming van die kontrolekaart tegniek is die statistiese ontwerp daarvan die mees algemene benadering en is dit ook in daardie formaat gebruik. Nietemin, volgens Montgomery and Klatt (1972) en Montgomery (2005), het die ontwerp van die kontrolekaart ook ekonomiese implikasies. Daar is kostes betrokke by die ontwerp van die kontrolekaart en daar is ook die kostes t.o.v. steekproefneming en toetsing, kostes geassosieer met die ondersoek van 'n buite-kontrole-sein, en moontlike herstel indien enige moontlike korreksie van so 'n buite-kontrole-sein gevind word, kostes geassosieer met die produksie van niekonforme produkte, ens. In die eenveranderlike geval is die hantering van die ekonomiese eienskappe al in diepte ondersoek. Hierdie werkstuk gee 'n oorsig oor sommige van die verskillende metodes of tegnieke wat al daargestel is t.o.v. verskillende ekonomiese statistiese modelle vir MSPK. In die besonder word aandag gegee aan die gevalle waar die vektor van gemiddeldes sowel as die kovariansiematriks onderhewig is aan potensiële verskuiwings, in teenstelling met 'n neiging om slegs na die vektor van gemiddeldes in isolasie te kyk synde onderhewig aan moontlike verskuiwings te wees.

APA, Harvard, Vancouver, ISO, and other styles

22

Yue, Hongyu. "Multivariate statistical monitoring and diagnosis with applications in semiconductor processes /." Digital version accessible at:, 2000. http://wwwlib.umi.com/cr/utexas/main.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Yoon, Seongkyu. "Using external information for statistical process control /." *McMaster only, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

24

Lee, David, and 李大為. "Statistical inference of a threshold model in extreme value analysis." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2012. http://hub.hku.hk/bib/B4819945X.

Full text

Abstract:

In many data sets, a mixture distribution formulation applies when it is known that each observation comes from one of the underlying categories. Even if there are no apparent categories, an implicit categorical structure may justify a mixture distribution. This thesis concerns the modeling of extreme values in such a setting within the peaks-over-threshold (POT) approach. Specifically, the traditional POT modeling using the generalized Pareto distribution is augmented in the sense that, in addition to threshold exceedances, data below the threshold are also modeled by means of the mixture exponential distribution. In the first part of this thesis, the conventional frequentist approach is applied for data modeling. In view of the mixture nature of the problem, the EM algorithm is employed for parameter estimation, where closed-form expressions for the iterates are obtained. A simulation study is conducted to confirm the suitability of such method, and the observation of an increase in standard error due to the variability of the threshold is addressed. The model is applied to two real data sets, and it is demonstrated how computation time can be reduced through a multi-level modeling procedure. With the fitted density, it is possible to derive many useful quantities such as return periods and levels, value-at-risk, expected tail loss and bounds for ruin probabilities. A likelihood ratio test is then used to justify model choice against the simpler model where the thin-tailed distribution is homogeneous exponential. The second part of the thesis deals with a fully Bayesian approach to the same model. It starts with the application of the Bayesian idea to a special case of the model where a closed-form posterior density is computed for the threshold parameter, which serves as an introduction. This is extended to the threshold mixture model by the use of the Metropolis-Hastings algorithm to simulate samples from a posterior distribution known up to a normalizing constant. The concept of depth functions is proposed in multidimensional inference, where a natural ordering does not exist. Such methods are then applied to real data sets. Finally, the issue of model choice is considered through the use of posterior Bayes factor, a criterion that stems from the posterior density.
published_or_final_version
Statistics and Actuarial Science
Master
Master of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

25

Stempski, Mark Owen. "Multivariate statistical strategies for the diagnosis of space-occupying liver disease." Diss., The University of Arizona, 1987. http://hdl.handle.net/10150/184280.

Full text

Abstract:

This dissertation investigated the use of a variety of multivariate statistical procedures to answer questions regarding the value of a number of medical tests and procedures in the diagnosis of space-occupying liver disease. Also investigated were some aspects of test ordering behavior by physicians. A basic methodology was developed to deal with archival data. A number of methodological problems were addressed. Discriminant function analysis was used to determine which procedures and tests served to provide the best classification of disease entities. Although the results were not spectacular, some variables, including a physical examination variable and a number of laboratory procedures were identified as being important. A more detailed analysis of the role of the laboratory variables was afforded by the use of stepwise logistic regression. In these analyses pairs of disease classifications were compared. Two of the more specific laboratory tests, total bilirubin and alkaline phosphate, entered into the equations to provide a fit to the data. Logistic regression analyses employing patient variables mirrored the results obtained with the discriminant function analyses. Liver-spleen scan indicants were also employed as predictor variables in a series of logistic regression analyses. In general, for a range of comparisons, those indicants cited in the literature as being valuable in discriminating between disease entities entered into the equations. Log-Linear models were used to investigate test ordering behavior. In general, test ordering was independent of department. The sole exception being that of the Gynecology-oncology department which relies heavily on Ultrasound. Log-Linear analyses investigating the use of a number of procedures showed differential use of procedures consistent with what is usually suggested in the medical literature for the combination of different imaging and more specialized procedures. Finally, a set of analyses investigated the ordering of a number of procedures relative to specific disease classifications. This set of analyses suffers, as do a number of the other analyses, from insufficient numbers of cases. However, some indications of differential performance of tests for different disease classifications were evident. Suggestions for further study concentrated on the development of experimental procedures given the results of this study.

APA, Harvard, Vancouver, ISO, and other styles

26

Murphy, Terrence Edward. "Multivariate Quality Control Using Loss-Scaled Principal Components." Diss., Available online, Georgia Institute of Technology, 2004:, 2004. http://etd.gatech.edu/theses/available/etd-11222004-122326/unrestricted/murphy%5Fterrence%5Fe%5F200412%5Fphd.pdf.

Full text

Abstract:

Thesis (Ph. D.)--Industrial and Systems Engineering, Georgia Institute of Technology, 2005.
Victoria Chen, Committee Co-Chair ; Kwok Tsui, Committee Chair ; Janet Allen, Committee Member ; David Goldsman, Committee Member ; Roshan Vengazhiyil, Committee Member. Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

27

Beltran, Luis. "NONPARAMETRIC MULTIVARIATE STATISTICAL PROCESS CONTROL USING PRINCIPAL COMPONENT ANALYSIS AND SIMPLICIAL DEPTH." Doctoral diss., University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4080.

Full text

Abstract:

Although there has been progress in the area of Multivariate Statistical Process Control (MSPC), there are numerous limitations as well as unanswered questions with the current techniques. MSPC charts plotting Hotelling's T2 require the normality assumption for the joint distribution among the process variables, which is not feasible in many industrial settings, hence the motivation to investigate nonparametric techniques for multivariate data in quality control. In this research, the goal will be to create a systematic distribution-free approach by extending current developments and focusing on the dimensionality reduction using Principal Component Analysis. The proposed technique is different from current approaches given that it creates a nonparametric control chart using robust simplicial depth ranks of the first and last set of principal components to improve signal detection in multivariate quality control with no distributional assumptions. The proposed technique has the advantages of ease of use and robustness in MSPC for monitoring variability and correlation shifts. By making the approach simple to use in an industrial setting, the probability of adoption is enhanced. Improved MSPC can result in a cost savings and improved quality.
Ph.D.
Department of Industrial Engineering and Management Systems
Engineering and Computer Science
Industrial Engineering and Management Systems

APA, Harvard, Vancouver, ISO, and other styles

28

Lin, Haisheng. "The application of multivariate statistical analysis and batch process control in industrial processes." Thesis, University of Manchester, 2010. https://www.research.manchester.ac.uk/portal/en/theses/the-application-of-multivariate-statistical-analysis-and-batch-process-control-in-industrial-processes(a80fba25-82b1-4f55-a38e-c486262e18dd).html.

Full text

Abstract:

To manufacture safe, effective and affordable medicines with greater efficiency, process analytical technology (PAT) has been introduced by the Food and Drug Agency to encourage the pharmaceutical industry to develop and design well-understood processes. PAT requires chemical imaging techniques to be used to collect process variables for real-time process analysis. Multivariate statistical analysis tools and process control tools are important for implementing PAT in the development and manufacture of pharmaceuticals as they enable information to be extracted from the PAT measurements. Multivariate statistical analysis methods such as principal component analysis (PCA) and independent component analysis (ICA) are applied in this thesis to extract information regarding a pharmaceutical tablet. ICA was found to outperform PCA and was able to identify the presence of five different materials and their spatial distribution around the tablet.Another important area for PAT is in improving the control of processes. In the pharmaceutical industry, many of the processes operate in a batch strategy, which introduces difficult control challenges. Near-infrared (NIR) spectroscopy is a non-destructive analytical technique that has been used extensively to extract chemical and physical information from a product sample based on the scattering effect of light. In this thesis, NIR measurements were incorporated as feedback information into several control strategies. Although these controllers performed reasonably well, they could only regulate the NIR spectrum at a number of wavenumbers, rather than over the full spectrum.In an attempt to regulate the entire NIR spectrum, a novel control algorithm was developed. This controller was found to be superior to the only comparable controller and able to regulate the NIR similarly. The benefits of the proposed controller were demonstrated using a benchmark simulation of a batch reactor.

APA, Harvard, Vancouver, ISO, and other styles

29

Lynch, James Charles. "A flexible class of models for regression modelling of multivariate failure time data /." Thesis, Connect to this title online; UW restricted, 1996. http://hdl.handle.net/1773/9561.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Hu, Cecilia X. Matthew C. Knitt. "A comparative analysis of multivariate statistical detection methods applied to syndromic surveillance." Monterey, Calif. : Naval Postgraduate School, 2007. http://bosun.nps.edu/uhtbin/hyperion-image.exe/07Jun%5FHu.pdf.

Full text

Abstract:

Thesis (M.S. in Applied Science (Operations Research) )--Naval Postgraduate School, June 2007.
Thesis Advisor(s): Ronald D. Fricker. "June 2007." Includes bibliographical references (p. 71-72). Also available in print.

APA, Harvard, Vancouver, ISO, and other styles

31

Hong, Jeong Jin. "Multivariate statistical modelling for fault analysis and quality prediction in batch processes." Thesis, University of Newcastle Upon Tyne, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.576960.

Full text

Abstract:

Multivariate statistical process control (MSPC) has emerged as an effective technique for monitoring processes with a large number of correlated process variables. MSPC techniques use principal component analysis (PCA) and partial least squares (PLS) to project the high dimensional correlated process variables onto a low dimensional principal component or latent variable space and process monitoring is carried out in this low dimensional space. This study is focused on developing enhanced MSPC techniques for fault diagnosis and quality prediction in batch processes. A progressive modelling method is developed in this study to facilitate fault analysis and fault localisation. A PCA model is developed from normal process operation data and is used for on-line process monitoring. Once a fault is detected by the PCA model, process variables that are related to the fault are identified using contribution analysis. The time information on when abnormalities occurred in these variables is identified using time series plot of the squared prediction errors (SPE) on these variables. These variables are then removed and another PCA model is developed using the remaining variables. If the faulty batch cannot be detected by the new PCA model, then the remaining variables are not related to the fault. If the faulty batch can still be detected by the new PCA model, then further variables associated with the fault are identified from SPE contribution analysis. The procedure is repeated until the faulty batch can no longer be detected using the remaining variables. Multi-block methods are then applied with the progressive modelling scheme to enhance fault analysis and localisation efficiency. The methods are tested on a benchmark simulated penicillin production process and real industrial data. An enhanced multi-block PLS predictive modelling method is developed in this study. It is based on the hypothesis that meaningful variable selection can lead to better prediction performance. A data partitioning method for enhanced predictive process modelling is proposed and it enables data to be separated into blocks by different measuring time. Model parameters can be used to express contributions

APA, Harvard, Vancouver, ISO, and other styles

32

Knitt, Matthew C. "A comparative analysis of multivariate statistical detection methods applied to syndromic surveillance." Thesis, Monterey, California. Naval Postgraduate School, 2007. http://hdl.handle.net/10945/3417.

Full text

Abstract:

Biological terrorism is a threat to the security and well-being of the United States. It is critical to detect the presence of these attacks in a timely manner, in order to provide sufficient and effective responses to minimize or contain the damage inflicted. Syndromic surveillance is the process of monitoring public health-related data and applying statistical tests to determine the potential presence of a disease outbreak in the observed system. Our research involved a comparative analysis of two multivariate statistical methods, the multivariate CUSUM (MCUSUM) and the multivariate exponentially weighted moving average (MEWMA), both modified to look only for increases in disease incidence. While neither of these methods is currently in use in a biosurveillance system, they are among the most promising multivariate methods for this application. Our analysis was based on a series of simulations using synthetic syndromic surveillance data that mimics various types of background disease incidence and outbreaks. We found that, similar to results for the univariate CUSUM and EWMA, the directionally-sensitive MCUSUM and MEWMA perform very similarly.

APA, Harvard, Vancouver, ISO, and other styles

33

Rameseder, Jonathan. "Multivariate methods for the statistical analysis of hyperdimensional high-content screening data." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/92957.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Computational and Systems Biology Program, 2014.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references.
In the post-genomic era, greater emphasis has been placed on understanding the function of genes at the systems level. To meet these needs, biologists are creating larger, and increasingly complex datasets. In recent years, high-content screening (HCS) using RNA interference (RNAi) or other perturbation techniques in combination with automated microscopy has emerged as a promising investigative tool to explore intricate biological processes. Image-based HC screens produce massive hyperdimensional data sets. To identify novel components of the DNA damage response (DDR) after ionizing radiation, we recently performed an image-based HC RNAi screen in an osteosarcoma cell line. Robust univariate hit identication methods and manual network analysis identied an isoform of BRD4, a bromodomain and extra-terminal domain family member, as an endogenous inhibitor of DDR signaling. However, despite the plethora of data generated from our and other HC screens, little progress has been made in analyzing HC data using multivariate computational methods that exploit the full richness of hyperdimensional data and identify more than just the most salient knockdown phenotypes to gain a detailed understanding of how gene products cooperate to regulate complex cellular processes. We developed a novel multivariate method using logistic regression models and least absolute shrinkage and selection operator regularization for analyzing hyperdimensional HC data. We applied this method to our HC screen to identify genes that exhibit subtle but consistent phenotypic changes upon knockdown that would have been missed by conventional univariate hit identication approaches. Our method automatically selects the most predictive features at the most predictive time points to facilitate the more ecient design of follow-up experiments and puts the identied hits in a network context using the Prize-Collecting Steiner Tree algorithm. This method offers superior performance over the current gold standard for the analysis of HC RNAi screens. A surprising finding from our analysis is that training sets of genes involved in complex biological phenomena used to train predictive models must be broken down into functionally coherent subsets in order to enhance new gene discovery. Additionally, we found that in the case of RNAi screening, statistical cell-to-cell variation in phenotypic responses in a well of cells targeted by a single shRNA is an important predictor of gene dependent events.
by Jonathan Rameseder.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

34

Loddo, Antonello. "Bayesian analysis of multivariate stochastic volatility and dynamic models." Diss., Columbia, Mo. : University of Missouri-Columbia, 2006. http://hdl.handle.net/10355/4359.

Full text

Abstract:

Thesis (Ph.D.)--University of Missouri-Columbia, 2006.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file viewed on (April 26, 2007) Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

35

Eno, Daniel R. "Noninformative Prior Bayesian Analysis for Statistical Calibration Problems." Diss., Virginia Tech, 1999. http://hdl.handle.net/10919/27140.

Full text

Abstract:

In simple linear regression, it is assumed that two variables are linearly related, with unknown intercept and slope parameters. In particular, a regressor variable is assumed to be precisely measurable, and a response is assumed to be a random variable whose mean depends on the regressor via a linear function. For the simple linear regression problem, interest typically centers on estimation of the unknown model parameters, and perhaps application of the resulting estimated linear relationship to make predictions about future response values corresponding to given regressor values. The linear statistical calibration problem (or, more precisely, the absolute linear calibration problem), bears a resemblance to simple linear regression. It is still assumed that the two variables are linearly related, with unknown intercept and slope parameters. However, in calibration, interest centers on estimating an unknown value of the regressor, corresponding to an observed value of the response variable. We consider Bayesian methods of analysis for the linear statistical calibration problem, based on noninformative priors. Posterior analyses are assessed and compared with classical inference procedures. It is shown that noninformative prior Bayesian analysis is a strong competitor, yielding posterior inferences that can, in many cases, be correctly interpreted in a frequentist context. We also consider extensions of the linear statistical calibration problem to polynomial models and multivariate regression models. For these models, noninformative priors are developed, and posterior inferences are derived. The results are illustrated with analyses of published data sets. In addition, a certain type of heteroscedasticity is considered, which relaxes the traditional assumptions made in the analysis of a statistical calibration problem. It is shown that the resulting analysis can yield more reliable results than an analysis of the homoscedastic model.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

36

Ley, Christophe. "Univariate and multivariate symmetry: statistical inference and distributional aspects." Doctoral thesis, Universite Libre de Bruxelles, 2010. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/210029.

Full text

Abstract:

This thesis deals with several statistical and probabilistic aspects of symmetry and asymmetry, both in a univariate and multivariate context, and is divided into three distinct parts.

The first part, composed of Chapters 1, 2 and 3 of the thesis, solves two conjectures associated with multivariate skew-symmetric distributions. Since the introduction in 1985 by Adelchi Azzalini of the most famous representative of that class of distributions, namely the skew-normal distribution, it is well-known that, in the vicinity of symmetry, the Fisher information matrix is singular and the profile log-likelihood function for skewness admits a stationary point whatever the sample under consideration. Since that moment, researchers have tried to determine the subclasses of skew-symmetric distributions who suffer from each of those problems, which has led to the aforementioned two conjectures. This thesis completely solves these two problems.

The second part of the thesis, namely Chapters 4 and 5, aims at applying and constructing extremely general skewing mechanisms. As such, in Chapter 4, we make use of the univariate mechanism of Ferreira and Steel (2006) to build optimal (in the Le Cam sense) tests for univariate symmetry which are very flexible. Actually, their mechanism allowing to turn a given symmetric distribution into any asymmetric distribution, the alternatives to the null hypothesis of symmetry can take any possible shape. These univariate mechanisms, besides that surjectivity property, enjoy numerous good properties, but cannot be extended to higher dimensions in a satisfactory way. For this reason, we propose in Chapter 5 different general mechanisms, sharing all the nice properties of their competitors in Ferreira and Steel (2006), but which moreover can be extended to any dimension. We formally prove that the surjectivity property holds in dimensions k>1 and we study the principal characteristics of these new multivariate mechanisms.

Finally, the third part of this thesis, composed of Chapter 6, proposes a test for multivariate central symmetry by having recourse to the concepts of statistical depth and runs. This test extends the celebrated univariate runs test of McWilliams (1990) to higher dimensions. We analyze its asymptotic behavior (especially in dimension k=2) under the null hypothesis and its invariance and robustness properties. We conclude by an overview of possible modifications of these new tests./

Cette thèse traite de différents aspects statistiques et probabilistes de symétrie et asymétrie univariées et multivariées, et est subdivisée en trois parties distinctes.

La première partie, qui comprend les chapitres 1, 2 et 3 de la thèse, est destinée à la résolution de deux conjectures associées aux lois skew-symétriques multivariées. Depuis l'introduction en 1985 par Adelchi Azzalini du plus célèbre représentant de cette classe de lois, à savoir la loi skew-normale, il est bien connu qu'en un voisinage de la situation symétrique la matrice d'information de Fisher est singulière et la fonction de vraisemblance profile pour le paramètre d'asymétrie admet un point stationnaire quel que soit l'échantillon considéré. Dès lors, des chercheurs ont essayé de déterminer les sous-classes de lois skew-symétriques qui souffrent de chacune de ces problématiques, ce qui a mené aux deux conjectures précitées. Cette thèse résoud complètement ces deux problèmes.

La deuxième partie, constituée des chapitres 4 et 5, poursuit le but d'appliquer et de proposer des méchanismes d'asymétrisation très généraux. Ainsi, au chapitre 4, nous utilisons le méchanisme univarié de Ferreira and Steel (2006) pour construire des tests de symétrie univariée optimaux (au sens de Le Cam) qui sont très flexibles. En effet, leur méchanisme permettant de transformer une loi symétrique donnée en n'importe quelle loi asymétrique, les contre-hypothèses à la symétrie peuvent prendre toute forme imaginable. Ces méchanismes univariés, outre cette propriété de surjectivité, possèdent de nombreux autres attraits, mais ne permettent pas une extension satisfaisante aux dimensions supérieures. Pour cette raison, nous proposons au chapitre 5 des méchanismes généraux alternatifs, qui partagent toutes les propriétés de leurs compétiteurs de Ferreira and Steel (2006), mais qui en plus sont généralisables à n'importe quelle dimension. Nous démontrons formellement que la surjectivité tient en dimension k > 1 et étudions les caractéristiques principales de ces nouveaux méchanismes multivariés.

Finalement, la troisième partie de cette thèse, composée du chapitre 6, propose un test de symétrie centrale multivariée en ayant recours aux concepts de profondeur statistique et de runs. Ce test étend le célèbre test de runs univarié de McWilliams (1990) aux dimensions supérieures. Nous en analysons le comportement asymptotique (surtout en dimension k = 2) sous l'hypothèse nulle et les propriétés d'invariance et de robustesse. Nous concluons par un aperçu sur des modifications possibles de ces nouveaux tests.
Doctorat en Sciences
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

37

Zhao, Hongya. "Statistical analysis of gene expression data in cDNA microarray experiments." HKBU Institutional Repository, 2006. http://repository.hkbu.edu.hk/etd_ra/657.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Tao, Hui. "An Investigation of False Discovery Rates in Multiple Testing under Dependence." Fogler Library, University of Maine, 2005. http://www.library.umaine.edu/theses/pdf/TaoH2005.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Mohamed, Nuri Eltabit [Verfasser], Rainer [Akademischer Betreuer] Schwabe, and Waltraud [Akademischer Betreuer] Kahle. "Statistical analysis in multivariate sampling / Nuri Eltabit Mohamed. Betreuer: Rainer Schwabe ; Waltraud Kahle." Magdeburg : Universitätsbibliothek, 2011. http://d-nb.info/1047558963/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Rowland, Adewumi. "GIS-based prediction of pipeline third-party interference using hybrid multivariate statistical analysis." Thesis, University of Newcastle Upon Tyne, 2011. http://hdl.handle.net/10443/2529.

Full text

Abstract:

In reported pipeline failures globally, third-party interference (TPI) has been recognised as a dominant failure mechanism in the oil and gas industry, although there has been limited research in this area. The problem is receiving considerable attention within the oil and gas industry, because of the industry threats (e.g. Al Qaeda's capabilities) and the natural vulnerability of pipelines because of their long distance network distribution. The ability to predict and secure pipelines against TPI is a valuable knowledge in the pipeline industry, and especially for the safety of the millions of people who live near pipelines. This thesis develop an understanding of the relationships between the many and various contributory factors leading to potential TPI, frequently resulting in mass deaths, economic losses, and widespread destruction to property. The thesis used GIS-based spatial statistical methodologies, first, based on hotspot and cold spot cluster analyses to explain pipeline incident patterns and distributions; and a geographically weighted regression (GWR) model to investigate the determinants of TPI and to identify local and global effects of the independent variables. Secondly, a generalized linear model (GLMs) methodology of Poisson GLMs and Logistic Regression (LR) procedures, by using a combination of land use types, pipeline geometry and intrinsic properties, and socioeconomic and socio-political factors to identify and predict potentially vulnerable pipeline segments and regions in a pipeline network. The GWR model showed significant spatial relationship between TPI, geographical accessibility, and pipeline intrinsic properties (e.g. depth, age, size), varying with location in the study area. The thesis showed that depth of pipeline and the socio-economic conditions of population living near pipeline are the two major factors influencing the occurrence of TPI. This thesis have prompted the need for selective protection of vulnerable segments of a pipeline by installing security tools where most needed. The thesis examined available literature and critically evaluated and assessed selected international pipeline failure databases, their effectiveness, limitations, trend, and the evolving difficulties of addressing and minimising TPI. The result of the review showed irregular nomenclature and the need for a universal classification of pipeline incidents database. The advantages and disadvantages of different detection and prevention tools for minimising TPI, used in the pipeline industry are discussed. A questionnaire survey was developed and employed, as part of the thesis, for the employees and managers in the pipeline industry. The results of the data analysis has contributed to the body of knowledge on pipeline TPI, especially the industry perceptions, prevention strategies, capabilities and complexities of the various application methods presently being implemented. The thesis also outlined the actions that governments and industry can and should take to help manage and effectively reduce the risk of pipeline TPI. The results of this study will be used as a reference to develop strategies for managing pipeline TPI. The results of the thesis also indicated that communications with all stakeholders is more effective in preventing intentional pipeline interference, and that the government's social responsibility to communities is the major factor influencing the occurrence of intentional pipeline TPI.

APA, Harvard, Vancouver, ISO, and other styles

41

Albazzaz, Hamza. "Multivariate statistical batch process control and data visualisation based on independent component analysis." Thesis, University of Leeds, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.432293.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Crawford, Jesse B. "Interpretation of eigenvalues in multivariate statistical analysis and Bartlett's test for Riesz distributions." [Bloomington, Ind.] : Indiana University, 2008. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3331317.

Full text

Abstract:

Thesis (Ph.D.)--Indiana University, Dept. of Mathematics, 2008.
Title from PDF t.p. (viewed on Jul 27, 2009). Source: Dissertation Abstracts International, Volume: 69-11, Section: B, page: 6890. Adviser: Steen A. Andersson.

APA, Harvard, Vancouver, ISO, and other styles

43

Petters, Patrik. "Development of a Supervised Multivariate Statistical Algorithm for Enhanced Interpretability of Multiblock Analysis." Thesis, Linköpings universitet, Matematiska institutionen, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-138112.

Full text

Abstract:

In modern biological research, OMICs techniques, such as genomics, proteomics or metabolomics, are often employed to gain deep insights into metabolic regulations and biochemical perturbations in response to a specific research question. To gain complementary biologically relevant information, multiOMICs, i.e., several different OMICs measurements on the same specimen, is becoming increasingly frequent. To be able to take full advantage of this complementarity, joint analysis of such multiOMICs data is necessary, but this is yet an underdeveloped area. In this thesis, a theoretical background is given on general component-based methods for dimensionality reduction such as PCA, PLS for single block analysis, and multiblock PLS for co-analysis of OMICs data. This is followed by a rotation of an unsupervised analysis method. The aim of this method is to divide dimensionality-reduced data in block-distinct and common variance partitions, using the DISCO-SCA approach. Finally, an algorithm for a similar rotation of a supervised (PLS) solution is presented using data available in the literature. To the best of our knowledge, this is the first time that such an approach for rotation of a supervised analysis in block-distinct and common partitions has been developed and tested.This newly developed DISCO-PLS algorithm clearly showed an increased potential for visualisation and interpretation of data, compared to standard PLS. This is shown bybiplots of observation scores and multiblock variable loadings.

APA, Harvard, Vancouver, ISO, and other styles

44

Hajigholizadeh, Mohammad. "Water Quality Modelling Using Multivariate Statistical Analysis and Remote Sensing in South Florida." FIU Digital Commons, 2016. http://digitalcommons.fiu.edu/etd/2992.

Full text

Abstract:

The overall objective of this dissertation research is to understand the spatiotemporal dynamics of water quality parameters in different water bodies of South Florida. Two major approaches (multivariate statistical techniques and remote sensing) were used in this study. Multivariate statistical techniques include cluster analysis (CA), principal component analysis (PCA), factor analysis (FA), discriminant analysis (DA), absolute principal component score-multiple linear regression (APCS-MLR) and PMF receptor modeling techniques were used to assess the water quality and identify and quantify the potential pollution sources affecting the water quality of three major rivers of South Florida. For this purpose, a 15-year (2000–2014) data set of 12 water quality variables, and about 35,000 observations were used. Agglomerative hierarchical CA grouped 16 monitoring sites into three groups (low pollution, moderate pollution, and high pollution) based on their similarity of water quality characteristics. DA, as an important data reduction method, was used to assess the water pollution status and analysis of its spatiotemporal variation. PCA/FA identified potential pollution sources in wet and dry seasons, respectively, and the effective mechanisms, rules, and causes were explained. The APCS-MLR and PMF models apportioned their contributions to each water quality variable. Also, the bio-physical parameters associated with the water quality of the two important water bodies of Lake Okeechobee and Florida Bay were investigated based on remotely sensed data. The principal objective of this part of the study is to monitor and assess the spatial and temporal changes of water quality using the application of integrated remote sensing, GIS data, and statistical techniques. The optical bands in the region from blue to near infrared and all the possible band ratios were used to explore the relation between the reflectance of a waterbody and observed data. The developed MLR models appeared to be promising for monitoring and predicting the spatiotemporal dynamics of optically active and inactive water quality characteristics in Lake Okeechobee and Florida Bay. It is believed that the results of this study could be very useful to local authorities for the control and management of pollution and better protection of water quality in the most important water bodies of South Florida.

APA, Harvard, Vancouver, ISO, and other styles

45

Heeb, Thomas Gregory. "Examination of turbulent mixing with multiple second order chemical reactions by the statistical analysis technique /." The Ohio State University, 1986. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487267024995615.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Lopez, Montero Eduardo. "Use of multivariate statistical methods for control of chemical batch processes." Thesis, University of Manchester, 2016. https://www.research.manchester.ac.uk/portal/en/theses/use-of-multivariate-statistical-methods-for-control-of-chemical-batch-processes(6cf45624-2388-4e85-b4c6-99503547ad06).html.

Full text

Abstract:

In order to meet tight product quality specifications for chemical batch processes, it is vital to monitor and control product quality throughout the batch duration. However, the frequent lack of in situ sensors for continuous monitoring of batch product quality complicates the control problem and calls for novel control approaches. This thesis focuses on the study and application of multivariate statistical methods to control product quality in chemical batch processes. These multivariate statistical methods can be used to identify data-driven prediction models that can be integrated within a model predictive control (MPC) framework. The ideal MPC control strategy achieves end-product quality specifications by performing trajectory tracking during the batch operating time. However, due to the lack of in-situ sensors, measurements of product quality are usually obtained by laboratory assays and are, therefore, inherently intermittent. This thesis proposes a new approach to realise trajectory tracking control of batch product quality in those situations where only intermittent measurements are available. The scope of this methodology consists of: 1) the identification of a partial least squares (PLS) model that works as an estimator of product quality, 2) the transformation of the PLS model into a recursive formulation utilising a moving window technique, and 3) the incorporation of the recursive PLS model as a predictor into a standard MPC framework for tracking the desired trajectory of batch product quality. The structure of the recursive PLS model allows a straightforward incorporation of process constraints in the optimisation process. Additionally, a method to incorporate a nonlinear inner relation within the proposed PLS recursive model is introduced. This nonlinear inner relation is a combination of feedforward artificial neural networks (ANNs) and linear regression. Nonlinear models based on this method can predict product quality of highly nonlinear batch processes and can, therefore, be used within an MPC framework to control such processes. The use of linear regression in addition to ANNs within the PLS model reduces the risk of overfitting and also reduces the computational e↵ort of the optimisation carried out by the controller. The benefits of the proposed modelling and control methods are demonstrated using a number of simulated batch processes.

APA, Harvard, Vancouver, ISO, and other styles

47

Lawal, Najib. "Modelling and multivariate data analysis of agricultural systems." Thesis, University of Manchester, 2015. https://www.research.manchester.ac.uk/portal/en/theses/modelling-and-multivariate-data-analysis-of-agricultural-systems(f6b86e69-5cff-4ffb-a696-418662ecd694).html.

Full text

Abstract:

The broader research area investigated during this programme was conceived from a goal to contribute towards solving the challenge of food security in the 21st century through the reduction of crop loss and minimisation of fungicide use. This is aimed to be achieved through the introduction of an empirical approach to agricultural disease monitoring. In line with this, the SYIELD project, initiated by a consortium involving University of Manchester and Syngenta, among others, proposed a novel biosensor design that can electrochemically detect viable airborne pathogens by exploiting the biology of plant-pathogen interaction. This approach offers improvement on the inefficient and largely experimental methods currently used. Within this context, this PhD focused on the adoption of multidisciplinary methods to address three key objectives that are central to the success of the SYIELD project: local spore ingress near canopies, the evaluation of a suitable model that can describe spore transport, and multivariate analysis of the potential monitoring network built from these biosensors. The local transport of spores was first investigated by carrying out a field trial experiment at Rothamsted Research UK in order to investigate spore ingress in OSR canopies, generate reliable data for testing the prototype biosensor, and evaluate a trajectory model. During the experiment, spores were air-sampled and quantified using established manual detection methods. Results showed that the manual methods, such as colourimetric detection are more sensitive than the proposed biosensor, suggesting the proxy measurement mechanism used by the biosensor may not be reliable in live deployments where spores are likely to be contaminated by impurities and other inhibitors of oxalic acid production. Spores quantified using the more reliable quantitative Polymerase Chain Reaction proved informative and provided novel of data of high experimental value. The dispersal of this data was found to fit a power decay law, a finding that is consistent with experiments in other crops. In the second area investigated, a 3D backward Lagrangian Stochastic model was parameterised and evaluated with the field trial data. The bLS model, parameterised with Monin-Obukhov Similarity Theory (MOST) variables showed good agreement with experimental data and compared favourably in terms of performance statistics with a recent application of an LS model in a maize canopy. Results obtained from the model were found to be more accurate above the canopy than below it. This was attributed to a higher error during initialisation of release velocities below the canopy. Overall, the bLS model performed well and demonstrated suitability for adoption in estimating above-canopy spore concentration profiles which can further be used for designing efficient deployment strategies. The final area of focus was the monitoring of a potential biosensor network. A novel framework based on Multivariate Statistical Process Control concepts was proposed and applied to data from a pollution-monitoring network. The main limitation of traditional MSPC in spatial data applications was identified as a lack of spatial awareness by the PCA model when considering correlation breakdowns caused by an incoming erroneous observation. This resulted in misclassification of healthy measurements as erroneous. The proposed Kriging-augmented MSPC approach was able to incorporate this capability and significantly reduce the number of false alarms.

APA, Harvard, Vancouver, ISO, and other styles

48

Holland, Jennifer M. "An Exploration of the Ground Water Quality of the Trinity Aquifer Using Multivariate Statistical Techniques." Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc84218/.

Full text

Abstract:

The ground water quality of the Trinity Aquifer for wells sampled between 2000 and 2009 was examined using multivariate and spatial statistical techniques. A Kruskal-Wallis test revealed that all of the water quality parameters with the exception of nitrate vary with land use. A Spearman’s rho analysis illustrates that every water quality parameter with the exception of silica correlated with well depth. Factor analysis identified four factors contributable to hydrochemical processes, electrical conductivity, alkalinity, and the dissolution of parent rock material into the ground water. The cluster analysis generated seven clusters. A chi-squared analysis shows that Clusters 1, 2, 5, and 6 are reflective of the distribution of the entire dataset when looking specifically at land use categories. The nearest neighbor analysis revealed clustered, dispersed, and random patterns depending upon the entity being examined. The spatial autocorrelation technique used on the water quality parameters for the entire dataset identified that all of the parameters are random with the exception of pH which was found to be spatially clustered. The combination of the multivariate and spatial techniques together identified influences on the Trinity Aquifer including hydrochemical processes, agricultural activities, recharge, and land use. In addition, the techniques aided in identifying areas warranting future monitoring which are located in the western and southwestern parts of the aquifer.

APA, Harvard, Vancouver, ISO, and other styles

49

Kong, Xiaoli. "High Dimensional Multivariate Inference Under General Conditions." UKnowledge, 2018. https://uknowledge.uky.edu/statistics_etds/33.

Full text

Abstract:

In this dissertation, we investigate four distinct and interrelated problems for high-dimensional inference of mean vectors in multi-groups. The first problem concerned is the profile analysis of high dimensional repeated measures. We introduce new test statistics and derive its asymptotic distribution under normality for equal as well as unequal covariance cases. Our derivations of the asymptotic distributions mimic that of Central Limit Theorem with some important peculiarities addressed with sufficient rigor. We also derive consistent and unbiased estimators of the asymptotic variances for equal and unequal covariance cases respectively. The second problem considered is the accurate inference for high-dimensional repeated measures in factorial designs as well as any comparisons among the cell means. We derive asymptotic expansion for the null distributions and the quantiles of a suitable test statistic under normality. We also derive the estimator of parameters contained in the approximate distribution with second-order consistency. The most important contribution is high accuracy of the methods, in the sense that p-values are accurate up to the second order in sample size as well as in dimension. The third problem pertains to the high-dimensional inference under non-normality. We relax the commonly imposed dependence conditions which has become a standard assumption in high dimensional inference. With the relaxed conditions, the scope of applicability of the results broadens. The fourth problem investigated pertains to a fully nonparametric rank-based comparison of high-dimensional populations. To develop the theory in this context, we prove a novel result for studying the asymptotic behavior of quadratic forms in ranks. The simulation studies provide evidence that our methods perform reasonably well in the high-dimensional situation. Real data from Electroencephalograph (EEG) study of alcoholic and control subjects is analyzed to illustrate the application of the results.

APA, Harvard, Vancouver, ISO, and other styles

50

Idrus, Muhammad Rijal. "Multivariate morphometric analysis of seasonal changes in overwintering arctic charr (Salvelinus alpinus L.)." Thesis, McGill University, 1996. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=27346.

Full text

Abstract:

This study developed a robust technique for the assessment of morphometric differences among overwintering northern fish populations. Arctic charr were sampled soon before the freeze-up and just after ice break-up at two subarctic Quebec lakes. A homogenous sample of 397 fish was used. Regression analyses of the length-weight relationships and their derived condition indices were insufficient, due to their inherent limitations, to recognize the differences between sampling groups. A series of multivariate analyses (canonical, stepwise and discriminant analysis), based on eleven morphometric characters of the fish, provided a better assessment. The analysis recognized the distinctions between sampling groups, correctly classified 70-100% of the fish into their appropriate groupings, and indicated that body height measured at the anal opening was the most discriminatory variable. Landmark variables related to shape differences were effective in discriminating fish according to their lake of origin, whereas length and weight variables, which closely reflected the size differences, were better at distinguishing seasonal changes. The study provides a simple, efficient assessment method based on phenotypic variations to explain different survival strategies, and the associated life history traits, adopted by fish.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Statistical multivariate analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles