Dissertations / Theses: 'Robust functional data analysis'

1

Willersjö, Nyfelt Emil. "Comparison of the 1st and 2nd order Lee–Carter methods with the robust Hyndman–Ullah method for fitting and forecasting mortality rates." Thesis, Mälardalens högskola, Akademin för utbildning, kultur och kommunikation, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-48383.

Full text

Abstract:

The 1st and 2nd order Lee–Carter methods were compared with the Hyndman–Ullah method in regards to goodness of fit and forecasting ability of mortality rates. Swedish population data was used from the Human Mortality Database. The robust estimation property of the Hyndman–Ullah method was also tested with inclusion of the Spanish flu and a hypothetical scenario of the COVID-19 pandemic. After having presented the three methods and making several comparisons between the methods, it is concluded that the Hyndman–Ullah method is overall superior among the three methods with the implementation of the chosen dataset. Its robust estimation of mortality shocks could also be confirmed.

APA, Harvard, Vancouver, ISO, and other styles

2

Yao, Fang. "Functional data analysis for longitudinal data /." For electronic version search Digital dissertations database. Restricted to UC campuses. Access is free to UC campus dissertations, 2003. http://uclibs.org/PID/11984.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Hadjipantelis, Pantelis-Zenon. "Functional data analysis in phonetics." Thesis, University of Warwick, 2013. http://wrap.warwick.ac.uk/62527/.

Full text

Abstract:

The study of speech sounds has established itself as a distinct area of research, namely Phonetics. This is because speech production is a complex phenomenon mediated by the interaction of multiple components of a linguistic and non-linguistic nature. To investigate such phenomena, this thesis employs a Functional Data Analysis framework where speech segments are viewed as functions. FDA treats functions as its fundamental unit of analysis; the thesis takes advantage of this, both in conceptual as well as practical terms, achieving theoretical coherence as well as statistical robustness in its insights. The main techniques employed in this work are: Functional principal components analysis, Functional mixed-effects regression models and phylogenetic Gaussian process regression for functional data. As it will be shown, these techniques allow for complementary analyses of linguistic data. The thesis presents a series of novel applications of functional data analysis in Phonetics. Firstly, it investigates the influence linguistic information carries on the speech intonation patterns. It provides these insights through an analysis combining FPCA with a series of mixed effect models, through which meaningful categorical prototypes are built. Secondly, the interplay of phase and amplitude variation in functional phonetic data is investigated. A multivariate mixed effects framework is developed for jointly analysing phase and amplitude information contained in phonetic data. Lastly, the phylogenetic associations between languages within a multi-language phonetic corpus are analysed. Utilizing a small subset of related Romance languages, a phylogenetic investigation of the words' spectrograms (functional objects defined over two continua simultaneously) is conducted to showcase a proof-of-concept experiment allowing the interconnection between FDA and Evolutionary Linguistics.

APA, Harvard, Vancouver, ISO, and other styles

4

Lee, Ho-Jin. "Functional data analysis: classification and regression." Texas A&M University, 2004. http://hdl.handle.net/1969.1/2805.

Full text

Abstract:

Functional data refer to data which consist of observed functions or curves evaluated at a finite subset of some interval. In this dissertation, we discuss statistical analysis, especially classification and regression when data are available in function forms. Due to the nature of functional data, one considers function spaces in presenting such type of data, and each functional observation is viewed as a realization generated by a random mechanism in the spaces. The classification procedure in this dissertation is based on dimension reduction techniques of the spaces. One commonly used method is Functional Principal Component Analysis (Functional PCA) in which eigen decomposition of the covariance function is employed to find the highest variability along which the data have in the function space. The reduced space of functions spanned by a few eigenfunctions are thought of as a space where most of the features of the functional data are contained. We also propose a functional regression model for scalar responses. Infinite dimensionality of the spaces for a predictor causes many problems, and one such problem is that there are infinitely many solutions. The space of the parameter function is restricted to Sobolev-Hilbert spaces and the loss function, so called, e-insensitive loss function is utilized. As a robust technique of function estimation, we present a way to find a function that has at most e deviation from the observed values and at the same time is as smooth as possible.

APA, Harvard, Vancouver, ISO, and other styles

5

Friman, Ola. "Adaptive analysis of functional MRI data /." Linköping : Univ, 2003. http://www.bibl.liu.se/liupubl/disp/disp2003/tek836s.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Zoglat, Abdelhak. "Analysis of variance for functional data." Thesis, University of Ottawa (Canada), 1994. http://hdl.handle.net/10393/10136.

Full text

Abstract:

In this dissertation we present an extension to the well known theory of multivariate analysis of variance. In various situations data are continuous stochastic functions of time or space. The speed of pollutants diffusing through a river, the real amplitude of a signal received from a broadcasting satellite, or the hydraulic conductivity rates at a given region are examples of such processes. After the mathematical background we develop tools for analyzing such data. Namely, we develop estimators, tests, and confidence sets for the parameters of interest. We extend these results, obtained under the normality assumption, and show that they are still valid if this assumption is relaxed. Some examples of applications of our techniques are given. We also outline how the latter can apply to random and mixed models for continuous data. In the appendix, we give some programs which we use to compute the distributions of some of our tests statistics.

APA, Harvard, Vancouver, ISO, and other styles

7

Martinenko, Evgeny. "Functional Data Analysis and its application to cancer data." Doctoral diss., University of Central Florida, 2014. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/6323.

Full text

Abstract:

The objective of the current work is to develop novel procedures for the analysis of functional data and apply them for investigation of gender disparity in survival of lung cancer patients. In particular, we use the time-dependent Cox proportional hazards model where the clinical information is incorporated via time-independent covariates, and the current age is modeled using its expansion over wavelet basis functions. We developed computer algorithms and applied them to the data set which is derived from Florida Cancer Data depository data set (all personal information which allows to identify patients was eliminated). We also studied the problem of estimation of a continuous matrix-variate function of low rank. We have constructed an estimator of such function using its basis expansion and subsequent solution of an optimization problem with the Schattennorm penalty. We derive an oracle inequality for the constructed estimator, study its properties via simulations and apply the procedure to analysis of Dynamic Contrast medical imaging data.
Ph.D.
Doctorate
Mathematics
Sciences
Mathematics

APA, Harvard, Vancouver, ISO, and other styles

8

Kröger, Viktor. "Classification in Functional Data Analysis : Applications on Motion Data." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184963.

Full text

Abstract:

Anterior cruciate knee ligament injuries are common and well known, especially amongst athletes.These injuries often require surgeries and long rehabilitation programs, and can lead to functionloss and re-injuries (Marshall et al., 1977). This work aims to explore the possibility of applyingsupervised classification on knee functionality, using different types of models, and testing differentdivisions of classes. The data used is gathered through a performance test, where individualsperform one-leg hops with motion sensors attached to their bodies. The obtained data representsthe position over time, and is considered functional data.With functional data analysis (FDA), a process can be analysed as a continuous function of time,instead of being reduced to finite data points. FDA includes many useful tools, but also somechallenges. A functional observation can for example be differentiated, a handy tool not found inthe multivariate tool-box. The speed, and acceleration, can then be calculated from the obtaineddata. How to define "similarity" is, on the other hand, not as obvious as with points. In this work,an FDA-approach is taken on classifying knee kinematic data, from a long-term follow-up studyon knee ligament injuries.This work studies kernel functional classifiers, and k-nearest neighbours models, and performssignificance tests on the model accuracy, using re-sampling methods. Additionally, depending onhow similarity is defined, the models can distinguish different features of the data. Attempts atutilising more information through incorporation of ensemble-methods, does not exceed the singlemodels it is created from. Further, it is shown that classification on optimised sub-domains, canbe superior to classifiers using the full domain, in terms of predictive power.
Främre korsbandsskador är vanliga och välkända skador, speciellt bland idrottsutövare. Skadornakräver ofta operationer och långa rehabiliteringsprogram, och kan leda till funktionell nedsättningoch återskador (Marshall et al., 1977). Målet med det här arbetet är att utforska möjligheten attklassificera knän utifrån funktionalitet, där utfallet är känt. Detta genom att använda olika typerav modeller, och genom att testa olika indelningar av grupper. Datat som används är insamlatunder ett prestandatest, där personer hoppat på ett ben med rörelsesensorer på kroppen. Deninsamlade datan representerar position över tid, och betraktas som funktionell data.Med funktionell dataanalys (FDA) kan en process analyseras som en kontinuerlig funktion av tid,istället för att reduceras till ett ändligt antal datapunkter. FDA innehåller många användbaraverktyg, men även utmaningar. En funktionell observation kan till exempel deriveras, ett händigtverktyg som inte återfinns i den multivariata verktygslådan. Hastigheten och accelerationen kandå beräknas utifrån den insamlade datan. Hur "likhet" är definierat, å andra sidan, är inte likauppenbart som med punkt-data. I det här arbetet används FDA för att klassificera knärörelsedatafrån en långtidsuppföljningsstudie av främre korsbandsskador.I detta arbete studeras både funktionella kärnklassificerare och k-närmsta grannar-metoder, och ut-för signifikanstest av modellträffsäkerheten genom omprovtagning. Vidare kan modellerna urskiljaolika egenskaper i datat, beroende på hur närhet definieras. Ensemblemetoder används i ett försökatt nyttja mer av informationen, men lyckas inte överträffa någon av de enskilda modellerna somutgör ensemblen. Vidare så visas också att klassificering på optimerade deldefinitionsmängder kange en högre förklaringskraft än klassificerare som använder hela definitionsmängden.

APA, Harvard, Vancouver, ISO, and other styles

9

Anderson, Joseph T. "Geometric Methods for Robust Data Analysis in High Dimension." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1488372786126891.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Alshabani, Ali Khair Saber. "Statistical analysis of human movement functional data." Thesis, University of Nottingham, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.421478.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Benko, Michal. "Functional data analysis with applications in finance." Doctoral thesis, Humboldt-Universität zu Berlin, Wirtschaftswissenschaftliche Fakultät, 2007. http://dx.doi.org/10.18452/15585.

Full text

Abstract:

An vielen verschiedenen Stellen der angewandten Statistik sind die zu untersuchenden Objekte abhängig von stetigen Parametern. Typische Beispiele in Finanzmarktapplikationen sind implizierte Volatilitäten, risikoneutrale Dichten oder Zinskurven. Aufgrund der Marktkonventionen sowie weiteren technisch bedingten Gründen sind diese Objekte nur an diskreten Punkten, wie zum Beispiel an Ausübungspreise und Maturitäten, für die ein Geschäft in einem bestimmten Zeitraum abgeschlossen wurde, beobachtbar. Ein funktionaler Datensatz ist dann vorhanden, wenn diese Funktionen für verschiedene Zeitpunkte (z.B. Tage) oder verschiedene zugrundeliegende Aktiva gesammelt werden. Das erste Thema, das in dieser Dissertation betrachtet wird, behandelt die nichtparametrischen Methoden der Schätzung dieser Objekte (wie z.B. implizierte Volatilitäten) aus den beobachteten Daten. Neben den bekannten Glättungsmethoden wird eine Prozedur für die Glättung der implizierten Volatilitäten vorgeschlagen, die auf einer Kombination von nichtparametrischer Glättung und den Ergebnissen der arbitragefreien Theorie basiert. Der zweite Teil der Dissertation ist der funktionalen Datenanalyse (FDA), speziell im Zusammenhang mit den Problemen, der empirischen Finanzmarktanalyse gewidmet. Der theoretische Teil der Arbeit konzentriert sich auf die funktionale Hauptkomponentenanalyse -- das funktionale Ebenbild der bekannten Dimensionsreduktionstechnik. Ein umfangreicher überblick der existierenden Methoden wird gegeben, eine Schätzmethode, die von der Lösung des dualen Problems motiviert ist und die Zwei-Stichproben-Inferenz basierend auf der funktionalen Hauptkomponentenanalyse werden behandelt. Die FDA-Techniken sind auf die Analyse der implizierten Volatilitäten- und Zinskurvendynamik angewandt worden. Darüber hinaus, wird die Implementation der FDA-Techniken zusammen mit einer FDA-Bibliothek für die statistische Software Xplore behandelt.
In many different fields of applied statistics an object of interest is depending on some continuous parameter. Typical examples in finance are implied volatility functions, yield curves or risk-neutral densities. Due to the different market conventions and further technical reasons, these objects are observable only on a discrete grid, e.g. for a grid of strikes and maturities for which the trade has been settled at a given time-point. By collecting these functions for several time points (e.g. days) or for different underlyings, a bunch (sample) of functions is obtained - a functional data set. The first topic considered in this thesis concerns the strategies of recovering the functional objects (e.g. implied volatilities function) from the observed data based on the nonparametric smoothing methods. Besides the standard smoothing methods, a procedure based on a combination of nonparametric smoothing and the no-arbitrage-theory results is proposed for implied volatility smoothing. The second part of the thesis is devoted to the functional data analysis (FDA) and its connection to the problems present in the empirical analysis of the financial markets. The theoretical part of the thesis focuses on the functional principal components analysis -- functional counterpart of the well known multivariate dimension-reduction-technique. A comprehensive overview of the existing methods is given, an estimation method based on the dual problem as well as the two-sample inference based on the functional principal component analysis are discussed. The FDA techniques are applied to the analysis of the implied volatility and yield curve dynamics. In addition, the implementation of the FDA techniques together with a FDA library for the statistical environment XploRe are presented.

APA, Harvard, Vancouver, ISO, and other styles

12

Prentius, Wilmer. "Exploring Cumulative Incomefunctions by Functional Data Analysis." Thesis, Umeå universitet, Statistik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-122685.

Full text

Abstract:

Cumulative incomes can be seen as the added yearly incomes for some distinct amount of years. It can also be thought of as a continuous curve, where income continuously flows into ones account. The analyzing of curves, or functions, instead of uni- or multivariate data, needs and enables different approaches. In this thesis, methods called Functional Data Analysis are used to show how analyzes of such cumulative income curves can be done, mainly through functional adaptions of principal component analysis and linear regression. Results shows how the smoothing of curves helps to decrease variances in a bias-variance trade-off, while having problems accounting for data containing many low valued observations. Furthermore, results indicates that education might have an effect, when controlling for employment rate, in the sample.

APA, Harvard, Vancouver, ISO, and other styles

13

Zhang, Zongjun. "Adaptive Robust Regression Approaches in data analysis and their Applications." University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1445343114.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Hu, Zonghui. "Semiparametric functional data analysis for longitudinal/clustered data: theory and application." Texas A&M University, 2004. http://hdl.handle.net/1969.1/3088.

Full text

Abstract:

Semiparametric models play important roles in the ﬁeld of biological statistics. In this dissertation, two types of semiparametic models are to be studied. One is the partially linear model, where the parametric part is a linear function. We are to investigate the two common estimation methods for the partially linear models when the data is correlated Â longitudinal or clustered. The other is a semiparametric model where a latent covariate is incorporated in a mixed effects model. We will propose a semiparametric approach for estimation of this model and apply it to the study on colon carcinogenesis. First, we study the proﬁlekernel and backﬁtting methods in partially linear models for clustered/longitudinal data. For independent data, despite the potential rootn inconsistency of the backﬁtting estimator noted by Rice (1986), the two estimators have the same asymptotic variance matrix as shown by Opsomer and Ruppert (1999). In this work, theoretical comparisons of the two estimators for multivariate responses are investigated. We show that, for correlated data, backﬁtting often produces a larger asymptotic variance than the proﬁlekernel method; that is, in addition to its bias problem, the backﬁtting estimator does not have the same asymptotic efﬁciency as the proﬁlekernel estimator when data is correlated. Consequently, the common practice of using the backﬁtting method to compute proﬁlekernel estimates is no longer advised. We illustrate this in detail by following Zeger and Diggle (1994), Lin and Carroll (2001) with a working independence covariance structure for nonparametric estimation and a correlated covariance structure for parametric estimation. Numerical performance of the two estimators is investigated through a simulation study. Their application to an ophthalmology dataset is also described. Next, we study a mixed effects model where the main response and covariate variables are linked through the positions where they are measured. But for technical reasons, they are not measured at the same positions. We propose a semiparametric approach for this misaligned measurements problem and derive the asymptotic properties of the semiparametric estimators under reasonable conditions. An application of the semiparametric method to a colon carcinogenesis study is provided. We ﬁnd that, as compared with the corn oil supplemented diet, ﬁsh oil supplemented diet tends to inhibit the increment of bcl2 (oncogene) gene expression in rats when the amount of DNA damage increases, and thus promotes apoptosis.

APA, Harvard, Vancouver, ISO, and other styles

15

Jiang, Huijing. "Statistical computation and inference for functional data analysis." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37087.

Full text

Abstract:

My doctoral research dissertation focuses on two aspects of functional data analysis (FDA): FDA under spatial interdependence and FDA for multi-level data. The first part of my thesis focuses on developing modeling and inference procedure for functional data under spatial dependence. The methodology introduced in this part is motivated by a research study on inequities in accessibility to financial services. The first research problem in this part is concerned with a novel model-based method for clustering random time functions which are spatially interdependent. A cluster consists of time functions which are similar in shape. The time functions are decomposed into spatial global and time-dependent cluster effects using a semi-parametric model. We also assume that the clustering membership is a realization from a Markov random field. Under these model assumptions, we borrow information across curves from nearby locations resulting in enhanced estimation accuracy of the cluster effects and of the cluster membership. In a simulation study, we assess the estimation accuracy of our clustering algorithm under a series of settings: small number of time points, high noise level and varying dependence structures. Over all simulation settings, the spatial-functional clustering method outperforms existing model-based clustering methods. In the case study presented in this project, we focus on estimates and classifies service accessibility patterns varying over a large geographic area (California and Georgia) and over a period of 15 years. The focus of this study is on financial services but it generally applies to any other service operation. The second research project of this part studies an association analysis of space-time varying processes, which is rigorous, computational feasible and implementable with standard software. We introduce general measures to model different aspects of the temporal and spatial association between processes varying in space and time. Using a nonparametric spatiotemporal model, we show that the proposed association estimators are asymptotically unbiased and consistent. We complement the point association estimates with simultaneous confidence bands to assess the uncertainty in the point estimates. In a simulation study, we evaluate the accuracy of the association estimates with respect to the sample size as well as the coverage of the confidence bands. In the case study in this project, we investigate the association between service accessibility and income level. The primary objective of this association analysis is to assess whether there are significant changes in the income-driven equity of financial service accessibility over time and to identify potential under-served markets. The second part of the thesis discusses novel statistical methodology for analyzing multilevel functional data including a clustering method based on a functional ANOVA model and a spatio-temporal model for functional data with a nested hierarchical structure. In this part, I introduce and compare a series of clustering approaches for multilevel functional data. For brevity, I present the clustering methods for two-level data: multiple samples of random functions, each sample corresponding to a case and each random function within a sample/case corresponding to a measurement type. A cluster consists of cases which have similar within-case means (level-1 clustering) or similar between-case means (level-2 clustering). Our primary focus is to evaluate a model-based clustering to more straightforward hard clustering methods. The clustering model is based on a multilevel functional principal component analysis. In a simulation study, we assess the estimation accuracy of our clustering algorithm under a series of settings: small vs. moderate number of time points, high noise level and small number of measurement types. We demonstrate the applicability of the clustering analysis to a real data set consisting of time-varying sales for multiple products sold by a large retailer in the U.S. My ongoing research work in multilevel functional data analysis is developing a statistical model for estimating temporal and spatial associations of a series of time-varying variables with an intrinsic nested hierarchical structure. This work has a great potential in many real applications where the data are areal data collected from different data sources and over geographic regions of different spatial resolution.

APA, Harvard, Vancouver, ISO, and other styles

16

Wang, Wei. "Linear mixed effects models in functional data analysis." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/253.

Full text

Abstract:

Regression models with a scalar response and a functional predictor have been extensively studied. One approach is to approximate the functional predictor using basis function or eigenfunction expansions. In the expansion, the coefficient vector can either be fixed or random. The random coefficient vector is also known as random effects and thus the regression models are in a mixed effects framework. The random effects provide a model for the within individual covariance of the observations. But it also introduces an additional parameter into the model, the covariance matrix of the random effects. This additional parameter complicates the covariance matrix of the observations. Possibly, the covariance parameters of the model are not identifiable. We study identifiability in normal linear mixed effects models. We derive necessary and sufficient conditions of identifiability, particularly, conditions of identifiability for the regression models with a scalar response and a functional predictor using random effects. We study the regression model using the eigenfunction expansion approach with random effects. We assume the random effects have a general covariance matrix and the observed values of the predictor are contaminated with measurement error. We propose methods of inference for the regression model's functional coefficient. As an application of the model, we analyze a biological data set to investigate the dependence of a mouse's wheel running distance on its body mass trajectory.

APA, Harvard, Vancouver, ISO, and other styles

17

Wagner, Heiko [Verfasser]. "A Contribution to Functional Data Analysis / Heiko Wagner." Bonn : Universitäts- und Landesbibliothek Bonn, 2016. http://d-nb.info/1122193726/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Li, Yehua. "Topics in functional data analysis with biological applications." [College Station, Tex. : Texas A&M University, 2006. http://hdl.handle.net/1969.1/ETD-TAMU-1867.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Rubanova, Natalia. "MasterPATH : network analysis of functional genomics screening data." Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCC109/document.

Full text

Abstract:

Dans ce travail nous avons élaboré une nouvelle méthode de l'analyse de réseau à définir des membres possibles des voies moléculaires qui sont important pour ce phénotype en utilisant la « hit-liste » des expériences « omics » qui travaille dans le réseau intégré (le réseau comprend des interactions protéine-protéine, de transcription, l’acide ribonucléique micro-l’acide ribonucléique messager et celles métaboliques). La méthode tire des sous-réseaux qui sont construit des voies de quatre types les plus courtes (qui ne se composent des interactions protéine-protéine, ayant au minimum une interaction de transcription, ayant au minimum une interaction l’acide ribonucléique micro-l’acide ribonucléique messager, ayant au minimum une interaction métabolique) entre des hit –gènes et des soi-disant « exécuteurs terminaux » - les composants biologiques qui participent à la réalisation du phénotype finale (s’ils sont connus) ou entre les hit-gènes (si « des exécuteurs terminaux » sont inconnus). La méthode calcule la valeur de la centralité de chaque point culminant et de chaque voie dans le sous-réseau comme la quantité des voies les plus courtes trouvées sur la route précédente et passant à travers le point culminant et la voie. L'importance statistique des valeurs de la centralité est estimée en comparaison avec des valeurs de la centralité dans les sous-réseaux construit des voies les plus courtes pour les hit-listes choisi occasionnellement. Il est supposé que les points culminant et les voies avec les valeurs de la centralité statistiquement signifiantes peuvent être examinés comme les membres possibles des voies moléculaires menant à ce phénotype. S’il y a des valeurs expérimentales et la P-valeur pour un grand nombre des points culminant dans le réseau, la méthode fait possible de calculer les valeurs expérimentales pour les voies (comme le moyen des valeurs expérimentales des points culminant sur la route) et les P-valeurs expérimentales (en utilisant la méthode de Fischer et des transpositions multiples).A l'aide de la méthode masterPATH on a analysé les données de la perte de fonction criblage de l’acide ribonucléique micro et l'analyse de transcription de la différenciation terminal musculaire et les données de la perte de fonction criblage du procès de la réparation de l'ADN. On peut trouver le code initial de la méthode si l’on suit le lien https://github.com/daggoo/masterPATH
In this work we developed a new exploratory network analysis method, that works on an integrated network (the network consists of protein-protein, transcriptional, miRNA-mRNA, metabolic interactions) and aims at uncovering potential members of molecular pathways important for a given phenotype using hit list dataset from “omics” experiments. The method extracts subnetwork built from the shortest paths of 4 different types (with only protein-protein interactions, with at least one transcription interaction, with at least one miRNA-mRNA interaction, with at least one metabolic interaction) between hit genes and so called “final implementers” – biological components that are involved in molecular events responsible for final phenotypical realization (if known) or between hit genes (if “final implementers” are not known). The method calculates centrality score for each node and each path in the subnetwork as a number of the shortest paths found in the previous step that pass through the node and the path. Then, the statistical significance of each centrality score is assessed by comparing it with centrality scores in subnetworks built from the shortest paths for randomly sampled hit lists. It is hypothesized that the nodes and the paths with statistically significant centrality score can be considered as putative members of molecular pathways leading to the studied phenotype. In case experimental scores and p-values are available for a large number of nodes in the network, the method can also calculate paths’ experiment-based scores (as an average of the experimental scores of the nodes in the path) and experiment-based p-values (by aggregating p-values of the nodes in the path using Fisher’s combined probability test and permutation approach). The method is illustrated by analyzing the results of miRNA loss-of-function screening and transcriptomic profiling of terminal muscle differentiation and of ‘druggable’ loss-of-function screening of the DNA repair process. The Java source code is available on GitHub page https://github.com/daggoo/masterPATH

APA, Harvard, Vancouver, ISO, and other styles

20

Sarmad, Majid. "Robust data analysis for factorial experimental designs : improved methods and software." Thesis, Durham University, 2006. http://etheses.dur.ac.uk/2432/.

Full text

Abstract:

Factorial experimental designs are a large family of experimental designs. Robust statistics has been a subject of considerable research in recent decades. Therefore, robust analysis of factorial designs is applicable to many real problems. Seheult and Tukey (2001) suggested a method of robust analysis of variance for a full factorial design without replication. Their method is generalised for many other factorial designs without the restriction of one observation in each cell. Furthermore, a new algorithm to decompose data from a factorial design is introduced and programmed in the statistical computer package R. The whole procedure of robust data analysis is also programmed in R and it is intended to submit the library to the repository of R software, CRAN. In the procedure of robust data analysis, a cut-off value is needed to detect possible outliers. A set of optimum cut-off values for univariate data and some dimensions of two-way designs (complete and incomplete) has also been provided using an improved design of simulation study.

APA, Harvard, Vancouver, ISO, and other styles

21

Ait, Si Ali Amine. "Custom IP cores for robust data analysis and pattern recognition algorithms." Thesis, University of the West of Scotland, 2016. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.739192.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Kim, Yoon G. "A response surface approach to data analysis in robust parameter design." Diss., Virginia Tech, 1992. http://hdl.handle.net/10919/38627.

Full text

Abstract:

It has become obvious that combined arrays and a response surface approach can be effective tools in our quest to reduce (process) variability. An important aspect of the improvement of quality is to suppress the magnitude of the influence coming from subtle changes of noise factors. To model and control process variability induced by noise factors we take a response surface approach. The derivative of the standard response function with respect to noise factors, i. e., the slopes of the response function in the direction of the noise factors, play an important role in the study of the minimum process variance. For better understanding of the process variability, we study various properties of both biased and the unbiased estimators of the process variance. Response surface modeling techniques and the ideas involved with variance modeling and estimation through the function of the aforementioned derivatives is a valuable concept in this study. In what follows, we describe the use of the response surface methodology for situations in which noise factors are used. The approach is to combine Taguchi's notion of heterogeneous variability with standard design and modeling techniques available in response surface methodology.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

23

Paszkowski-Rogacz, Maciej. "Integration and analysis of phenotypic data from functional screens." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2011. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-63063.

Full text

Abstract:

Motivation: Although various high-throughput technologies provide a lot of valuable information, each of them is giving an insight into different aspects of cellular activity and each has its own limitations. Thus, a complete and systematic understanding of the cellular machinery can be achieved only by a combined analysis of results coming from different approaches. However, methods and tools for integration and analysis of heterogenous biological data still have to be developed. Results: This work presents systemic analysis of basic cellular processes, i.e. cell viability and cell cycle, as well as embryonic stem cell pluripotency and differentiation. These phenomena were studied using several high-throughput technologies, whose combined results were analysed with existing and novel clustering and hit selection algorithms. This thesis also introduces two novel data management and data analysis tools. The first, called DSViewer, is a database application designed for integrating and querying results coming from various genome-wide experiments. The second, named PhenoFam, is an application performing gene set enrichment analysis by employing structural and functional information on families of protein domains as annotation terms. Both programs are accessible through a web interface. Conclusions: Eventually, investigations presented in this work provide the research community with novel and markedly improved repertoire of computational tools and methods that facilitate the systematic analysis of accumulated information obtained from high-throughput studies into novel biological insights.

APA, Harvard, Vancouver, ISO, and other styles

24

Liu, Haiyan [Verfasser]. "On Functional Data Analysis with Dependent Errors / Haiyan Liu." Konstanz : Bibliothek der Universität Konstanz, 2016. http://d-nb.info/1114894222/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Vogetseder, Georg. "Functional Analysis of Real World Truck Fuel Consumption Data." Thesis, Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-1148.

Full text

Abstract:

This thesis covers the analysis of sparse and irregular fuel consumption data of long

distance haulage articulate trucks. It is shown that this kind of data is hard to analyse with multivariate as well as with functional methods. To be able to analyse the data, Principal Components Analysis through Conditional Expectation (PACE) is used, which enables the use of observations from many trucks to compensate for the sparsity of observations in order to get continuous results. The principal component scores generated by PACE, can then be used to get rough estimates of the trajectories for single trucks as well as to detect outliers. The data centric approach of PACE is very useful to enable functional analysis of sparse and irregular data. Functional analysis is desirable for this data to sidestep feature extraction and enabling a more natural view on the data.

APA, Harvard, Vancouver, ISO, and other styles

26

Wang, Shanshan. "Exploring and modeling online auctions using functional data analysis." College Park, Md. : University of Maryland, 2007. http://hdl.handle.net/1903/6962.

Full text

Abstract:

Thesis (Ph. D.) -- University of Maryland, College Park, 2007.
Thesis research directed by: Mathematics. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

27

Doehring, Orlando. "Peak selection in metabolic profiles using functional data analysis." Thesis, Imperial College London, 2013. http://hdl.handle.net/10044/1/11062.

Full text

Abstract:

In this thesis we describe sparse principal component analysis (PCA) methods and apply them to the analysis of short multivariate time series in order to perform both dimensionality reduction and variable selection. We take a functional data analysis (FDA) modelling approach in which each time series is treated as a continuous smooth function of time or curve. These techniques have been applied to analyse time series data arising in the area of metabonomics. Metabonomics studies chemical processes involving small molecule metabolites in a cell. We use experimental data obtained from the COnsortium for MEtabonomic Toxicology (COMET) project which is formed by six pharmaceutical companies and Imperial College London, UK. In the COMET project repeated measurements of several metabolites over time were collected which are taken from rats subjected to different drug treatments. The aim of our study is to detect important metabolites by analysing the multivariate time series. Multivariate functional PCA is an exploratory technique to describe the observed time series. In its standard form, PCA involves linear combinations of all variables (i.e. metabolite peaks) and does not perform variable selection. In order to select a subset of important metabolites we introduce sparsity into the model. We develop a novel functional Sparse Grouped Principal Component Analysis (SGPCA) algorithm using ideas related to Least Absolute Shrinkage and Selection Operator (LASSO), a regularized regression technique, with grouped variables. This SGPCA algorithm detects a sparse linear combination of metabolites which explain a large proportion of the variance. Apart from SGPCA, we also propose two alternative approaches for metabolite selection. The first one is based on thresholding the multivariate functional PCA solution, while the second method computes the variance of each metabolite curve independently and then proceeds to these rank curves in decreasing order of importance. To the best of our knowledge, this is the first application of sparse functional PCA methods to the problem of modelling multivariate metabonomic time series data and selecting a subset of metabolite peaks. We present comprehensive experimental results using simulated data and COMET project data for different multivariate and functional PCA variants from the literature and for SGPCA . Simulation results show that that the SGPCA algorithm recovers a high proportion of truly important metabolite variables. Furthermore, in the case of SGPCA applied to the COMET dataset we identify a small number of important metabolites independently for two different treatment conditions. A comparison of selected metabolites in both treatment conditions reveals that there is an overlap of over 75 percent.

APA, Harvard, Vancouver, ISO, and other styles

28

Sheppard, Therese. "Extending covariance structure analysis for multivariate and functional data." Thesis, University of Manchester, 2010. https://www.research.manchester.ac.uk/portal/en/theses/extending-covariance-structure-analysis-for-multivariate-and-functional-data(e2ad7f12-3783-48cf-b83c-0ca26ef77633).html.

Full text

Abstract:

For multivariate data, when testing homogeneity of covariance matrices arising from two or more groups, Bartlett's (1937) modified likelihood ratio test statistic is appropriate to use under the null hypothesis of equal covariance matrices where the null distribution of the test statistic is based on the restrictive assumption of normality. Zhang and Boos (1992) provide a pooled bootstrap approach when the data cannot be assumed to be normally distributed. We give three alternative bootstrap techniques to testing homogeneity of covariance matrices when it is both inappropriate to pool the data into one single population as in the pooled bootstrap procedure and when the data are not normally distributed. We further show that our alternative bootstrap methodology can be extended to testing Flury's (1988) hierarchy of covariance structure models. Where deviations from normality exist, we show, by simulation, that the normal theory log-likelihood ratio test statistic is less viable compared with our bootstrap methodology. For functional data, Ramsay and Silverman (2005) and Lee et al (2002) together provide four computational techniques for functional principal component analysis (PCA) followed by covariance structure estimation. When the smoothing method for smoothing individual profiles is based on using least squares cubic B-splines or regression splines, we find that the ensuing covariance matrix estimate suffers from loss of dimensionality. We show that ridge regression can be used to resolve this problem, but only for the discretisation and numerical quadrature approaches to estimation, and that choice of a suitable ridge parameter is not arbitrary. We further show the unsuitability of regression splines when deciding on the optimal degree of smoothing to apply to individual profiles. To gain insight into smoothing parameter choice for functional data, we compare kernel and spline approaches to smoothing individual profiles in a nonparametric regression context. Our simulation results justify a kernel approach using a new criterion based on predicted squared error. We also show by simulation that, when taking account of correlation, a kernel approach using a generalized cross validatory type criterion performs well. These data-based methods for selecting the smoothing parameter are illustrated prior to a functional PCA on a real data set.

APA, Harvard, Vancouver, ISO, and other styles

29

Cheng, Yafeng. "Functional regression analysis and variable selection for motion data." Thesis, University of Newcastle upon Tyne, 2016. http://hdl.handle.net/10443/3150.

Full text

Abstract:

Modern technology o ers us highly evolved data collection devices. They allow us to observe data densely over continua such as time, distance, space and so on. The observations are normally assumed to follow certain continuous and smooth underline functions of the continua. Thus the analysis must consider two important properties of functional data: infinite dimension and the smoothness. Traditional multivariate data analysis normally works with low dimension and independent data. Therefore, we need to develop new methodology to conduct functional data analysis. In this thesis, we first study the linear relationship between a scalar variable and a group of functional variables using three di erent discrete methods. We combine this linear relationship with the idea from least angle regression to propose a new variable selection method, named as functional LARS. It is designed for functional linear regression with scalar response and a group of mixture of functional and scalar variables. We also propose two new stopping rules for the algorithm, since the conventional stopping rules may fail for functional data. The algorithm can be used when there are more variables than samples. The performance of the algorithm and the stopping rules is compared with existed algorithms by comprehensive simulation studies. The proposed algorithm is applied to analyse motion data including scalar response, more than 200 scalar covariates and 500 functional covariates. Models with or without functional variables are compared. We have achieved very accurate results for this complex data particularly the models including functional covariates. The research in functional variable selection is limited due to its complexity and onerous computational burdens. We have demonstrated that the proposed functional LARS is a very e cient method and can cope with functional data very large dimension. The methodology and the idea have the potential to be used to address other challenging problems in functional data analysis.

APA, Harvard, Vancouver, ISO, and other styles

30

Harrington, Justin. "Extending linear grouping analysis and robust estimators for very large data sets." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/845.

Full text

Abstract:

Cluster analysis is the study of how to partition data into homogeneous subsets so that the partitioned data share some common characteristic. In one to three dimensions, the human eye can distinguish well between clusters of data if clearly separated. However, when there are more than three dimensions and/or the data is not clearly separated, an algorithm is required which needs a metric of similarity that quantitatively measures the characteristic of interest. Linear Grouping Analysis (LGA, Van Aelst et al. 2006) is an algorithm for clustering data around hyperplanes, and is most appropriate when: 1) the variables are related/correlated, which results in clusters with an approximately linear structure; and 2) it is not natural to assume that one variable is a “response”, and the remainder the “explanatories”. LGA measures the compactness within each cluster via the sum of squared orthogonal distances to hyperplanes formed from the data. In this dissertation, we extend the scope of problems to which LGA can be applied. The first extension relates to the linearity requirement inherent within LGA, and proposes a new method of non-linearly transforming the data into a Feature Space, using the Kernel Trick, such that in this space the data might then form linear clusters. A possible side effect of this transformation is that the dimension of the transformed space is significantly larger than the number of observations in a given cluster, which causes problems with orthogonal regression. Therefore, we also introduce a new method for calculating the distance of an observation to a cluster when its covariance matrix is rank deficient. The second extension concerns the combinatorial problem for optimizing a LGA objective function, and adapts an existing algorithm, called BIRCH, for use in providing fast, approximate solutions, particularly for the case when data does not fit in memory. We also provide solutions based on BIRCH for two other challenging optimization problems in the field of robust statistics, and demonstrate, via simulation study as well as application on actual data sets, that the BIRCH solution compares favourably to the existing state-of-the-art alternatives, and in many cases finds a more optimal solution.

APA, Harvard, Vancouver, ISO, and other styles

31

Gromski, Piotr Sebastian. "Application of chemometrics for the robust analysis of chemical and biochemical data." Thesis, University of Manchester, 2015. https://www.research.manchester.ac.uk/portal/en/theses/application-of-chemometrics-for-the-robust-analysis-of-chemical-and-biochemical-data(3049006f-e218-4286-83a8-e1fd85004366).html.

Full text

Abstract:

In the last two decades chemometrics has become an essential tool for the experimental biologist and chemist. The level of contribution varies strongly depending on the type of research performed. Therefore, chemometrics may be used to interpret and explain results, to compare experimental data with real-word ‘unseen’ data, to accurately detect certain chemical vapour, to identify cancerous related metabolites, to identify and rank potentially relevant/important variables or simply just for a pictorial interpretation and understanding of the results. Whilst many chemometrics methods are well-established in the area of chemistry and metabolomics many scientists are still using them with what is often referred to as a ‘black-box’ approach, that is without prior knowledge of the methods and well-recognised statistical properties. This lack of knowledge is thanks to the wide availability of powerful computers and – perhaps more notably – up-to-date, easy to use and reliable software. The main aim of this study is to reduce this gap by providing extensive demonstration of several approaches applied at different stages of the data analysis pipeline highlighting the importance of appropriate method selection. The comparisons are based both on chemical and biochemical (metabolomics) data and construct a firm basis for the researchers in terms of understanding of chemometric methods and the influence of parameter selection. Consequently, in this thesis the exploration and comparison of different approaches employed for various statistical steps are investigated. These include pre-treatment steps such as dealing with missing data and scaling. First, different substitution of missing values and their influence on unsupervised and supervised learning have been compared, where it has been shown that metabolites that display skewness in distribution can have a significant impact on the replacement approach. The scaling approaches were compared in terms of effect on classification accuracy for variety of metabolomics data sets. It was shown that the most standard option which is autoscaling is not always the best. In the next step a comparison of various variable selection methods which are commonly used for the analysis of chemical data has been carried out. The results revealed that random forests, with its variable selection techniques, and support vector machines, combined with recursive feature elimination as a variable selection method, displayed the best results in comparison to other approaches. Moreover, in this study a double cross-validation procedure was applied to minimize the consequence of over-fitting. Finally, seven different algorithms and two model validation procedures based on either 10-fold cross-validation or bootstrapping were investigated in order to allow direct comparison between different classification approaches.

APA, Harvard, Vancouver, ISO, and other styles

32

Fitzgerald-DeHoog, Lindsay M. "Multivariate analysis of proteomic data| Functional group analysis using a global test." Thesis, California State University, Long Beach, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=1602759.

Full text

Abstract:

Proteomics is a relatively new discipline being implemented in life science fields. Proteomics allows a whole-systems approach to discerning changes in organismal physiology due to physical perturbations. The advantages of a proteomic approach may be counteracted by the ability to analyze the data in a meaningful way due to inherent problems with statistical assumptions. Furthermore, analyzing significant protein volume differences among treatment groups often requires analysis of numerous proteins even when limiting analyses to a particular protein type or physiological pathway. Improper use of traditional techniques leads to problems with multiple hypotheses testing.

This research will examine two common techniques used to analyze proteomic data and will apply these to a novel proteomic data set. In addition, a Global Test originally developed for gene array data will be employed to discover its utility for proteomic data and the ability to counteract the multiple hypotheses testing problems encountered with traditional analyses.

APA, Harvard, Vancouver, ISO, and other styles

33

Charles, Nathan Richard. "Data model refinement, generic profiling, and functional programming." Thesis, University of York, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.341629.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Zhang, Wen 1978. "Functional data analysis for detecting structural boundaries of cortical area." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=98531.

Full text

Abstract:

It is widely accepted that the cortex can be divided into a series of spatially discrete areas based on their specific laminar patterns. It is of great interest to divide the cortex into different areas in terms of both neuronal functions and cellular composition. The division of cortical areas can be reflected by the cell arrangements or cellular composition. Therefore, the cortical structure can be represented by some functional neuronal density data. Techniques on functional data analysis help to develop some measures which indicate structural changes.
In order to separate roughness from structural variations and influences of the convolutions and foldings, a method called bivariate smoothing is proposed for the noisy density data. This smoothing method is applied to four sets of cortical density data provided by Prof Petrides [1] and Scott Mackey [2].
The first or second order derivatives of the density function reflect the change and the rate of the change of the density, respectively. Therefore, derivatives of the density function are applied to analyze the structural features as an attempt to detect indicators for boundaries of subareas of the four cortex sections.
Finally, the accuracy and limitation of this smoothing method is tested using some simulated examples.

APA, Harvard, Vancouver, ISO, and other styles

35

Li, Yan. "Analysis of complex survey data using robust model-based and model-assisted methods." College Park, Md. : University of Maryland, 2006. http://hdl.handle.net/1903/4080.

Full text

Abstract:

Thesis (Ph. D.) -- University of Maryland, College Park, 2006.
Thesis research directed by: Survey Methodology. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

36

Uddin, Mohammad Moin. "ROBUST STATISTICAL METHODS FOR NON-NORMAL QUALITY ASSURANCE DATA ANALYSIS IN TRANSPORTATION PROJECTS." UKnowledge, 2011. http://uknowledge.uky.edu/gradschool_diss/153.

Full text

Abstract:

The American Association of Highway and Transportation Officials (AASHTO) and Federal Highway Administration (FHWA) require the use of the statistically based quality assurance (QA) specifications for construction materials. As a result, many of the state highway agencies (SHAs) have implemented the use of a QA specification for highway construction. For these statistically based QA specifications, quality characteristics of most construction materials are assumed normally distributed, however, the normality assumption can be violated in several forms. Distribution of data can be skewed, kurtosis induced, or bimodal. If the process shows evidence of a significant departure from normality, then the quality measures calculated may be erroneous. In this research study, an extended QA data analysis model is proposed which will significantly improve the Type I error and power of the F-test and t-test, and remove bias estimates of Percent within Limit (PWL) based pay factor calculation. For the F-test, three alternative tests are proposed when sampling distribution is non-normal. These are: 1) Levene’s test; 2) Brown and Forsythe’s test; and 3) O’Brien’s test. One alternative method is proposed for the t-test, which is the non-parametric Wilcoxon - Mann – Whitney Sign Rank test. For PWL based pay factor calculation when lot data suffer non-normality, three schemes were investigated, which are: 1) simple transformation methods, 2) The Clements method, and 3) Modified Box-Cox transformation using “Golden Section Search” method. The Monte Carlo simulation study revealed that both Levene’s test and Brown and Forsythe’s test are robust alternative tests of variances when underlying sample population distribution is non-normal. Between the t-test and Wilcoxon test, the t-test was found significantly robust even when sample population distribution was severely non-normal. Among the data transformation for PWL based pay factor, the modified Box-Cox transformation using the golden section search method was found to be the most effective in minimizing or removing pay bias. Field QA data was analyzed to validate the model and a Microsoft® Excel macro based software is developed, which can adjust any pay consequences due to non-normality.

APA, Harvard, Vancouver, ISO, and other styles

37

Burrell, Lauren S. "Feature analysis of functional mri data for mapping epileptic networks." Diss., Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26528.

Full text

Abstract:

This research focused on the development of a methodology for analyzing functional magnetic resonance imaging (fMRI) data collected from patients with epilepsy in order to map epileptic networks. Epilepsy, a chronic neurological disorder characterized by recurrent, unprovoked seizures, affects up to 1% of the world's population. Antiepileptic drug therapies either do not successfully control seizures or have unacceptable side effects in over 30% of patients. Approximately one-third of patients whose seizures cannot be controlled by medication are candidates for surgical removal of the affected area of the brain, potentially rendering them seizure free. Accurate localization of the epileptogenic focus, i.e., the area of seizure onset, is critical for the best surgical outcome. The main objective of the research was to develop a set of fMRI data features that could be used to distinguish between normal brain tissue and the epileptic focus. To determine the best combination of features from various domains for mapping the focus, genetic programming and several feature selection methods were employed. These composite features and feature sets were subsequently used to train a classifier capable of discriminating between the two classes of voxels. The classifier was then applied to a separate testing set in order to generate maps showing brain voxels labeled as either normal or epileptogenic based on the best feature or set of features. It should be noted that although this work focuses on the application of fMRI analysis to epilepsy data, similar techniques could be used when studying brain activations due to other sources. In addition to investigating in vivo data collected from temporal lobe epilepsy patients with uncertain epileptic foci, phantom (simulated) data were created and processed to provide quantitative measures of the efficacy of the techniques.

APA, Harvard, Vancouver, ISO, and other styles

38

McGonigle, John. "Data-driven analysis methods in pharmacological and functional magnetic resonance." Thesis, University of Bristol, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.573929.

Full text

Abstract:

This thesis introduces several novel methods for the data-driven and ex- ploratory analysis of functional brain images. Functional magnetic resonance imaging (fMRI) has emerged as a safe and non-invasive way to image the hu- man brain in action. In pharmacological MRI (phMRI), a drug's effect on the brain is of interest, rather than the brain's response to a specific task as in fMRI. However, the sometimes prolonged response to a drug necessi- tates different methodologies than those for task related effects, with further methods development needed to deliver robust results so that phMRI may be of practical use during drug development. There are many confounding issues in analysing these data, including under-informed models of response, subject motion, scanner drift, and gross differences in brain volume. In this work, data from a phMRI experiment was analysed to examine the effect of a pharmacological dose of hydrocortisone; a glucocorticoid associ- ated with the body's response to stress, and used in a number of medical conditions. The key findings were that even without using a priori hypothe- ses about the site of action, hydrocortisone significantly reduces a phMRI signal associated with blood oxygenation in the dorsal hippocampi, which is confirmed by decreases in absolute perfusion measured using arterial spin labelling. Methods were developed for the detection and correction of artefacts, includ- ing intra-scan motion and scanner drift. Functional connectivity methods were examined, and methodological issues in comparing groups investigated, revealing that many previously observed differences may have been biased or even artefactual due to gross differences in brain volume. Temporal decom- position techniques were also explored for their use in brain imaging, with wavelet cluster analysis being developed into an interactive and iterative method, while an adaptive analysis method, empirical mode decomposition, is built upon to allow the analysis of many thousands of time courses.

APA, Harvard, Vancouver, ISO, and other styles

39

Lee, Homin, William Braynen, Kiran Keshav, and Paul Pavlidis. "ErmineJ: Tool for functional analysis of gene expression data sets." BioMed Central, 2005. http://hdl.handle.net/10150/610121.

Full text

Abstract:

BACKGROUND:It is common for the results of a microarray study to be analyzed in the context of biologically-motivated groups of genes such as pathways or Gene Ontology categories. The most common method for such analysis uses the hypergeometric distribution (or a related technique) to look for "over-representation" of groups among genes selected as being differentially expressed or otherwise of interest based on a gene-by-gene analysis. However, this method suffers from some limitations, and biologist-friendly tools that implement alternatives have not been reported.RESULTS:We introduce ErmineJ, a multiplatform user-friendly stand-alone software tool for the analysis of functionally-relevant sets of genes in the context of microarray gene expression data. ErmineJ implements multiple algorithms for gene set analysis, including over-representation and resampling-based methods that focus on gene scores or correlation of gene expression profiles. In addition to a graphical user interface, ErmineJ has a command line interface and an application programming interface that can be used to automate analyses. The graphical user interface includes tools for creating and modifying gene sets, visualizing the Gene Ontology as a table or tree, and visualizing gene expression data. ErmineJ comes with a complete user manual, and is open-source software licensed under the Gnu Public License.CONCLUSION:The availability of multiple analysis algorithms, together with a rich feature set and simple graphical interface, should make ErmineJ a useful addition to the biologist's informatics toolbox. ErmineJ is available from http://microarray.cu.genome.org webcite.

APA, Harvard, Vancouver, ISO, and other styles

40

Parameswaran, Rupa. "A Robust Data Obfuscation Technique for Privacy Preserving Collaborative Filtering." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/11459.

Full text

Abstract:

Privacy is defined as the freedom from unauthorized intrusion. The availability of personal information through online databases, such as government records, medical records, and voters and #146; lists, pose a threat to personal privacy. The concern over individual privacy has led to the development of legal codes for safeguarding privacy in several countries. However, the ignorance of individuals as well as loopholes in the systems, have led to information breaches even in the presence of such rules and regulations. Protection against data privacy requires modification of the data itself. The term {em data obfuscation} is used to refer to the class of algorithms that modify the values of the data items without distorting the usefulness of the data. The main goal of this thesis is the development of a data obfuscation technique that provides robust privacy protection with minimal loss in usability of the data. Although medical and financial services are two of the major areas where information privacy is a concern, privacy breaches are not restricted to these domains. One of the areas where the concern over data privacy is of growing interest is collaborative filtering. Collaborative filtering systems are being widely used in E-commerce applications to provide recommendations to users regarding products that might be of interest to them. The prediction accuracy of these systems is dependent on the size and accuracy of the data provided by users. However, the lack of sufficient guidelines governing the use and distribution of user data raises concerns over individual privacy. Users often provide the minimal information that is required for accessing these E-commerce services. The lack of rules governing the use and distribution of data disallows sharing of data among different communities for collaborative filtering. The goals of this thesis are (a) the definition of a standard for classifying DO techniques, (b) the development of a robust cluster preserving data obfuscation algorithm, and (c) the design and implementation of a privacy-preserving shared collaborative filtering framework using the data obfuscation algorithm.

APA, Harvard, Vancouver, ISO, and other styles

41

Jiang, Cheng. "Investigation and application of functional data analysis technology for calibration of near-infrared spectroscopic data." Thesis, University of Newcastle Upon Tyne, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.601687.

Full text

Abstract:

This thesis focuses on the investigation and application of functional Data Analysis methodologies to address calibration challenges of spectroscopic data. Of particular interest is the area of calibration of near-infrared spectral data. Different strategies to construct functional linear calibration methodologies and a number of functional linear calibration approaches are initially discussed. A novel approach is then proposed to compare functional linear calibration methodologies with a well established and widely used methodology in the chemometrics area, Partial Least Squares (PLS). From this perspective, a common framework can be established to investigate the similarities and differences between these two methodologies. It is shown that the model structures of these two methodologies are similar but the difference is the selection of basis function to represent the original spectral data. As opposed to the loadings of PLS, B-splines capture local features of the data.

APA, Harvard, Vancouver, ISO, and other styles

42

Jin, Zhongnan. "Statistical Methods for Multivariate Functional Data Clustering, Recurrent Event Prediction, and Accelerated Degradation Data Analysis." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/102628.

Full text

Abstract:

In this dissertation, we introduce three projects in machine learning and reliability applications after the general introductions in Chapter 1. The first project concentrates on the multivariate sensory data, the second project is related to the bivariate recurrent process, and the third project introduces thermal index (TI) estimation in accelerated destructive degradation test (ADDT) data, in which an R package is developed. All three projects are related to and can be used to solve certain reliability problems. Specifically, in Chapter 2, we introduce a clustering method for multivariate functional data. In order to cluster the customized events extracted from multivariate functional data, we apply the functional principal component analysis (FPCA), and use a model based clustering method on a transformed matrix. A penalty term is imposed on the likelihood so that variable selection is performed automatically. In Chapter 3, we propose a covariate-adjusted model to predict next event in a bivariate recurrent event system. Inspired by geyser eruptions in Yellowstone National Park, we consider two event types and model their event gap time relationship. External systematic conditions are taken account into the model with covariates. The proposed covariate adjusted recurrent process (CARP) model is applied to the Yellowstone National Park geyser data. In Chapter 4, we compare estimation methods for TI. In ADDT, TI is an important index indicating the reliability of materials, when the accelerating variable is temperature. Three methods are introduced in TI estimations, which are least-squares method, parametric model and semi-parametric model. An R package is implemented for all three methods. Applications of R functions are introduced in Chapter 5 with publicly available ADDT datasets. Chapter 6 includes conclusions and areas for future works.
Doctor of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

43

Jeanmougin, Marine. "Statistical methods for robust analysis of transcriptome data by integration of biological prior knowledge." Thesis, Evry-Val d'Essonne, 2012. http://www.theses.fr/2012EVRY0029/document.

Full text

Abstract:

Au cours de la dernière décennie, les progrès en Biologie Moléculaire ont accéléré le développement de techniques d'investigation à haut-débit. En particulier, l'étude du transcriptome a permis des avancées majeures dans la recherche médicale. Dans cette thèse, nous nous intéressons au développement de méthodes statistiques dédiées au traitement et à l'analyse de données transcriptomiques à grande échelle. Nous abordons le problème de sélection de signatures de gènes à partir de méthodes d'analyse de l'expression différentielle et proposons une étude de comparaison de différentes approches, basée sur plusieurs stratégies de simulations et sur des données réelles. Afin de pallier les limites de ces méthodes classiques qui s'avèrent peu reproductibles, nous présentons un nouvel outil, DiAMS (DIsease Associated Modules Selection), dédié à la sélection de modules de gènes significatifs. DiAMS repose sur une extension du score-local et permet l'intégration de données d'expressions et de données d'interactions protéiques. Par la suite, nous nous intéressons au problème d'inférence de réseaux de régulation de gènes. Nous proposons une méthode de reconstruction à partir de modèles graphiques Gaussiens, basée sur l'introduction d'a priori biologique sur la structure des réseaux. Cette approche nous permet d'étudier les interactions entre gènes et d'identifier des altérations dans les mécanismes de régulation, qui peuvent conduire à l'apparition ou à la progression d'une maladie. Enfin l'ensemble de ces développements méthodologiques sont intégrés dans un pipeline d'analyse que nous appliquons à l'étude de la rechute métastatique dans le cancer du sein
Recent advances in Molecular Biology have led biologists toward high-throughput genomic studies. In particular, the investigation of the human transcriptome offers unprecedented opportunities for understanding cellular and disease mechanisms. In this PhD, we put our focus on providing robust statistical methods dedicated to the treatment and the analysis of high-throughput transcriptome data. We discuss the differential analysis approaches available in the literature for identifying genes associated with a phenotype of interest and propose a comparison study. We provide practical recommendations on the appropriate method to be used based on various simulation models and real datasets. With the eventual goal of overcoming the inherent instability of differential analysis strategies, we have developed an innovative approach called DiAMS, for DIsease Associated Modules Selection. This method was applied to select significant modules of genes rather than individual genes and involves the integration of both transcriptome and protein interactions data in a local-score strategy. We then focus on the development of a framework to infer gene regulatory networks by integration of a biological informative prior over network structures using Gaussian graphical models. This approach offers the possibility of exploring the molecular relationships between genes, leading to the identification of altered regulations potentially involved in disease processes. Finally, we apply our statistical developments to study the metastatic relapse of breast cancer

APA, Harvard, Vancouver, ISO, and other styles

44

Xie, Guangrui. "Robust and Data-Efficient Metamodel-Based Approaches for Online Analysis of Time-Dependent Systems." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/98806.

Full text

Abstract:

Metamodeling is regarded as a powerful analysis tool to learn the input-output relationship of a system based on a limited amount of data collected when experiments with real systems are costly or impractical. As a popular metamodeling method, Gaussian process regression (GPR), has been successfully applied to analyses of various engineering systems. However, GPR-based metamodeling for time-dependent systems (TDSs) is especially challenging due to three reasons. First, TDSs require an appropriate account for temporal effects, however, standard GPR cannot address temporal effects easily and satisfactorily. Second, TDSs typically require analytics tools with a sufficiently high computational efficiency to support online decision making, but standard GPR may not be adequate for real-time implementation. Lastly, reliable uncertainty quantification is a key to success for operational planning of TDSs in real world, however, research on how to construct adequate error bounds for GPR-based metamodeling is sparse. Inspired by the challenges encountered in GPR-based analyses of two representative stochastic TDSs, i.e., load forecasting in a power system and trajectory prediction for unmanned aerial vehicles (UAVs), this dissertation aims to develop novel modeling, sampling, and statistical analysis techniques for enhancing the computational and statistical efficiencies of GPR-based metamodeling to meet the requirements of practical implementations. Furthermore, an in-depth investigation on building uniform error bounds for stochastic kriging is conducted, which sets up a foundation for developing robust GPR-based metamodeling techniques for analyses of TDSs under the impact of strong heteroscedasticity.
Ph.D.
Metamodeling has been regarded as a powerful analysis tool to learn the input-output relationship of an engineering system with a limited amount of experimental data available. As a popular metamodeling method, Gaussian process regression (GPR) has been widely applied to analyses of various engineering systems whose input-output relationships do not depend on time. However, GPR-based metamodeling for time-dependent systems (TDSs), whose input-output relationships depend on time, is especially challenging due to three reasons. First, standard GPR cannot properly address temporal effects for TDSs. Second, standard GPR is typically not computationally efficient enough for real-time implementations in TDSs. Lastly, research on how to adequately quantify the uncertainty associated with the performance of GPR-based metamodeling is sparse. To fill this knowledge gap, this dissertation aims to develop novel modeling, sampling, and statistical analysis techniques for enhancing standard GPR to meet the requirements of practical implementations for TDSs. Effective solutions are provided to address the challenges encountered in GPR-based analyses of two representative stochastic TDSs, i.e., load forecasting in a power system and trajectory prediction for unmanned aerial vehicles (UAVs). Furthermore, an in-depth investigation on quantifying the uncertainty associated with the performance of stochastic kriging (a variant of standard GPR) is conducted, which sets up a foundation for developing robust GPR-based metamodeling techniques for analyses of more complex TDSs.

APA, Harvard, Vancouver, ISO, and other styles

45

Zhou, Rensheng. "Degradation modeling and monitoring of engineering systems using functional data analysis." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45897.

Full text

Abstract:

In this thesis, we develop several novel degradation models based on techniques from functional data analysis. These models are suitable for characterizing different types of sensor-based degradation signals, whether they are censored at a certain fixed time point or truncated at the failure threshold. Our proposed models can also be easily extended to accommodate for the effects of environmental conditions on degradation processes. Unlike many existing degradation models that rely on the existence of a historical sample of complete degradation signals, our modeling framework is well-suited for modeling complete as well as incomplete (sparse and fragmented) degradation signals. We utilize these models to predict and continuously update, in real time, the residual life distributions of partially degraded components. We assess and compare the performance of our proposed models and existing benchmark models by using simulated signals and real world data sets. The results indicate that our models can provide a better characterization of the degradation signals and a more accurate prediction of a system's lifetime under different signal scenarios. Another major advantage of our models is their robustness to the model mis-specification, which is especially important for applications with incomplete degradation signals (sparse or fragmented).

APA, Harvard, Vancouver, ISO, and other styles

46

Zhang, Bairu. "Functional data analysis in orthogonal designs with applications to gait patterns." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/44698.

Full text

Abstract:

This thesis presents a contribution to the active research area of functional data analysis (FDA) and is concerned with the analysis of data from complex experimental designs in which the responses are curves. High resolution, closely correlated data sets are encountered in many research fields, but current statistical methodologies often analyse simplistic summary measures and therefore limit the completeness and accuracy of conclusions drawn. Specifically the nature of the curves and experimental design are not taken into account. Mathematically, such curves can be modelled either as sample paths of a stochastic process or as random elements in a Hilbert space. Despite this more complex type of response, the structure of experiments which yield functional data is often the same as in classical experimentation. Thus, classical experimental design principles and results can be adapted to the FDA setting. More specifically, we are interested in the functional analysis of variance (ANOVA) of experiments which use orthogonal designs. Most of the existing functional ANOVA approaches consider only completely randomised designs. However, we are interested in more complex experimental arrangements such as, for example, split-plot and row-column designs. Similar to univariate responses, such complex designs imply that the response curves for different observational units are correlated. We use the design to derive a functional mixed-effects model and adapt the classical projection approach in order to derive the functional ANOVA. As a main result, we derive new functional F tests for hypotheses about treatment effects in the appropriate strata of the design. The approximate null distribution of these tests is derived by applying the Karhunen- Lo`eve expansion to the covariance functions in the relevant strata. These results extend existing work on functional F tests for completely randomised designs. The methodology developed in the thesis has wide applicability. In particular, we consider novel applications of functional F tests to gait analysis. Results are presented for two empirical studies. In the first study, gait data of patients with cerebral palsy were collected during barefoot walking and walking with ankle-foot orthoses. The effects of ankle-foot orthoses are assessed by functional F tests and compared with pointwise F tests and the traditional univariate repeated-measurements ANOVA. The second study is a designed experiment in which a split-plot design was used to collect gait data from healthy subjects. This is commonly done in gait research in order to better understand, for example, the effects of orthoses while avoiding confounded analysis from the high variability observed in abnormal gait. Moreover, from a technical point of view the study may be regarded as a real-world alternative to simulation studies. By using healthy individuals it is possible to collect data which are in better agreement with the underlying model assumptions. The penultimate chapter of the thesis presents a qualitative study with clinical experts to investigate the utility of gait analysis for the management of cerebral palsy. We explore potential pathways by which the statistical analyses in the thesis might influence patient outcomes. The thesis has six chapters. After describing motivation and introduction in Chapter 1, mathematical representations of functional data are presented in Chapter 2. Chapter 3 considers orthogonal designs in the context of functional data analysis. New functional F tests for complex designs are derived in Chapter 4 and applied in two gait studies. Chapter 5 is devoted to a qualitative study. The thesis concludes with a discussion which details the extent to which the research question has been addressed, the limitations of the work and the degree to which it has been answered.

APA, Harvard, Vancouver, ISO, and other styles

47

Santiago, Calderón José Bayoán. "On Cluster Robust Models." Scholarship @ Claremont, 2019. https://scholarship.claremont.edu/cgu_etd/132.

Full text

Abstract:

Cluster robust models are a kind of statistical models that attempt to estimate parameters considering potential heterogeneity in treatment effects. Absent heterogeneity in treatment effects, the partial and average treatment effect are the same. When heterogeneity in treatment effects occurs, the average treatment effect is a function of the various partial treatment effects and the composition of the population of interest. The first chapter explores the performance of common estimators as a function of the presence of heterogeneity in treatment effects and other characteristics that may influence their performance for estimating average treatment effects. The second chapter examines various approaches to evaluating and improving cluster structures as a way to obtain cluster-robust models. Both chapters are intended to be useful to practitioners as a how-to guide to examine and think about their applications and relevant factors. Empirical examples are provided to illustrate theoretical results, showcase potential tools, and communicate a suggested thought process. The third chapter relates to an open-source statistical software package for the Julia language. The content includes a description for the software functionality and technical elements. In addition, it features a critique and suggestions for statistical software development and the Julia ecosystem. These comments come from my experience throughout the development process of the package and related activities as an open-source and professional software developer. One goal of the paper is to make econometrics more accessible not only through accessibility to functionality, but understanding of the code, mathematics, and transparency in implementations.

APA, Harvard, Vancouver, ISO, and other styles

48

Sun, Jian. "Robust Multichannel Functional-Data-Analysis Methods for Data Recovery in Complex Systems." 2011. http://trace.tennessee.edu/utk_graddiss/1230.

Full text

Abstract:

In recent years, Condition Monitoring (CM), which can be performed via several sensor channels, has been recognized as an effective paradigm for failure prevention of operational equipment or processes. However, the complexity caused by asynchronous data collection with different and/or time-varying sampling/transmission rates has long been a hindrance in the effective use of multichannel data in constructing empirical models. The problem becomes more challenging when sensor readings are incomplete. Traditional sensor data recovery techniques are often prohibited in asynchronous CM environments, not to mention sparse datasets. The proposed Functional Principal Component Analysis (FPCA) methodologies, e.g., nonparametric FPC model and semi-parametric functional regression model, provide new sensor data recovery techniques to improve the reliability and robustness of multichannel CM systems. Based on the FPCA results obtained from historical asynchronous data, the deviation from the smoothing trajectory of each sensor signal can be described by a set of unit-specific model parameters. Furthermore, the relationships among these sensor signals can be identified and used to construct regression models for the correlated signals. For real-time or online implementation, use of these models along with the parameters adjusted by real-time CM data become powerful tools for dealing with asynchronous CM data while recovering lost data when needed. To improve the robustness and predictability in dealing with asynchronous data, which may be skewed in probability distribution, robust methods were developed based on Functional Data Analysis (FDA) and Local Quantile Regression (LQR) models. Case studies examining turbofan aircraft engines and an experimental two-tank flow-control loop are used to demonstrate the effectiveness and adaptability of the proposed sensor data recovery techniques. The proposed methods may also find a variety of applications in systems of other industries, such as nuclear power plants, wind turbines, railway systems, economic fields, etc., which may face asynchronous sampling and/or missing data collection problems.

APA, Harvard, Vancouver, ISO, and other styles

49

Chiu, Sheng-Che, and 邱聖哲. "A Robust Estimation Method for Outlier-Resistant in Functional Data Analysis." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/40523172376862192665.

Full text

Abstract:

碩士
國立清華大學
通訊工程研究所
99
Functionaldataiswidelyappliedinourdailylife,suchashumanheightgrowth,circadianrhythmsandinternettracanalysisetc.Outliersoftenoccurinfunctionaldata,andsome-timesarediculttodirectlydetectbyvisualinspectionandcauseincorrectconclusioninthestatisticalanalysis.Thus,themethodswhichcanhandleoutliersautomaticallyandecientlyisanimportantissue.Inthispaper,weproposeamethodtohandleandagainstoutliersonparameterestimation.TheestimatorsarebasedonStudent'stdistribution,andwhichcanautomaticallydetectoutliers.Inaddition,theoutlierwillberesistedviaaweightfunction,andthusenhancerobustness.Intheoreticallyderivation,weintroducetheconsistencypropertyofestimatorsandanalyzetheoutliersensitivityandtheasymp-toticcovariancebyderivingtheirin uencefunction.Ournumericalresultsdemonstrateourproposedestimatorsarerobustandcanecientlyagainstoutliers.

APA, Harvard, Vancouver, ISO, and other styles

50

Ciou, Shih-yun, and 邱詩芸. "Correlated binary data analysis using robust likelihood." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/62465397378121646143.

Full text

Abstract:

碩士
國立中央大學
統計研究所
97
Correlated data are commonly encountered in many fields. The correlation may come from the genetic heredity, familial aggregation, environmental heterogeneity, or repeated measures. Royall and Tsou (2003) proposed a parametric robust likelihood technique. With large samples, the adjusted binomial likelihood is asymptotically legitimate for correlated binary data. In this work, we use the adjustment by the binomial working model and obtain a new method for estimating the correlation between data in a cluster.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Robust functional data analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles