To see the other types of publications on this topic, follow the link: Statistics and methodology.

Dissertations / Theses on the topic 'Statistics and methodology'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Statistics and methodology.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Zhang, Bo. "Machine Learning on Statistical Manifold." Scholarship @ Claremont, 2017. http://scholarship.claremont.edu/hmc_theses/110.

Full text
Abstract:
This senior thesis project explores and generalizes some fundamental machine learning algorithms from the Euclidean space to the statistical manifold, an abstract space in which each point is a probability distribution. In this thesis, we adapt the optimal separating hyperplane, the k-means clustering method, and the hierarchical clustering method for classifying and clustering probability distributions. In these modifications, we use the statistical distances as a measure of the dissimilarity between objects. We describe a situation where the clustering of probability distributions is needed and useful. We present many interesting and promising empirical clustering results, which demonstrate the statistical-distance-based clustering algorithms often outperform the same algorithms with the Euclidean distance in many complex scenarios. In particular, we apply our statistical-distance-based hierarchical and k-means clustering algorithms to the univariate normal distributions with k = 2 and k = 3 clusters, the bivariate normal distributions with diagonal covariance matrix and k = 3 clusters, and the discrete Poisson distributions with k = 3 clusters. Finally, we prove the k-means clustering algorithm applied on the discrete distributions with the Hellinger distance converges not only to the partial optimal solution but also to the local minimum.
APA, Harvard, Vancouver, ISO, and other styles
2

Vrahimis, Andreas. "Smoothing methodology with applications to nonparametric statistics." Thesis, University of Manchester, 2011. https://www.research.manchester.ac.uk/portal/en/theses/smoothing-methodology-with-applications-to-nonparametric-statistics(6d6567f2-1bfa-4e77-8dbb-71fea7564185).html.

Full text
Abstract:
The work in this thesis is based on kernel smoothing techniques with applications to nonparametric statistical methods and especially kernel density estimation and nonparametric regression. We examine a bootstrap iterative method of choosing the smoothing parameter, in univariate kernel density estimation, and propose an empirical smoothness correction that generally improves the method for small-medium sample sizes tested. In a simulation study performed, the corrected bootstrap iterative method shows consistent overall performance and can compete with other popular widely used methods. The theoretical asymptotic properties of the smoothed bootstrap method, in univariate kernel density estimation, are examined and an adaptive data-based choice of fixed pilot smoothing parameter formed, that provides a good performance trade-off among distributions of various shapes, with fast relative rate of convergence to the optimal. The asymptotic and practical differences of the smoothed bootstrap method, when the diagonal terms of the error criterion are included or omitted, are also examined. The exclusion of the diagonal terms yields faster relative rates of convergence of the smoothing parameter to the optimal but a simulation study shows that for smaller sample sizes, including the diagonal terms can be favourable. In a real data set application both methods produced similar smoothing parameters and the resulting kernel density estimates were of reasonable smoothness.Existing methods of kernel density estimation in two dimensions are discussed and the corrected bootstrap iterative method is adapted to work in the bivariate kernel density estimation, with considerable success. Additionally, the theoretical asymptotic properties of the smoothed bootstrap method, in the bivariate kernel density estimation, are examined, and adaptive data-based choices for the fixed pilot smoothing parameters formed, that provide fast relative rates of convergence to the optimal, compared to other popular methods. The smoothed bootstrap method with the diagonal terms of the error criterion omitted, exhibits slightly faster relative rates of convergence, compared to the method which includes the diagonal terms, and in a simulation study they performed considerably well, compared to other methods. Also, we discover that a scaling transformation of the data, before applying the method, leads to poor results for distributions of various shapes, and it should be generally avoided. In an application using the iris flowers data set, both smoothed bootstrap versions suggested, produce reasonable kernel density estimates. We also look at various methods of estimating the variance of the errors in nonparametric regression and suggest a simple robust method of estimating the error variance, for the homoscedastic fixed design. The method is based on a multiplicative correction of the variance of the residuals and a comparison with popular difference-based methods shows favourable results, especially when the local linear estimator is employed.
APA, Harvard, Vancouver, ISO, and other styles
3

Ren, Yu. "The methodology of flowgraph models." Thesis, London School of Economics and Political Science (University of London), 2011. http://etheses.lse.ac.uk/318/.

Full text
Abstract:
Flowgraph models are directed graph models for describing the dynamic changes in a stochastic process. They are one class of multistate models that are applied to analyse time-to-event data. The main motivation of the flowgraph models is to determine the distribution of the total waiting times until an event of interest occurs in a stochastic process that progresses through various states. This thesis applies the methodology of flowgraph models to the study of Markov and SemiMarkov processes. The underlying approach of the thesis is that the access to the moment generating function (MGF) and cumulant generating function (CGF), provided by Mason’s rule enables us to use the Method of Moments (MM) which depends on moments and cumulant. We give a new derivation of the Mason’s rule to compute the total waiting MGF based on the internode transition matrix of a flowgraph. Next, we demonstrate methods to determine and approximate the distribution of total waiting time based on the inversion of the MGF, including an alternative approach using the Pad´e approximation of the MGF, which always yields a closed form density. For parameter estimation, we extend the Expectation-Maximization (EM) algorithm to estimate parameters in the mixture of negative weight exponential density. Our second contribution is to develop a bias correction method in the Method of Moments (BCMM). By investigating methods for tail area approximation, we propose a new way to estimate the total waiting time density function and survival
APA, Harvard, Vancouver, ISO, and other styles
4

Levy, Melanie E. "Survey analysis| Methodology and application using CHIS data." Thesis, California State University, Long Beach, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=1527014.

Full text
Abstract:

Over the past hundred years, advancements in survey research and understanding of survey methodology and analysis have removed major biases when small numbers of respondents can speak for larger groups in addition to the ability of modem polls to support inferences about populations. This project presents a brief history of survey methodology and utilizes common applied statistical procedures using the 2009 California Health Interview Survey (CHIS). Survey methodology and analysis will be explored through examples including survey linear regression analysis, canonical correlation and multinomial logistic regression.

This project's goal is to create greater understanding of the survey analysis process, as well as, some of the challenges survey researchers face. With this knowledge more procedures can be adapted to incorporate survey design to expand survey methodology and analysis to reach more diverse research needs.

APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Ye. "Community Detection| Fundamental Limits, Methodology, and Variational Inference." Thesis, Yale University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10957347.

Full text
Abstract:

Network analysis has become one of the most active research areas over the past few years. A core problem in network analysis is community detection. In this thesis, we investigate it under Stochastic Block Model and Degree-corrected Block Model from three different perspectives: 1) the minimax rates of community detection problem, 2) rate-optimal and computationally feasible algorithms, and 3) computational and theoretical guarantees of variational inference for community detection.

APA, Harvard, Vancouver, ISO, and other styles
6

Stewart, Brandon Michael. "Three Papers in Political Methodology." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:17467206.

Full text
Abstract:
This collection of three papers develops two statistical techniques for addressing canonical problems in applied computational social science: unsupervised text analysis and regression with dependent data. In both cases I provide a flexible framework that allows the analyst to leverage known structure within the data to improve inference. The first paper introduces the Structural Topic Model (STM) which generalizes and extends a broad class of probabilistic topic models developed in computer science. Crucially for applied social science, STM provides a framework for estimating the factors which drive topical frequency and content within documents. The second paper explores the challenge that non-convex likelihoods pose for applied research with topic models. The paper presents a series of diagnostics and discusses the under-appreciated role of initialization methods. The third paper introduces Latent Factor Regressions (LFR), a new set of tools for regression modeling in the presence of unobserved heterogeneity or dependence between observations. The approach uses interactive latent effects to provide a unified framework for modeling different data structures, including network, time-series cross-sectional and spatial data. Each of these methods is designed with a focus on applied work. Estimation algorithms are presented which are fast enough for applied work and software is either currently available (STM) or in development (LFR). The use of these techniques is illustrated with a range of applications from across political science.
Government
APA, Harvard, Vancouver, ISO, and other styles
7

Smith, Anna Lantz. "Statistical Methodology for Multiple Networks." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492720126432803.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

An, Baoshe. "Poisson approximation in the context of file-merging methodology /." The Ohio State University, 1997. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487943610785527.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Lesser, Elizabeth Rochelle. "A New Right Tailed Test of the Ratio of Variances." UNF Digital Commons, 2016. http://digitalcommons.unf.edu/etd/719.

Full text
Abstract:
It is important to be able to compare variances efficiently and accurately regardless of the parent populations. This study proposes a new right tailed test for the ratio of two variances using the Edgeworth’s expansion. To study the Type I error rate and Power performance, simulation was performed on the new test with various combinations of symmetric and skewed distributions. It is found to have more controlled Type I error rates than the existing tests. Additionally, it also has sufficient power. Therefore, the newly derived test provides a good robust alternative to the already existing methods.
APA, Harvard, Vancouver, ISO, and other styles
10

Kashin, Konstantin Daniel. "Essays on Political Methodology and Data Science." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:17464583.

Full text
Abstract:
This collection of six essays makes novel methodological contributions to causal inference, time-series cross-sectional forecasting, and supervised text analysis. The first three essays start from the premise that while randomized experiments are the gold standard for causal claims, randomization is not feasible or ethical for many questions in the social sciences. Researchers have thus devised methods that approximate experiments using nonexperimental control units to estimate counterfactuals. However, control units may be costly to obtain, incomparable to the treated units, or completely unavailable when all units are treated. We challenge the commonplace intuition that control units are necessary for causal inference. We propose conditions under which one can use post-treatment variables to estimate causal effects. At its core, we show when one can obtain identification of causal effects by comparing treated units to other treated units, without recourse to control units. The next two essays demonstrate that the U.S. Social Security Administration's (SSA) forecasting errors were approximately unbiased until about 2000, but then began to grow quickly, with increasingly overconfident uncertainty intervals. Moreover, the errors all turn out to be in the same potentially dangerous direction, each making the Social Security Trust Funds look healthier than they actually are. We also discover the cause of these findings with evidence from a large number of interviews we conducted with participants at every level of the forecasting and policy processes. Finally, the last essay develops a new dataset for studying the influence of business on public policy decisions across the American states. Compiling and digitizing nearly 1,000 leaked legislative proposals made by a leading business lobbying group in the states, along with digitized versions of all state legislation introduced or enacted between 1995 and 2013, we use a two-stage supervised classifier to categorize state bills as either sharing the same underlying concepts or specific language as business-drafted model bills. We find these business-backed bills were more likely to be introduced and enacted by legislatures lacking policy resources, such as those without full-time members and with few staffers.
Government
APA, Harvard, Vancouver, ISO, and other styles
11

Zhang, Aijun. "Majorization methodology for experimental designs." HKBU Institutional Repository, 2004. http://repository.hkbu.edu.hk/etd_ra/521.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Herring, Keith 1981. "Propagation models for multiple-antenna systems : methodology, measurements and statistics." Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/43027.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
Includes bibliographical references (leaves 219-223).
The trend in wireless communications is towards utilization of multiple antenna systems. While techniques such as beam-forming and spatial diversity have been implemented for some time, the emergence of Multiple-Input Multiple-Output (MIMO) communications has increased commercial interest and development in multiple-antenna technology. Given this trend it has become increasingly important that we understand the propagation characteristics of the environments where this new technology will be deployed. In particular the development of low-cost, high-performance system architectures and protocols is largely dependent on the accuracy of available channel models for approximating realized propagation behavior. The first contribution of this thesis is a methodology for the modeling of wireless propagation in multiple antenna systems. Specifically we consider the problem of propagation modeling from the perspective of the protocol designer and system engineer. By defining the wireless channel as the complex narrow-band channel response h e C between two devices, we characterize the important degrees of freedom associated with the channel by modeling it as a function of its path-loss, multipath/frequency, time stability, spatial, and polarization characteristics. We then motivate this model by presenting a general set of design decisions that depend on these parameters such as network density, channel allocation, and channel-state information (CSI) update rate. Lastly we provide a parametrization of the environment into measurable factors that can be used to predict channel behavior including link-length, Line-Of-Sight (LOS), link topology (e.g. air-to-ground), building density, and other physical parameters. The second contribution of this thesis is the experimental analysis and development of this modeling space.
(cont) Specifically we have gathered a large database of real wireless channel data from a diverse set of propagation environments. A mobile channel-data collection system was built for obtaining the required data which includes an eight-channel software receiver and a collection of WiFi channel sounders. The software receiver synchronously samples the 20-MHz band centered at 2.4 GHz from eight configurable antennas. Measurements have been carried out for both air-to-ground and ground-to-ground links for distances ranging from tens of meters to several kilometers throughout the city of Cambridge, MA. Here we have developed a collection of models for predicting channel behavior, including a model for estimating the path-loss coefficient a in street environments that utilizes two physical parameters: P1 = percentage of building gaps averaged over each side of the street, P2= percentage of the street length that has a building gap on at least one side of the street. Results show a linear increase in a of 0.53 and 0.32 per 10% increase in P1 and P2, respectively, with RMS errors of 0.47 and 0.27 a for a's between 2 and 5. Experiments indicate a 10dB performance advantage in estimating path-loss with this multi-factor model over the optimal linear estimator (upper-bound empirical model) for link lengths as short as 100 meters. In contrast, air-to-ground links have been shown to exhibit log-normal fading with an average attenuation of a ; 2 and standard deviation of 8dB. Additionally we provide exhaustive evidence that the small-scale fading behavior (frequency domain) of both Non-Line-Of-Sight (NLOS) air-to-ground and ground-to-ground links as short as tens of meters is Rayleigh distributed. More specifically, fading distributions across a diverse set of environments and link lengths have been shown to have Rician K-factors smaller than 1, suggesting robust performance of the Rayleigh model.
(cont) A model is also presented that defines a stochastic distribution for the delay-spread of the channel as a function of the link-length (do), multipath component (MPC) decay-rate ( ... attenuation per unit delay ... ), and MPC arrival-rate (q = MPCs per unit delay ... periments support the use of this model over a spectrum of link-lengths (50m-700m) and indicate a dense arrival-rate (q) (on the order of 1 MPC) in ground-to-ground links. In this range the frequency structure of the channel is insensitive to q, which reduces the modeling complexity to a single unknown parameter, P. We provide estimators for 3 over a variety of environment types that have been shown to closely replicate the fade width distribution in these environments. The observed time-coherence length (tc) of MPCs tend to be either less than 300ms (high-frequency) or 5 seconds and longer (low-frequency), resulting in a Rician-like distribution for fading in the time domain. We show that the time characteristics of the channel are accurately modeled as the superposition of two independent circularly symmetric complex gaussian random variables corresponding to the channel response due to a set of stable and unstable MPCs. We observe the S-factor, defined as the ratio of average power in stable to unstable MPCs (distinct from the Rician K-factor), which ranges between 0-30dB depending on environment and link length, and can be estimated with an rms error of 3dB in both ground-to-ground and air-to-ground link regimes. Experiments show improved performance of this model over the Rician fading model which has been shown to underestimate high fade events (tails) in the time domain, corresponding to cases where the stable MPCs destructively combine to form a null. Additionally, the Kronecker MIMO channel model is shown to predict channel capacity (of a 7x7 system) with an rms error of 1.7 ... (at 20dB SNR) over a diverse set of observed outdoor environments.
(cont) Experiments indicate a 3dB performance advantage in this prediction when applied to environments that are not dominated by single-bounce propagation paths (Single-bounce: 2.1 ... rms, Multi-bounce: 1 ... rms).
by Keith T. Herring.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
13

Greenfield, C. C. "Replicated sampling in censuses and surveys." Thesis, [Hong Kong] : University of Hong Kong, 1985. http://sunzi.lib.hku.hk/hkuto/record.jsp?B1232131X.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Lindsey, Heidi Lula. "An Introduction to Bayesian Methodology via WinBUGS and PROC MCMC." BYU ScholarsArchive, 2011. https://scholarsarchive.byu.edu/etd/2784.

Full text
Abstract:
Bayesian statistical methods have long been computationally out of reach because the analysis often requires integration of high-dimensional functions. Recent advancements in computational tools to apply Markov Chain Monte Carlo (MCMC) methods are making Bayesian data analysis accessible for all statisticians. Two such computer tools are Win-BUGS and SASR 9.2's PROC MCMC. Bayesian methodology will be introduced through discussion of fourteen statistical examples with code and computer output to demonstrate the power of these computational tools in a wide variety of settings.
APA, Harvard, Vancouver, ISO, and other styles
15

Guo, Yawen. "On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study." FIU Digital Commons, 2016. http://digitalcommons.fiu.edu/etd/3045.

Full text
Abstract:
The purpose of this thesis is to propose some test statistics for testing the skewness and kurtosis parameters of a distribution, not limited to a normal distribution. Since a theoretical comparison is not possible, a simulation study has been conducted to compare the performance of the test statistics. We have compared both parametric methods (classical method with normality assumption) and non-parametric methods (bootstrap in Bias Corrected Standard Method, Efron’s Percentile Method, Hall’s Percentile Method and Bias Corrected Percentile Method). Our simulation results for testing the skewness parameter indicate that the power of the tests differs significantly across sample sizes, the choice of alternative hypotheses and methods we chose. For testing the kurtosis parameter, the simulation results suggested that the classical method performs well when the data are from both normal and beta distributions and bootstrap methods are useful for uniform distribution especially when the sample size is large.
APA, Harvard, Vancouver, ISO, and other styles
16

Miller, Michael Chad. "Global Resource Management of Response Surface Methodology." PDXScholar, 2014. https://pdxscholar.library.pdx.edu/open_access_etds/1621.

Full text
Abstract:
Statistical research can be more difficult to plan than other kinds of projects, since the research must adapt as knowledge is gained. This dissertation establishes a formal language and methodology for designing experimental research strategies with limited resources. It is a mathematically rigorous extension of a sequential and adaptive form of statistical research called response surface methodology. It uses sponsor-given information, conditions, and resource constraints to decompose an overall project into individual stages. At each stage, a "parent" decision-maker determines what design of experimentation to do for its stage of research, and adapts to the feedback from that research's potential "children", each of whom deal with a different possible state of knowledge resulting from the experimentation of the "parent". The research of this dissertation extends the real-world rigor of the statistical field of design of experiments to develop an deterministic, adaptive algorithm that produces deterministically generated, reproducible, testable, defendable, adaptive, resource-constrained multi-stage experimental schedules without having to spend physical resource.
APA, Harvard, Vancouver, ISO, and other styles
17

Sroka, Christopher J. "Extending Ranked Set Sampling to Survey Methodology." The Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=osu1218543909.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Hinchliffe, Sally Rose. "Advancing and appraising competing risks methodology for better communication of survival statistics." Thesis, University of Leicester, 2013. http://hdl.handle.net/2381/28176.

Full text
Abstract:
The probability of an event occurring or the proportion of patients experiencing an event, such as death or disease, is often of interest in medical research. It is a measure that is intuitively appealing to many consumers of statistics and yet the estimation is not always clearly understood or straightforward. Many researchers will take the complement of the survival function, obtained using the Kaplan-Meier estimator. However, in situations where patients are also at risk of competing events, the interpretation of such estimates may not be meaningful. Competing risks are present in almost all areas of medical research. They occur when patients are at risk of more than one mutually exclusive event, such as death from different causes. Although methods for the analysis of survival data in the presence of competing risks have been around since the 1760s there is increasing evidence that these methods are being underused. The primary aim of this thesis is to develop and apply new and accessible methods for analysing competing risks in order to enable better communication of the estimates obtained from such analyses. These developments will primarily involve the use of the recently established exible parametric survival model. Several applications of the methods will be considered in various areas of medical research to demonstrate the necessity of competing risks theory. As there is still a great amount of misunderstanding amongst clinical researchers about when these methods should be applied, considerations are made as to how to best present results. Finally, key concepts and assumptions of the methods will be assessed through sensitivity analyses and implications of data quality will be investigated through the use of a simulation study.
APA, Harvard, Vancouver, ISO, and other styles
19

Graversen, Therese. "Statistical and computational methodology for the analysis of forensic DNA mixtures with artefacts." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:4c3bfc88-25e7-4c5b-968f-10a35f5b82b0.

Full text
Abstract:
This thesis proposes and discusses a statistical model for interpreting forensic DNA mixtures. We develop methods for estimation of model parameters and assessing the uncertainty of the estimated quantities. Further, we discuss how to interpret the mixture in terms of predicting the set of contributors. We emphasise the importance of challenging any interpretation of a particular mixture, and for this purpose we develop a set of diagnostic tools that can be used in assessing the adequacy of the model to the data at hand as well as in a systematic validation of the model on experimental data. An important feature of this work is that all methodology is developed entirely within the framework of the adopted model, ensuring a transparent and consistent analysis. To overcome the challenge that lies in handling the large state space for DNA profiles, we propose a representation of a genotype that exhibits a Markov structure. Further, we develop methods for efficient and exact computation in a Bayesian network. An implementation of the model and methodology is available through the R package DNAmixtures.
APA, Harvard, Vancouver, ISO, and other styles
20

Wang, Ruoying. "A Methodology for the Analysis of Fly Activity Data." Digital Commons @ East Tennessee State University, 2012. https://dc.etsu.edu/honors/132.

Full text
Abstract:
Experiments to learn about the effect of light, sex, and diet on the activity of flies generate great quantities of data that is necessary to analyze. Since different researches and students participate in the analysis of those experiments, it is convenient to have a methodology to analyze the experimental data using software so that the data can be analyzed in a uniform way. Being a double major in mathematics and biology, I am interested in:Deciding which statistical procedure to use to analyze the data so that the research questions of the researchers in biology are answered.To recommend how to implement those procedures using software in an efficient way.To write a prototype for the interpretation of the results.Those are the objectives of this work. In the thesis, we first applied two-way ANOVA to analyze the effect of two selected factors, sex (female and male) and diet (liver and non-liver), on the fly activity under dark condition and under light condition, respectively. Next, we employed the repeated measures to capture how fly activity changes over time (day in this case) and to relate the changes to the selected factors, sex and diet, also under dark condition and under light condition, respectively. Finally, we did a little research on the analysis of circadian rhythms and compared the results with that obtained from honey bee activity experiments carried out before.
APA, Harvard, Vancouver, ISO, and other styles
21

Cernat, Alexandru. "Evaluating mode differences in longitudinal data : moving to a mixed mode paradigm of survey methodology." Thesis, University of Essex, 2015. http://repository.essex.ac.uk/15739/.

Full text
Abstract:
Collecting and combining data using multiple modes of interview (e.g., face-to- face, telephone, Web) is becoming common practice in survey agencies. This is also true for longitudinal studies, a special type of survey that applies questionnaires repeatedly to the same respondents. In this PhD I investigate if and how collecting information using different modes can impact data quality in panel studies. Chapters 2 and 3 investigate how a sequential telephone - face-to-face mixed mode design can bias reliability, validity and estimates of change compared to a single mode. In order to achieve this goal I have used an experimental design from the Understanding Society Innovation Panel. The analyses have shown that there are only small differences in reliability and validity between the two modes but estimates of change might be overestimated in the mixed modes design. Chapter 4 investigates the measurement differences between face-to-face, telephone and Web on three scales: depression, physical activity and religiosity. We use a quasi-experimental (cross-over) design in the Health and Retirement Study. The results indicate systematic differences between interviewer modes and Web. We propose social desirability and recency as possible explanations. In Chapter 5 we investigate using the Understanding Innovation Panel if the extra contact by email leads to increased propensity to participate in a sequential Web - face-to-face design. Using the experimental nature of our data we show that the extra contact by email in the mixed mode survey does not increase participation likelihood. One of the main difficulties in the research of (mixed) modes designs is separating the effects of selection and measurement of the modes. Chapter 6 tackles this issue by proposing equivalence testing, a statistical approach to control for measurement differences across groups, as a front-door approach to disentangle these two. A simulation study shows that this approach works and highlights the bias when the two main assumptions don't hold.
APA, Harvard, Vancouver, ISO, and other styles
22

Williams, Ulyana P. "On Some Ridge Regression Estimators for Logistic Regression Models." FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3667.

Full text
Abstract:
The purpose of this research is to investigate the performance of some ridge regression estimators for the logistic regression model in the presence of moderate to high correlation among the explanatory variables. As a performance criterion, we use the mean square error (MSE), the mean absolute percentage error (MAPE), the magnitude of bias, and the percentage of times the ridge regression estimator produces a higher MSE than the maximum likelihood estimator. A Monto Carlo simulation study has been executed to compare the performance of the ridge regression estimators under different experimental conditions. The degree of correlation, sample size, number of independent variables, and log odds ratio has been varied in the design of experiment. Simulation results show that under certain conditions, the ridge regression estimators outperform the maximum likelihood estimator. Moreover, an empirical data analysis supports the main findings of this study. This thesis proposed and recommended some good ridge regression estimators of the logistic regression model for the practitioners in the field of health, physical and social sciences.
APA, Harvard, Vancouver, ISO, and other styles
23

Jensen, Krista Peine. "Probabilistic Methodology for Record Linkage Determining Robustness of Weights." BYU ScholarsArchive, 2004. https://scholarsarchive.byu.edu/etd/590.

Full text
Abstract:
Record linkage is the process that joins separately recorded pieces of information for a particular individual from one or more sources. To facilitate record linkage, a reliable computer based approach is ideal. In genealogical research computerized record linkage is useful in combing information for an individual across multiple censuses. In creating a computerized method for linking censuse records it needs to be determined if weights calculated from one geographical area, can be used to link records from another geographical area. Research performed by Marcie Francis calculates field weights using census records from 1910 and 1920 for Ascension Parish Louisiana. These weights are re-calculated to take into account population changes of the time period and then used on five data sets from different geographical locations to determine their robustness. HeritageQuest provided indexed census records on four states. They include California, Connecticut, Illinois and Michigan in addition to Louisiana. Because the record size of California was large and we desired at least five data sets for comparison this state was split into two groups based on geographical location. Weights for Louisiana were re-calculated to take into consideration visual basic code modifications for the field "Place of Origin", "Age" and "Location" (enumeration district). The validity of these weights, were a concern due to the low number of known matches present in the data set for Louisiana. Thus, to get a better feel for how weights calculated from a data source with a larger number of known matches present, weights were calculated for Michigan census records. Error rates obtained using weights calculated from the Michigan data set were lower than those obtained using Louisiana weights. In order to determine weight robustness weights for Southern California were also calculated to allow for comparison between two samples. Error rates acquired using Southern California weights were much lower than either of the previously calculated error rates. This led to the decision to calculate weights for each of the data sets and take the average of the weights and use them to link each data set to take into account fluctuations of the population between geographical locations. Error rates obtained when using the averaged weights proved to be robust enough to use in any of the geographical areas sampled. The weights obtained in this project can be used when linking any census records from 1910 and 1920. When linking census records from other decades it is necessary to calculate new weights to account for specific time period fluctuations.
APA, Harvard, Vancouver, ISO, and other styles
24

Stone, R. A. "Statistical methodology and causal inference in studies of the health effects of radiation." Thesis, University of Oxford, 1986. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.375329.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Fletcher, Douglas. "Generalized Empirical Bayes: Theory, Methodology, and Applications." Diss., Temple University Libraries, 2019. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/546485.

Full text
Abstract:
Statistics
Ph.D.
The two key issues of modern Bayesian statistics are: (i) establishing a principled approach for \textit{distilling} a statistical prior distribution that is \textit{consistent} with the given data from an initial believable scientific prior; and (ii) development of a \textit{consolidated} Bayes-frequentist data analysis workflow that is more effective than either of the two separately. In this thesis, we propose generalized empirical Bayes as a new framework for exploring these fundamental questions along with a wide range of applications spanning fields as diverse as clinical trials, metrology, insurance, medicine, and ecology. Our research marks a significant step towards bridging the ``gap'' between Bayesian and frequentist schools of thought that has plagued statisticians for over 250 years. Chapters 1 and 2---based on \cite{mukhopadhyay2018generalized}---introduces the core theory and methods of our proposed generalized empirical Bayes (gEB) framework that solves a long-standing puzzle of modern Bayes, originally posed by Herbert Robbins (1980). One of the main contributions of this research is to introduce and study a new class of nonparametric priors ${\rm DS}(G, m)$ that allows exploratory Bayesian modeling. However, at a practical level, major practical advantages of our proposal are: (i) computational ease (it does not require Markov chain Monte Carlo (MCMC), variational methods, or any other sophisticated computational techniques); (ii) simplicity and interpretability of the underlying theoretical framework which is general enough to include almost all commonly encountered models; and (iii) easy integration with mainframe Bayesian analysis that makes it readily applicable to a wide range of problems. Connections with other Bayesian cultures are also presented in the chapter. Chapter 3 deals with the topic of measurement uncertainty from a new angle by introducing the foundation of nonparametric meta-analysis. We have applied the proposed methodology to real data examples from astronomy, physics, and medical disciplines. Chapter 4 discusses some further extensions and application of our theory to distributed big data modeling and the missing species problem. The dissertation concludes by highlighting two important areas of future work: a full Bayesian implementation workflow and potential applications in cybersecurity.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
26

Gryder, Ryan W. "Design & Analysis of a Computer Experiment for an Aerospace Conformance Simulation Study." VCU Scholars Compass, 2016. http://scholarscompass.vcu.edu/etd/4208.

Full text
Abstract:
Within NASA's Air Traffic Management Technology Demonstration # 1 (ATD-1), Interval Management (IM) is a flight deck tool that enables pilots to achieve or maintain a precise in-trail spacing behind a target aircraft. Previous research has shown that violations of aircraft spacing requirements can occur between an IM aircraft and its surrounding non-IM aircraft when it is following a target on a separate route. This research focused on the experimental design and analysis of a deterministic computer simulation which models our airspace configuration of interest. Using an original space-filling design and Gaussian process modeling, we found that aircraft delay assignments and wind profiles significantly impact the likelihood of spacing violations and the interruption of IM operations. However, we also found that implementing two theoretical advancements in IM technologies can potentially lead to promising results.
APA, Harvard, Vancouver, ISO, and other styles
27

Gardner, Sugnet. "Extensions of biplot methodology to discriminant analysis with applications of non-parametric principal components." Thesis, Stellenbosch : Stellenbosch University, 2001. http://hdl.handle.net/10019.1/52264.

Full text
Abstract:
Dissertation (PhD)--Stellenbosch University, 2001.
ENGLISH ABSTRACT: Gower and Hand offer a new perspective on the traditional biplot. This perspective provides a unified approach to principal component analysis (PCA) biplots based on Pythagorean distance; canonical variate analysis (CVA) biplots based on Mahalanobis distance; non-linear biplots based on Euclidean embeddable distances as well as generalised biplots for use with both continuous and categorical variables. The biplot methodology of Gower and Hand is extended and applied in statistical discrimination and classification. This leads to discriminant analysis by means of PCA biplots, CVA biplots, non-linear biplots as well as generalised biplots. Properties of these techniques are derived in detail. Classification regions defined for linear discriminant analysis (LDA) are applied in the CVA biplot leading to discriminant analysis using biplot methodology. Situations where the assumptions of LDA are not met are considered and various existing alternative discriminant analysis procedures are formulated in terms of biplots and apart from PCA biplots, QDA, FDA and DSM biplots are defined, constructed and their usage illustrated. It is demonstrated that biplot methodology naturally provides for managing categorical and continuous variables simultaneously. It is shown through a simulation study that the techniques based on biplot methodology can be applied successfully to the reversal problem with categorical variables in discriminant analysis. Situations occurring in practice where existing discriminant analysis procedures based on distances from means fail are considered. After discussing self-consistency and principal curves (a form of non-parametric principal components), discriminant analysis based on distances from principal curves (a form of a conditional mean) are proposed. This biplot classification procedure based upon principal curves, yields much better results. Bootstrapping is considered as a means of describing variability in biplots. Variability in samples as well as of axes in biplot displays receives attention. Bootstrap a-regions are defined and the ability of these regions to describe biplot variability and to detect outliers is demonstrated. Robust PCA and CVA biplots restricting the role of influential observations on biplot displays are also considered. An extensive library of S-PLUS computer programmes is provided for implementing the various discriminant analysis techniques that were developed using biplot methodology. The application of the above theoretical developments and computer software is illustrated by analysing real-life data sets. Biplots are used to investigate the degree of capital intensity of companies and to serve as an aid in risk management of a financial institution. A particular application of the PCA biplot is the TQI biplot used in industry to determine the degree to which manufactured items comply with multidimensional specifications. A further interesting application is to determine whether an Old-Cape furniture item is manufactured of stinkwood or embuia. A data set provided by the Western Cape Nature Conservation Board consisting of measurements of tortoises from the species Homopus areolatus is analysed by means of biplot methodology to determine if morphological differences exist among tortoises from different geographical regions. Allometric considerations need to be taken into account and the resulting small sample sizes in some subgroups severely limit the use of conventional statistical procedures. Biplot methodology is also applied to classification in a diabetes data set illustrating the combined advantage of using classification with principal curves in a robust biplot or biplot classification where covariance matrices are unequal. A discriminant analysis problem where foraging behaviour of deer might eventually result in a change in the dominant plant species is used to illustrate biplot classification of data sets containing both continuous and categorical variables. As an example of the use of biplots with large data sets a data set consisting of 16828 lemons is analysed using biplot methodology to investigate differences in fruit from various areas of production, cultivars and rootstocks. The proposed a-bags also provide a measure of quantifying the graphical overlap among classes. This method is successfully applied in a multidimensional socio-economical data set to quantify the degree of overlap among different race groups. The application of the proposed biplot methodology in practice has an important byproduct: It provides the impetus for many a new idea, e.g. applying a peA biplot in industry led to the development of quality regions; a-bags were constructed to represent thousands of observations in the lemons data set, in tum leading to means for quantifying the degree of overlap. This illustrates the enormous flexibility of biplots - biplot methodology provides an infrastructure for many novelties when applied in practice.
AFRIKAANSE OPSOMMING: Gower en Hand bied 'n nuwe perspektief op die tradisionele bistipping. Hierdie perspektief verskaf 'n uniforme benadering tot hoofkomponent analise (HKA) bistippings gebaseer op Pythagoras-afstand; kanoniese veranderlike analise (KVA) bistippings gebaseer op Mahalanobis-afstand; nie-lineere bistippings gebaseer op Euclidies inbedbare afstande sowel as veralgemeende bistippings vir gebruik wanneer beide kontinue en kategoriese veranderlikes voorkom. Die bistippingsmetodologie van Gower en Hand word uitgebrei en toegepas in statistiese diskriminasie en klassifikasie. Dit lei tot diskriminantanalise met behulp van HKA bistippings, KVA bistippings, nie-lineere bistippings sowel as veralgemeende bistippings. Die eienskappe van hierdie tegnieke word in besonderhede afgelei. Die toepassing van die konsep van 'n klassifikasiegebied in die KVA bistipping baan die weg vir lineere diskriminantanalise (LDA) met behulp van bistippingsmetodologie. Situasies waar daar nie aan die aannames van LDA voldoen word nie kry aandag en verskeie bestaande altematiewe diskriminantanalise prosedures word in terme van bistippings geformuleer en naas HKA bistippings, word QDA, FDA en DSM bistippings gedefinieer, gekonstrueer en hul gebruike gedemonstreer. Dit word aangetoon dat bistippingsmetodologie op 'n natuurlik wyse voorsiening maak om kategoriese veranderlikes en kontinue veranderlikes gelyktydig te hanteer. Daar word met behulp van 'n simulasie-studie aangetoon dat tegnieke gebaseer op die bistippingsmetodologie wat ontwikkel IS, suksesvol by die sogenaamde ornkeringsprobleem by diskriminantanalise met kategoriese veranderlikes gebruik kan word. Verder word aangevoer dat daar baie praktiese situasies voorkom waar bestaande prosedures van diskriminantanalise faal omdat dit op afstande vanaf gemiddeldes gebaseer IS. Na 'n bespreking van self-konsekwentheid en hoofkrommes ('n vorm van nieparametriese hoofkomponente) word voorgestel om diskriminantanalise op afstand vanaf hoofkrommes ('n vonn van 'n voorwaardelike gemiddelde) te baseer. Sodoende is 'n bistippingklassifikasie prosedure wat op afstand vanaf hoofkrommes gebaseer is en wat baie beter resultate lewer, ontwikkel. Die variasie in die posisies van datapunte in die bistipping sowel as van die bistippingsasse word bestudeer met behulp van skoenlusmetodes. 'n Skoenlus a-gebied word gedefinieer en dit word gedemonstreer hoe so 'n a-gebied aangewend kan word om variasie in bistippings te beskryf en wegleers te identifiseer. Robuuste HKA en KV A bistippings wat die rol van invloedryke waamemings op die bistipping beperk, word bespreek. 'n Omvangryke biblioteek van S-PLUS rekenaarprogramme is geskryf VIr die implementering van die verskillende diskriminantanalise tegnieke wat met behulp van bistippingsmetodologie ontwikkel is. Die toepassing van die voorafgaande teoretiese ontwikkelinge en rekenaarprogramme word geillustreer aan die hand van werklike datastelle vanuit die praktyk. So word bistippings gebruik om die mate van kapitaalintensiteit van ondememings te ondersoek en om as hulpmiddel by risikobestuur van 'n finansiele instelling te dien. 'n Besondere toepassing van die HKA bistipping is die TQI bistipping wat in die industriele omgewing gebruik word ten einde te bepaal tot watter mate vervaardigde artikels aan neergelegde meerdimensionele spesifikasies voldoen. 'n Verdere interessante toepassing is om te bepaal of 'n Ou-Kaapse meubelstuk van stinkhout of embuia gemaak is. 'n Datastel verskaf deur Wes-Kaap Natuurbewaring in verband met die bekende padloper skilpad, Homopus areolatus, is met behulp van bistippings geanaliseer om te bepaal of daar morfometriese verskille tussen die padlopers afkomstig van bepaalde geografiese gebiede is. Allometriese beginsels moes ook in ag gene em word en die min waamemings in sommige van die subgroepe het tot gevolg dat konvensionele statistiese tegnieke nie sonder meer gebruik kan word nie. Die bistippingsmetodologie is ook toegepas op klassifikasie by 'n diabetes datastel om die gekombineerde gebruik van. hoofkrommes in 'n robuuste bistipping te illustreer en bistippingklassifikasie waar daar sprake van ongelyke kovariansiematrikse is. 'n Diskriminantanalise probleem waar die weidingsvoorkeure van wildsbokke 'n verandering in die dominante plantegroei tot gevolg kan he, word gebruik om bistippingklassifikasie met data waar kontinue sowel as kategoriese veranderlikes verskaf word, te illustreer. As voorbeeld van die gebruik van bistippings by 'n groot datastel is 'n datastel bestaande uit waamemings van 16828 suurlemoene met behulp van bistippingsmetodologie geanaliseer ten einde verskille in vrugte afkomstig van verskillende produsente-streke, kultivars en onderstamme te ondersoek. Die a-sakkies wat hier ontwikkel is, lei tot kwantifisering van die grafiese oorvleueling van groepe. Hierdie beginsel word suksesvol toegepas in 'n meerdimensionele sosio-ekonomiese datastel om die mate van oorvleueling van verskillende bevolkingsgroepe te kwantifiseer. Die toepassing van die voorgestelde bistippingsmetodologie in die praktyk lei tot 'n belangrike newe-produk: Dit verskaf die stimulus tot die ontstaan van nuwe idees, byvoorbeeld, die toepassing van 'n HKA bistipping in 'n industriele omgewing het tot die ontwikkeling van die konsep van 'n kwaliteitsgebied aanleiding gegee; a-sakkies is gekonstrueer om duisende waamemings in die suurlemoendatastel te verteenwoordig wat weer gelei het tot 'n metode om die graad van oorvleueling te kwantifiseer. Hierdeur is die geweldige veelsydigheid van bistippings geillustreer - bistippingsmetodologie verskaf die infrastruktuur vir baie vindingryke toepassings in die praktyk.
APA, Harvard, Vancouver, ISO, and other styles
28

Carter, William E. "Response surface methodology for optimizing the fermentation of a cycloheximide producing streptomycete." Virtual Press, 2001. http://liblink.bsu.edu/uhtbin/catkey/1221297.

Full text
Abstract:
Many antibiotics are produced as secondary metabolites of Streptomyces species. Commercial production of an antibiotic involves the optimization of environmental parameters, genetic makeup, and medium. Selection of ingredients for both inoculum (seed) and fermentation (production) media must provide for economic production, and easy downstream processing of the compound. Antibiotics are produced as secondary shunt metabolites and represent products that are not essential for primary metabolism of the cell; therefore conditions for their optimal expression may or may not be associated with good growth of the organism. Response Surface Methodology (RSM) is a collection of statistically designed experiments and analyses that directs the investigation of many factors and their interactions. This approach minimizes the number of trials required to identify critical factors and possible synergism between factors. In this research, an antifungal antibiotic produced by an unknown streptomycete collected from soil, was isolated, characterized and identified as cycloheximide. RSM was then used toformulate both a seed and production medium that optimizes cycloheximide biosynethesis. For the seed medium, RSM was used in a three step process: i) full factorial categorical screen of many factors, ii) Plackett-Burman two-level screen of promising factors, and iii) orthogonal central composite design of critical factors. Optimal 24 hour packed cell volume was found with a seed medium containing (g/L): 6.6g soluble starch, 23.4g yeast extract, and Mg K2HPO4. Additionally, the effects of inoculum age and passage on resulting cycloheximide production were studied. It was found that the negative effects of increasing inoculum age and passages on cycloheximide production could be mediated by the composition of the seed medium. For the production medium, RSM analysis of 29 ingredients suggests that an optimal production medium for cycloheximide biosynthesis should contain a combination of starch (40 g/L), corn gluten (17.8 g/L), MgSO4.7H2O (1.16 g/L), and NaCl (6.38 g/L). This final production medium resulted in a cycloheximide titer of 943 µg/ml, a 6-fold improvement in antibiotic production.
Department of Biology
APA, Harvard, Vancouver, ISO, and other styles
29

Weyenberg, Grady S. "STATISTICS IN THE BILLERA-HOLMES-VOGTMANN TREESPACE." UKnowledge, 2015. http://uknowledge.uky.edu/statistics_etds/12.

Full text
Abstract:
This dissertation is an effort to adapt two classical non-parametric statistical techniques, kernel density estimation (KDE) and principal components analysis (PCA), to the Billera-Holmes-Vogtmann (BHV) metric space for phylogenetic trees. This adaption gives a more general framework for developing and testing various hypotheses about apparent differences or similarities between sets of phylogenetic trees than currently exists. For example, while the majority of gene histories found in a clade of organisms are expected to be generated by a common evolutionary process, numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history quite distinct from the histories of the majority of genes. Such “outlying” gene trees are considered to be biologically interesting and identifying these genes has become an important problem in phylogenetics. The R sofware package kdetrees, developed in Chapter 2, contains an implementation of the kernel density estimation method. The primary theoretical difficulty involved in this adaptation concerns the normalizion of the kernel functions in the BHV metric space. This problem is addressed in Chapter 3. In both chapters, the software package is applied to both simulated and empirical datasets to demonstrate the properties of the method. A few first theoretical steps in adaption of principal components analysis to the BHV space are presented in Chapter 4. It becomes necessary to generalize the notion of a set of perpendicular vectors in Euclidean space to the BHV metric space, but there some ambiguity about how to best proceed. We show that convex hulls are one reasonable approach to the problem. The Nye-PCA- algorithm provides a method of projecting onto arbitrary convex hulls in BHV space, providing the core of a modified PCA-type method.
APA, Harvard, Vancouver, ISO, and other styles
30

Ensor, Joie. "Evidence synthesis for prognosis and prediction : application, methodology and use of individual participant data." Thesis, University of Birmingham, 2017. http://etheses.bham.ac.uk//id/eprint/7759/.

Full text
Abstract:
Prognosis research summarises, explains and predicts future outcomes in patients with a particular condition. This thesis investigates the application and development of evidence synthesis methods for prognosis research, with particular attention given to improving individualised predictions from prognostic models developed and/or validated using metaanalysis techniques. A review of existing prognostic models for recurrence of venous thromboembolism highlighted several methodological and reporting issues. This motivated the development of a new model to address previous shortcomings, in particular by explicitly modelling and reporting the baseline hazard to enable individualised risk predictions over time. The new model was developed using individual participant data from several studies, using a novel internal-external cross-validation approach. This highlighted the potential for between-study heterogeneity in model performance, and motivated the investigation of recalibration methods to substantially improve consistency in model performance across populations. Finally, a new multiple imputation method was developed to investigate the impact of missing threshold information in meta-analysis of prognostic test accuracy. Computer code was developed to implement the method, and applied examples indicated missing thresholds could have a potentially large impact on conclusions. A simulation study indicated that the new method generally improves on the current standard, in terms of bias, precision and coverage.
APA, Harvard, Vancouver, ISO, and other styles
31

Joshi, Shirish. "Simulation-optimization studies : under efficient stimulationstrategies, and a novel response surface methodology algorithm /." Diss., This resource online, 1993. http://scholar.lib.vt.edu/theses/available/etd-06062008-170545/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

O'Connor, Andrew N. "A general cause based methodology for analysis of dependent failures in system risk and reliability assessments." Thesis, University of Maryland, College Park, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=3587283.

Full text
Abstract:

Traditional parametric Common Cause Failure (CCF) models quantify the soft dependencies between component failures through the use of empirical ratio relationships. Furthermore CCF modeling has been essentially restricted to identical components in redundant formations. While this has been advantageous in allowing the prediction of system reliability with little or no data, it has been prohibitive in other applications such as modeling the characteristics of a system design or including the characteristics of failure when assessing the risk significance of a failure or degraded performance event (known as an event assessment).

This dissertation extends the traditional definition of CCF to model soft dependencies between like and non-like components. It does this through the explicit modeling of soft dependencies between systems (coupling factors) such as sharing a maintenance team or sharing a manufacturer. By modeling the soft dependencies explicitly these relationships can be individually quantified based on the specific design of the system and allows for more accurate event assessment given knowledge of the failure cause.

Since the most data informed model in use is the Alpha Factor Model (AFM), it has been used as the baseline for the proposed solutions. This dissertation analyzes the US Nuclear Regulatory Commission's Common Cause Failure Database event data to determine the suitability of the data and failure taxonomy for use in the proposed cause-based models. Recognizing that CCF events are characterized by full or partial presence of "root cause" and "coupling factor" a refined failure taxonomy is proposed which provides a direct link between the failure cause category and the coupling factors.

This dissertation proposes two CCF models (a) Partial Alpha Factor Model (PAFM) that accounts for the relevant coupling factors based on system design and provide event assessment with knowledge of the failure cause, and (b)General Dependency Model (GDM),which uses Bayesian Network to model the soft dependencies between components. This is done through the introduction of three parameters for each failure cause that relate to component fragility, failure cause rate, and failure cause propagation probability.

APA, Harvard, Vancouver, ISO, and other styles
33

Shen, Zhiyuan. "EMPIRICAL LIKELIHOOD AND DIFFERENTIABLE FUNCTIONALS." UKnowledge, 2016. http://uknowledge.uky.edu/statistics_etds/14.

Full text
Abstract:
Empirical likelihood (EL) is a recently developed nonparametric method of statistical inference. It has been shown by Owen (1988,1990) and many others that empirical likelihood ratio (ELR) method can be used to produce nice confidence intervals or regions. Owen (1988) shows that -2logELR converges to a chi-square distribution with one degree of freedom subject to a linear statistical functional in terms of distribution functions. However, a generalization of Owen's result to the right censored data setting is difficult since no explicit maximization can be obtained under constraint in terms of distribution functions. Pan and Zhou (2002), instead, study the EL with right censored data using a linear statistical functional constraint in terms of cumulative hazard functions. In this dissertation, we extend Owen's (1988) and Pan and Zhou's (2002) results subject to non-linear but Hadamard differentiable statistical functional constraints. In this purpose, a study of differentiable functional with respect to hazard functions is done. We also generalize our results to two sample problems. Stochastic process and martingale theories will be applied to prove the theorems. The confidence intervals based on EL method are compared with other available methods. Real data analysis and simulations are used to illustrate our proposed theorem with an application to the Gini's absolute mean difference.
APA, Harvard, Vancouver, ISO, and other styles
34

Bosse, Anna L. "Comparing the Structural Components Variance Estimator and U-Statistics Variance Estimator When Assessing the Difference Between Correlated AUCs with Finite Samples." VCU Scholars Compass, 2017. https://scholarscompass.vcu.edu/etd/5194.

Full text
Abstract:
Introduction: The structural components variance estimator proposed by DeLong et al. (1988) is a popular approach used when comparing two correlated AUCs. However, this variance estimator is biased and could be problematic with small sample sizes. Methods: A U-statistics based variance estimator approach is presented and compared with the structural components variance estimator through a large-scale simulation study under different finite-sample size configurations. Results: The U-statistics variance estimator was unbiased for the true variance of the difference between correlated AUCs regardless of the sample size and had lower RMSE than the structural components variance estimator, providing better type 1 error control and larger power. The structural components variance estimator provided increasingly biased variance estimates as the correlation between biomarkers increased. Discussion: When comparing two correlated AUCs, it is recommended that the U-Statistics variance estimator be used whenever possible, especially for finite sample sizes and highly correlated biomarkers.
APA, Harvard, Vancouver, ISO, and other styles
35

Jerome, Guensley. "A Comparison of Some Confidence Intervals for Estimating the Kurtosis Parameter." FIU Digital Commons, 2017. http://digitalcommons.fiu.edu/etd/3489.

Full text
Abstract:
Several methods have been proposed to estimate the kurtosis of a distribution. The three common estimators are: g2, G2 and b2. This thesis addressed the performance of these estimators by comparing them under the same simulation environments and conditions. The performance of these estimators are compared through confidence intervals by determining the average width and probabilities of capturing the kurtosis parameter of a distribution. We considered and compared classical and non-parametric methods in constructing these intervals. Classical method assumes normality to construct the confidence intervals while the non-parametric methods rely on bootstrap techniques. The bootstrap techniques used are: Bias-Corrected Standard Bootstrap, Efron’s Percentile Bootstrap, Hall’s Percentile Bootstrap and Bias-Corrected Percentile Bootstrap. We have found significant differences in the performance of classical and bootstrap estimators. We observed that the parametric method works well in terms of coverage probability when data come from a normal distribution, while the bootstrap intervals struggled in constantly reaching a 95% confidence level. When sample data are from a distribution with negative kurtosis, both parametric and bootstrap confidence intervals performed well, although we noticed that bootstrap methods tend to have smaller intervals. When it comes to positive kurtosis, bootstrap methods perform slightly better than classical methods in coverage probability. Among the three kurtosis estimators, G2 performed better. Among bootstrap techniques, Efron’s Percentile intervals had the best coverage.
APA, Harvard, Vancouver, ISO, and other styles
36

Wang, Bingxia. "Estimation of Standardized Mortality Ratio in Epidemiological Studies." Fogler Library, University of Maine, 2002. http://www.library.umaine.edu/theses/pdf/WangB2002.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Charoenphol, Dares. "Using robust statistical methodology to evaluate the performance of project delivery systems| A case study of horizontal construction." Thesis, The George Washington University, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10252878.

Full text
Abstract:

The objective of this study is to demonstrate the application of the bootstrapping M-estimator (a robust Analysis of Variance, ANOVA) to test the null hypotheses of means equality among the cost and schedule performance of the three project delivery systems (PDS). A statistical planned contrast methodology is utilized after the robust ANOVA analysis to further determine where the differences of the means lie.

The results of this research concluded that traditional PDS (Design-Bid-Build, DBB) outperformed the two alternative PDS (“Design-Build (DB) and Construction Manager/General Contractor (CMGC)”), DBB and CMGC outperformed DB, and DBB outperformed CMGC, for the Cost Growth and the Change Order Cost Factor performance. On the other hand, alternative PDS (“DB & CMGC”) outperformed DBB, DB and CMGC (separately) outperformed DBB, and between the two alternative PDS, CMGC outperformed DB, for the Schedule Cost Growth performance.

These findings can help decision makers/owners making an informed decision, regarding cost and schedule related aspects, when choosing PDS for their projects. Though the case study of this research is based on the sample data obtained from the construction industry, the same methodology and statistical process can be applied to other industries and factors/variables of interest when the study sample data are unbalanced and the normality and homogeneity of variance assumptions are violated.

APA, Harvard, Vancouver, ISO, and other styles
38

Skeppström, Kirlna. "Radon in Groundwater- Influencing Factors and Prediction Methodology for a Swedish Environment." Licentiate thesis, KTH, Land and Water Resources Engineering, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-491.

Full text
Abstract:

This thesis presents a method for predicting radon (222Rn) levels in groundwater on a general scale, within an area of approximately 185 x 145 km2. The method applies to Swedish conditions, where 222Rn is the main contributor to natural radioactivity. Prediction of radon potential in groundwater is complex because there are many different factors affecting radon content, including geochemical and flow processes. The proposed method is based on univariate and multivariate statistical analyses and investigated the influence of different factors such as bedrock, soils, uranium distribution, altitude, distance to fractures and land use. A statistical variable based method (the RV method) was used to estimate risk values related to different radon concentrations. The method was calibrated and tested on more than 4400 drilled wells in Stockholm County. The weighted index (risk value) estimated by the RV method provided a fair prediction of radon potential in groundwater on a general scale. The RV method was successful in estimating the median radon concentration within 12 subregions (at a local scale, each of area 25 x 25 km2), based on weighted index values obtained from half of all wells tested. A high correlation between risk values and median radon concentrations was demonstrated. The factors bedrock, altitude, distance to fracture zone and distribution of uranium in bedrock were found to be significant in the prediction approach on a general scale. Visual data mining, which comprised analysis of 3D images, was a useful tool for data exploration but could not be used as an independent method for drawing conclusions regarding radon in groundwater. Results of a field study based on 38 drilled wells on the island of Ljusterö in the Stockholm archipelago showed that 222Rn concentrations in groundwater were weakly correlated to the parent elements (226Ra and 238U) in solution.

APA, Harvard, Vancouver, ISO, and other styles
39

McQuerry, Kristen J. "Statistical Methods for Handling Intentional Inaccurate Responders." UKnowledge, 2016. http://uknowledge.uky.edu/statistics_etds/17.

Full text
Abstract:
In self-report data, participants who provide incorrect responses are known as intentional inaccurate responders. This dissertation provides statistical analyses for address intentional inaccurate responses in the data. Previous work with adolescent self-report, labeled survey participants who intentionally provide inaccurate answers as mischievous responders. This phenomenon also occurs in clinical research. For example, pregnant women who smoke may report that they are nonsmokers. Our advantage is that we do not solely have self-report answers and can verify responses with lab values. Currently, there is no clear method for handling these intentional inaccurate respondents when it comes to making statistical inferences. We propose a using an EM algorithm to account for the intentional behavior while maintaining all responses in the data. The performance of this model is evaluated using simulated data and real data. The strengths and weaknesses of the EM algorithm approach will be demonstrated.
APA, Harvard, Vancouver, ISO, and other styles
40

Dufresne, Stephane. "A hierarchical modeling methodology for the definition and selection of requirements." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/24755.

Full text
Abstract:
Thesis (Ph.D.)--Aerospace Engineering, Georgia Institute of Technology, 2008.
Committee Chair: Mavris, Dimitri; Committee Member: Bishop, Carlee; Committee Member: Costello, Mark; Committee Member: Nickol, Craig; Committee Member: Schrage, Daniel
APA, Harvard, Vancouver, ISO, and other styles
41

Fang, Zhou. "Reweighting methods in high dimensional regression." Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:26f8541a-9e2d-466a-84aa-e6850c4baba9.

Full text
Abstract:
In this thesis, we focus on the application of covariate reweighting with Lasso-style methods for regression in high dimensions, particularly where p ≥ n. We apply a particular focus to the case of sparse regression under a-priori grouping structures. In such problems, even in the linear case, accurate estimation is difficult. Various authors have suggested ideas such as the Group Lasso and the Sparse Group Lasso, based on convex penalties, or alternatively methods like the Group Bridge, which rely on convergence under repetition to some local minimum of a concave penalised likelihood. We propose in this thesis a methodology that uses concave penalties to inspire a procedure whereupon we compute weights from an initial estimate, and then do a single second reweighted Lasso. This procedure -- the Co-adaptive Lasso -- obtains excellent results in empirical experiments, and we present some theoretical prediction and estimation error bounds. Further, several extensions and variants of the procedure are discussed and studied. In particular, we propose a Lasso style method of doing additive isotonic regression in high dimensions, the Liso algorithm, and enhance it using the Co-adaptive methodology. We also propose a method of producing rules based regression estimates for high dimensional non-parametric regression, that often outperforms the current leading method, the RuleFit algorithm. We also discuss extensions involving robust statistics applied to weight computation, repeating the algorithm, and online computation.
APA, Harvard, Vancouver, ISO, and other styles
42

Sánchez, Niubó Albert. "Development of Statistical Methodology to Study the Incidence of Drug Use." Doctoral thesis, Universitat de Barcelona, 2014. http://hdl.handle.net/10803/131161.

Full text
Abstract:
This work aims to contribute methodologically in the epidemiology of drug use, particularly estimation of incidence. No incidence figures of drug use in Spain had ever been published, prior to those appearing in these articles, and relatively little has been published for other countries. Since around 2000, the European Monitoring Centre for Drugs and Drug Addiction (EMCDDA), which is an agency of the European Union, has been making a concerted effort to promote the determination and publication of drug use incidence figures, given their great importance in designing prevention policies. The approaches used and results obtained by our research have been presented in three EMCDDA meetings (years 2007, 2008 and 2012), at a monographic meeting on incidence promoted by the Norwegian Institute for Alcohol and Drug Research (SIRUS) in 2009, and in the framework of a European project on new methodological tools for policy and programme evaluation (JUST/2010/DPIP/AG/1410) which ran from 2010 to 2012. This work therefore contributes not only by presenting drug use incidence results for Spain, but also by describing the development of methods and sharing ideas that may be adapted for use in other countries.
APA, Harvard, Vancouver, ISO, and other styles
43

Paprzycki, Peter Pawel. "Developing a Methodological Framework for the Analysis of Perceptions: A Case Study of the National Public Opinion Survey “The EU in the Eyes of Asia-Pacific”." University of Toledo / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1430493813.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Chernoff, Parker. "Sabermetrics - Statistical Modeling of Run Creation and Prevention in Baseball." FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3663.

Full text
Abstract:
The focus of this thesis was to investigate which baseball metrics are most conducive to run creation and prevention. Stepwise regression and Liu estimation were used to formulate two models for the dependent variables and also used for cross validation. Finally, the predicted values were fed into the Pythagorean Expectation formula to predict a team’s most important goal: winning. Each model fit strongly and collinearity amongst offensive predictors was considered using variance inflation factors. Hits, walks, and home runs allowed, infield putouts, errors, defense-independent earned run average ratio, defensive efficiency ratio, saves, runners left on base, shutouts, and walks per nine innings were significant defensive predictors. Doubles, home runs, walks, batting average, and runners left on base were significant offensive regressors. Both models produced error rates below 3% for run prediction and together they did an excellent job of estimating a team’s per-season win ratio.
APA, Harvard, Vancouver, ISO, and other styles
45

Ogden, Mitchell. "Communications and Methodologies in Crime Geography: Contemporary Approaches to Disseminating Criminal Incidence and Research." Digital Commons @ East Tennessee State University, 2019. https://dc.etsu.edu/etd/3652.

Full text
Abstract:
Many tools exist to assist law enforcement agencies in mitigating criminal activity. For centuries, academics used statistics in the study of crime and criminals, and more recently, police departments make use of spatial statistics and geographic information systems in that pursuit. Clustering and hot spot methods of analysis are popular in this application for their relative simplicity of interpretation and ease of process. With recent advancements in geospatial technology, it is easier than ever to publicly share data through visual communication tools like web applications and dashboards. Sharing data and results of analyses boosts transparency and the public image of police agencies, an image important to maintaining public trust in law enforcement and active participation in community safety.
APA, Harvard, Vancouver, ISO, and other styles
46

Schoergendorfer, Angela. "BAYESIAN SEMIPARAMETRIC GENERALIZATIONS OF LINEAR MODELS USING POLYA TREES." UKnowledge, 2011. http://uknowledge.uky.edu/gradschool_diss/214.

Full text
Abstract:
In a Bayesian framework, prior distributions on a space of nonparametric continuous distributions may be defined using Polya trees. This dissertation addresses statistical problems for which the Polya tree idea can be utilized to provide efficient and practical methodological solutions. One problem considered is the estimation of risks, odds ratios, or other similar measures that are derived by specifying a threshold for an observed continuous variable. It has been previously shown that fitting a linear model to the continuous outcome under the assumption of a logistic error distribution leads to more efficient odds ratio estimates. We will show that deviations from the assumption of logistic error can result in great bias in odds ratio estimates. A one-step approximation to the Savage-Dickey ratio will be presented as a Bayesian test for distributional assumptions in the traditional logistic regression model. The approximation utilizes least-squares estimates in the place of a full Bayesian Markov Chain simulation, and the equivalence of inferences based on the two implementations will be shown. A framework for flexible, semiparametric estimation of risks in the case that the assumption of logistic error is rejected will be proposed. A second application deals with regression scenarios in which residuals are correlated and their distribution evolves over an ordinal covariate such as time. In the context of prediction, such complex error distributions need to be modeled carefully and flexibly. The proposed model introduces dependent, but separate Polya tree priors for each time point, thus pooling information across time points to model gradual changes in distributional shapes. Theoretical properties of the proposed model will be outlined, and its potential predictive advantages in simulated scenarios and real data will be demonstrated.
APA, Harvard, Vancouver, ISO, and other styles
47

Boone, Edward L. "Bayesian Methodology for Missing Data, Model Selection and Hierarchical Spatial Models with Application to Ecological Data." Diss., Virginia Tech, 2003. http://hdl.handle.net/10919/26141.

Full text
Abstract:
Ecological data is often fraught with many problems such as Missing Data and Spatial Correlation. In this dissertation we use a data set collected by the Ohio EPA as motivation for studying techniques to address these problems. The data set is concerned with the benthic health of Ohio's waterways. A new method for incorporating covariate structure and missing data mechanisms into missing data analysis is considered. This method allows us to detect relationships other popular methods do not allow. We then further extend this method into model selection. In the special case where the unobserved covariates are assumed normally distributed we use the Bayesian Model Averaging method to average the models, select the highest probability model and do variable assessment. Accuracy in calculating the posterior model probabilities using the Laplace approximation and an approximation based on the Bayesian Information Criterion (BIC) are explored. It is shown that the Laplace approximation is superior to the BIC based approximation using simulation. Finally, Hierarchical Spatial Linear Models are considered for the data and we show how to combine analysis which have spatial correlation within and between clusters.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
48

Zaldivar, Cynthia. "On the Performance of some Poisson Ridge Regression Estimators." FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3669.

Full text
Abstract:
Multiple regression models play an important role in analyzing and making predictions about data. Prediction accuracy becomes lower when two or more explanatory variables in the model are highly correlated. One solution is to use ridge regression. The purpose of this thesis is to study the performance of available ridge regression estimators for Poisson regression models in the presence of moderately to highly correlated variables. As performance criteria, we use mean square error (MSE), mean absolute percentage error (MAPE), and percentage of times the maximum likelihood (ML) estimator produces a higher MSE than the ridge regression estimator. A Monte Carlo simulation study was conducted to compare performance of the estimators under three experimental conditions: correlation, sample size, and intercept. It is evident from simulation results that all ridge estimators performed better than the ML estimator. We proposed new estimators based on the results, which performed very well compared to the original estimators. Finally, the estimators are illustrated using data on recreational habits.
APA, Harvard, Vancouver, ISO, and other styles
49

Vähänikkilä, H. (Hannu). "Statistical methods in dental research, with special reference to time-to-event methods." Doctoral thesis, Oulun yliopisto, 2015. http://urn.fi/urn:isbn:9789526207933.

Full text
Abstract:
Abstract Statistical methods are an essential part of the published dental research. It is important to evaluate the use of these methods to improve the quality of dental research. In the first part, the aim of this interdisciplinary study is to investigate the development of the use of statistical methods in dental journals, quality of statistical reporting and reporting of statistical techniques and results in dental research papers, with special reference to time-to-event methods. In the second part, the focus is specifically on time-to-event methods, and the aim is to demonstrate the strength of time-to-event methods in collecting detailed data about the development of oral health. The first part of this study is based on an evaluation of dental articles from five dental journals. The second part of the study is based on empirical data from 28 municipal health centres in order to study variations in the survival of tooth health. There were different profiles in the statistical content among the journals. The quality of statistical reporting was quite low in the journals. The use of time-to-event methods has increased from 1996 to 2007 in the evaluated dental journals. However, the benefits of these methods have not been fully adopted in dental research. The current study added new information regarding the status of statistical methods in dental research. Our study also showed that complex time-to-event analysis methods can be utilized even with detailed information on each tooth in large groups of study subjects. Authors of dental articles might apply the results of this study to improve the study protocol/planning as well as the statistical section of their research article
Tiivistelmä Tilastolliset tutkimusmenetelmät ovat olennainen osa hammaslääketieteellistä tutkimusta. Menetelmien käyttöä on tärkeä tutkia, jotta hammaslääketieteen tutkimuksen laatua voitaisiin parantaa. Tämän poikkitieteellisen tutkimuksen ensimmäisessä osassa tavoite on tutkia erilaisten tilastomenetelmien ja tutkimusasetelmien käyttöä, raportoinnin laatua ja tapahtumaan kuluvan ajan analysointimenetelmien käyttöä hammaslääketieteellisissä artikkeleissa. Toisessa osassa osoitetaan analysointimenetelmien vahvuus isojen tutkimusjoukkojen analysoinnissa. Ensimmäisen osan tutkimusaineiston muodostavat viiden hammaslääketieteellisen aikakauslehden artikkelit. Toisen osan tutkimusaineiston muodostivat 28 terveyskeskuksessa eri puolella Suomea hammashoitoa saaneet potilaat. Lehdet erosivat toisistaan tilastomenetelmien käytön ja tulosten esittämisen osalta. Tilastollisen raportoinnin laatu oli lehdissä puutteellinen. Tapahtumaan kuluvan ajan analysointimenetelmien käyttö on lisääntynyt vuosien 1996–2007 aikana. Tapahtumaan kuluvan ajan analysointimenetelmät mittaavat seuranta-ajan tietystä aloituspisteestä määriteltyyn päätepisteeseen. Tämän väitöksen tutkimukset osoittivat, että tapahtumaan kuluvan ajan analysointimenetelmät sopivat hyvin isojen tutkimusjoukkojen analysointiin. Menetelmien hyötyä ei ole kuitenkaan vielä saatu täysin esille hammaslääketieteellisissä julkaisuissa. Tämä tutkimus antoi uutta tietoa tilastollisten tutkimusmenetelmien käytöstä hammaslääketieteellisessä tutkimuksessa. Artikkelien kirjoittajat voivat hyödyntää tämän tutkimuksen tuloksia suunnitellessaan hammaslääketieteellistä tutkimusta
APA, Harvard, Vancouver, ISO, and other styles
50

Frie, Gudrun Louise. "Organizing, describing, analyzing, and retrieving the dissertation literature in special education : a case study using microcomputer technology to develop a personal information retrieval system." Thesis, University of British Columbia, 1988. http://hdl.handle.net/2429/28047.

Full text
Abstract:
This study analyzed special education dissertations published in Dissertation Abstracts International, 1980 to 1985. Keywords, describing the substantive content of each abstract and title, were assigned according to principles used in controlled and natural language indexing. A bibliometric analysis was performed to identify a core vocabulary representing frequent concepts and ideas and the most productive institutions awarding doctorates in special education. Descriptive and bivariate (chi square) analyses were also conducted illustrating relationships between demographic variables: year of completion, sex of author, degree awarded, page length, institution; and content variables: category of special education, research type, and data analysis technique. Finally, a microcomputer information retrieval system was developed to provide better access to the dissertation literature. Results indicated that a greater number of women choose to do doctoral work, graduate with Ph.D. degrees and write longer theses. The keyword index illustrated a wide diversity of topics being pursued. The microcomputer personal information retrieval system is multifaceted, is available for searching, may describe the vocabulary, and will accommodate the growing dissertation base in special education.
Education, Faculty of
Educational and Counselling Psychology, and Special Education (ECPS), Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography